Abstract: Soil As Metagenomics Greatest Challenge: The Great Prairie Project. (ASA, CSSA and SSSA Annual Meetings (San Antonio, TX

Poster Number 132

See more from this Division: S03 Soil Biology & Biochemistry
See more from this Session: Advanced Techniques for Assessing and Interpreting Microbial Community Function: II

Wednesday, October 19, 2011

Henry Gonzalez Convention Center, Hall C

Share |

James M. Tiedje¹, Adina Howe¹, Rachel Lamendella², Rachel Mackelprang², C. Titus Brown¹, Susannah G. Tringe³, Jordan Fish¹, Qiong Wang¹, James R. Cole¹, Patrick S. Chain⁴, Tijana Glavina del Rio³ and Janet K. Jansson⁵, (1)Michigan State University, East Lansing, MI
(2)Lawrence Berkeley National Lab, Berkeley, CA
(3)DOE Joint Genome Institute, Walnut Creek, CA
(4)Genome Sciences Group, Los Alamos National Lab, Los Alamos, NM
(5)Earth Sciences Division, LBNL, Berkeley, CA

Soil is the greatest challenge for metagenomics due its enormous microbial and gene diversity. Hence, JGI selected soil metagenomics as a Grand Challenge for testing metagenomics sequencing, assembly, and annotation. Samples were collected along a transect across the Midwest prairie from paired native prairie and long-term agriculture sites in Wisconsin, Iowa, and Kansas. Seven soil cores were collected from each of the six sampling sites from a 10 m area. 16S rRNA gene pyrosequence analyses demonstrated that the native prairie and agricultural communities differed. Metagenome (shotgun) sequencing was also performed on the DNA from the central core from each site using the 454 and Illumina (GAII and HiSeq) platforms. This yielded more than a 1.5 Terabase of sequence with 200-300 Gigabases per sample, the largest amount of sequence for any metagenomic project to date. We used two analysis approaches, one being to assemble what we can to 1000bp or longer. Because this amount of data cannot be assembled by standard methods, we developed a novel pre-filtering approach to partition the dataset into reads that are likely to assembled together. The assembly of Iowa corn (cultivated) Illumina sequence (176 Gb) using Amazon Cloud resulted in 148,053 contigs (>1000 bp). Based on k-mer abundances, we estimate 2-6x maximum coverage for sequencing efforts of Iowa corn soil. Our partitioning assembly approach works well for large datasetes, scales to commodity hardware, and has a freely available implementation. Our second approach is to use all (short) reads to analyze targeted ecofunctional genes. Using nifH (of nitrogenase), as a prototype, we retrieved 1.1 million reads by an HMM nifH model, of which 524 had nitrogenase as their best Blast hit. This corresponds to 1 nifH gene per 370 microbes. We are also developing a nucleating assembly approach to recover more of the biogeochemically important genes.

See more from this Division: S03 Soil Biology & Biochemistry
See more from this Session: Advanced Techniques for Assessing and Interpreting Microbial Community Function: II

Previous Abstract | Next Abstract >>

355-1 Soil As Metagenomics Greatest Challenge: The Great Prairie Project.

Poster Number 132