355-1 Soil As Metagenomics Greatest Challenge: The Great Prairie Project.



Wednesday, October 19, 2011
Henry Gonzalez Convention Center, Hall C, Street Level

James Tiedje1, Adina Howe1, Rachel Lamendella2, Rachel Mackelprang2, C. Titus Brown1, Susannah G. Tringe3, Jordan Fish1, Qiong Wang1, James R. Cole1, Patrick S. Chain4, Tijana Glavina del Rio3 and Janet K. Jansson5, (1)Michigan State University, East Lansing, MI
(2)Lawrence Berkeley National Lab, Berkeley, CA
(3)DOE Joint Genome Institute, Walnut Creek, CA
(4)Genome Sciences Group, Los Alamos National Lab, Los Alamos, NM
(5)Earth Sciences Division, LBNL, Berkeley, CA
Soil is the greatest challenge for metagenomics due its enormous microbial and gene diversity. Hence, JGI selected soil metagenomics as a Grand Challenge for testing metagenomics sequencing, assembly, and annotation. Samples were collected along a transect across the Midwest prairie from paired native prairie and long-term agriculture sites in Wisconsin, Iowa, and Kansas. Seven soil cores were collected from each of the six sampling sites from a 10 m area. 16S rRNA gene pyrosequence analyses demonstrated that the native prairie and agricultural communities differed. Metagenome (shotgun) sequencing was also performed on the DNA from the central core from each site using the 454 and Illumina (GAII and HiSeq) platforms. This yielded more than a 1.5 Terabase of sequence with 200-300 Gigabases per sample, the largest amount of sequence for any metagenomic project to date. We used two analysis approaches, one being to assemble what we can to 1000bp or longer. Because this amount of data cannot be assembled by standard methods, we developed a novel pre-filtering approach to partition the dataset into reads that are likely to assembled together. The assembly of Iowa corn (cultivated) Illumina sequence (176 Gb) using Amazon Cloud resulted in 148,053 contigs (>1000 bp). Based on k-mer abundances, we estimate 2-6x maximum coverage for sequencing efforts of Iowa corn soil. Our partitioning assembly approach works well for large datasetes, scales to commodity hardware, and has a freely available implementation. Our second approach is to use all (short) reads to analyze targeted ecofunctional genes. Using nifH (of nitrogenase), as a prototype, we retrieved 1.1 million reads by an HMM nifH model, of which 524 had nitrogenase as their best Blast hit. This corresponds to 1 nifH gene per 370 microbes. We are also developing a nucleating assembly approach to recover more of the biogeochemically important genes.
See more from this Division: S03 Soil Biology & Biochemistry
See more from this Session: Advanced Techniques for Assessing and Interpreting Microbial Community Function: II