Picking OTUs for use in PICRUStΒΆ

PICRUSt (doi: 10.1038/nbt.2676) is a software designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes. This tutorial covers how to pick OTUs from 16S rRNA sequences data to use with PICRUSt.


Requires Quality filtering in Single-end sequencing to be done and the PICRUSt software to be installed in your system. Warning: PICRUSt 1.0.0 requires the biom-format package v1.3.1 to be installed in your system (from the command line run: pip install biom-format==1.3.1, for more information see http://biom-format.org/).

PICRUSt requires an Closed-reference OTU table computed against the Greengenes reference (clustered at 97% identity). Download the reference database (Greengenes, version 2013/05), clustered at 97% identity:

wget ftp://ftp.fmach.it/metagenomics/micca/dbs/gg_2013_05.tar.gz
tar -zxvf gg_2013_05.tar.gz

Run the micca closed-reference protocol:

micca otu -m closed_ref -i filtered.fasta -o closed_ref_otus -r 97_otus.fasta -d 0.97 -t 4
cd closed_ref_otus

Report the sample summary:

micca tablestats -i otutable.txt -o tablestats
head tablestats/tablestats_samplesumm.txt

Sample       Depth   NOTU    NSingle
Mw_03        1084    132     39
Mw_06        1387    122     27
Mw_11        1485    155     44
Mw_07        1528    150     36
Mw_01        1537    143     35
Mw_15        1565    144     35
Mw_14        1610    149     42
Mw_02        1670    143     43
Mw_12        1710    153     54

Rarefy the OTU table for the PICRUSt analysis is always a good idea (see https://groups.google.com/forum/#!topic/picrust-users/ev5uZGUIPrQ), so we will rarefy the table at 1084 sequences per sample using tablerare:

micca tablerare -i otutable.txt -o otutable_rare.txt -d 1084

Convert the rarefied OTU table into the BIOM format replacing the OTU IDs with the original sequence IDs using the tobiom command:

micca tobiom -i otutable_rare.txt -o tables.biom -u otuids.txt

Normalize the OTU table by dividing each OTU by the known/predicted 16S copy number abundance:

normalize_by_copy_number.py -i tables.biom -o normalized_otus.biom

Create the final metagenome functional predictions:

predict_metagenomes.py -i normalized_otus.biom -o metagenome_predictions.biom

Now you can analyze the PICRUSt predicted metagenome: http://picrust.github.io/picrust/tutorials/downstream_analysis.html#downstream-analysis-guide.