Supported file formats

Sequence files

FASTA and FASTQ Sanger/Illumina 1.8+ format (phred+33) formats are supported. micca provides the convert command to convert between sequence file formats.

Taxonomy files

Taxonomy files map sequence IDs to taxonomy. Input taxonomy files must be TAB-delimited files where rows are either in the form:

  1. SEQID[TAB]k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__;g__;
  2. SEQID[TAB]Bacteria;Firmicutes;Clostridia;Clostridiales;;;
  3. SEQID[TAB]Bacteria;Firmicutes;Clostridia;Clostridiales

Compatible taxonomy files are in Greengenes (http://greengenes.secondgenome.com/downloads), QIIME-formatted SILVA (https://www.arb-silva.de/download/archive/qiime/) and UNITE (https://unite.ut.ee/repository.php) databases.

The output taxonomy file returned by classify is a TAB-delimited file where each row is always in the format:

SEQID[TAB]Bacteria;Firmicutes;Clostridia;Clostridiales

OTU table and taxonomy tables

The OTU table returned by otu is an OTU x sample, TAB-delimited text file, containing the number of times an OTU is found in each sample:

OTU     Mw_01 Mw_02 Mw_03 ...
DENOVO1 151   178   177   ...
DENOVO2 339   181   142   ...
DENOVO3 533   305   63    ...
DENOVO4 166   299   115   ...
...     ...   ...   ...   ...

The tabletotax command returns the “taxonomy tables” for each taxonomic level, e.g.:

OTU                                Mw_01 Mw_02 Mw_03 ...
Bacteria;Bacteroidetes             1363  1543  1168  ...
Bacteria;Cyanobacteria/Chloroplast 0     0     0     ...
Bacteria;Firmicutes                6257  5780  6761  ...
Bacteria;Lentisphaerae             0     1     0     ...
...                                ...   ...   ...   ...

Sample data

The sample data file contains all of the information about the samples. In QIIME this file is called Mapping File. In micca, the sample data file must be a TAB-delimited text file (a row for each sample). The first column must be the sample identifier (assigned in merge, split or mergepairs):

ID    Group Altitude
Mw_01 Mw1   492
Mw_02 Mw1   492
Mw_09 Mw1   492
Mw_12 Mw1   492
...   ...   ...

Phylogenetic tree

Only the Newick format is supported.