msaΒΆ

$ micca msa --help
usage: micca msa [-h] -i FILE -o FILE [-m {muscle,nast}]
                 [--muscle-maxiters MUSCLE_MAXITERS] [--nast-template FILE]
                 [--nast-id NAST_ID] [--nast-threads NAST_THREADS]
                 [--nast-mincov NAST_MINCOV] [--nast-strand {both,plus}]
                 [--nast-notaligned FILE] [--nast-hits FILE] [--nast-nofilter]
                 [--nast-notrim]

micca msa performs a multiple sequence alignment (MSA) on the input
file in FASTA format. micca msa provides two approaches for MSA:

* MUSCLE (doi: 10.1093/nar/gkh340). It is one of the most widely-used
  multiple sequence alignment software;

* Nearest Alignment Space Termination (NAST) (doi:
  10.1093/nar/gkl244). MICCA provides a very fast and memory
  efficient reimplementation of the NAST algorithm. The algorithm is
  based on VSEARCH (https://github.com/torognes/vsearch). It requires
  a pre-aligned database of sequences (--nast-template). For 16S
  data, a good template file is the Greengenes Core Set
  (http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/
  core_set_aligned.fasta.imputed).

optional arguments:
  -h, --help            show this help message and exit

arguments:
  -i FILE, --input FILE
                        input FASTA file (required).
  -o FILE, --output FILE
                        output MSA file in FASTA format (required).
  -m {muscle,nast}, --method {muscle,nast}
                        multiple sequence alignment method (default muscle).

MUSCLE specific options:
  --muscle-maxiters MUSCLE_MAXITERS
                        maximum number of MUSCLE iterations. Set to 2 for a
                        good compromise between speed and accuracy (>=1
                        default 16).

NAST specific options:
  --nast-template FILE  multiple sequence alignment template file in FASTA
                        format.
  --nast-id NAST_ID     sequence identity threshold to consider a sequence a
                        match (0.0 to 1.0, default 0.75).
  --nast-threads NAST_THREADS
                        number of threads to use (1 to 256, default 1).
  --nast-mincov NAST_MINCOV
                        reject sequence if the fraction of alignment to the
                        template sequence is lower than MINCOV. This parameter
                        prevents low-coverage alignments at the end of the
                        sequences (default 0.75).
  --nast-strand {both,plus}
                        search both strands or the plus strand only (default
                        both).
  --nast-notaligned FILE
                        write not aligned sequences in FASTA format.
  --nast-hits FILE      write hits on a TAB delimited file with the query
                        sequence id, the template sequence id and the
                        identity.
  --nast-nofilter       do not remove positions which are gaps in every
                        sequenceces (useful if you want to apply a Lane mask
                        filter before the tree inference).
  --nast-notrim         force to align the entire candidate sequence (i.e. do
                        not trim the candidate sequence to that which is bound
                        by the beginning and end points of of the alignment
                        span

Examples

De novo MSA using MUSCLE:

    micca msa -i input.fasta -o msa.fasta

Template-based MSA using NAST, the Greengenes alignment as
template (clustered at 97% similarity) 4 threads and a sequence
identity threshold of 75%:

    micca msa -i input.fasta -o msa.fasta -m nast --nast-threads 4 \
    --nast-template greengenes_2013_05/rep_set_aligned/97_otus.fasta