msaΒΆ

$ micca msa --help
usage: micca msa [-h] -i FILE -o FILE [-m {muscle,nast}]
                 [--muscle-maxiters MUSCLE_MAXITERS] [--nast-template FILE]
                 [--nast-id NAST_ID] [--nast-threads NAST_THREADS]
                 [--nast-mincov NAST_MINCOV] [--nast-strand {both,plus}]
                 [--nast-notaligned FILE] [--nast-hits FILE] [--nast-nofilter]

micca msa performs a multiple sequence alignment (MSA) on the input
file in FASTA format. micca msa provides two approaches for MSA:

* MUSCLE (doi: 10.1093/nar/gkh340). It is one of the most widely-used
  multiple sequence alignment software;

* Nearest Alignment Space Termination (NAST) (doi:
  10.1093/nar/gkl244). MICCA provides a very fast and memory
  efficient reimplementation of the NAST algorithm. The algorithm is
  based on VSEARCH (https://github.com/torognes/vsearch). It requires
  a pre-aligned database of sequences (--nast-template). For 16S
  data, a good template file is the Greengenes Core Set
  (http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/
  core_set_aligned.fasta.imputed).

optional arguments:
  -h, --help            show this help message and exit

arguments:
  -i FILE, --input FILE
                        input FASTA file (required).
  -o FILE, --output FILE
                        output MSA file in FASTA format (required).
  -m {muscle,nast}, --method {muscle,nast}
                        multiple sequence alignment method (default muscle).

MUSCLE specific options:
  --muscle-maxiters MUSCLE_MAXITERS
                        maximum number of MUSCLE iterations. Set to 2 for a
                        good compromise between speed and accuracy (>=1
                        default 16).

NAST specific options:
  --nast-template FILE  multiple sequence alignment template file in FASTA
                        format.
  --nast-id NAST_ID     sequence identity threshold to consider a sequence a
                        match (0.0 to 1.0, default 0.75).
  --nast-threads NAST_THREADS
                        number of threads to use (1 to 256, default 1).
  --nast-mincov NAST_MINCOV
                        reject sequence if the fraction of alignment to the
                        template sequence is lower than MINCOV. This parameter
                        prevents low-coverage alignments at the end of the
                        sequences (default 0.75).
  --nast-strand {both,plus}
                        search both strands or the plus strand only (default
                        both).
  --nast-notaligned FILE
                        write not aligned sequences in FASTA format.
  --nast-hits FILE      write hits on a TAB delimited file with the query
                        sequence id, the template sequence id and the
                        identity.
  --nast-nofilter       do not remove positions which are gaps in every
                        sequenceces (useful if you want to apply a Lane mask
                        filter before the tree inference).

Examples

De novo MSA using MUSCLE:

    micca msa -i input.fasta -o msa.fasta

Template-based MSA using NAST, the Greengenes alignment as
template (clustered at 97% similarity) 4 threads and a sequence
identity threshold of 75%:

    micca msa -i input.fasta -o msa.fasta -m nast --nast-threads 4 \
    --nast-template greengenes_2013_05/rep_set_aligned/97_otus.fasta