mergepairsΒΆ

$ micca mergepairs --help
usage: micca mergepairs [-h] -i FILE [FILE ...] -o FILE [-r FILE]
                        [-l MINOVLEN] [-d MAXDIFFS] [-p PATTERN] [-e REPL]
                        [-s SEP] [--notmerged-fwd FILE] [--notmerged-rev FILE]

micca mergepairs merges paired-end sequence reads into one sequence.

A single merging of a pair of FASTQ files can be simply performed
using both -i/--input and -r/--reverse options.

When the option -r/--reverse is not specified:

1. you can indicate several forward files (with the option -i/--input);

2. the reverse file name will be constructed by replacing the string
   '_R1' in the forward file name with '_R2' (typical in Illumina
   file names, see options -p/--pattern and -e/--repl);

3. after the merging of the paired reads, different samples will be
   merged in a single file and sample names will be appended to the
   sequence identifier (e.g. >SEQID;sample=SAMPLENAME), as in 'micca
   merge' and 'micca split'. Sample names are defined as the leftmost
   part of the file name splitted by the first occurence of '_'
   (-s/--sep option). Whitespace characters in names will be replaced
   with a single character underscore ('_').

micca mergepairs wraps VSEARCH (https://github.com/torognes/vsearch).
Statistical testing of significance is performed in a way similar to
PEAR (doi: 10.1093/bioinformatics/btt593). The quality of merged bases
is computed as in USEARCH (doi: 10.1093/bioinformatics/btv401).

optional arguments:
  -h, --help            show this help message and exit

arguments:
  -i FILE [FILE ...], --input FILE [FILE ...]
                        forward FASTQ file(s), Sanger/Illumina 1.8+ format
                        (phred+33) (required).
  -o FILE, --output FILE
                        output FASTQ file (required).
  -r FILE, --reverse FILE
                        reverse FASTQ file, Sanger/Illumina 1.8+ format
                        (phred+33).
  -l MINOVLEN, --minovlen MINOVLEN
                        minimum overlap length (default 32).
  -d MAXDIFFS, --maxdiffs MAXDIFFS
                        maximum number of allowed mismatches in the overlap
                        region (default 8).
  -p PATTERN, --pattern PATTERN
                        when the reverse filename is not specified, it will be
                        constructed by replacing 'PATTERN' in the forward file
                        name with 'REPL' (default _R1).
  -e REPL, --repl REPL  when the reverse filename is not specified, it will be
                        constructed by replacing 'PATTERN' in the forward file
                        name with 'REPL' (default _R2).
  -s SEP, --sep SEP     when the reverse file name is not specified, sample
                        names are appended to the sequence identifier (e.g.
                        >SEQID;sample=SAMPLENAME). Sample names are defined as
                        the leftmost part of the file name splitted by the
                        first occurence of 'SEP' (default _)
  --notmerged-fwd FILE  write not merged forward reads.
  --notmerged-rev FILE  write not merged reverse reads.

Examples

Merge reads with a minimum overlap length of 50 and maximum number
of allowed mismatches of 3:

    micca mergepairs -i reads1.fastq -r reads2.fastq -o merged.fastq \
    -l 50 -d 3

Merge several illumina paired reads (typically named *_R1*.fastq and
*_R2*.fastq):

    micca mergepairs -i *_R1*.fastq -o merged.fastq --notmerged-fwd \
    notmerged_fwd.fastq --notmerged-rev notmerged_rev.fastq