A new day is coming,whether we like it or not. The question is will you control it,or will it control you?

Mate-pair Reads Alignment

Posted on 2016-11-25 | In Bioinformatics | Comments: | Views: ℃

| Words count in article: | Reading time ≈

文库类型

对于基因组文库我们一般会建小库（<1k）的**paired-end reads="" (l-=""> <-R) 和大库的 mate-pair reads(<-L R->)，二者最主要的区别就是reads1和reads2的方向和之间的间隔大小。

现在绝大部分的主流软件都是支持将paired-end reads进行比对的，那么 mate-pair reads如何处理呢，即 mate-pair reads**如何做比对？

reverse complement

When done standard Illumina MP preps, reverse complemented with fastx-toolkit and aligned with standard parameters using bwa/bowtie.

fastx-toolkit reverse complement

FASTQ/A Reverse Complement

$ fastx_reverse_complement -h
usage: fastx_reverse_complement [-h] [-r] [-z] [-v] [-i INFILE] [-o OUTFILE]

version 0.0.6
   [-h]         = This helpful help screen.
   [-z]         = Compress output with GZIP.
   [-i INFILE]  = FASTA/Q input file. default is STDIN.
   [-o OUTFILE] = FASTA/Q output file. default is STDOUT.

bowtie2

也可通过设置bowtie2的—fr/—rf/—ff、-I、-X参数来进行比对。

Aligning pairs

A “paired-end” or “mate-pair” read consists of pair of mates, called mate 1 and mate 2. Pairs come with a prior expectation about (a) the relative orientation of the mates, and (b) the distance separating them on the original DNA molecule. Exactly what expectations hold for a given dataset depends on the lab procedures used to generate the data. For example, a common lab procedure for producing pairs is Illumina’s Paired-end Sequencing Assay, which yields pairs with a relative orientation of FR (“forward, reverse”) meaning that if mate 1 came from the Watson strand, mate 2 very likely came from the Crick strand and vice versa. Also, this protocol yields pairs where the expected genomic distance from end to end is about 200-500 base pairs.

Paired-end options

-I/—minins
The minimum fragment length for valid paired-end alignments. E.g. if -I 60 is specified and a paired-end alignment consists of two 20-bp alignments in the appropriate orientation with a 20-bp gap between them, that alignment is considered valid (as long as -X is also satisfied). A 19-bp gap would not be valid in that case. If trimming options -3 or -5 are also used, the -I constraint is applied with respect to the untrimmed mates.
The larger the difference between -I and -X, the slower Bowtie 2 will run. This is because larger differences bewteen -I and -X require that Bowtie 2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very efficient.
Default: 0 (essentially imposing no minimum)

-X/—maxins
The maximum fragment length for valid paired-end alignments. E.g. if -X 100 is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as -I is also satisfied). A 61-bp gap would not be valid in that case. If trimming options -3 or -5 are also used, the -X constraint is applied with respect to the untrimmed mates, not the trimmed mates.
The larger the difference between -I and -X, the slower Bowtie 2 will run. This is because larger differences bewteen -I and -X require that Bowtie 2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very efficient.
Default: 500.

—fr/—rf/—ff
The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. E.g., if —fr is specified and there is a candidate paired-end alignment where mate 1 appears upstream of the reverse complement of mate 2 and the fragment length constraints (-I and -X) are met, that alignment is valid. Also, if mate 2 appears upstream of the reverse complement of mate 1 and all other constraints are met, that too is valid. —rf likewise requires that an upstream mate1 be reverse-complemented and a downstream mate2 be forward-oriented. —ff requires both an upstream mate 1 and a downstream mate 2 to be forward-oriented. Default: —fr (appropriate for Illumina’s Paired-end Sequencing Assay).

Novoalign

tiramisutes wechat

欢迎关注

Post author: tiramisutes
Post link: http://tiramisutes.github.io/2016/11/25/mate-pair-reads-Aligner.html
Copyright Notice: All articles in this blog are licensed under BY-NC-SA unless stating additionally.

tiramisutes

hope bioinformatics blog

GitHub E-Mail Weibo Twitter

文库类型
reverse complement
1. fastx-toolkit reverse complement
bowtie2
Novoalign

0%