I am new to bioinformatics and currently learning how to use Bowtie 2. As written in the manual:
A pair that aligns with the expected relative mate orientation and with the expected range of distances between mates is said to align "concordantly". If both mates have unique alignments, but the alignments do not match paired-end expectations (i.e. the mates aren't in the expected relative orientation, or aren't within the expected distance range, or both), the pair is said to align "discordantly".
I have read about the basics of paired-end sequencing and orientation (in the molecular biology sense). In summary, my understanding is that in paired-end sequencing we sequence both ends of a DNA fragment at the 5' end and the 3' end (we call them mate 1 and mate 2) and by knowing the expected length between the mates we can better align the fragment to a reference genome.
My question is, what is it meant by two mates having an expected relative orientation? If am using Bowtie 2 and giving it a file with all the first mates and another one with all the corresponding second mates, how can two mates align without first having the expected orientation?
I don't think that this is the question. Mate-Pair and Paired-End are COMPLETELY different. One (Mate-Pair) is a library construction method. The other (Paired-End) is a method of sequencing. The confusion arises from Pyro/454/Roche terminology wherein they used the term "mate-pair" to describe what Illumina now calls "paired-end" sequencing. The distinction was made (by Illumina) when the technique of constructing "MATE-PAIR" libraries was introduced and when Illumina introduced sequencing from both ends "PAIRED-END".
To further clarify; on Illumina platforms, paired-end sequencing is sequenced as depicted in the illustration. The reads come in TWO files (R1 and R2). The ORIENTATION of the reads is (R1)FORWARD (R2)REVERSE. However, both files will have the reads written 5'----3' orientation.
For example, If you make an interleaved file, the R1 read would be FORWARD and the R2 read would be REVERSE.
The orientations are determined with regards to their 3' and 5' ends? The mapper than supposes that for the given platform the reads should align in a particular orientation, and if they map differently, states a warning / error?
Does the mapper (like bwa for instance) than check for all mapping combinations (
-><-
,->->
,<-<-
,<-->
) or does it fix the first seq from the pair and rotate the other seq?Since the sequence in a fastq file is always 5'->3', the orientation is determined by the relative positions of the alignments and which, if either, is reverse complemented. If they don't align with the proper orientation then bit 0x2 in the flag field won't be set.
How the search is performed will vary by aligner. Bowtie2, for example, will search for concordant pairs first (at least if memory serves).
To clarify what you mean by reads "pointing" to each other, take the following example:
Are these two reads pointing to each other?
The second read wouldn't align to the given stretch of DNA; read1 would map as unpaired.
How then would two reads be pointing to each other? Would it be like in the following example?
I am just having trouble understanding what "pointing to each other" means in this context.
If the original sequence of read1 was
ACAAGATGCCATTGTCCC
and that for read2 wasGGGCAGCGGTGGCCGTGG
, then yes.How did you come up with read 2?
It's the reverse complement of the sequence you posted. I'm guessing that your background isn't biology :)
You guessed right! I just have one more question, why the complement and not sequence itself?
DNA is double stranded, with the opposite strand being the complement. So, if one strand is
Since sequence is always presented 5' to 3', you need the reverse complement. You'd be well served to take a few biology classes, you're life would be easier.
http://www.cureffi.org/2012/12/19/forward-and-reverse-reads-in-paired-end-sequencing/
Hi, I have a little doubt about that. I have a dataset of Illumina paired-end strand-specific reads. I assembled it de novo using Trinity, and now I'm assessing its quality using several metrics. One of them is align the initial reads with the new contigs. For that I'm using Bowtie2; but I wonder: Given a forward read and a contig, does Bowtie2 try to align the read in both strands? I always assumed that the answer is yes, but just to be sure.