This is probably a beginner question, but I am not sure how to go about this. I am using bowtie2 to align RNA-Seq Illumina paired-end yeast reads to transcripts reconstructed from those reads using Trinity.
Bowtie produces alignments in the SAM format. I've read SAM specification in detail, but I still don't know how to properly read the alignments of these paired-end reads to group the mates in the same alignment together. For example, in the SAM file, the first mate alignment record could be written, then some other read alignments could be written and then the second mate alignment could follow them. How do I know that this second mate alignment record belongs to the same alignment as the first mate record? This bothers me since some reads could map well to multiple positions and then I have no idea how to distinguish between different alignments of the same read and its mates if the alignments for corresponding mates are not written exactly one after the other in the SAM file (according to the SAM paper, they don't have to be). I hope this is clear enough.
Thank you. I am wondering what do you mean by the fact that only one of the reads can have multiple mappings? How does that look like?
By multiple mappings, I was thinking of a case where the paired-end read (both mates concordantly) maps to different positions on the same transcript equally well, or similarly well. That is not possible? If not, how is it possible that only one mate can have multiple alignments? What happens with the other one then?
I didn't say that only one mate in a pair can have multiple mappings. Rather, I said that such cases are possible. Typically this will happen when you have one read mapping to a unique sequence and the other to a repetitive element. The case you mentioned will also happen and you'll have to use the strategy I outlined in my answer to pair things properly.