Question

SAM format paired-end alignment handling

0

Entering edit mode

9.6 years ago

darxsys ▴ 240

This is probably a beginner question, but I am not sure how to go about this. I am using bowtie2 to align RNA-Seq Illumina paired-end yeast reads to transcripts reconstructed from those reads using Trinity.

Bowtie produces alignments in the SAM format. I've read SAM specification in detail, but I still don't know how to properly read the alignments of these paired-end reads to group the mates in the same alignment together. For example, in the SAM file, the first mate alignment record could be written, then some other read alignments could be written and then the second mate alignment could follow them. How do I know that this second mate alignment record belongs to the same alignment as the first mate record? This bothers me since some reads could map well to multiple positions and then I have no idea how to distinguish between different alignments of the same read and its mates if the alignments for corresponding mates are not written exactly one after the other in the SAM file (according to the SAM paper, they don't have to be). I hope this is clear enough.

RNA-Seq paired-end SAM • 7.3k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.6 years ago by darxsys ▴ 240

Ram · Accepted Answer · 2015-05-26

2

Entering edit mode

9.6 years ago

Devon Ryan 105k

Have a look at the 7th and 8th columns. The 7th column (RNEXT) gives the reference contig of the mate. The 8th column (PNEXT) gives the mapping position of the mate. Those two combined should allow you to properly match mates. I should note that it's possible to have multiple mappings for only one of the reads in a pair, in which case only one of them will have multiple entries (and their PNEXT and RNEXT values will all be the same).

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.6 years ago by Devon Ryan 105k

0

Entering edit mode

Thank you. I am wondering what do you mean by the fact that only one of the reads can have multiple mappings? How does that look like?

By multiple mappings, I was thinking of a case where the paired-end read (both mates concordantly) maps to different positions on the same transcript equally well, or similarly well. That is not possible? If not, how is it possible that only one mate can have multiple alignments? What happens with the other one then?

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.6 years ago by darxsys ▴ 240

1

Entering edit mode

I didn't say that only one mate in a pair can have multiple mappings. Rather, I said that such cases are possible. Typically this will happen when you have one read mapping to a unique sequence and the other to a repetitive element. The case you mentioned will also happen and you'll have to use the strategy I outlined in my answer to pair things properly.

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.6 years ago by Devon Ryan 105k