Find mates between all alignments in a SAM file
1
0
Entering edit mode
9.4 years ago
Pfs ▴ 580

I am interested in finding actual pairs of mate reads in a SAM file, which possibly contains multiple alignments for the same pair of reads.

Can I assume that the QNAME field is the same for mates (except for /1 or /2)? If I have multiple alignments for the same pair, can I trust a quick check on the reciprocal PNEXT field?

Thanks in advance

RNASeq SAM • 7.7k views
ADD COMMENT
0
Entering edit mode

Hello Pfs!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=60951

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Sorry! I was just too eager to get an answer. Thanks for pointing it out.

ADD REPLY
3
Entering edit mode
9.4 years ago

Can I assume that the QNAME field is the same for mates (except for /1 or /2)?

Yes, QNAME IDs will be identical for mates. However, the suffix /1 or /2 are removed from the QNAME in the bam format. As a result, you can't just look at the QNAME and decide whether the read was the first in a pair or the second. You will have to use bitwise FLAG column (2nd column) to identify if the read was first in pair or second. Check out this link: https://broadinstitute.github.io/picard/explain-flags.html

PNEXT field will give you the location for the mate read.

Usually if the read shows multiple alignments, one of the alignment (best alignment score or randomly chosen if more than one best alignments are possible) is marked as primary alignment and the rest are marked as secondary. You should try to find for QNAME that have secondary alignments. You can try NH:i tag that gives you number of hits (alignments) for a read or you can exploit bitwise FLAG information. Normally most of the read-pair have one read that will show a unique alignment and the other read may have multiple alignments. So you will see only one entry for the uniquely aligned read and multiple entries for the other read. The number of entries to be output in the bam file is decided by the aligner but you can decide this number for most of the aligners. Read pair where both the reads show multiple alignments will be relatively less frequent.

ADD COMMENT
0
Entering edit mode

Thank you for your reply!

ADD REPLY

Login before adding your answer.

Traffic: 2615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6