Question

Pysam Parses Certain Read Pairs Out Of Order?

4

Entering edit mode

14.4 years ago

User 9996 ▴ 840

hi all,

I am parsing a set of paired-end reads using Pysam. Before parsing my ".bam" file with Pysam, I make sort that it is sorted (by calling samtools sort), so that reads with smaller genomic coordinates should precede ones with larger genomic coordinates.

I found that when I parse certain read pairs, even though the "first" read end of a pair comes first in file, pysam marks it as read2 -- meaning is_read2 returns True for it -- while the "second" read end of a pair is marked as read1. For example, in the following SAM file:

HWUSI-EASXXX_0001:6:99:772:1104#0    147    10    98472853    255    36M    98472914    0    AGACAAGATTTGGCCAAAGCTTCGAGTACTTGCAAG    ggggegggggegggggdgdccggggggfggfggggf    NM:i:0
HWUSI-EASXXX_0001:6:99:772:1104#0    99    10    98472914    255    10M384N26M    =    98472853    0    CTGGTGAAAGGTATAATTGACAGCACAGTCTCAGAG    eWdfegdgeggfagggdgg_dgdggggggfgbe_eg    NM:i:0    XS:A:+    NS:i:0

The read that appears first in the file is the one with the smaller genomic coordinate (98472853), however, I find that is_read2 is true for that first read, while is_read1 is true for the second read (whose genomic coordinate is 98472914.)

Does this mean that this read pair is problematic, or is this a technical issue? Any advice on this would be greatly appreciated. thanks.

sam samtools next-gen sequencing python • 3.6k views

ADD COMMENT • link updated 6.2 years ago by Ram 44k • written 14.4 years ago by User 9996 ▴ 840

score 5 · Answer 1 · 2010-07-14

First and second reads are defined based on the output from the sequencing machine, not on their genomic position.

If the genomic insert you are sequencing with a pair is oriented in the forward direction, then you'll have the reads as you expect to see: with read1 first in the genome, and read2 second. In your case, the insert is oriented in the reverse direction so read2 is first in the genome and read1 follows.

Whether it is a problem depends on the construction of the library. If it was generated to be directional then you may want to filter out read pairs which are not in the designed orientation. If inserts are cloned so that they can be in either direction then the results you're seeing are expected.