understanding pysam.AlignmentFile output
1
0
Entering edit mode
4.9 years ago
bitpir ▴ 250

I am new to pysam and I'm trying to figure out the output of each column of pysam.AlignmentFile. My code is as follow:

for b in pysam.AlignmentFile(bam,'rb'):
    print (b)

The output I see is this:

HWI-ST1285:282:C8MPEACXX:6:2211:11377:16289 163 0   10281   12  31S29M40S   0   10369   29  CCCTCACCCTCACCCTCACCCTCACCCTCACCCCTACCCCCAACCCCAACCCCAACCCCACACCCCACCCCAAACCCACCACCCACACCTCCCCCCACCC    array('B', [34, 34, 31, 37, 37, 37, 35, 37, 30, 35, 39, 39, 35, 40, 38, 31, 37, 34, 38, 38, 40, 36, 38, 35, 37, 39, 40, 40, 8, 34, 38, 33, 35, 31, 31, 30, 8, 23, 33, 37, 39, 33, 33, 37, 14, 20, 31, 26, 31, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]) [('NM', 1), ('MD', '5A23'), ('AS', 24), ('XS', 35), ('RG', '6_AD016')]
HWI-ST1285:282:C8MPEACXX:6:2211:11377:16289 83  0   10369   25  58M1I12M1D29M   0   10281   100 CACCCTCACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA    array('B', [2, 2, 33, 35, 32, 27, 27, 18, 33, 34, 30, 32, 23, 7, 33, 35, 33, 33, 31, 28, 33, 32, 32, 27, 24, 18, 6, 25, 33, 34, 32, 26, 29, 21, 32, 33, 30, 30, 26, 13, 36, 39, 34, 35, 28, 22, 32, 40, 38, 34, 40, 37, 40, 38, 37, 28, 30, 21, 41, 39, 38, 35, 39, 37, 33, 38, 39, 39, 35, 37, 36, 34, 38, 39, 36, 34, 34, 39, 37, 38, 36, 36, 36, 40, 40, 38, 38, 32, 35, 39, 37, 37, 26, 33, 32, 37, 35, 27, 30, 31])    [('NM', 5), ('MD', '0A5A63^C28C0'), ('AS', 78), ('XS', 78), ('RG', '6_AD016')]

However, when I use samtools view to view bam, I got this:

 HWI-ST1285:282:C8MPEACXX:6:2211:11377:16289    163 chr1    10282   12  31S29M40S   =   10370   188 CCCTCACCCTCACCCTCACCCTCACCCTCACCCCTACCCCCAACCCCAACCCCAACCCCACACCCCACCCCAAACCCACCACCCACACCTCCCCCCACCC    CC@FFFDF?DHHDIG@FCGGIEGDFHII)CGBD@@?)8BFHBBF/5@;@###################################################    NM:i:1  MD:Z:5A23   AS:i:24 XS:i:35 RG:Z:6_AD016

HWI-ST1285:282:C8MPEACXX:6:2211:11377:16289 83  chr1    10370   25  58M1I12M1D29M   =   10282   -188    CACCCTCACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA    ##BDA<<3BC?A8(BDBB@=BAA<93':BCA;>6AB??;.EHCD=7AIGCIFIGF=?6JHGDHFBGHHDFECGHECCHFGEEEIIGGADHFF;BAFD<?@    NM:i:5  MD:Z:0A5A63^C28C0   AS:i:78 XS:i:78 RG:Z:6_AD016

I notice that the 9th column is different (29 and 100 from pysam and 188 and -188 from samtools view). Why is this so? Also, from the samtools view output, I can see that the difference between 10370 - 10282 is 88, but why is 100 being added to the 9th column (template length) to yield 188?

Thanks for your help!

pysam sam bam • 2.6k views
ADD COMMENT
0
Entering edit mode
4.9 years ago
onestop_data ▴ 330

Hi I debugged it on my side, and I concluded that the 9th column on pysam will be the alignment length (try b.alen in your for loop), but if you read the SAM manual it says that the 9th column is the template length (which you can get from on the pysam object with b.template_length)

The bottom line, you are good to go it is just the order or which column is shown. Install some IDE such as Pycharm and it should be easy for you to debug things like this.

Let me know if it is clear.

ADD COMMENT

Login before adding your answer.

Traffic: 1667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6