Dear all,
I have indexed bam file, and I want to print all alignments on chromosome "2" using pysam.
My code is
import pysam
bam = pysam.AlignmentFile("Aligned.sortedByCoord_rep1.out.bam", "rb")
for line in bam.fetch("2"):
print line
What I get is:
D00733:162:CADM2ANXX:2:1311:7466:63232 163 18 3328 255 1S47M2S 18 3419 47 TTGCTTAGTGTCCGAAATACCATCCTCAAGGCTAAGAACTAAATCGATTA array('B', [33, 33, 33, 33, 33, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38]) [('NH', 1), ('HI', 1), ('AS', 70), ('nM', 10)]
D00733:162:CADM2ANXX:2:1215:11810:89291 163 18 3356 255 44M6S 18 3441 44 GGCTAAGAACTAAATCGATTATTCTGGCTCGTAACGCATATAATATGGCA array('B', [33, 33, 33, 32, 33, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 35, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 37, 38, 38, 38, 38, 38, 38, 33]) [('NH', 1), ('HI', 1), ('AS', 70), ('nM', 10)]
So there are 2 questions: 1. Why pysam does not preserve format of bam file? 2. Why does it print out chromosome 18 instead of 2?
I would appreciate if somebody points out on what am I doing wrong. Thanks
Thanks for your reply, now it makes sense. My original task is a bit different from what I have posted (In my post I wanted to check whether I do everthing correct).
What I need to do is read everyline of bam file, and save only multimap alignments. So what I was palnning to do is following:
So it seems that this method of pysam will not work for me, will it? Are any other ways of reading bam file line by line? Thank you
As i.sudbery metion in his response, the
has_tag
andget_tag
objects will resolve your issue