Error Filtering BAM file based on mapping qualities in python's pysam
1
1
Entering edit mode
8.1 years ago
ammarsabir15 ▴ 70

I am using this code to filter a bam file for mapping quality but its not working rightly. Instead of producing a new file it is giving strange characters.

pysam.view("-q", "30","-b",filename,"pysam.bam")

Where filename is the name of bam file i.e 'reads.sorted.bam'

Output:

\x1f\x8b\x08\x04\x00\x00\x00\x00\x00\xff\x06\x00BC\x02\x00\xbf\x00sr\xf4e\x9c\xc1\xc0\xc0\xe0\xe0\xe1\xc2\x19\xe6ge\xa8g\xcc\x19\xeco\x95\x9c '....


How can I get a filtered file based on mapping qualities using above code??

next-gen alignment pysam • 3.8k views
ADD COMMENT
0
Entering edit mode

your output looks like a binary header of a valid BAM :

~$ hexdump input.bam 
0000000 8b1f 0408 0000 0000 ff00 0006 4342 0002
(...)
ADD REPLY
0
Entering edit mode

The complete output of my above python script is given below: \x1f\x8b\x08\x04\x00\x00\x00\x00\x00\xff\x06\x00BC\x02\x00\xbf\x00sr\xf4e\x9c\xc1\xc0\xc0\xe0\xe0\xe1\xc2\x19\xe6ge\xa8g\xcc\x19\xeco\x95\x9c\x9f_\x94\x92\x99\x97X\x92\xca\xe5\x10\x1c\xc8\x19\xecg\x95\x9eYcdjhalhhVS\x94\x9aV\xe3\xe7\x1cohdid\xa0gX\xc3\xe9\x03\xd4ifjf\xc9\xe5\x10\xe0\xce\xe9\xe9b\x95T\x9e\xc8\x19\xe0\x07\xa6\x80\x86\x1a\xe8\x99\xeb\x19\x1a\xe9\x16\x19\x1a\x18[r:\xfb\x80\x84\x15\x8a\x13s\x8bS\x15\x8a\x92\x8b\x8a\xf5\xd2\x12\x15\x12s\xf2",,\xf4\x8a\x133\x15"\n\x8aR\x8b\x8b\x81\x9c\xb4\xc4\xe2\x92B.F\xa0\xdb\xe4\x80\x18\xaf\xfd\x0c;\x1d\x18\x18\x00\x99\x18\x95\x8c\xca\x00\x00\x00\x1f\x8b\x08\x04\x00\x00\x00\x00\x00\xff\x06\x00BC\x02\x00\x1b\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00'

Don't know whether it looks like header or not because your command for header produces larger no of lines, as given below

hexdump reads.sorted.bam | head

output:

0000000 8b1f 0408 0000 0000 ff00 0006 4342 0002 0000010 00bf 7273 65f4 c19c c0c0 e0e0 c2e1 e619 0000020 6567 67a8 19cc 6fec 9c95 5f9f 9294 9799 0000030 9258 e5ca 1c10 19c8 67ec 9e95 6359 6a64 0000040 6168 686c 6068 5356 9a94 e356 1ce7 606f 0000050 6468 6469 67a0 c358 03e9 69d4 6a66 c966 0000060 10e5 cee0 e9e9 9562 9e54 19c8 07e0 80a6 0000070 1a86 99e8 19eb e91a 1916 181a 725b fb3a 0000080 8480 8a15 7313 538b 8a15 8b92 f58a 12d2 0000090 1215 f273 2c22 f42c 138a 1533 0a22 528a



hexdump reads.sorted.bam |wc -l

output:

229087

ADD REPLY
0
Entering edit mode
8.1 years ago
John 13k

So you've got bgzipped data but you're missing the BAM header. You do have the BAM EOF line though. I have a sneaking suspicion that in your code you are over-writing the file rather than appending to it.

Doesn't pysam have an example of how to write new BAMs on their read_the_docs thing?

ADD COMMENT
0
Entering edit mode

It is three line code as shown in my post, and yes there is a way in pysam docs to create a new file that is given below ,

    header = { 'HD': {'VN': '1.0'},
   'SQ': [{'LN': 1575, 'SN': 'chr1'},
   {'LN': 1584, 'SN': 'chr2'}] }
   with pysam.AlignmentFile(tmpfilename, "wb", header=header) as outf:
   a = pysam.AlignedSegment()
   a.query_name = "read_28833_29006_6945"
   a.query_sequence="AGCTTAGCTAGCTACCTATATCTTGGTCTTGGCCG"
   a.flag = 99
   a.reference_id = 0
   a.reference_start = 32
   a.mapping_quality = 20
   a.cigar = ((0,10), (2,1), (0,25))
   a.next_reference_id = 0
   a.next_reference_start=199
   a.template_length=167
   a.query_qualities = pysam.qualitystring_to_array("<<<<<<<<<<<<<<<<<<<<<:<9/,&,22;;<<<")
   a.tags = (("NM", 1),
   ("RG", "L1"))
   outf.write(a)

But being a beginner I am not getting how to use pysam.view with this command.

ADD REPLY
0
Entering edit mode

Ahh, ok, i see whats going on, you're not actually writing it to a file anywhere. Turns out pysam.view is an undocumented way to call samtools view with a bunch of parameters. In otherwords, you're not really using pysam in your code, you're just asking samtools to do the filtering using -q.

So to change it to write something out, do:

pysam.view("-q", "30","-b",filename,"-o","pysam.bam")

Now pysam will return with "" or an empty string, because it's just wrapping open. But note that this is a really bad thing. Pysam has totally confused you here i feel. If you want to use samtools to filter, use samtools and call it with a standard python subprocess. That is the "one correct way" to subprocess another program. Pysam shouldn't offer this functionality.

However, you can also replicate -q 30 in pysam, but looking at the a.mapping_quality, and if it's >= 30 writing the read to a new file (using all the code above).

I highly recommend you do that, as you'll learn more about using python and pysam, and then when you want to do something like only write out reads with a quality between X and Y, you can use pysam and not two samtools view commands.

ADD REPLY
0
Entering edit mode

John I can use subprocess but just checking if there is a way to do so in python but it seems that can only go with subprocess because when I use a.mapping_qualities then other errors arise like pysam.calignmentfile.AlignmentFile' object has no attribute 'mapping_quality'

ADD REPLY
0
Entering edit mode

Uh-huh, well, that's a different problem. You are asking how to use pysam to read data from a BAM file. I can't really help you there because i've forgotten how pysam works so i'd have to check the manual, and really you should be doing that.

If you can write a program that prints the mapping quality for each read in the file, you're 3/4 of the way there.

ADD REPLY
1
Entering edit mode

By God I finally cracked it, John thanks for guidance.

import pysam
samfile = pysam.Samfile( "reads.sorted.bam", "rb" )
reverseS=''
forwardS='' 
for pileupcolumn in samfile.pileup():    
           if (pileupcolumn.n <= 3) : # if coverage is less than 3 
                  continue
           else:
                  print("\ncoverage at base %s = %s" % (pileupcolumn.pos,pileupcolumn.n))
           for pileupread in pileupcolumn.pileups:
                  if (pileupread.alignment.mapping_quality <= 25):
                         continue
                  if not pileupread.is_del and not pileupread.is_refskip:                
                         if pileupread.alignment.is_reverse: #negative strand hit
                                    reverseS = pileupread.alignment.query_sequence[pileupread.query_position]            
                                    print("Rev %s" % (reverseS))             
                         else:  
                            forwardS = pileupread.alignment.query_sequence[pileupread.query_position]  
                                    print("Fwd %s" % (forwardS))
samfile.close()
ADD REPLY

Login before adding your answer.

Traffic: 3491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6