unmapped reads with pysam
2
1
Entering edit mode
5.8 years ago

I have a bam file produces by BWA-MEM. It has some unmapped reads, e.g.

D00733:389:CD1T7ANXX:3:1101:1572:2235   77  *   0   0   *   *   0   0   CAGTTTCACTGTATAAATTGCTTATACTTAGACATGCATGGCTTAATCTT  AAB=AFGDGCFFGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGF  AS:i:0  XS:i:0
D00733:389:CD1T7ANXX:3:1101:1572:2235   141 *   0   0   *   *   0   0   GTATCTTCTAGAGAGAGGGAATGGGCGAGAGAAAAAGAGATTTCGGTTTC  BBB@BGGGGGGGGFGGGGGGGGEGGGGGGDGFGGGGGGGGEGGGGGFGGG  AS:i:0  XS:i:0
D00733:389:CD1T7ANXX:3:1101:6797:2243   77  *   0   0   *   *   0   0   TGTCTGGACCTGGTGAGTTTCCCCGTGTTGAGTCAAATTAAGCCGCAGGC  3A<0BDGGGGGGGGGGGGGFGGGGGFGGGGGGGGGGGGGGGGGGGGGGGG  AS:i:0  XS:i:0

I use pysam to count some stats on the bam file, but for some reason pysam does not find these unmapped reads.

bam=pysam.AlignmentFile(file,"rb")
for line in bam.fetch():
    line=line.tostring(bam)
    line=line.split("\t")
    if  line[2]=="*":
        print(line)

The code does not return anything

Any ideas how to fix this?

Thanks

pysam • 3.2k views
ADD COMMENT
3
Entering edit mode
5.8 years ago
Asaf 10k

you should add until_eof=True in the fetch()

ADD COMMENT
0
Entering edit mode

Accepted :) Thank you!

ADD REPLY
0
Entering edit mode
5 months ago
weisburd • 0

You can now get all unmapped read pairs efficiently by running AlignmentFile.fetch('*').

see: https://github.com/pysam-developers/pysam/issues/424#issuecomment-2192755497

AFAIK, fetch('*') only returns reads where both mates are unmapped. When a read pair has 1 mate that's mapped and one that's unmapped, the unmapped mate will be located next to their mapped mate in the file, so should be returned by the regular interval query (ie. fetch('chr1:12345-54321')). In contrast, the unmapped read pairs are located at the end of the BAM/CRAM file, and so require a special query.

ADD COMMENT

Login before adding your answer.

Traffic: 1392 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6