Difference between Pysam Pileup and Fetch
1
1
Entering edit mode
5.3 years ago
aalith ▴ 20

I've been attempting to use pysam to find how many reads support a particular SNP vs how many reads do not. I have not been able to exactly pinpoint what pysam pileup is doing. Is it exactly like the samtools function?

For example, to compare what exactly is focused on for each function, I've run this...

for reads in bam.fetch(chrom, start = pos, stop = pos + 1):
    fetch_reads.append(reads.cigarstring)

for refsection in bam.pileup(chrom, start = pos, stop = pos + 1):
    for reads in refsection.pileups:
        pileup_reads.append(reads.alignment.cigarstring)
        if not reads.is_refskip:
            pileup_reads_norefskip.append(reads.alignment.cigarstring)

print(len(fetch_reads))
print(len(pileup_reads))
print(len(pileup_reads_norefskip))

And I get wildly different numbers for what I think are the number of reads at the particular site I'm interested in. 81,157492, and 4856 respectively. Also, when I try to append get_num_aligned(reads) to a list (supposed to return the number of aligned bases at a certain pileup column position), pysam gives me back a list of lists.

pysam pileup reads bam • 2.8k views
ADD COMMENT
0
Entering edit mode
5.1 years ago

I had many troubles with those stuff too. If i remember correctly if you go for the "fetch" way after calling bam.fetch you also have to check if the outputted reads actually overlap the genomics position you searched for (which is not obvious if you deal with spliced reads maybe) since fetch will just report the inferred overlap looking at read start + read end, it does not guarantee you are really reading the desired base.

You can try something like this to see if the situation gets better:

import pysam
fetch_reads=[]
for read in bam.fetch(str(chr), int(position)-1, int(position)):
    if int(position)-1 in read.get_reference_positions():
        fetch_reads.append(read)

To better understand:

import pysam
position = yourPosition
chr = yourChr
bam=pysam.AlignmentFile(yourBam, "rb")


readnamesA=[]
for read in bam.fetch(str(chr), int(position)-1, int(position)):
    readnamesA.append(read.qname)

readnamesB=[]
for read in bam.fetch(str(chr), int(position)-1, int(position)):
    if int(position)-1 in read.get_reference_positions():
        readnamesB.append(read.qname)

testRead=list(set(readnamesA).difference(set(readnamesB)))[1]

for read in bam.fetch(str(chr), int(position)-1, int(position)): 
    if read.qname == testRead:
        print(read.positions)

You will probably see printed gapped genomics positions of the actually covered bases, among which you won't see your original fetched position

ADD COMMENT

Login before adding your answer.

Traffic: 3200 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6