Question

Pysam pileup dropping reads

0

Entering edit mode

5.2 years ago

yryan ▴ 10

Hi folks,

I'm trying to use pysam's pileup in order to count some base changes occurring at certain points in my transcript of interest using MinION data. However my coverage at each base is often three times as large as the total of reads in my final counts and I'm trying to figure out this discrepancy.

My code is the following:

def stats_4605(input_file):
A = 0
T = 0
C = 0
G = 0
total = 0
samfile = pysam.AlignmentFile(input_file, "rb")
for pileupcolumn in samfile.pileup("ref", 4604, 4605):
    if pileupcolumn.pos != 4604:
        continue
    else:
        print(input_file, "\nCoverage at base %s = %s" %
               (pileupcolumn.pos, pileupcolumn.n))
        for pileupread in pileupcolumn.pileups:
            total += 1
            if not pileupread.is_del and not pileupread.is_refskip:
                if pileupread.alignment.query_sequence[pileupread.query_position] == "A":
                    A += 1
                elif pileupread.alignment.query_sequence[pileupread.query_position] == "T":
                    T += 1
                elif pileupread.alignment.query_sequence[pileupread.query_position] == "C":
                    C += 1
                elif pileupread.alignment.query_sequence[pileupread.query_position] == "G":
                    G += 1
        print("Total counted ", total)
        print("A: ", A, "T: ", T, "C: ", C, "G: ", G)
        print("T:A ratio = ", T/A)
        print('')

samfile.close()

An example of the output I get is:

Coverage at base 4604 = 11547
Total counted  4502
A:  3063 T:  577 C:  4 G:  301
T:A ratio =  0.18837740777015996

So far I think the problem is with the line if not pileupread.is_del and not pileupread.is_refskip:

Deleting pileup.is refskip shows no changes, but pileupread.is_del gives an error message:

    Traceback (most recent call last):
  File "/home/user/PycharmProjects/samp_comp_extractor/4605_a_vs_t.py", line 36, in <module>
    stats_4605("6hr_mapped_sorted.bam")
  File "/home/user/PycharmProjects/samp_comp_extractor/4605_a_vs_t.py", line 20, in stats_4605
    if pileupread.alignment.query_sequence[pileupread.query_position] == "A":
TypeError: string indices must be integers

Any idea if these are related or how I could get around these issues? I've tried generating a VCF for these reads but can't get it to produce an AF segment and would prefer more in depth numbers for my use case.

Cheers!

pysam pileup • 2.7k views

ADD COMMENT • link 5.2 years ago by yryan ▴ 10

0

Entering edit mode

The problem with the code is that pileupread.query_position is not an integer. Not sure why.

ADD REPLY • link 5.2 years ago by i.sudbery 21k

0

Entering edit mode

Oh.... Because if the base is deleted from the "query" (i.e. the read), then it has no position in the query_sequence.

ADD REPLY • link 5.2 years ago by i.sudbery 21k

0

Entering edit mode

Eyeballing it in a genomeviewer there are some deleted bases but looks do be much closer to 1 in 10 reads rather than 5-6/10 reads which my script is returning. I'm not sure if its a nanopore issue since it is only single reads and does have what looks to be many deletions but not enough over my region of interest to cause this drop in reads

ADD REPLY • link 5.2 years ago by yryan ▴ 10

0

Entering edit mode

Yeah, sorry, my comment was over what was causing the error from your code, not what was causing the underlying difference in read counts.

I noticed from the pysam documentation that n() includes bases that don't pass the quality filter. Perhaps pileups excludes these? What does get_num_aligned() return?

ADD REPLY • link 5.2 years ago by i.sudbery 21k

0

Entering edit mode

It gives me the same number that my print("Total counted ", total) line gives me. But in the docs it says :

This method applies a base quality filter and the number is equal to the size of get_query_sequences(), get_mapping_qualities(), etc.

This makes me think its a quality score issue, I'll have a dig around seeing if its possibly to set that cutoff lower but if you have any advice there that would be great!

ADD REPLY • link 5.2 years ago by yryan ▴ 10

0

Entering edit mode

I've found set_min_base_quality(self, min_base_quality) and added that to my code on line 15 just after the else which now works perfectly. Thanks for helping point me in the right direction!

ADD REPLY • link 5.2 years ago by yryan ▴ 10