Finding bases at specific sites using pysam pileup
3
0
Entering edit mode
9.7 years ago
kris_A ▴ 40

Hi,

I am trying to write a python script that finds bases at specific sites in each read (aligning to the site). I am using pysam pileup, but cannot figure out how to use it properly. Could someone help me out with this?

the input data is a list of coordinates and a samfile = pysam.AlignmentFile("bam", "rb")

I just want to iterate through each read in a bam aligning to a single base site, and extract bases at each of the sites.

I tried doing this, but it's outputting each base multiple times.

for pileupcolumn in samfile.pileup('1', 14693, 14694):
    for pileupread in pileupcolumn.pileups:
        print pileupread.alignment.query_sequence[pileupread.query_position]

Thank you!

Kristina

sequence • 12k views
ADD COMMENT
4
Entering edit mode
9.7 years ago
kris_A ▴ 40

I found a solution. Ended up reading-in bam file using pysam.Samfile, and then just extracted sequence fragment at pileupcolumn.pos.

import pysam
samfile = pysam.Samfile(bam, "rb")
for pileupcolumn in samfile.pileup('1', 14693, 14694):
    for pileupread in pileupcolumn.pileups:
        if pileupcolumn.pos == 14693:
            base = pileupread.alignment.query_sequence[pileupread.query_position]
            print base
samfile.close()
ADD COMMENT
2
Entering edit mode
9.7 years ago

I am assuming that you would like to see the base present at 14694 position from all the reads.

Something like this would do:

bamfile=pysam.Samfile("input_bam.bam")

for read in bamfile.pileup('1', 14693, 14694):
    print read.seq[14694-read.pos]

Not tested. You need to take care of forward, reverse reads POS and 0/1 based coordinates and Mainly insertions/deletions.

ADD COMMENT
0
Entering edit mode
9.7 years ago
kris_A ▴ 40

Hi, thanks, I tried, and I'm getting an error (the same applies when I use query_sequence instead of seq

AttributeError: 'pysam.calignmentfile.PileupColumn' object has no attribute 'seq'

As much as I understand, I have to iterate over the reads (pileups) in the pileupcolumn, but then I get the extracted bases repeated many times. Any other ideas? Thank you.

ADD COMMENT
0
Entering edit mode

Can you post entire script?

ADD REPLY
0
Entering edit mode

My initial attempt is posted in my first message, and now I just directly plugged in what you posted earlier. when I got an "attribute is not present" I changed it to

for read in bamfile.pileup('1', 14693, 14694):
    print read.query_sequence[14694-read.query_position]

but that was a useless attempt

ADD REPLY
0
Entering edit mode

The reason I asked for entire script is to see how you are reading the bam file.

pysam.AlignmentFile() or pysam.Samfile() ? If you are using pysam.AlignmentFile(), then try something like this:

for read in bamfile.fetch('1', 14693, 14694):
    print read.seq[14694-read.pos]
ADD REPLY
0
Entering edit mode

So I am using pysam.AlignmentFile(), and tried what you suggest, but now I get:

IndexError: string index out of range
ADD REPLY
0
Entering edit mode

Hi kris_A, Did you solve the problem? As far as I know, if I want the retrieve the base at a specific position of the read by something like the following code. The read.pos is supposed to be the order of the specific read instead of the position of reference.

for read in bamfile.fetch('1', 14693, 14694):
    print read.seq[14694-read.pos]
ADD REPLY

Login before adding your answer.

Traffic: 1571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6