Parsing BAM to genome feature
1
0
Entering edit mode
7.7 years ago
dzb • 0

Hi folks -- haven't had much success in figuring this out through Googling. Hoping for some help here:

My inputs are as follows: a BAM where each alignment has tags for a barcode and UMI. I also have an annotation file in .gff3 format.

My desired output is a table where each alignment in the BAM is broken down into three columns: barcode, UMI, and a given feature from the annotation file (gene, exon, feature, etc.).

Thanks to recent advice obtained here, I'm familiar with how to parse through the BAM file to pluck out the barcode and UMI, but the process of taking the alignment data and returning the genome feature is beyond my thought process. Any ideas? Many thanks.

RNA-Seq BAM SAM annotation • 1.7k views
ADD COMMENT
3
Entering edit mode
7.7 years ago

If you install deepTools, then you can use its API to do this:

from deeptoolsintervals import Enrichment
import pysam

gtf = Enrichment(["something.gtf", "you can use multiple files.gtf"])
bam = pysam.AlignmentFile("alignments.bam")
for b in bam:
    o = gtf.finderOverlaps(b.reference_name, b.get_blocks())

o would be something like frozenset(['start_codon', 'transcript', 'gene', 'exon', 'CDS']). You can find further information on the deeptoolsintervals module on github.

You'll need to combine this with something else to get the barcode and UMI, but it seems like you know how to do that already.

ADD COMMENT
0
Entering edit mode

AWESOME. Thanks for this!

ADD REPLY

Login before adding your answer.

Traffic: 1331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6