Hello everyone,
I have a SAM file from Tophat and I want to extract multiple reads sharing the same intron boundary based on CIGAR string.
Hello everyone,
I have a SAM file from Tophat and I want to extract multiple reads sharing the same intron boundary based on CIGAR string.
If I understood the question correctly you'd like to access reads that span across the exon/intron boundary and contain the exon. Which makes it a bit tricker than a simple intersect.
You can't quite use the CIGAR string alone since that does not contain the coordinate. Working that out from the position would take some custom programming effort and would duplicate existing functionality in other libraries.
If you are able to use PySam the pileup
method on the last coordinate of the exon might work. It states:
An alternative way of accessing the data in a SAM file is by iterating over each base of a specified region using the pileup() method. Each iteration returns a PileupColumn which represents all the reads in the SAM file that map to a single base in the reference sequence.
http://pysam.readthedocs.io/en/latest/api.html
You will still need to check that the end of the alignment is past the coordinate.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.
I am not using TopHat, I already have SAM file from TopHat and I want to extract multiple reads sharing the same intron boundary based on CIGAR string.
Samtools Filter Reads Cigar Field
A: R: Readaligned Only Junction Reads From Bam-File