Hi,
How can I count the number of reads supporting a specific splice junction (e.g. chr1:15000-20000 where 15000 is the splice donor and 20000 is the splice acceptore position) in a RNA-Seq bam file aligned with STAR ?
Thanks
Hi,
How can I count the number of reads supporting a specific splice junction (e.g. chr1:15000-20000 where 15000 is the splice donor and 20000 is the splice acceptore position) in a RNA-Seq bam file aligned with STAR ?
Thanks
Star uses N's in CIGAR strings for reads that were spliced (over introns). You can use that to count total number of spliced reads.
If you want to do this for 1 specific regions I think its easiest to make a sashimi plot with IGV.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Doesn't STAR's .junc file provide that count?
Thanks Devon I forget about the SJ.out.tab file. However if someone do not have this file and only have the bam file. Any idea?
Ah, I'd forgotten the exact name, thanks.
If you don't have that then you'd have to follow Irsan's suggestion. Of course, this would likely mean needing to code something since I don't personally know anything that would do it. In theory, that should be possible with a bit of pysam to get the splice junctions. The simplest method would then be to write a script to output only splice junctions and then to pipe that to
sort
and thenuniq -c
followed byawk
to reformat. The performance might not be optimal, but for an ad hoc solution I figure that'd work.In the
*.Log.final.out
the number of introns/splices are recorded but I am not sure if this this is the number of spliced reads. I have checked the manual but couldn't find an answer. A read can be spliced multiple times of course.