Hi Biostars community,
I have bam files with RNA-seq results from ~1000 of single cell samples. For every transcript in each sample, I would like to calculate what percentage of spliced transcript's length is covered by aligned reads. Besides, It would be great to have this value for coding part of transcripts only, or, ideally, for every exon.
What would be the most straightforward way to do this?
Thanks a lot!
To rephrase: per sample, you want to count how many reads overlap with an exon? In that case: featureCounts. Calculating a percentage afterward shouldn't be too hard, e.g. using R/Python/....
Agree with @WouterDeCoster, you can try to use featureCounts. You need to get an annototation file (GTF file), in featureCounts use isGTFAnnotationFile = TRUE (to set up your GTF file), GTF.featureType = "exon" (for read summarization), GTF.attrType = "exon_id" (to group features).
Otherwise, to quantify abundances of transcripts from RNA-Seq data, Kallisto could be an other way to do the trick (check this here https://pachterlab.github.io/kallisto/manual).