Question

High amount of intronic/intergenic reads in SMARTer stranded total bulk RNAseq

0

Entering edit mode

14 months ago

Mat ▴ 80

Hello,

I have bulk RNAseq data (SMARTer, stranded total RNA with ribo depletion, 100M paired-end reads 150bp) of 12 human samples and the QC stats look like this

High mapping rate against the genome (~90% with Hisat2/STAR)
Low mapping rate (3-30%, 7 samples <= 10%) against the transcriptome (using Salmon); also alignment-based quantification using STAR alignments as input didn't increase the mapping rate
According to Qualimap most of the reads map to the intronic region followed by the intergenic region, e.g.
- exonic: 8%
- intronic: 59%
- intergenic: 33%
- overlapping exon: 3%
After trimming with Fastp, around 65-75% of the reads map to the genome uniqely, and 15-25% reads are multimapping
The average input read and mapped length is ~280 according to STAR.

This is consistent across all 12 samples.

Are there other explanations than genomic DNA contamination for a high amount of intronic/intergenic reads and what else could I check?

A similar question was already asked before: High percentage of intronic/intergenic reads in RNA-seq

Thank you very much. MM

RNA-seq DNA SMARTer • 741 views

ADD COMMENT • link updated 14 months ago by Ram 44k • written 14 months ago by Mat ▴ 80

0

Entering edit mode

Sounds like genomic DNA contamination to me. Even if you had captured nascent (unspliced) RNA, you should still have a much higher coverage over exons.

If you want to check more things, take some of the reads and map them to genomic coordinates (i.e. a BAM file), and visualize on a genome browser.

ADD REPLY • link 14 months ago by dsull ★ 6.9k