Question

High percentage of intronic/intergenic reads in RNA-seq

0

Entering edit mode

2.3 years ago

lluc.cabus ▴ 20

Hello,

I'm working with RNA-seq libraries coming from human plasma cell-free RNA, which is very fragmented/degraded (the bioanalyzer shows peaks around 100-200bp). When performing the alignment of those sequences we see that the majority of the reads map to introns/intergenic regions and we obtain less than 10% of the reads mapping to exons.

Our libraries are total RNA with rRNA depletion.

Seeing this, we thought that it was DNA contamination, but after performing a DNAse treatment (and ensuring that there is no DNA by measuring it by Qubit), the samples don't seem to improve (we see the same exonic percentage). Also, when checking the bam files in the UCSC we see that there is a lot of reads that map all over the genome, which is characteristic for DNA contamination.

Could there be another explanation than DNA contamination for a high quantity of intronic/intergenic reads?

Thank you very much! Lluc

RNA-seq • 3.3k views

ADD COMMENT • link updated 2.3 years ago by yhoogstrate ▴ 150 • written 2.3 years ago by lluc.cabus ▴ 20

0

Entering edit mode

What is the length distribution of your reads after scanning/trimming? You knew that the sample was fragmented/degraded. Perhaps they have become too small post-trimming and are simply aligning by chance. FastQC trace will be enough.

ADD REPLY • link 2.3 years ago by GenoMax 148k

0

Entering edit mode

We also thought about that, but with the STAR output we see that the majority of the aligned reads are uniquely aligned. Checking for the avg_input_read_length from STAR it is >150, so I don't think this is the reason.

ADD REPLY • link 2.3 years ago by lluc.cabus ▴ 20

0

Entering edit mode

Just to confirm. You are getting 150 bp unique read matches in intronic/intergenic regions? At what depth?

ADD REPLY • link 2.3 years ago by GenoMax 148k

0

Entering edit mode

The avg_mapped_read_length is very similar to the avg_input_read_length (slightly higher). I don't know if there is a difference in mapping between exonic and intronic/intergenic regions, I will check that

ADD REPLY • link 2.3 years ago by lluc.cabus ▴ 20

0

Entering edit mode

If it is a read/few reads then may be ok but if you are seeing equivalent pileups as real data then that is puzzling. Any chance you are dealing with a contaminated batch of reagents somewhere?

ADD REPLY • link 2.3 years ago by GenoMax 148k

0

Entering edit mode

I don't think so, because this has happened with multiple batches of samples and with different RNA extraction/library preparation kits.

ADD REPLY • link 2.3 years ago by lluc.cabus ▴ 20

0

Entering edit mode

I can confirm that the distribution of mapped reads between genic and intergenic regions are the same

ADD REPLY • link 2.3 years ago by lluc.cabus ▴ 20

0

Entering edit mode

Could be a bit far fetched, but maybe you can clarify what else was sequenced on the same sequencing late with your samples. A couple of times we saw a "leakage" of libraries within the same seq. lane.

ADD REPLY • link 2.3 years ago by grant.hovhannisyan ★ 2.6k

0

Entering edit mode

It could be an option, but we have seen this in different sequencing runs. I will ask to see if this could be the reason, thanks!

ADD REPLY • link 2.3 years ago by lluc.cabus ▴ 20

0

Entering edit mode

This would only apply if your samples were mixed with others and sequenced as a super pool.

ADD REPLY • link 2.3 years ago by GenoMax 148k

score 0 · Answer 1 · 2022-08-26

0

Entering edit mode

2.3 years ago

yhoogstrate ▴ 150

We once had a few samples with DNA contamination, which we could see by looking at the strand of the intergenic reads, which were ~50% mapped in both strands. IGV -> color by first-in-pair strand.

ADD COMMENT • link 2.3 years ago by yhoogstrate ▴ 150

0

Entering edit mode

Yes, we have tried that and we see that the reads are mapping in both strands. However, we use a DNAse treatment to remove the DNA before the library preparation and the quantification shows that there is no DNA in the samples (or at least so little that is not detectable).

ADD REPLY • link 2.3 years ago by lluc.cabus ▴ 20

0

Entering edit mode

In that case, given your description of the data, I believe this question "Could there be another explanation than DNA contamination for a high quantity of intronic/intergenic reads?" should be answered with "no / probably not" - despite the Qubit analysis. Assuming your data is stranded and appropriate primers etc. were used etc. .

ADD REPLY • link 2.3 years ago by yhoogstrate ▴ 150