Question

How do I know if I have DNA contamination in RNA-seq data?

2

Entering edit mode

5.8 years ago

blur ▴ 280

Hi,

I want to know whether I have DNA contamination in my RNA-seq data. I have a library with DNAse and one without DNAse and another pair with additional treatment [DNAse(+) vs. DNAse(-), DNAse(+)+X vs. DNAse(-)+X]

I tried:

1) Checking overall alignment rate of DNAse(+) vs. DNAse(-) and DNAse(+)+X vs. DNAse(-)+X] It is slightly higher on DNAse free samples

2) Checking alignment vs. transcripts It's all over the place - DNAse(+)+X has lowest rate, then DNAse(-) (without treatment), then DNAse(+) (without treatment) and finally the least background was in DNAse(-)+X... But I figured that might be due to rRNA? If it had not been sufficiently cleaned in some libraries it would not map to transcripts.

3) Looking at the reads in IGV after alignment to see if there is background/baseline in non-exon areas.

I am out of ideas, anyone know what else I can do?

Thanks!

RNA-Seq contamination • 7.4k views

ADD COMMENT • link updated 5.1 years ago by zebasilio • 0 • written 5.8 years ago by blur ▴ 280

2

Entering edit mode

Basically, you have tried almost everything. I would suggest a massive effort on step 3.

For example you might count the number of reads that align over introns in all your samples. The higher the number of such reads the higher the DNA contamination (assuming that the proportion of pre-mRNA in your samples is the same. I guess it was total RNA (not poly-A selection).

I also guess you didn't remove rRNA.

I am not sure, but maybe filtering rRNA reads might give you better estimates.

ADD REPLY • link 5.8 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Thanks, I'll try it all (the library is after experimental rRNA depletion, but I guess it never hurts to check again)

ADD REPLY • link 5.8 years ago by blur ▴ 280

0

Entering edit mode

Was this a sample mis-labeling, by any chance?

ADD REPLY • link 5.1 years ago by Kevin Blighe 88k

1

Entering edit mode

Build intronic sequences fasta
Index it
Use fastqscreen
You should see only small % mapping to the test reads. blur

ADD REPLY • link 5.8 years ago by cpad0112 21k

0

Entering edit mode

Hi!

I am trying to check for possible DNA contamination on the fastq files. Therefore I am using this tool, read_distribution: http://rseqc.sourceforge.net/#read-distribution-py

I then check for the percentage of reads aligned to intronic tags to check if there is DNA contamination.
Now I do not know how high must be the percentage of reads aligned to the intronic region to be considered DNA contamination.

Could you please help me in this regard?

Best, José

ADD REPLY • link 5.1 years ago by zebasilio • 0

0

Entering edit mode

How high is it? As often there is no fixed cutoff.

ADD REPLY • link 5.1 years ago by ATpoint 86k

0

Entering edit mode

Hi!

I am trying to check for possible DNA contamination on the fastq files. Therefore I am using this tool, read_distribution: http://rseqc.sourceforge.net/#read-distribution-py

I then check for the percentage of reads aligned to intronic tags to check if there is DNA contamination.
Now I do not know how high must be the percentage of reads aligned to the intronic region to be considered DNA contamination.

Could you please help me in this regard?

Best, José

ADD REPLY • link 5.1 years ago by zebasilio • 0

0

Entering edit mode

Please don't post the same question twice. You should also not use SUBMIT ANSWER box to ask additional questions in an existing thread.

ATpoint asked you to provide some results/data. Unless you do that we can't help you since he has already indicated that there is no fixed cutoff.

ADD REPLY • link 5.1 years ago by GenoMax 148k

score 6 · Answer 1 · 2019-02-27

6

Entering edit mode

5.8 years ago

michael.ante ★ 3.9k

Hi blur,

Try RSeQC's read_distribution.py. With DNA-contamination, you'll observe more introns and more intergenic 'tags'.

Cheers,

Michael

ADD COMMENT • link 5.8 years ago by michael.ante ★ 3.9k