How to deal with PCR artifacts in RNASeq data?
1
0
Entering edit mode
7.1 years ago
annen ▴ 30

Hi everyone. I have an RNAseq dataset that I had done at a core facility (they did the library prep, sequencing, and initial analysis). At first, I tried to verify some of the interesting DEGs by qPCR but was unable to do so for most of them (could not verify 18 out of 20), either in the same samples I submitted for sequencing or in an additional cohort.

I went back to the raw data which I did not have access to before, and found what looked like a lot of PCR artifacts from the library prep. This makes me think that a lot of my DEGs are not actually differentially expressed but are instead artificially different due to library prep.

My questions are then: 1) Is there any way to confirm PCR artifacts in RNAseq data? 2) How do I deal with this? In looking a the core's analysis, it seems like they performed a deduplication step. I've read that you should not deduplicate RNAseq data... should I redo this analysis?

Thanks for any advice.

RNA-Seq • 2.8k views
ADD COMMENT
0
Entering edit mode

Can you clarify "looked like a lot of PCR artifacts from the library prep"? Do you mean you have a lot of duplicates or is it something else? Is it just in one group and not the other?

ADD REPLY
0
Entering edit mode

Its a lot of identical reads, seems to be in all my experimental groups, but is not consistent across genes. I'm attaching an example. If I run a deduplication on the data, something like 25% of my reads are thrown out as duplicates.

Example: https://ibb.co/jQ598w

ADD REPLY
0
Entering edit mode

Are there specific artefacts you're concerned about or you think that there are biases in the PCR?

If the latter there are a number of papers now which have mathematically modelled the biases PCR introduces, and you may be able to run your libraries through their analyses to correct for it.

ADD REPLY
0
Entering edit mode

Biases in the PCR- for example, some samples seem to have PCR duplicates (tons of identical reads) in a particular gene while others don't, therefore the one with duplicates is coming up as differentially expressed when it's really not.

I've read about deduplication algorithms, but it looks like most people say that you should not deduplicate for RNAseq. I'm finding conflicting information on that though.

ADD REPLY
0
Entering edit mode

What's your biological and technical replicate set up? Can you compare between samples that should theoretically be the same and see if the 'duplications' are present in all of them?

ADD REPLY
0
Entering edit mode

I guess it depends on what your definition of a biological/technical replicate is- I have 3 conditions with 5 replicates each. One of the samples was sequenced twice (from the same library prep) as a sequencing technical replicate. The two links below should be identical- to me, it looks like there are identical/duplicated reads in each, but they are different between the two.

https://ibb.co/iqySdw https://ibb.co/moDtJw

ADD REPLY
0
Entering edit mode
7.1 years ago
theobroma22 ★ 1.2k

You should use qRT-PCR. Compare the 2^-DeltadeltaCt data with the RNAseq count data. Timecourse or group comparison experiments should show similar trends in expression level, respectively.

ADD COMMENT
0
Entering edit mode

Post all of your code and graphic please.

ADD REPLY
0
Entering edit mode

That's what I was initially trying to do and was unable to. I was unable to show any change in genes that were +/-2 fold changed in RNAseq with qPCR. This is what lead me to think there was something funny in the RNAseq analysis.

ADD REPLY
0
Entering edit mode

I would re-do the PCR analysis. It's very technical. Relying on someone else's fishy technique or unmatched results, I wouldn't have a lot of confidence in it.

ADD REPLY
0
Entering edit mode

Make sure you understand and get the melting curve for each primer set!!

ADD REPLY
0
Entering edit mode

I did the qPCR myself and have a ton of experience with that, so I don't think that's the cause. I designed the primers myself- several sets, some of them based on the specific regions in DEGs from the RNAseq data. Still nothing. It's super frustrating.

ADD REPLY
0
Entering edit mode

And you are certain that for each primer set you had a single peak melting curve?

ADD REPLY
0
Entering edit mode

By the way, your post is a bit out of the area of bioinformatics and the Biostars community. Good luck with your research efforts.

ADD REPLY

Login before adding your answer.

Traffic: 1633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6