RNA-seq dedupe PCR contamination before or after mapping
1
0
Entering edit mode
8.8 years ago
umn_bist ▴ 390

Will deduping (marking duplicates) with Picard before mapping affect my variant calling? Is it common practice to dedupe after mapping?

Reason why I ask is because my FastQC reports still have a lot of Kmer, overrepresented sequences, and bad GC content. I figured these can be corrected by removing PCR contamination. This is after trimming adapter and low quality (10) bases using BBDuk.

RNA-Seq • 3.6k views
ADD COMMENT
0
Entering edit mode

Depends on what data you have, but a slight bimodal distribution of GC content in whole exome data, seems to be the norm (I haven't figured out a reason why, but it appears to be commonplace)

ADD REPLY
0
Entering edit mode

I'm working with tumor/normal PE RNA-seq samples from TCGA. The distribution varies across the board. Some are slight, some are drastic. I fear that mapping my reads without correcting GC and Kmer bias may muddle my variant calling downstream.

http://p08i.imgup.net/ScreenShot4fe0.png

http://i86i.imgup.net/ScreenShot1738.png

http://t38i.imgup.net/ScreenShote90f.png

ADD REPLY
0
Entering edit mode

I highly recommend you look at the GATK best practises, it includes caveats for using RNA seq data (providing the samples have suitable depth) https://www.broadinstitute.org/gatk/guide/best-practices.php

ADD REPLY
2
Entering edit mode
8.8 years ago

Overrepresented sequences / skewed GC content is expected in RNA-seq data. It usually comes from the most highly expressed transcripts (such as rRNA). However, it can also come from PCR duplicates and those can completely skew variant calling. For this reason, while people usually don't dedupe RNA-seq data for differential expression analysis, it is still recommended to do so for variant calling.

Some reference : http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0058815

ADD COMMENT

Login before adding your answer.

Traffic: 2376 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6