Will deduping (marking duplicates) with Picard before mapping affect my variant calling? Is it common practice to dedupe after mapping?
Reason why I ask is because my FastQC reports still have a lot of Kmer, overrepresented sequences, and bad GC content. I figured these can be corrected by removing PCR contamination. This is after trimming adapter and low quality (10) bases using BBDuk.
Depends on what data you have, but a slight bimodal distribution of GC content in whole exome data, seems to be the norm (I haven't figured out a reason why, but it appears to be commonplace)
I'm working with tumor/normal PE RNA-seq samples from TCGA. The distribution varies across the board. Some are slight, some are drastic. I fear that mapping my reads without correcting GC and Kmer bias may muddle my variant calling downstream.
http://p08i.imgup.net/ScreenShot4fe0.png
http://i86i.imgup.net/ScreenShot1738.png
http://t38i.imgup.net/ScreenShote90f.png
I highly recommend you look at the GATK best practises, it includes caveats for using RNA seq data (providing the samples have suitable depth) https://www.broadinstitute.org/gatk/guide/best-practices.php