Hi!
Here is what I have, I have SNP calls from RNA-Seq (paired end) and SNP calls from WGBS performed on the same sample.
I have used GATK (Unified caller) for RNA-seq sample and for WGBS I had calls at 30x.
What, I have done, is made an overlap between the SNP calls from RNA-seq (Irrespective if it has PASS or Low or undetermined tag at its filter column) and from WGBS (Only with PASS tags), because if the call is present from both methods, it can be considered as high confidence call. The idea is to reduce the number of calls and get the high confidence calls. Let me know, if this is wrong approach
Secondly, RNA-Seq file for GATK was aligned using hg19 assembly from UCSC (provided by GATK) and after the VCF file generation I used SNPEff to annotate it. For SNPEff I was forced to use GRCh37.75. Is this change in builts will be a cause of concern? OR it is fine?
Thank you for your time
Just to clarify: SnpEff does not"force you to use GRCh37.75 at all. I provide pre-built databases for RefSeq (hg19), ENSEMBL (GRCh37.*) and KnownGenes (hg19kg). You can use whichever you reference genome you prefer.
Although some genes and transcript differ form hg19 to GRCh, the reference sequence is the same in all three cases. So it's perfectly OK to align to hg19 and annotate with GRCh.
Hi Pablo,
Thank's for pointing me to the hg19 database for SNPEff. Never meant to demean what SNPEff does, it is a wonderful annotation tool.