Question

Confusion with High Confidence SNP Calls (Overlap B/w RNA-seq & WGBS calls) and Genome Assemblies Used for Alignment and Annotation

0

Entering edit mode

10.1 years ago

Dataminer ★ 2.8k

Hi!

Here is what I have, I have SNP calls from RNA-Seq (paired end) and SNP calls from WGBS performed on the same sample.

I have used GATK (Unified caller) for RNA-seq sample and for WGBS I had calls at 30x.

What, I have done, is made an overlap between the SNP calls from RNA-seq (Irrespective if it has PASS or Low or undetermined tag at its filter column) and from WGBS (Only with PASS tags), because if the call is present from both methods, it can be considered as high confidence call. The idea is to reduce the number of calls and get the high confidence calls. Let me know, if this is wrong approach

Secondly, RNA-Seq file for GATK was aligned using hg19 assembly from UCSC (provided by GATK) and after the VCF file generation I used SNPEff to annotate it. For SNPEff I was forced to use GRCh37.75. Is this change in builts will be a cause of concern? OR it is fine?

Thank you for your time

SNP WGBS SNPEff RNA-Seq • 3.8k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Dataminer ★ 2.8k

3

Entering edit mode

Just to clarify: SnpEff does not"force you to use GRCh37.75 at all. I provide pre-built databases for RefSeq (hg19), ENSEMBL (GRCh37.*) and KnownGenes (hg19kg). You can use whichever you reference genome you prefer.

Although some genes and transcript differ form hg19 to GRCh, the reference sequence is the same in all three cases. So it's perfectly OK to align to hg19 and annotate with GRCh.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Pablo ★ 1.9k

1

Entering edit mode

Hi Pablo,

Thank's for pointing me to the hg19 database for SNPEff. Never meant to demean what SNPEff does, it is a wonderful annotation tool.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Dataminer ★ 2.8k

Ram · Answer 1 · 2014-11-09

2

Entering edit mode

10.0 years ago

Devon Ryan 104k

Given how much bisulfite treatment can degrade DNA quality, I'd generally be hesitant to then try and use it for variant calling. Firstly, I would recommend that you filter fairly stringently. You'll also need to screen out apparent C/G sites that are T/A. In general, you might look into BisSNP for handling the WGBS data.

For SNPEff, just keep in mind that UCSC and Ensembl use different chromosome names. Perhaps SNPEff knows to convert things, which is good. You just need to ensure that the resulting VCF files have the same coordinate names.

ADD COMMENT • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon,

I have variant calls from both RNA-Seq and WGBS, what I was thinking since, WGBS has depth of 30x, I can use an overlap between the calls that have PASS tag in both RNA-Seq and WGBS method. This will enable me in getting high confidence SNP calls for my data. Or am I completely wrong in taking the overlap? I am not too experienced in this WGBS SNP call thing, a little help and guidiance will be deeply appreciated.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Dataminer ★ 2.8k

2

Entering edit mode

The only concern is decreasing the false-positive rate on the WGBS dataset to a reasonable level before doing the overlap. If you overlap noise with noise, you get noise out.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Devon Ryan 104k