Evaluate the sensitivity and specificity of vcf files
1
0
Entering edit mode
5.1 years ago
jackycsie • 0

Hi,

I have already finished the variant call of NA12878.

My steps are as follows:

  1. bwa mem
  2. SortSamSpark
  3. MarkDuplicatesSpark
  4. BaseRecalibratorSpark
  5. ApplyBQSRSpark
  6. HaplotypeCaller

Now I have a vcf file, but I don't know how to judge his correctness.

My reference data is:

  1. dbsnp_138.b37.vcf.
  2. Mills_and_1000G_gold_standard.indels.b37.vcf
  3. 1000G_phase1.indels.b37.vcf

Thank you, jacky.

gene • 1.5k views
ADD COMMENT
0
Entering edit mode

Hi,

I find this url:

https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv3.3.2/GRCh37/

It's right or only provide chromosome 1 ?

Thanks, Jacky.

ADD REPLY
0
Entering edit mode

I think this is better suited for the GATK forums. There's this post that might be a good starting point: https://gatkforums.broadinstitute.org/gatk/discussion/6308/evaluating-the-quality-of-a-variant-callset

ADD REPLY
0
Entering edit mode

although chr1 is one of the biggest ones, and considering that you've already done the largest effort which is calling variants on NA12878, I'd definitely go for the entire genome's numbers rather than being fine with a relatively small subset of statistics.

ADD REPLY
1
Entering edit mode
4.9 years ago

you've chosen a reference sample to apply your variant calling pipeline to. well done, because you're almost there. just get a set of high confidence NA12878 variants such as the GiB project ones, use a comparison tool such as RTG vcfeval (the one recommended by the GiB project), and you'll be done.

ADD COMMENT

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6