Entering edit mode
5.1 years ago
jackycsie
•
0
Hi,
I have already finished the variant call of NA12878.
My steps are as follows:
- bwa mem
- SortSamSpark
- MarkDuplicatesSpark
- BaseRecalibratorSpark
- ApplyBQSRSpark
- HaplotypeCaller
Now I have a vcf file, but I don't know how to judge his correctness.
My reference data is:
- dbsnp_138.b37.vcf.
- Mills_and_1000G_gold_standard.indels.b37.vcf
- 1000G_phase1.indels.b37.vcf
Thank you, jacky.
Hi,
I find this url:
https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv3.3.2/GRCh37/
It's right or only provide chromosome 1 ?
Thanks, Jacky.
I think this is better suited for the GATK forums. There's this post that might be a good starting point: https://gatkforums.broadinstitute.org/gatk/discussion/6308/evaluating-the-quality-of-a-variant-callset
although chr1 is one of the biggest ones, and considering that you've already done the largest effort which is calling variants on NA12878, I'd definitely go for the entire genome's numbers rather than being fine with a relatively small subset of statistics.