Dear Biostarians,
Using GATK i successfully created a VCF file.Now i have to validate it.In GATK itself there is an command option to do it
gatk ValidateVariants \
-R ref.fasta \
-V input.vcf \
--dbsnp dbsnp.vcf
Here in " --dbsnp " which dbsp file i have to use ,I am confused regarding latest GCF_000001405.39.gz or All.vcf.gz.And also lot other human VCF file too there like archive folder and GATK folder too which further confuses me which to use here.The above both VCF file links.
Their links https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.39.gz or
https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/VCF/00-All.vcf.gz
My data is processed with GRCh38 reference genome.
Or anyother validations for VCF is there please let me knew it.
Thanks in advance
Thanks Genomax for your reply,I will download the same as you recommended and getback to you soon :)
Hello genmax, I ran the validation but got errors regarding compatability of contigs.I am here attaching my command & terminal output
command i used: gatk ValidateVariants \ -R /media/lab/Lab/GRCh38genome/Homo_sapiens.GRCh38.dna.toplevelfiltered.fa\ -V /media/lab/Lab10TB/0VCF/02h1WGS/05dbsnp/rawsnpsbwa.vcf.gz \ --dbsnp GCF_000001405.39.gz
terminaloutput
Based on the name it appears that you are using
toplevel
data file from Ensembl. This file contains haplotypes etc and is generally not needed for normal data analysis. That must be one of the reasons for the error. Other is the chromosome designations in RefSeq SNP file may not match what you have. Likely if you are usingtoplevel
file. See: Why is human genome FASTA file on GENCODE much smaller than that on ENSEMBL?I extracted all 1-22,x,y and MT chromosomes alones from the toplevel reference genome.So no problem with haplotypes here.Only the chromosome names varies here like NC_000001.11 is named for chromosome 1 etc.How i can make this file compatible here or is there any other way to validate a vcf file with dbsnp data
Since changing chromosome names in either file is going to be a big task so perhaps you could use Ensembl provided VCF files. http://ftp.ensembl.org/pub/current_variation/vcf/homo_sapiens/ . They may have matching chromosome names. Be sure to check the readme included in this directory to see if these files will work.