Entering edit mode
3.5 years ago
storm1907
▴
30
Hello, I have to work with vcf files, were no IDs are present, such as:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ivf-15.F
chr1 69270 . A G 55.4 PASS CSQ=|FAIL|0.01|0.00|0.00|0.00|-10|26|-28|-25|||LOW|OR4F5|ENSG00000186092|ENST00000335137|protein_coding|1/1||216|60|S|tcA/tcG|,|FAIL|0.01|0.00|0.00|0.00|-10|26|-28|-25|||LOW|OR4F5|ENSG00000186092|ENST00000641515|protein_coding|3/3||303|81|S|tcA/tcG| GT:GQ:DP:AD:VAF:PL 1/1:55:18:0,18:1:55,65,0
chr1 69511 . A G 61.4 PASS CSQ=|FAIL|0.00|0.00|0.02|0.00|26|32|26|34|||MODERATE|OR4F5|ENSG00000186092|ENST00000335137|protein_coding|1/1||457|141|T/A|Aca/Gca|,|FAIL|0.00|0.00|0.02|0.00|26|32|26|34|||MODERATE|OR4F5|ENSG00000186092|ENST00000641515|protein_coding|3/3||544|162|T/A|Aca/Gca| GT:GQ:DP:AD:VAF:PL 1/1:59:69:0,69:1:61,62,0
Is there a way to correct such files without making variant calling step again?
Thank you!
If you're looking to get dbSNP rsIDs in the ID field, you could use any of a bunch of variant annotation tools, including
bcftools annotate
+ a dbSNP VCF file to do this.OK, I found this https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/ , but got confused, which is the right vcf file?
There is no "right" file. Which one do you need? If you don't care about CLINVAR, go for the file without clinical assertions. If you care about CLINVAR annotations, go for the other VCF.
I recommend you get
00-All.vcf.gz
, as it seems to be the most comprehensive file at first glance.ok, I ran
and still ID field is empty
vcf file looks like that:
what could be the problem?
Are the reference genomes version same for your VCF file and 00-All.vcf.gz? It is possible that due to reference genome version differences, coordinates are not matching.
You can also try Snpsift annotate: http://pcingola.github.io/SnpEff/ss_annotate/
It has a good example which can be helpful if you can't make bcftools annotate to work for you.
Thank you, snpEff seems to be working!
Make sure that the tool that you end up using (bcftools/snpEff) compares loci in the way you want it to. Some tools default to comparing only CHROM and POS while some compare all of CHROM POS REF and ALT. You can customize what bcftools compares, so make sure your tool and command reflect your requirement.
In addition to what prasundutta says, ensure the contig names match -
chr1, chr2, ...
are not compatible with1,2,...
.A good way to test would be to manually find a site that is in both VCFs and run annotate just for that using
Replace the
chr2:1234567
with the known site and ensure the output you see has the right rsID from the dbSNP file.