Hi,
I have some VCF files that doesn't contain any assembly information in the header. Is there a tool or algorithm to detect the assembly?
Thank you!
Hi,
I have some VCF files that doesn't contain any assembly information in the header. Is there a tool or algorithm to detect the assembly?
Thank you!
Maybe someone else has a clever idea, but I'm pretty sure unless there is some metadata in the vcf header, it's impossible to tell the assembler.
I wrote a program vcfverifier that takes in a --fasta and --vcf argument and tries to verify that the REF column of the VCF matches what is in the FASTA file
Possibly it could be sped up but it takes 23 seconds to match the ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz from 1000 genomes to hs37d5.fa
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
this is a tricky one. the VCF does contain the REF allele, so you can check the reference genome FASTA and see if it matches
Yeah, but I guess that would be hard to automatize as each VCF contain different variants. What do you think?
a script could probably be made to check many of the REF column values against a given FASTA file in bulk
That's true. I wish there was a solid tool in bcftools or something. Thank you!
there might be some code in bcftools that can kind of do this. the
bcftools csq
command outputs an error message related to this for example...https://github.com/samtools/bcftools/issues/869 .... not sure if there is a simpler way to make it do that checkThank you for the idea. This may work but have no idea how to utilize it. VCF content is unpredictable. One filter may not work on the other one I think.
I wrote a program that tries to do this by hand https://github.com/cmdcolin/vcfverifier
Do you mean genome assembly (like gr37/gr38) or the assembler used (SPAdes)?
I meant assembly (grch37, grch38 etc.)
Oh right - well the easy way is just to pull out some SNP rsids and look up the position on dbSNP or something, that will quickly tell you what build it's on.
Yeah that makes sense. It won't cover all cases but hopefully, cover enough of them. Thank you!