Entering edit mode
14 months ago
a5864557
•
0
It shouldn't be too hard to create one, but if one exists already that's even better. I need it to be automatable / non-web based (assume no relevant info exists in the header).
I don't think you can get genome build unless it's written in VCF header. Maybe you can guess according to some contig IDs.
First, I would never guess on an analysis. If there is no code and documentation available for a file then I'd never use it. But if you are absolutely forced to, you might get positions for common variants, dbSNP for example with high AF, for hg18, hg19, hg38, and then intersect these files with your VCF. The correct build should have the best overlap with the VCF on these sites.
@MatthewP ATpoint It is definitely possible to deterministically determine the build, by cross-referencing dbSNP (e.g) as ATpoint mentioned. If you do this for a large number of both b37 and b38 matches and you get 1000 matches on b38 and none on b37, we can assign the build to b38. Unless I'm missing something?