I am finding difficulty finding the exact assembly version (e.g. patch version) of GRCh38 used for major databases.
For instance, gnomad says "GRCh38". But the only information on the version, for v3.1 comes from here, which says it "uses an updated version of Variant Effect Predictor (VEP) based on the most recent Gencode v35. When I click that link, I discover Gencode v35 is based on GRCh38.p13... which is great except it doesn't tell me if the GenBank (GCA_000001405.28) or RefSeq (GCF_000001405.39) assembly was used. This is important as those versions are not the same (unlike early patches of GRCh38). Additionally, the files I downloaded were v3.1.2... where no assembly information beyond "GRCh38" is used.
Then I decided to look up the latest update to the 1000 genome (e.g. 2022 Byrska-Bishop et al). And with a quick scan I am not finding anything about the patch version. Do I simply assume the original 2013 release was used?
The fact that patch-version does matter for looking up variants at an exact position makes me wonder if I approaching this wrong... 1) Are people generally not using positions to look up variant information and instead using rsIDs? What about rare variants based on whole-genome sequencing, which don't have an rsID? 2) Is there some quick way/tool to figure out the patch version?
For my work, being able to extract variants using a position and mapping them exactly onto the chromosome is important. However, I am trying to incorporate multiple databases, which used different patches. Thus, I am also wondering...
3) Is there a reliable way to remove patches and convert VCF files to the original 2013 version of GRCh38? Is this consider a bad practice?
Really appreciate any feedback
Thank you!
GenoMax how does this work, formally?
Did you check the link above?