Hello,
I have a vcf file which consists of mutations that was generated using the GATK variant calling workflow. For this the hs37d5
assembly was used.
The problem is, that all GAKT reference resources use the b37
assembly, and if I simply use them, my script fails, because for some variants (less than 0.1%) there is a mismatch between the b37
and hs37d5
reference genome.
So my idea was to simply remap the variants of the VCF file to b37
. I planed on using something like CrossMap
, but no chain files are available for my reference assemblies.
Does anyone have an idea how I can remap the variants from my hs37d5
vcf file to the b37
assembly without the use of chain files, or any other suggestions?
I would greatly appreciate them!
Cheers
don't you just have to rename the chromosomes (if needed) and discard the chromosomes that are not present in the other reference ?
Unfortunately not... Very rarely, the also differ in the nucleotide sequence. But because I am working with WGS, these events do occur and causes my script to crash, because the "REF" in my VCF file does not match the "REF" of my provided genome assembly.
My idea was to use a simple python script to manually change the REF nucleotides where a mismatch occurs, but it feels kinda wrong to manually change nucleotides...
https://cloud.google.com/life-sciences/docs/resources/public-datasets/reference-genomes
Could you elaborate what you mean? I also thought that they share the same sequence for the autosomes, but there are definitely some positions where they differ (at least the ones that I am using). I also found this on the GATK page:
For b37:
I am in a similar predicament and am trying to make my own liftover for hs37d5 to b37. I'm wondering if you had any luck resolving your issue. Here are the steps I'm planning to follow - UCSC Liftover Instructions