Entering edit mode
4.4 years ago
igor
•
0
I have genome sequencing data in vcf format that I would like to annotate using VEP. The genomic variants were called against the hs37d5 reference genome, and I can't seem to find VEP cache files for this assembly. How can I annotate these variants using VEP?
Check for hg19, I am 100% sure you will find it via that name.
Thanks. hg19 is actually not equivalent to hs37d5. See https://googlegenomics.readthedocs.io/en/latest/use_cases/discover_public_data/reference_genomes.html
It is the same coordinate system with additional decoy sequences, you are good to go with using the hg19 VEP annotations. These decoys are intended to catch false alignments in case that somatic cells in fact have viral integrates which is not uncommon. It is the same reference genome though.
That doesn't appear to be completely true either. For instance, when I open the BAM file in IGV using hg19 as the reference, the mitochondrial genome is completely misaligned. It is properly aligned when I use b37+decoy.
Edit: Chromosome Y and 3 are also different (https://gatk.broadinstitute.org/hc/en-us/articles/360035890711?id=23390#comparison)
I always thought they were (at least for the major chromosomes) fully identical, hmm... This is based on checksum, I do not know, cannot contribute any further, sorry.
Hi Igor, were you able to find a solution?