Dear list,
I have used CrossMap to lift over vcf files from GRCh37 coordinates to GRCh38. Here is the command:
python CrossMap-0.2.8/bin/CrossMap.py vcf GRCh37_to_GRCh38.chain.gz sample1520.vcf.gz Homo_sapiens.GRCh38.dna.primary_assembly.fa sample1520.N.grch38.vcf
I noticed that for some of my indel calls, for which I have an END coordinate field in the vcf INFO, those coordinates are not lifted over, which is causing me issues in some downstream analyses (see example variant below).
2 240692368 . AG A . PASS AC=1;AF=0.5;AN=2;DEL=.;END=241631786;HOMLEN=0;SVLEN=-1;set=indel GT:AD 0/1:20,6
Is there a way to lift over that field as well? Is there any lift-over tool available which takes care of this issue? I tried UCSC liftover tool, but the same issue occurs.
Any help is appreciated. Thanks!
Fernanda
Are you using the chain file from UCSC or your own? Also, have you checked to see whether the region you're trying to convert coordinates to is represented in the alignment used to generate your chain file and/or is not a region that is misassembled in 37 relative to 38 (i.e., perhaps this region is most accurately characterized on a patch scaffold and not the chromosome in 37)?
I have downloaded the chain file appropriate for my case directly from the CrossMap website. I see your point, but I don't think that is the problem, since the problem is observed in all indel calls where an END field is present in the INFO field. The END field is never converted, only the variant position, which makes me think CrossMap does not process that information?
If I look at the grch37 vcf file used as input, the END fields remain the same, but the variant location is successfully converted.
Sorry, I completely misread the question. You're trying to lift over a coordinate that's in the INFO field, not in the POS field (i.e., second column)? As far as I know, these tools aren't looking at the INFO field whatsoever as coordinates aren't usually there. However, you could extract those fields into a BED and convert the BED file, then re-insert them into the VCF.
Thank you for your response, Brice! Yes, for indels called by pindel, the stop coordinate for the variant is added to the INFO field. Because that is not lifted over, it was causing issues when running the converted VCF through tools like Gatk's VariantEval.
What you suggested sounds like a feasible solution. I will try that.
Thank you!
Great! I'll go ahead and copy/paste my comment below so it can be accepted as an answer.