Entering edit mode
6 months ago
Omics data mining
▴
260
Hello everyone
I am facing challenges with liftover of a VCF file from hg19 to hg38 using GATK because of 'I' and 'D' annotations representing insertions and deletions in the VCF file.
Running command used for the liftover
gatk LiftoverVcf -I SNP_GRCh37.vcf -O Liftover_with_Indels/lifted_over.vcf -C hg19ToHg38.over.chain.gz -WMC true -R genome.fa --REJECT Liftover_with_Indels/rejeceted_variants.vcf --RECOVER_SWAPPED_REF_ALT True
Despite converted the VCF file to VCF 4.2 version using vcftools, I'm still having this issue.
htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 200: Insertions/Deletions are not supported when reading 3.x VCF's. Please convert your file to VCF4 using
VCFTools, available at http://vcftools.sourceforge.net/index.html, for input source: file:///SNP_GRCh37.vcf
at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:887)
at htsjdk.variant.vcf.AbstractVCFCodec.checkAllele(AbstractVCFCodec.java:674)
at htsjdk.variant.vcf.AbstractVCFCodec.parseAlleles(AbstractVCFCodec.java:640)
at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:443)
at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:384)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:328)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:48)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:377)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:356)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:317)
at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:411)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:280)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)
Any suggestions on how to convert 'I' and 'D' annotations into a more acceptable format compatible with VCF 4.2 would be greatly appreciated. I've been struggling with this problem for a few days now."
I think this is useless, in the gatk doc:
so, as far as I understand, the alleles must be ATGC. Unless you find a way to restore the REF and ALT sequences you'd better re-call the bam with modern tools.
Pierre Lindenbaum is correct. Also, do notice that the GATK option
--RECOVER_SWAPPED_REF_ALT True
does not work with indels. In general, if your VCF includes indels, avoid tools such as GATK/LiftoverVcf or CrossMap/VCF, as explained here