I'm running an analysis where I use Mutect2 to look for reads that code for a particular pathogenic variant. The variant is given in a VCF file (here: myvcf.vcf.gz
).
gatk Mutect2 \
-R "hg38.fa" \
--alleles "myvcf.vcf.gz" \
-L "myvcf.vcf.gz" \
-I "input.bam" \
-O "output.vcf"
For "myvcf.vcf.gz" I initially used a VCF file which I generated with a different analysis, this worked well (tags omitted for brevity):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 312092_13-Ctrl30_G01_S16_R1_001.fastq.gz
chr20 4699818 . G A 222.391 . DP=1064;VDB=0.670108;SGB=-0.693147;RPBZ=-0.800274;MQBZ=-1.6155;MQSBZ=-4.79102;BQBZ=9.03611;SCBZ=0.192787;FS=0;MQ0F=0;AC=1;AN=2;DP4=214,222,177,203;MQ=59 GT:PL:AD 0/1:255,0,190:436,380
Now I want to look for a new variant. I haven't got a VCF file, so I simply copied the information on this variant from Ensembl and generated the following:
##fileformat=VCFv4.3
##fileDate=20240306
##source=Ensembl
##reference=GRCh38
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
20 4699752 rs74315403 G A . . . GT
However, Mutect2 gave me the error below. It seems like my custom VCF lacks genotype/sample data.
So my question is: How do I create a correctly formatted VCF file for the purpose of this analysis?
org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path /add/results/2024-03-05_ref_VCFs_for_mutations/D178N.vcf.gz
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:436)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:377)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:319)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:291)
at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:245)
at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:208)
at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:155)
at org.broadinstitute.hellbender.engine.GATKTool.initializeFeatures(GATKTool.java:497)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:79)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:147)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)
Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: The FORMAT field was provided but there is no genotype/sample data, for input source: /add/results/2024-03-05_ref_VCFs_for_mutations/D178N.vcf.gz
at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:97)
at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:82)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:433)
... 15 more