Problem with vcf format to run gatk FastaAlternateReferenceMaker
0
0
Entering edit mode
4.1 years ago
pablo ▴ 310

Hello,

I'm trying to run gatk FastaAlternateReferenceMaker to get the FASTA files from my vcf file.

I run : gatk FastaAlternateReferenceMaker -R my_reference.fa -O my_output.fasta -V my_file.vcf

I get that error message :

Using GATK jar /opt/apps/gcc-8.1.0/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /opt/apps/gcc-8.1.0/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar FastaAlternateReferenceMaker -R my_reference.fa -O my_output.fasta -V my_file.vcf    14:15:23.712 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/apps/gcc-8.1.0/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
14:15:30.426 INFO  FastaAlternateReferenceMaker - ------------------------------------------------------------
14:15:30.427 INFO  FastaAlternateReferenceMaker - The Genome Analysis Toolkit (GATK) v4.1.0.0
14:15:30.427 INFO  FastaAlternateReferenceMaker - For support and documentation go to https://software.broadinstitute.org/gatk/
14:15:30.430 INFO  FastaAlternateReferenceMaker - Executing as *** on Linux v3.10.0-1127.18.2.el7.x86_64 amd64
14:15:30.430 INFO  FastaAlternateReferenceMaker - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_45-b14
14:15:30.430 INFO  FastaAlternateReferenceMaker - Start Date/Time: November 3, 2020 2:15:23 PM CET
14:15:30.430 INFO  FastaAlternateReferenceMaker - ------------------------------------------------------------
14:15:30.430 INFO  FastaAlternateReferenceMaker - ------------------------------------------------------------
14:15:30.431 INFO  FastaAlternateReferenceMaker - HTSJDK Version: 2.18.2
14:15:30.432 INFO  FastaAlternateReferenceMaker - Picard Version: 2.18.25
14:15:30.432 INFO  FastaAlternateReferenceMaker - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:15:30.432 INFO  FastaAlternateReferenceMaker - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:15:30.432 INFO  FastaAlternateReferenceMaker - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:15:30.432 INFO  FastaAlternateReferenceMaker - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:15:30.432 INFO  FastaAlternateReferenceMaker - Deflater: IntelDeflater
14:15:30.432 INFO  FastaAlternateReferenceMaker - Inflater: IntelInflater
14:15:30.433 INFO  FastaAlternateReferenceMaker - GCS max retries/reopens: 20
14:15:30.433 INFO  FastaAlternateReferenceMaker - Requester pays: disabled
14:15:30.433 INFO  FastaAlternateReferenceMaker - Initializing engine
14:15:30.853 INFO  FeatureManager - Using codec VCFCodec to read file file:///my_file.vcf
14:15:30.859 INFO  FastaAlternateReferenceMaker - Shutting down engine
[November 3, 2020 2:15:30 PM CET] org.broadinstitute.hellbender.tools.walkers.fasta.FastaAlternateReferenceMaker done. Elapsed time: 0.12 minutes.
Runtime.totalMemory()=623378432
org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path my_file.vcf
        at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:353)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:305)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:256)
        at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:234)
        at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:208)
        at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:155)
        at org.broadinstitute.hellbender.engine.GATKTool.initializeFeatures(GATKTool.java:417)
        at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:638)
        at org.broadinstitute.hellbender.engine.ReferenceWalker.onStartup(ReferenceWalker.java:36)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
        at org.broadinstitute.hellbender.Main.main(Main.java:291)
Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: The FORMAT field was provided but there is no genotype/sample data, for input source: my_file.vcf
        at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263)
        at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:102)
        at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:127)
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:120)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:350)
        ... 14 more
Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: The FORMAT field was provided but there is no genotype/sample data
        at htsjdk.variant.vcf.AbstractVCFCodec.parseHeaderFromLines(AbstractVCFCodec.java:185)
        at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:111)
        at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
        at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
        at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:261)
        ... 18 more

I show the header of my vcf file, which looks good :

##fileformat=VCFv4.2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
Super-Scaffold_100001   672149  .       C       T       .       PASS    DP=210;AF=0.51  GT:HQ
Super-Scaffold_100001   862122  .       A       T       .       PASS    DP=305;AF=0.5   GT:HQ
Super-Scaffold_100001   931168  .       C       A       .       PASS    DP=127;AF=0.5   GT:HQ
Super-Scaffold_100001   967240  .       C       T       .       PASS    DP=127;AF=0.5   GT:HQ

Any idea why I get Your input file has a malformed header: The FORMAT field was provided but there is no genotype/sample data ?

Bests

vcf gatk phasing • 2.2k views
ADD COMMENT
0
Entering edit mode

I think the main problem is here:

Error initializing feature reader for path my_file.vcf

Is that vcf file in the right location?

ADD REPLY
0
Entering edit mode

Yes, the vcf file is in the right location. Even if I use the full path, it does not work.

ADD REPLY
0
Entering edit mode

The FORMAT field was provided but there is no genotype/sample data, for input source: my_file.vcf

there is a FORMAT column but there is no associated genotype in your vcf.

ADD REPLY

Login before adding your answer.

Traffic: 2366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6