index fasta file for HaplotypeCaller
1
0
Entering edit mode
8 months ago
Mojtaba • 0

I am going to work with HaplotypeCaller to realign indels after some processes. HaplotypeCaller asks me to define an indexed fasta reference <ref.fasta.fai> and offers me to make it by samtools faidx. When I use samtools faidx its work is very short and gives me a fasta.fai file. But when I define it to haplotypecaller, it gives me a new error. Anyone HELP Me.

    (base) mojtaba@Mojtaba:~/Desktop/BAM$ java -jar /home/mojtaba/Desktop/BAM/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar HaplotypeCaller -R GRCh38_latest_genomic.fna -I marked_duplicates.sam -O realigned.vcf.gz -bamout completelyprocessedSAM.bam
15:50:37.491 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/mojtaba/Desktop/BAM/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
15:50:37.690 INFO  HaplotypeCaller - ------------------------------------------------------------
15:50:37.693 INFO  HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.5.0.0
15:50:37.693 INFO  HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
15:50:37.694 INFO  HaplotypeCaller - Executing as mojtaba@Mojtaba on Linux v6.5.0-21-generic amd64
15:50:37.694 INFO  HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v17.0.10+7-Ubuntu-122.04.1
15:50:37.694 INFO  HaplotypeCaller - Start Date/Time: March 7, 2024 at 3:50:37 PM IRST
15:50:37.694 INFO  HaplotypeCaller - ------------------------------------------------------------
15:50:37.694 INFO  HaplotypeCaller - ------------------------------------------------------------
15:50:37.696 INFO  HaplotypeCaller - HTSJDK Version: 4.1.0
15:50:37.696 INFO  HaplotypeCaller - Picard Version: 3.1.1
15:50:37.696 INFO  HaplotypeCaller - Built for Spark Version: 3.5.0
15:50:37.697 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:50:37.697 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:50:37.697 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:50:37.698 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:50:37.698 INFO  HaplotypeCaller - Deflater: IntelDeflater
15:50:37.698 INFO  HaplotypeCaller - Inflater: IntelInflater
15:50:37.699 INFO  HaplotypeCaller - GCS max retries/reopens: 20
15:50:37.699 INFO  HaplotypeCaller - Requester pays: disabled
15:50:37.700 INFO  HaplotypeCaller - Initializing engine
15:50:37.704 INFO  HaplotypeCaller - Shutting down engine
[March 7, 2024 at 3:50:37 PM IRST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=91226112
***********************************************************************

A USER ERROR has occurred: Fasta index file file:///home/mojtaba/Desktop/BAM/GRCh38_latest_genomic.fna.fai for reference file:///home/mojtaba/Desktop/BAM/GRCh38_latest_genomic.fna does not exist. Please see https://gatk.broadinstitute.org/hc/articles/360035531652-FASTA-Reference-genome-format for help creating it.

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

    mojtaba@Mojtaba:~/Desktop/BAM$ samtools faidx GRCh38_latest_genomic.fna
    mojtaba@Mojtaba:~/Desktop/BAM$ java -jar /home/mojtaba/Desktop/BAM/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar HaplotypeCaller -R GRCh38_latest_genomic.fasta.fai -I marked_duplicates.sam -O realigned.vcf.gz -bamout completelyprocessedSAM.bam

15:54:54.231 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/mojtaba/Desktop/BAM/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
15:54:54.458 INFO  HaplotypeCaller - ------------------------------------------------------------
15:54:54.461 INFO  HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.5.0.0
15:54:54.462 INFO  HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
15:54:54.462 INFO  HaplotypeCaller - Executing as mojtaba@Mojtaba on Linux v6.5.0-21-generic amd64
15:54:54.462 INFO  HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v17.0.10+7-Ubuntu-122.04.1
15:54:54.463 INFO  HaplotypeCaller - Start Date/Time: March 7, 2024 at 3:54:54 PM IRST
15:54:54.463 INFO  HaplotypeCaller - ------------------------------------------------------------
15:54:54.463 INFO  HaplotypeCaller - ------------------------------------------------------------
15:54:54.465 INFO  HaplotypeCaller - HTSJDK Version: 4.1.0
15:54:54.465 INFO  HaplotypeCaller - Picard Version: 3.1.1
15:54:54.465 INFO  HaplotypeCaller - Built for Spark Version: 3.5.0
15:54:54.466 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:54:54.466 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:54:54.467 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:54:54.467 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:54:54.468 INFO  HaplotypeCaller - Deflater: IntelDeflater
15:54:54.468 INFO  HaplotypeCaller - Inflater: IntelInflater
15:54:54.468 INFO  HaplotypeCaller - GCS max retries/reopens: 20
15:54:54.468 INFO  HaplotypeCaller - Requester pays: disabled
15:54:54.469 INFO  HaplotypeCaller - Initializing engine
15:54:54.474 INFO  HaplotypeCaller - Shutting down engine
[March 7, 2024 at 3:54:54 PM IRST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=101711872
java.lang.IllegalArgumentException: File is not a supported reference file type: /home/mojtaba/Desktop/BAM/GRCh38_latest_genomic.fasta.fai
    at htsjdk.samtools.reference.ReferenceSequenceFileFactory.lambda$getFastaExtension$0(ReferenceSequenceFileFactory.java:253)
    at java.base/java.util.Optional.orElseGet(Optional.java:364)
    at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getFastaExtension(ReferenceSequenceFileFactory.java:253)
    at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getDefaultDictionaryForReferenceSequence(ReferenceSequenceFileFactory.java:223)
    at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.checkFastaPath(CachingIndexedFastaSequenceFile.java:184)
    at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:147)
    at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:129)
    at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:114)
    at org.broadinstitute.hellbender.engine.ReferenceFileSource.<init>(ReferenceFileSource.java:35)
    at org.broadinstitute.hellbender.engine.ReferenceDataSource.of(ReferenceDataSource.java:27)
    at org.broadinstitute.hellbender.engine.GATKTool.initializeReference(GATKTool.java:439)
    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:722)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:79)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:147)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
    at org.broadinstitute.hellbender.Main.main(Main.java:306)
Samtools HaplotypeCaller GATK faidx • 933 views
ADD COMMENT
1
Entering edit mode

A USER ERROR has occurred: Fasta index file file:///home/mojtaba/Desktop/BAM/GRCh38_latest_genomic.fna.fai for reference file:///home/mojtaba/Desktop/BAM/GRCh38_latest_genomic.fna does

please, what is the output of

ls -lah /home/mojtaba/Desktop/BAM/GRCh38_latest_genomic.*
ADD REPLY
0
Entering edit mode
8 months ago
Arton ▴ 20

You should use "-R GRCh38_latest_genomic.fasta" instead of "-R GRCh38_latest_genomic.fasta.fai". "-R" parameter is for the reference file. GATK automatically detects the index file.

ADD COMMENT
0
Entering edit mode

the gatk log shows OP first used java -jar /home/mojtaba/Desktop/BAM/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar HaplotypeCaller -R GRCh38_latest_genomic.fna -I marked_duplicates.sam -O realigned.vcf.gz -bamout completelyprocessedSAM.bam

ADD REPLY
0
Entering edit mode

This is what he did next:

$ samtools faidx GRCh38_latest_genomic.fna

$ java -jar /home/mojtaba/Desktop/BAM/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar HaplotypeCaller -R GRCh38_latest_genomic.fasta.fai -I marked_duplicates.sam -O realigned.vcf.gz -bamout completelyprocessedSAM.bam

ADD REPLY
0
Entering edit mode

Thank you. Its working.

ADD REPLY

Login before adding your answer.

Traffic: 2314 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6