Hello
I need help in the Base recalibration step in variant analysis. I downloaded the fasta reference file for the recalibration step and there seems to be error in the reference file. Not sure about the error below.
need also advice in the preprocessing steps. After marking for duplicates, should we also perform the indel realignment step and then do the base recalibration?. Some of the NGS pipeling figures have not mentioned the indel realignment step. So the step can be skipped?
Command for base recalibration:
gatk BaseRecalibrator \
-I my_reads.bam \
-R reference.fasta \
--known-sites sites_of_variation.vcf \
--known-sites another/optional/setOfSitesToMask.vcf \
-O recal_data.table
Command which I used to run the recalibration step:
gatk BaseRecalibrator \
-I my_sorted.bam \
-R GCF_000001405.26_GRCh38_genomic.fna \
--known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz\
--known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz \
--known-sites Homo_sapiens_assembly38.dbsnp138.vcf \
-O NG-01_1_S1_dedup_bwa_BSQR.table
Error:
Using GATK jar /scicore/soft/apps/GATK/4.2.0.0-foss-2018b-Java-1.8/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /scicore/soft/apps/GATK/4.2.0.0-foss-2018b-Java-1.8/gatk-package-4.2.0.0-local.jar BaseRecalibrator -I my_sorted.bam -R Homo_sapiens_assembly38.fasta --known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz --known-sites Homo_sapiens_assembly38.dbsnp138.vcf -O NG-01_1_S1_dedup_bwa_BSQR.table
15:50:12.217 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/scicore/soft/apps/GATK/4.2.0.0-foss-2018b-Java-1.8/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 29, 2021 3:50:12 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:50:12.383 INFO BaseRecalibrator - ------------------------------------------------------------
15:50:12.384 INFO BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.2.0.0
15:50:12.384 INFO BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
15:50:12.384 INFO BaseRecalibrator - Executing as thirun0000@login20.cluster.bc2.ch on Linux v3.10.0-1127.el7.x86_64 amd64
15:50:12.384 INFO BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-b03
15:50:12.384 INFO BaseRecalibrator - Start Date/Time: July 29, 2021 3:50:12 PM CEST
15:50:12.384 INFO BaseRecalibrator - ------------------------------------------------------------
15:50:12.384 INFO BaseRecalibrator - ------------------------------------------------------------
15:50:12.385 INFO BaseRecalibrator - HTSJDK Version: 2.24.0
15:50:12.385 INFO BaseRecalibrator - Picard Version: 2.25.0
15:50:12.385 INFO BaseRecalibrator - Built for Spark Version: 2.4.5
15:50:12.385 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:50:12.385 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:50:12.385 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:50:12.385 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:50:12.386 INFO BaseRecalibrator - Deflater: IntelDeflater
15:50:12.386 INFO BaseRecalibrator - Inflater: IntelInflater
15:50:12.386 INFO BaseRecalibrator - GCS max retries/reopens: 20
15:50:12.386 INFO BaseRecalibrator - Requester pays: disabled
15:50:12.386 INFO BaseRecalibrator - Initializing engine
15:50:12.393 INFO BaseRecalibrator - Shutting down engine
[July 29, 2021 3:50:12 PM CEST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=162267136
***********************************************************************
A USER ERROR has occurred: Fasta index file file:///scicore/home/cichon/thirun0000/bowtie/Homo_sapiens_assembly38.fasta.fai for reference file:///scicore/home/cichon/thirun0000/bowtie/Homo_sapiens_assembly38.fasta does not exist. Please see http://gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference for help creating it.
***********************************************************************
Thanks
I tried with index fasta file. Now, I also get an error:
Any idea
Thanks
You may need to rename
GCF_000001405.26_GRCh38_genomic.fna
toGCF_000001405.26_GRCh38_genomic.fa
and reindex. Whilefna
files are multi-fasta GATK may not like that.Is there problem with Java?. I tried renaming the file, still got the same error
I figured out that I have to create FASTA reference sequence dictionary file. For creating the dictionary file, the command GATK-Launch does not work.