Reference file for base recalibration
1
1
Entering edit mode
3.3 years ago
priya.bmg ▴ 60

Hello

I need help in the Base recalibration step in variant analysis. I downloaded the fasta reference file for the recalibration step and there seems to be error in the reference file. Not sure about the error below.

need also advice in the preprocessing steps. After marking for duplicates, should we also perform the indel realignment step and then do the base recalibration?. Some of the NGS pipeling figures have not mentioned the indel realignment step. So the step can be skipped?

Command for base recalibration:

 gatk BaseRecalibrator \
   -I my_reads.bam \
   -R reference.fasta \
   --known-sites sites_of_variation.vcf \
   --known-sites another/optional/setOfSitesToMask.vcf \
   -O recal_data.table

Command which I used to run the recalibration step:

 gatk BaseRecalibrator \
   -I my_sorted.bam \
   -R GCF_000001405.26_GRCh38_genomic.fna \
   --known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz\ 
   --known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz \
    --known-sites Homo_sapiens_assembly38.dbsnp138.vcf \
    -O NG-01_1_S1_dedup_bwa_BSQR.table

Error:

Using GATK jar /scicore/soft/apps/GATK/4.2.0.0-foss-2018b-Java-1.8/gatk-package-4.2.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /scicore/soft/apps/GATK/4.2.0.0-foss-2018b-Java-1.8/gatk-package-4.2.0.0-local.jar BaseRecalibrator -I my_sorted.bam -R Homo_sapiens_assembly38.fasta --known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --known-sites   1000G_phase1.snps.high_confidence.hg38.vcf.gz --known-sites Homo_sapiens_assembly38.dbsnp138.vcf -O NG-01_1_S1_dedup_bwa_BSQR.table
15:50:12.217 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/scicore/soft/apps/GATK/4.2.0.0-foss-2018b-Java-1.8/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 29, 2021 3:50:12 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:50:12.383 INFO  BaseRecalibrator - ------------------------------------------------------------
15:50:12.384 INFO  BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.2.0.0
15:50:12.384 INFO  BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
15:50:12.384 INFO  BaseRecalibrator - Executing as thirun0000@login20.cluster.bc2.ch on Linux v3.10.0-1127.el7.x86_64 amd64
15:50:12.384 INFO  BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-b03
15:50:12.384 INFO  BaseRecalibrator - Start Date/Time: July 29, 2021 3:50:12 PM CEST
15:50:12.384 INFO  BaseRecalibrator - ------------------------------------------------------------
15:50:12.384 INFO  BaseRecalibrator - ------------------------------------------------------------
15:50:12.385 INFO  BaseRecalibrator - HTSJDK Version: 2.24.0
15:50:12.385 INFO  BaseRecalibrator - Picard Version: 2.25.0
15:50:12.385 INFO  BaseRecalibrator - Built for Spark Version: 2.4.5
15:50:12.385 INFO  BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:50:12.385 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:50:12.385 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:50:12.385 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:50:12.386 INFO  BaseRecalibrator - Deflater: IntelDeflater
15:50:12.386 INFO  BaseRecalibrator - Inflater: IntelInflater
15:50:12.386 INFO  BaseRecalibrator - GCS max retries/reopens: 20
15:50:12.386 INFO  BaseRecalibrator - Requester pays: disabled
15:50:12.386 INFO  BaseRecalibrator - Initializing engine
15:50:12.393 INFO  BaseRecalibrator - Shutting down engine
[July 29, 2021 3:50:12 PM CEST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=162267136
***********************************************************************

A USER ERROR has occurred: Fasta index file file:///scicore/home/cichon/thirun0000/bowtie/Homo_sapiens_assembly38.fasta.fai for reference file:///scicore/home/cichon/thirun0000/bowtie/Homo_sapiens_assembly38.fasta does not exist. Please see http://gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference for help creating it.

***********************************************************************

Thanks

base NGS recalibration preprocessing GATK • 2.0k views
ADD COMMENT
1
Entering edit mode
3.3 years ago
GenoMax 147k

You need to create an index file for the fasta reference. Error you posted above even provides you with a link on how to do this. http://gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference

Link provided in the error does not work so try this one: https://gatk.broadinstitute.org/hc/en-us/articles/360035531652-FASTA-Reference-genome-format

TL;DR you need to do this : samtools faidx ref.fasta

ADD COMMENT
0
Entering edit mode

I tried with index fasta file. Now, I also get an error:

java.lang.IllegalArgumentException: File is not a supported reference file type: /scicore/home/cichon/thirun0000/bowtie/GCF_000001405.26_GRCh38_genomic.fna.fai

Any idea

Thanks

ADD REPLY
0
Entering edit mode

You may need to rename GCF_000001405.26_GRCh38_genomic.fna to GCF_000001405.26_GRCh38_genomic.fa and reindex. While fna files are multi-fasta GATK may not like that.

ADD REPLY
0
Entering edit mode

Is there problem with Java?. I tried renaming the file, still got the same error

java.lang.IllegalArgumentException: File is not a supported reference file type: /scicore/home/cichon/thirun0000/bowtie/GCF_000001405.26_GRCh38_genomic.fa.fai
ADD REPLY
0
Entering edit mode

I figured out that I have to create FASTA reference sequence dictionary file. For creating the dictionary file, the command GATK-Launch does not work.

 gatk-launch -R GCF_000001405.26_GRCh38_genomic.fa
-bash: gatk-launch: command not found
ADD REPLY

Login before adding your answer.

Traffic: 2528 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6