Can't call subsampled bam file with GATK Haplotypecaller with --disable-tool-default-read-filters
0
0
Entering edit mode
22 months ago
berndmann ▴ 10

I want to simulate variant calling of an ultra-low-coverage >0.005x bam file. I subsampled reads from the (HG02024) sample of the 1KG phase 3 dataset.

My code in R to do so is the following (bam and reference are just path extensions, file is the inital bam file):

cov_rate <- 11282/28533515
#subsampling with samtools 
system(str_glue("samtools view  -s {cov_rate} {file} -o {bam}HG02024_cov_{cov_rate}x.bam"))
#sort the bam file
system(str_glue("samtools sort {bam}HG02024_cov_{cov_rate}x.bam -o {bam}HG02024_cov_{cov_rate}x_sort.bam"))
#index the bam file
system(str_glue("samtools index {bam}HG02024_cov_{cov_rate}x_sort.bam"))
#extract chromosome 1-22 from bam file, everything else is not relevant in the further analysis 
system(str_c('samtools view  -b ',bam,'HG02024_cov_',cov_rate,'x_sort.bam  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 > ',bam,'HG02024_cov_',cov_rate,'x_sort_chr1_22.bam'))

call = str_glue(' python3 $HOME/gatk/gatk --java-options "-Xmx28g" HaplotypeCaller --disable-tool-default-read-filters --native-pair-hmm-threads 24 -R  {reference}/GRCh37/references_hs37d5_hs37d5_chr1_22.fa -I HG02024_cov_{cov_rate}x_sort_chr1_22.bam -O {str_replace(i, ".bam", ".vcf.gz")} -ERC GVCF')
system(call)

But then the HaploTypeCaller delivers this error message when I add "--disable-tool-default-read-filters" which is necessary for the project:

java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.locusiterator.ReadStateManager.readStartsAtCurrentPosition(ReadStateManager.java:117)
    at org.broadinstitute.hellbender.utils.locusiterator.ReadStateManager.collectPendingReads(ReadStateManager.java:144)
    at org.broadinstitute.hellbender.utils.locusiterator.LocusIteratorByState.lazyLoadNextAlignmentContext(LocusIteratorByState.java:287)
    at org.broadinstitute.hellbender.utils.locusiterator.LocusIteratorByState.hasNext(LocusIteratorByState.java:224)
    at org.broadinstitute.hellbender.utils.locusiterator.IntervalAlignmentContextIterator.advanceAlignmentContext(IntervalAlignmentContextIterator.java:104)
    at org.broadinstitute.hellbender.utils.locusiterator.IntervalAlignmentContextIterator.advanceAlignmentContextToCurrentInterval(IntervalAlignmentContextIterator.java:99)
    at org.broadinstitute.hellbender.utils.locusiterator.IntervalAlignmentContextIterator.next(IntervalAlignmentContextIterator.java:69)
    at org.broadinstitute.hellbender.utils.locusiterator.IntervalAlignmentContextIterator.next(IntervalAlignmentContextIterator.java:21)
    at org.broadinstitute.hellbender.engine.AssemblyRegionIterator.loadNextAssemblyRegion(AssemblyRegionIterator.java:123)
    at org.broadinstitute.hellbender.engine.AssemblyRegionIterator.next(AssemblyRegionIterator.java:115)
    at org.broadinstitute.hellbender.engine.AssemblyRegionIterator.next(AssemblyRegionIterator.java:35)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:192)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1095)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

The NullPointerException first led me to the conclusion that due to the removal of some non-relevant contigs from the bam file I might need to edit the header file as well. Then I learned due to some other posts on biostar that editing the header of a SAM/BAM file seems to be generally a bad idea. And indeed gave me an corrupted bam file after tried. Then I tried to edit the reference fasta file. Sadly the error from HaploTypeCaller persists.

Does anyone know where this error comes from and how to solve it?

ultra-low-coverage GATK HaplotypeCaller • 971 views
ADD COMMENT
0
Entering edit mode

are you using R as a workflow manager ?

ADD REPLY
0
Entering edit mode

https://github.com/broadinstitute/gatk/issues/4665#issuecomment-382367333

but I wouldn't expect tools to run and/or produce meaningful results if all read filters are completely disabled.

ADD REPLY
0
Entering edit mode

@Pierre Lindenbaum: Thanks for your input! Yes, R is my workflow manager indeed. I worked with this low-cov calling pipeline before and followed a guide for ultra-low-coverage calling, though I can't find it anymore. This guide recommended to use this kind of calling strategy to preserve as many reads as possible. Could you extend the argument why this is really such a "big hammer"?

ADD REPLY
0
Entering edit mode

Could you extend the argument why this is really such a "big hammer"?

Makefile, nextflow, snakemake, etc... you're re-inventing the wheel without the good parts (parallelism, caching... )

ADD REPLY

Login before adding your answer.

Traffic: 3740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6