Gatk Error At Countcovariate Tool: Bad Input- Could Not Find Any Usable Data In Input Bam
2
0
Entering edit mode
12.4 years ago
cable82 • 0

Hi,

I am running exome analysis using BWA, samtools, picard, and GATK. When I reached GATK CountCovariates tool, I recieved an error: "Bad input: Could not find any usable data in the input BAM file(s)."

The bam file I used as input for CountCovariates was generated by samtools from BWA sam files. I merged my bam files using picard, used AddorReplaceReadGroups, sorted and indexed using samtools, used Markduplicates to create a dedup.bam file, samtools to index that dedup.bam file, used RealignerTargetCreator, used IndelRealigner to create a realigned.bam and used that realigned.bam as input for the CountCovariate tool.

Any idea what is going on?

Script:

    java -Xmx5g -jar /Users/Cable/Bioinformatics/Applications/GenomeAnalysisTK-1.6-11-g3b2fab9/GenomeAnalysisTK.jar -R /Users/Cable/Bioinformatics/GATKbundle1.5/ucsc.hg19.fasta -knownSites /Users/Cable/Bioinformatics/GATKbundle1.5/hg19/dbsnp_135.hg19.vcf -I /Users/Cable/Bioinformatics/exomepipelinefiles/LD-04400.realignedretry.bam -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /Users/Cable/Bioinformatics/exomepipelinefiles/LD-04400.recal_data.csv
    INFO 20:19:33,071 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:19:33,073 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.6-11-g3b2fab9, Compiled 2012/06/05 21:00:10
    INFO 20:19:33,074 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 20:19:33,074 HelpFormatter - Please view our documentation at <http://www.broadinstitute.org/gsa/wiki>
    INFO 20:19:33,074 HelpFormatter - For support, please view our support site at <http://getsatisfaction.com/gsa>
    INFO 20:19:33,075 HelpFormatter - Program Args: -R /Users/Cable/Bioinformatics/GATKbundle1.5/ucsc.hg19.fasta     -knownSites /Users/Cable/Bioinformatics/GATKbundle1.5/hg19/dbsnp_135.hg19.vcf -I /Users/Cable/Bioinformatics/exomepipelinefiles/LD-04400.realignedretry.bam -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /Users/Cable/Bioinformatics/exomepipelinefiles/LD-04400.recal_data.csv
    INFO 20:19:33,075 HelpFormatter - Date/Time: 2012/07/11 20:19:33
    INFO 20:19:33,075 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:19:33,075 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:19:33,088 RodBindingArgumentTypeDescriptor - Dynamically determined type of /Users/Cable/Bioinformatics/GATKbundle1.5/hg19/dbsnp_135.hg19.vcf to be VCF
    INFO 20:19:33,114 GenomeAnalysisEngine - Strictness is SILENT
    INFO 20:19:33,296 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 20:19:33,341 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04
    INFO 20:19:33,358 RMDTrackBuilder - Loading Tribble index from disk for file /Users/Cable/Bioinformatics/GATKbundle1.5/hg19/dbsnp_135.hg19.vcf
    INFO 20:19:34,908 CountCovariatesWalker - The covariates being used here:
    INFO 20:19:34,908 CountCovariatesWalker - ReadGroupCovariate
    INFO 20:19:34,908 CountCovariatesWalker - QualityScoreCovariate
    INFO 20:19:34,908 CountCovariatesWalker - CycleCovariate
    INFO 20:19:34,909 CountCovariatesWalker - DinucCovariate
    INFO 20:19:35,219 CountCovariatesWalker - Writing raw recalibration data...
    INFO 20:19:36,952 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 1.6-11-g3b2fab9):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our wiki for extensive documentation <http://www.broadinstitute.org/gsa/wiki>
    ##### ERROR Visit our forum to view answers to commonly asked questions <http://getsatisfaction.com/gsa>
    ##### ERROR
    ##### ERROR MESSAGE: Bad input: Could not find any usable data in the input BAM file(s).

My SAM file is as follows:

@HD VN:1.0 GO:none SO:coordinate
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:chr1_gl000191_random LN:106433
@SQ SN:chr1_gl000192_random LN:547496
@SQ SN:chr4_ctg9_hap1 LN:590426
@SQ SN:chr4_gl000193_random LN:189789
@SQ SN:chr4_gl000194_random LN:191469
@SQ SN:chr6_apd_hap1 LN:4622290
@SQ SN:chr6_cox_hap2 LN:4795371
@SQ SN:chr6_dbb_hap3 LN:4610396
@SQ SN:chr6_mann_hap4 LN:4683263
@SQ SN:chr6_mcf_hap5 LN:4833398
@SQ SN:chr6_qbl_hap6 LN:4611984
@SQ SN:chr6_ssto_hap7 LN:4928567
@SQ SN:chr7_gl000195_random LN:182896
@SQ SN:chr8_gl000196_random LN:38914
@SQ SN:chr8_gl000197_random LN:37175
@SQ SN:chr9_gl000198_random LN:90085
@SQ SN:chr9_gl000199_random LN:169874
@SQ SN:chr9_gl000200_random LN:187035
@SQ SN:chr9_gl000201_random LN:36148
@SQ SN:chr11_gl000202_random LN:40103
@SQ SN:chr17_ctg5_hap1 LN:1680828
@SQ SN:chr17_gl000203_random LN:37498
@SQ SN:chr17_gl000204_random LN:81310
@SQ SN:chr17_gl000205_random LN:174588
@SQ SN:chr17_gl000206_random LN:41001
@SQ SN:chr18_gl000207_random LN:4262
@SQ SN:chr19_gl000208_random LN:92689
@SQ SN:chr19_gl000209_random LN:159169
@SQ SN:chr21_gl000210_random LN:27682
@SQ SN:chrUn_gl000211 LN:166566
@SQ SN:chrUn_gl000212 LN:186858
@SQ SN:chrUn_gl000213 LN:164239
@SQ SN:chrUn_gl000214 LN:137718
@SQ SN:chrUn_gl000215 LN:172545
@SQ SN:chrUn_gl000216 LN:172294
@SQ SN:chrUn_gl000217 LN:172149
@SQ SN:chrUn_gl000218 LN:161147
@SQ SN:chrUn_gl000219 LN:179198
@SQ SN:chrUn_gl000220 LN:161802
@SQ SN:chrUn_gl000221 LN:155397
@SQ SN:chrUn_gl000222 LN:186861
@SQ SN:chrUn_gl000223 LN:180455
@SQ SN:chrUn_gl000224 LN:179693
@SQ SN:chrUn_gl000225 LN:211173
@SQ SN:chrUn_gl000226 LN:15008
@SQ SN:chrUn_gl000227 LN:128374
@SQ SN:chrUn_gl000228 LN:129120
@SQ SN:chrUn_gl000229 LN:19913
@SQ SN:chrUn_gl000230 LN:43691
@SQ SN:chrUn_gl000231 LN:27386
@SQ SN:chrUn_gl000232 LN:40652
@SQ SN:chrUn_gl000233 LN:45941
@SQ SN:chrUn_gl000234 LN:40531
@SQ SN:chrUn_gl000235 LN:34474
@SQ SN:chrUn_gl000236 LN:41934
@SQ SN:chrUn_gl000237 LN:45867
@SQ SN:chrUn_gl000238 LN:39939
@SQ SN:chrUn_gl000239 LN:33824
@SQ SN:chrUn_gl000240 LN:41933
@SQ SN:chrUn_gl000241 LN:42152
@SQ SN:chrUn_gl000242 LN:43523
@SQ SN:chrUn_gl000243 LN:43341
@SQ SN:chrUn_gl000244 LN:39929
@SQ SN:chrUn_gl000245 LN:36651
@SQ SN:chrUn_gl000246 LN:38154
@SQ SN:chrUn_gl000247 LN:36422
@SQ SN:chrUn_gl000248 LN:39786
@SQ SN:chrUn_gl000249 LN:38502
@RG ID:vanishing PL:illumina PU:matter LB:white SM:VWM_04400
@PG ID:bwa PN:bwa VN:0.6.2-r126
@PG ID:GATK IndelRealigner VN:1.6-11-g3b2fab9 CL:knownAlleles=[(RodBinding name=knownAlleles source=/Users/Cable/Bioinformatics/GATKbundle1.5/hg19/1000G_phase1.indels.hg19.vcf)] targetIntervals=/Users/Cable/Bioinformatics/GATKbundle1.5/output.intervals LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15 maxReadsInMemory=150000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=null generate_nWayOut_md5s=false check_early=false noPGTag=false keepPGTags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null
HWI-H173:16109L7ACXX:5:1101:10000:100567 77 * 0 0 * * 0 0 TAGTTTCTTTTTCATTCCTGCTCCCTGCCTTAACTCCTCCTCCCACTGCCCCTGATCCCABCCDDDFEFHGHHIJJJJIECIJIJIFHIIJIIJIIJJJJJJIIJJJAHH8BFGHIIIJI RG:Z:vanishing

I'm kind of stuck here and don't really know why I am getting this error. I can't find this error in the forum. Any help would be much appreciated.

Thanks,

Nathan

gatk • 5.3k views
ADD COMMENT
0
Entering edit mode

how big are your bam files? Are they empty?

ADD REPLY
0
Entering edit mode

The bam files is 1.34 GB and they are not empty.

ADD REPLY
0
Entering edit mode

Please edit this question to make it readable by indenting lines of code and output with 4 spaces.

ADD REPLY
0
Entering edit mode

I indented the code and output.

ADD REPLY
0
Entering edit mode

is it a BAM or a SAM file ?

ADD REPLY
0
Entering edit mode

I meant a SAM file...my apologies. I ran the IndelRealigner before the CountCovariates walker and that one worked without error.

ADD REPLY
1
Entering edit mode
12.4 years ago

Hi,

GATK expect a binary BAM file format (not readable). What you present is a textual SAM format...

Could you confirm that your file

/Users/Cable/Bioinformatics/exomepipelinefiles/LD-04400.realignedretry.bam

is in BAM format?

Fred

ADD COMMENT
0
Entering edit mode

It is in bam format. I put my sam file (I meant to say SAM in my post) to show my headers and read groups. I ran the IndelRealigner before the CountCovariates and that one worked fine.

ADD REPLY
0
Entering edit mode
12.4 years ago

In case you haven't solved the why...your problem is probably in your data. At the end of the output you show, you have the following line

HWI-H173:16109L7ACXX:5:1101:10000:100567 77 * 0 0 * * 0 0

followed by some other stuff that's not that important here. The bolded part in the SAM/BAM format indicates which chromosome your read aligns to among other things. If all your data/reads shows up as stars and zeroes, it would indicate that somehow none of your reads are aligning to the reference, hence the "no usable data" complaint (i.e. everything is unmapped).

I'm not sure how to fix it though...

ADD COMMENT
0
Entering edit mode

Thanks for the comment. I figured out that the BWA indexing step did not work and hence the no usuable data just like you were saying with the 00*00.

ADD REPLY

Login before adding your answer.

Traffic: 1694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6