Entering edit mode
8.2 years ago
sarthak
•
0
I am trying to run variant calling for input BAM file as follows:
java -Xmx5120m -jar /data/spark/tools/GenomeAnalysisTK.jar -T HaplotypeCaller -nct 4 -R /data/spark/ref/ucsc.hg19.fasta -I /data/sarthaksharma/region0-3.bam -o region0.vcf -stand_call_conf 30.0 -stand_emit_conf 30.0 -L /data/sarthaksharma/bed0.intervals --no_cmdline_in_header --disable_auto_index_creation_and_locking_when_reading_rods
I am getting the following error:
##### ERROR
##### ERROR MESSAGE: Badly formed genome loc: Contig 'chrY 150797 151037' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?
##### ERROR ------------------------------------------------------------------------------------------
I tried to look in the answers here: Snp Calling Error By Gatk Unifiedgenotyper Program but it didn't work.
When I change the format of the .intervals
file to .bed
, it ran but gave an empty vcf file.
java -Xmx5120m -jar /data/spark/tools/GenomeAnalysisTK.jar -T HaplotypeCaller -nct 4 -R /data/spark/ref/ucsc.hg19.fasta -I /data/sarthaksharma/region0-3.bam -o region0.vcf -stand_call_conf 30.0 -stand_emit_conf 30.0 -L /data/sarthaksharma/bed0.bed --no_cmdline_in_header --disable_auto_index_creation_and_locking_when_reading_rods
The log for this command:
Total compute time in PairHMM computeLikelihoods() : 0.0
INFO 10:39:24,505 HaplotypeCaller - Ran local assembly on 0 active regions
INFO 10:39:24,532 ProgressMeter - done 1.76e+06 2.0 s 1.0 s 100.0% 2.0 s 0.0 s
INFO 10:39:24,534 ProgressMeter - Total runtime 2.03 secs, 0.03 min, 0.00 hours
INFO 10:39:24,535 MicroScheduler - 488 reads were filtered out during the traversal out of approximately 507 total reads (96.25%)
INFO 10:39:24,536 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter
INFO 10:39:24,537 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 10:39:24,538 MicroScheduler - -> 488 reads (96.25% of total) failing HCMappingQualityFilter
INFO 10:39:24,538 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 10:39:24,539 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 10:39:24,539 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 10:39:24,540 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter
INFO 10:39:26,141 GATKRunReport - Uploaded run statistics report to AWS S3
Could someone please tell any possible cause of this?
EDIT
This is the snippet of the .bed
file (tab separated):
chrY 150797 151037
chrY 155397 155496
chrY 157258 157446
chrY 158123 158363
chrY 249337 249469
chrY 251487 251707
chrY 252086 252146
The Reference sequence used to generate the BAM is not the same one you are using. Check with
to know which Reference was used to generate BAM.
Can you tell me what the result is of running without the
-L
argument? How is your bed file separated?The bed file is tab separated...I have als added the snippet in the question above. And when I run without the -L argument, it ran and printed the following three entries in the
vcf
file:So variantcalling works and there is a problem with the interval file... You are sure that it's tab separated and not mixed with spaces or something?
Did you notice in the log
488 reads were filtered out during the traversal out of approximately 507 total reads (96.25%)
? So that explains why you only get three variants.With the
-L
argument, would those 3 variants be filtered out? Essentially, are those 3 variants included in the bed file intervals?I have checked it is a tab separated file. Here is how I am making it:
Would it do filtering if the intervals file is not in correct format.
Besides the problem with the interval file I would worry more about the bam file, given that only 19 reads are considered good enough by HaplotypeCaller.
Could you run
sed 's/ \+ /\t/g' bed0.bed > newbed0.bed
and check if that solves your problem?Thanks...but this too didn't help
Strange. You ran HaplotypeCaller with
-L newbed0.bed
right? The error was the same?Can you show me the output of
and
(as already suggested by venu)
When I am running with the newbed0.bed, I get the same log as I have shown in the question, no error but the vcf file is empty. Here is the snippet of the grep (not complete result, there are lot of lines)
And for:
Right, so, connecting the evidence:
-L
argument are in the interval(s) specified by the bed fileThank a lot for your help....I will check the bam file