vcf files in gatk not showing data lines
0
0
Entering edit mode
8.6 years ago

Hi,

I tried using Gatk tool to find variants in a chromosome 21 of human genome. I used HaplotypeCaller tool for finding the variants using the following command:

java -jar GenomeAnalysisTK.jar -R chr21/chr21.fa -T HaplotypeCaller -I chr21/alignments/human38chr21.sorted.bam -o chr21/variants/answerold.raw.snps.indels.vcf

as given on following url: https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php

I got a vcf file but it contains only the header part and there is not data lines in it.What is wrong?

gatk haplotypecaller SNP indel • 4.0k views
ADD COMMENT
0
Entering edit mode

Can you post the stack trace you get when you run this command?

ADD REPLY
0
Entering edit mode
INFO  10:43:04,724 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  10:43:04,734 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56 
INFO  10:43:04,734 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  10:43:04,734 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  10:43:04,747 HelpFormatter - Program Args: -R chr21/chr21.fa -T HaplotypeCaller -I chr21/alignments/human38chr21new.sorted.bam -o chr21/humanoutput.raw.snps.indels.vcf 
INFO  10:43:04,763 HelpFormatter - Executing as aditya@aditya-VirtualBox on Linux 3.19.0-56-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_45-internal-b14. 
INFO  10:43:04,775 HelpFormatter - Date/Time: 2016/04/06 10:43:04 
INFO  10:43:04,775 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  10:43:04,776 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  10:43:05,009 GenomeAnalysisEngine - Strictness is SILENT 
INFO  10:43:05,377 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500 
INFO  10:43:05,410 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  10:43:05,571 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.16 
INFO  10:43:05,647 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 
INFO  10:43:06,050 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
INFO  10:43:06,448 GenomeAnalysisEngine - Done preparing for traversal 
INFO  10:43:06,451 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  10:43:06,458 ProgressMeter -                 |      processed |    time |         per 1M |           |   total | remaining 
INFO  10:43:06,458 ProgressMeter -        Location | active regions | elapsed | active regions | completed | runtime |   runtime 
INFO  10:43:06,469 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output 
INFO  10:43:06,618 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. 
WARN  10:43:06,621 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. 
INFO  10:43:06,622 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. 
INFO  10:43:06,879 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units 
Using AVX accelerated implementation of PairHMM
INFO  10:43:11,839 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file 
INFO  10:43:11,842 VectorLoglessPairHMM - Using vectorized implementation of PairHMM 
INFO  10:43:36,488 ProgressMeter -  chr21:10690236              0.0    30.0 s           49.7 w       22.9%     2.2 m     101.0 s 
INFO  10:44:06,490 ProgressMeter -  chr21:17044140              0.0    60.0 s           99.3 w       36.5%     2.7 m     104.0 s 
INFO  10:44:36,494 ProgressMeter -  chr21:22543743              0.0    90.0 s          148.9 w       48.3%     3.1 m      96.0 s 
INFO  10:45:06,496 ProgressMeter -  chr21:27653512              0.0   120.0 s          198.5 w       59.2%     3.4 m      82.0 s 
INFO  10:45:36,497 ProgressMeter -  chr21:33975834              0.0     2.5 m          248.1 w       72.7%     3.4 m      56.0 s 
INFO  10:46:06,499 ProgressMeter -  chr21:40228258              0.0     3.0 m          297.7 w       86.1%     3.5 m      29.0 s 
INFO  10:46:36,130 VectorLoglessPairHMM - Time spent in setup for JNI call : 0.06849923200000001 
INFO  10:46:36,130 PairHMM - Total compute time in PairHMM computeLikelihoods() : 1.20202936 
INFO  10:46:36,131 HaplotypeCaller - Ran local assembly on 0 active regions 
INFO  10:46:36,133 ProgressMeter -            done      4.6709983E7     3.5 m            4.0 s      100.0%     3.5 m       0.0 s 
ADD REPLY
0
Entering edit mode
INFO  10:46:36,135 ProgressMeter - Total runtime 209.68 secs, 3.49 min, 0.06 hours 
INFO  10:46:36,135 MicroScheduler - 169827 reads were filtered out during the traversal out of approximately 962428 total reads (17.65%) 
INFO  10:46:36,135 MicroScheduler -   -> 0 reads (0.00% of total) failing BadCigarFilter 
INFO  10:46:36,136 MicroScheduler -   -> 0 reads (0.00% of total) failing DuplicateReadFilter 
INFO  10:46:36,137 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter 
INFO  10:46:36,137 MicroScheduler -   -> 169827 reads (17.65% of total) failing HCMappingQualityFilter 
INFO  10:46:36,138 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter 
INFO  10:46:36,138 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter 
INFO  10:46:36,139 MicroScheduler -   -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter 
INFO  10:46:36,150 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter 
INFO  10:46:37,534 GATKRunReport - Uploaded run statistics report to AWS S3 
ADD REPLY
0
Entering edit mode

I have added the stack trace in 2 replies due to limit placed on the number of characters in the reply text area.

ADD REPLY
0
Entering edit mode

I've also had issues with some of the GATK tools outputting what were basically empty vcf files. I found that running the base recalibrator before I ran haplotype caller changed my outputs significantly. I wonder if you're not running into something similar.

ADD REPLY
0
Entering edit mode

where do I get to known sites parameter.I can't find it.

ADD REPLY
0
Entering edit mode

approximately 962.428 total reads

So you have less than 1 million reads for all of chr21?

ADD REPLY
0
Entering edit mode

Have you tried to use GATK forum? This seems GATK specific.

ADD REPLY
0
Entering edit mode

what is the size of your bam file?

Also try to see inside bam file with:

samtools tview chr21/chr21.fa chr21/alignments/human38chr21.sorted.bam

Then give us a feedback.

ADD REPLY
0
Entering edit mode

Are u running this locally or on the cluster?

ADD REPLY

Login before adding your answer.

Traffic: 1543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6