Entering edit mode
8.5 years ago
epigene
▴
590
I wanted to run GATK VariantsToTable on a vcf file but somehow my command finished within 10 seconds and output a file with the header only.. so something must be wrong and I could't see where the issue comes from. Did anyone run into the same issue and know how to solve it?
I could only find a post that had similar issue but no direct answer..
http://gatkforums.broadinstitute.org/gatk/discussion/5260/variantstotable-for-multi-sample-vcfs
Here is the command line and running log info.
java -jar ~/localbin/GenomeAnalysisTK.jar -R ~/genomes/hg38.fa -T VariantsToTable -V test.hg38.WGS.sort.vcf.gz -F CHROM -F POS -F ID -F REF -F ALT -o test.hg38.WGS.sort.vcf_table.txt
INFO 22:47:28,527 HelpFormatter - --------------------------------------------------------------------------------
INFO 22:47:28,531 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22
INFO 22:47:28,531 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 22:47:28,531 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 22:47:28,537 HelpFormatter - Program Args: -R ~/genomes/hg38.fa -T VariantsToTable -V test.hg38.WGS.sort.vcf.gz -F CHROM -F POS -F ID -F REF -F ALT -o test.hg38.WGS.sort.vcf_table.txt
INFO 22:47:28,543 HelpFormatter - Executing as xx on Linux 2.6.18-164.6.1.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0-b147.
INFO 22:47:28,543 HelpFormatter - Date/Time: 2016/06/16 22:47:28
INFO 22:47:28,544 HelpFormatter - --------------------------------------------------------------------------------
INFO 22:47:28,544 HelpFormatter - --------------------------------------------------------------------------------
INFO 22:47:32,236 GenomeAnalysisEngine - Strictness is SILENT
INFO 22:47:35,623 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
WARN 22:47:37,344 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation
INFO 22:47:37,450 GenomeAnalysisEngine - Preparing for traversal
INFO 22:47:37,480 GenomeAnalysisEngine - Done preparing for traversal
INFO 22:47:37,481 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 22:47:37,482 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 22:47:37,483 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 22:47:38,168 ProgressMeter - done 0.0 0.0 s 7.9 d 100.0% 0.0 s 0.0 s
INFO 22:47:38,169 ProgressMeter - Total runtime 0.69 secs, 0.01 min, 0.00 hours
INFO 22:47:39,534 GATKRunReport - Uploaded run statistics report to AWS S3
Does this still happen if you pass an uncompressed VCF file through?
haven't tried that, but i have a huge vcf file so zip and unzip it will be a headache.. actually i run the same command on another zipped vcf file and it run through. so i'm trying to troubleshoot what went wrong with vcf files. so you can run it with zipped vcf without a problem for sure.
You can, but there have been reports of instability. Decompressing won't take that long either, even if it's really big. Look into Pigz (unix utility), that can perform compression / decompression operations in parallel.
i see. will look it up, thanks