Error in VQSR first step
1
0
Entering edit mode
21 months ago
Yoosef ▴ 60

Hello. After inserting following codes for VQSR first step:

java -jar gatk-package-4.3.0.0-local.jar VariantRecalibrator -O /home/yousef/Desktop/haplotype_hg38_SNP_Recal.vcf --resource:hapmap,known=false,training=true,truth=true,prior=15.0 '/media/yousef/EEFC0BDBFC0B9CC9/Sequencing/Next_generation_sequencing/Index_files/VQSR_data/resources_broad_hg38_v0_hapmap_3.3.hg38.vcf' --resource:omni,known=false,training=true,truth=false,prior=12.0 '/media/yousef/EEFC0BDBFC0B9CC9/Sequencing/Next_generation_sequencing/Index_files/VQSR_data/resources_broad_hg38_v0_1000G_omni2.5.hg38.vcf' --resource:1000G,known=false,training=true,truth=false,prior=10.0 '/media/yousef/EEFC0BDBFC0B9CC9/Sequencing/Next_generation_sequencing/Index_files/VQSR_data/resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf' --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 '/media/yousef/EEFC0BDBFC0B9CC9/Sequencing/Next_generation_sequencing/Index_files/VQSR_data/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf' --tranches-file  /home/yousef/Desktop/Tranches.txt -an QD -an SOR -an MQ -an FS -an SOR -an ReadPosRankSum -an MQRankSum -V /home/yousef/Desktop/haplotype.hg38.vcf --max-gaussians 4 -R /media/yousef/EEFC0BDBFC0B9CC9/Sequencing/Next_generation_sequencing/Index_files/hg38_FASTA/hg38.fa --rscript-file /home/yousef/Desktop/Rscript.R

I received the following error:

org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path /media/yousef/EEFC0BDBFC0B9CC9/Sequencing/Next_generation_sequencing/Index_files/VQSR_data/resources_broad_hg38_v0_hapmap_3.3.hg38.vcf
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:436)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:377)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:319)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:291)
    at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:245)
    at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:208)
    at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:155)
    at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:66)
    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
    at org.broadinstitute.hellbender.engine.MultiVariantWalker.onStartup(MultiVariantWalker.java:49)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: /media/yousef/EEFC0BDBFC0B9CC9/Sequencing/Next_generation_sequencing/Index_files/VQSR_data/resources_broad_hg38_v0_hapmap_3.3.hg38.vcf
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:264)
    at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:103)
    at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:128)
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:433)
    ... 15 more
Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
    at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:115)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:262)
    ... 19 more

How should I fix it?

NGS WES VQSR variant • 2.2k views
ADD COMMENT
1
Entering edit mode

could you show us the header of your vcf file. It seems that it's malformed.

ADD REPLY
0
Entering edit mode
##fileformat=VCFv4.2
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##th-waterman-read-to-haplotype-mismatch-penalty -15 --smith-waterman-read-to-haplotype-gap-open-penalty -30 --smith-waterman-read-to-haplotype-gap-extend-penalty -5 --flow-assembly-collapse-hmer-size 0 --flow-assembly-collapse-partial-mode false --flow-filter-alleles false --flow-filter-alleles-qual-threshold 30.0 --flow-filter-alleles-sor-threshold 3.0 --flow-filter-lone-alleles false --flow-filter-alleles-debug-graphs false --min-assembly-region-size 50 --max-assembly-region-size 300 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --assembly-region-padding 100 --padding-around-indels 75 --padding-around-snps 20 --padding-around-strs 75 --max-extension-into-assembly-region-padding-legacy 25 --max-reads-per-alignment-start 50 --enable-legacy-assembly-region-trimming false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">
##contig=<ID=chr1,length=248956422>
##contig=<ID=chr10,length=133797422>
##contig=<ID=chr11,length=135086622>
##contig=<ID=chr11_KI270721v1_random,length=100316

In advance:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  yousef
chr1    16565468    .   T   A   87.64   .   AC=1;AF=0.500;AN=2;BaseQRankSum=1.559;DP=9;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=9.74;ReadPosRankSum=-0.480;SOR=0.283   GT:AD:DP:GQ:PL  0/1:6,3:9:95:95,0,212
chr1    16574912    .   G   C   70.32   .   AC=2;AF=1.00;AN=2;DP=2;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=25.36;SOR=0.693    GT:AD:DP:GQ:PL  1/1:0,2:2:6:82,6,0
chr1    16576232    .   G   C   72.64   .   AC=1;AF=0.500;AN=2;BaseQRankSum=-0.734;DP=16;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=4.54;ReadPosRankSum=0.729;SOR=0.693  GT:AD:DP:GQ:PL  0/1:12,4:16:80:80,0,445
chr1    16576255    .   T   C   86.64   .   AC=1;AF=0.500;AN=2;BaseQRankSum=-1.866;DP=16;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=5.41;ReadPosRankSum=0.061;SOR=0.693  GT:AD:DP:GQ:PL  0/1:12,4:16:94:94,0,431
ADD REPLY
0
Entering edit mode

I checked one of my resource files that the error was related to it. As you can see below, it seems the file is corrupted. I downloaded it again from Broad institute repository, but it is the same. File name: resources_broad_hg38_v0_1000G_omni2.5.hg38.vcf.vcf

?‹??     ÿ? BC? §#]]?·r}ϯXØ/ à+7¿I ?@þ֍,9¶b_#??£ÝÑjàÝÙõì¬|•_Ÿêéi’Ã"\§í?ÛÒœ"›,?‹§ŠÅO?}·½Ý¼»ßß­?—?ùÍ?ûLý˧Ÿ~óâ囯¼ü÷?_]¾zýfõÃë—¿®^¼Z©a?¾ýì«ÍãÕ~ûpØÞï.?y~{ØìwëÃæb}{»¹Ý\\Ý?í??—?Ã'ÿq.èíúúùããúã÷뇇íîæ\Ì›÷›‹»é/.¶»©?ô??ô_??ú»Ÿ^ýp±?Ñ?ÛGúÁ±ÍÛ[ú¿ÝÕýîqûxØP£ÛÝñ·Wï·??w›Ãúz}X·½¸~z8où«§‡ÛíÕñ?Žòׇ‹ÇõÝæâáþq{ìÂ?ÛÃû‹?î÷›‹o©•ýššùéê~¿iEo¯Õp.û?‚Ò¯ÕpñöáâþÝÅzwñÛîþ?õôzsËñº×(Þuá?E÷[wXëûÍ»Wçø?7ï6ûÍîjsñvMƒG?÷êÙÅóy?¯7Û›Ýæú8Åš´çpÒ ÇIòë?¿þæ(ùÛ/?{õt÷v³¿|öÙ›?›Ëonïׇó¶hf®?^OL%àÍ,@M?~:ì™?’„û?ýå?ÞŽjýrûî°¹þy½ß®w‡ÇËOÖ¤s?IÙV‡c/:¿¡¡yx:¬Æ5uù?ÿ{±ß¬¯WoŸÞÑ8¬?·ÿ·¹Ü=Q??Þßï6«÷÷wÔ‘7Ï_}õüǯ¦_¾;J?G
ÿ°¾}œ?›^Ý>]o^œÿéü£Õãæ°Ú?Q“ÿýêÅëWåÏï6û?úÎËç/_R?§É _ÿþ4þÇåçïŸîn6»ÏÕo7Ÿç¿?ÿp½[ݨßV?LxönýxX_ìﯿ I?{¶»ß}µ!ùwÛ?­ºíՏëÝõýÝO›Íõå;êÙæâš4„??-ªÝÍ4P_üºúéù÷?¼üºüÝfu¸_½Û¯¯Žcüšó¿»ºÿ°Ù¯o6—£É!ýùýòõ7ߌÿþvýðúa³ûaCsqøxi‡gÃÅÃf´?Ôý—÷7“¸§ÇÍëý–>}û_Oë[Zʛǹ‡›wë§ÛÃ?¤”å¯þ¦.hȶ×G³³z$?¹:ì6—?½xùõ«7?OÔ¹w§  Ü=Ý­?ïÇ){¼TÓÔÝìïŸ?Voo×W¿­ni`N3½¹ÞÞì7G]˜ÿ{Ò½úO~ÎíNÊùæÇ?_¾??Ãý?«¬?«Ñ?­žvãÚû熴j}7Îöqý–FmóO?‡í?™(šú?{·¹»ß\=¾_ï¯Ç&§ŸßÞߌJ±ºÝ|ØÜ^¾xõÍëñŽó1jí±ßï7·?§Ÿ˜4ûò_œt`4Ð;²—§¿¸x¼Ú“2=û|xf2ÚØ!?CôN={¼ßÓÚxöáêÝ¿]Ü??.ï÷7ÏÞîï×$„TçðtØ<£ïnžÝ¬?¿=ÛÞÓÿ=½}|FÛÐ/û-}÷Oô¿?¯^¯¾ûúùWdeþ?œŒøæqu¿»ýø×ðÓš\Ým?iG¢…Fn´f+RûÕïOãZ<ŽÓ'd4Ʊ<Ú›/”
Ö¸¡}Is¼§çhr2zÞdŽ¦?²|íž”E}÷‹ŒÿŽ´ããß~Ùlwô»›‹¯ÚÞnßî·OwGY´½?¶7GiWï÷ê³ÛÍîæðþRÛ˜œ·ZF»ææî-
¯?øïuù½VÉ8þü÷fþ½JQ'çœð{[~?he   ðç¿wù÷Q9?µ$ßçß?Òj—‚ðûï’±ô{ó翏ù÷ãÚ‰Þø?ÿ}Ê¿7Ñ$?Tøóß“C’?&¤ N˜R?àh?{? «?tpf?†H•9VÖxkt? e’‡0X?”?(³<¨””ŠR—ò4§ÁÐÀZA‹Tžæh´?Ö*á÷yšã`?}°$?O³‹^?¯?­Ðy–½¥”?´BçI¶>?)EAKužcR  ?­?&à?e?ø¶ƒ$|ï¯Y|Ð:X%üüû,Þ;/Míê?_è@zã?¨Õþè«”%í?É^dx`p£?#­??yã:øˆÂ?ƒ{?ý€¡ÕÀÐv ¥?ÑŠ£µVhÛº3ê¤?‚¾e¸ét=ˆ«sF[ŽV¢ÅÔ3Úñ®{E??„wôÍ‘?–ö¿Õ·/Éá¥?îÀIBçí?'÷ëƒnàzHA2ÍîÔzG_“ö’?Mó§óµb‡Á?ʞѼmC;¯?Ñ|©ÂEIef¸ækŤÁIÚ>k\oÖ†ÁHæ[åiKiSâ´Íx­¹Îj¥”ä„?þÔÿÎrMVþþSûôÛ?^YÑFçöùz7QIÛAóæM?]Â?ï¬ø £hë2ž/yrVŒôñn†w–ägK¾ËŒïl2‘¤J.®
óªwLûT$?\rnÂÜ?¾öt?¢8 3Þðŧ”vN²¸³É5=›9?Ét?<×~rE¤½ªÀ¹ò*ò¸ƒtBÊøÎâñN\|?ÏÕ×’ö£×Ù°¢J?nž«¯?È—?Ô·à;{N2AX¼?ŏþLr´=¡mÇî“9?¾ý¿w?Ü?¤9YåɁEa¶‚Ñ\£0S`dr?ý*0WÁ¬¤–?¦ëo#œ°—?ÜPpã‘?„©
¦hóCa¾þ8‹Î€rg??w²??:=¡(Uš‘\Á‚??g‚O VÚº—ä?€ÍÙº›Ú[pL¬ª•ÙJ¾b¥ª“ƒD??X<ÓJpéØZ)µÑ vY}¶PQ³`ë‰Ó⡧´¦Ït?þ¶Ê,¤?Zm?b??ŽÕõ´?¶f«O3´·À@ïÏÖ?:”¾þº€?fëÃy7A˜©ZKZdU
.Ö?UÓžö³?Z‘˘a®¶±Zd[2lHõâ‘H¸ÒÚÙR =qC=(Ir?Kkµ¢˜?œ:§Îvpñ¨[pgë *~^½ÈG5ÓÕÆ™Z?è€?«4ÓÃËÕ
ADD REPLY
1
Entering edit mode

It looks like a binary file. Can you try bcftools view resources_broad_hg38_v0_1000G_omni2.5.hg38.vcf.vcf | head?

ADD REPLY
0
Entering edit mode

Here is the result

bcftools view  '/media/yousef/Games/resources_broad_hg38_v0_1000G_omni2.5.hg38.vcf.vcf' | head
##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=NOT_POLY_IN_1000G,Description="Alternate allele count = 0">
##FILTER=<ID=badAssayMapping,Description="The mapping information for the SNP assay is internally inconsistent in the chip metadata">
##FILTER=<ID=dup,Description="Duplicate assay at same position with worse Gentrain Score">
##FILTER=<ID=id10,Description="Within 10 bp of an known indel">
##FILTER=<ID=id20,Description="Within 20 bp of an known indel">
##FILTER=<ID=id5,Description="Within 5 bp of an known indel">
##FILTER=<ID=id50,Description="Within 50 bp of an known indel">
##FILTER=<ID=refN,Description="Reference base is N. Assay is designed for 2 alt alleles">
ADD REPLY
2
Entering edit mode
21 months ago
Ram 44k

Use bcftools to save the file as a vcf.gz file and then use that file instead of this one.

ADD COMMENT
0
Entering edit mode

Can you please explain more? Which function should I use? What is the problem with this file now?

ADD REPLY
0
Entering edit mode

Please Google these things - read the bcftools view manual. BCF is binary VCF format and given that you have a BCF file named as .vcf.gz and the error says ...malformatted..., it probably means that the tool expects VCF format and you provided a BCF file.

ADD REPLY
0
Entering edit mode

Thanks a lot. do I need to make a new index file after conversion too?

ADD REPLY
0
Entering edit mode

Yes, of course - indexes are file specific, not content specific.

ADD REPLY
0
Entering edit mode

After converting the file and creating an index for it, the problem solved. Thanks again.

ADD REPLY
0
Entering edit mode

I've moved my comment to an answer. Please accept it to resolve the post.

ADD REPLY

Login before adding your answer.

Traffic: 2515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6