Hi,
I wanted to see a specific frequency of mutations within an interval (through bed files). Therefore, I used snpsift annotate and snpsift filter for annotation and filtering. After repeating this event for ~500 files I wanted see them as a bundle, so I needed to use vcf-concatete to merge them in a single vcf file. But know, my problem is when I try to visualise this big bundle with IGV tools, I get this error,
Error loading features for interval: 16:17478977-17479244 htsjdk.tribble.TribbleException: Line 69: there aren't enough columns for line 16 17479124 . CAT C . alleleBias;QD;ALTC;VAFC;QUAL;GTQ SOMATIC;BRF=0.36;FR=0.2500;HP=2;HapScore=1;MGOF=11;MMLQ=28;MQ=55.96;NF=6;NR=2;PP=14;QD=4.125;SC=CATAAACACACATACACACAC;SbPval=0.29;Source=Platypus;TC=65;TCF=30;TCR=35;TR=8;WE=17479136;WS=17479114;HE=1 (we expected 9 tokens, and saw 8 ), for input source: /Users/morova/Google Drive/TuncProjectStep2-3/Yunus_HPC/AR_250widen/AR_250widen_final_output_from_all_batches/EOPC-DE/DO10806/final.f601cf2f-081f-484d-ab0e-21a8ec8d3770.indel_dkfz.vcf.gz
I dont have any idea why this happens and what is wrong with my vcf files.
Does anyone have an idea about this problem ?
Help would be greatly appericiated.
Best regards,
Tunc.
EDIT: I added a sample of the vcf that I am trying to visualise.
Also, the error became like;
Line 69: there aren't enough columns for line 1 164840924 . TGTGGGGAG T . alleleBias;QD;VAF;QUAL;ALTT;GTQ;GTQFRT SOMATIC;BRF=0.17;FR=0.2499;HP=2;HapScore=1;MGOF=1;MMLQ=100;MQ=60.0;NF=0;NR=0;PP=5;QD=0;SC=CCTGGGAACCTGTGGGGAGCC;SbPval=1.0;Source=Platypus;TC=64;TCF=24;TCR=40;TR=0;WE=164840940;WS=164840914;HE=1 (we expected 9 tokens, and saw 8 ), for input source: /Users/morova/Google Drive/TuncProjectStep2-3/Yunus_HPC/AR_250widen/AR_250widen_final_output_from_all_batches/EOPC-DE/DO10806/final.f601cf2f-081f-484d-ab0e-21a8ec8d3770.indel_dkfz.vcf
The Subject vcf;
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CONTROL TUMOR
1 164840924 . TGTGGGGAG T . alleleBias;QD;VAF;QUAL;ALTT;GTQ;GTQFRT SOMATIC;BRF=0.17;FR=0.2499;HP=2;HapScore=1;MGOF=1;MMLQ=100;MQ=60.0;NF=0;NR=0;PP=5;QD=0;SC=CCTGGGAACCTGTGGGGAGCC;SbPval=1.0;Source=Platypus;TC=64;TCF=24;TCR=40;TR=0;WE=164840940;WS=164840914;HE=1
1 243862215 . CA C . badReads;QD;ALTC;VAFC;QUAL;GTQ SOMATIC;BRF=0.25;FR=0.2499;HP=15;HapScore=2;MGOF=2;MMLQ=5;MQ=60.0;NF=6;NR=7;PP=28;QD=2.38461538462;SC=GAGGCTGGACCAAAAAAAAAA;SbPval=0.33;Source=Platypus;TC=56;TCF=21;TCR=35;TR=13;WE=243862239;WS=243862205;HE=1
2 20327814 . CA C . QD;ALTC;VAFC;QUAL;GTQ SOMATIC;BRF=0.34;FR=0.2502;HP=17;HapScore=4;MGOF=13;MMLQ=20;MQ=57.24;NF=0;NR=7;PP=12;QD=2.14285714286;SC=ACTCTGTTTCCAAAAAAAAAA;SbPval=1.0;Source=Platypus;TC=50;TCF=15;TCR=35;TR=7;WE=20327824;WS=20327804;HE=1
2 227076637 . TTATA T . QUAL;ALTT;GTQ SOMATIC;BRF=0.27;FR=0.2512;HP=1;HapScore=2;MGOF=0;MMLQ=27;MQ=60.0;NF=0;NR=1;PP=19;QD=30.0;SC=TATGTGTATATTATATATATA;SbPval=0.44;Source=Platypus;TC=22;TCF=10;TCR=12;TR=1;WE=227076649;WS=227076627;HE=1
Edit2: I understood why the problem happened. FORMAT column info was missing in the VCF's. SNP sift somehow deleted it.
Snpsift omitted that information during annotation and filtration step. Is there any idea how can I obtain that information?
Best,
Tunc.
Look up that line in your file (with e.g.
grep
) and have a look at how it differs from other lines. Alternatively/additionally, try to delete the line (with e.g.grep -v
) and see if there are more problematic lines.Thank you for the effort!. I somehow solved the problem. But now I have another error(I posted right below the previous one), keeps coming even If I delete the problematic line that causes problem.