VCF file not being indexed, decided to sort could not sort
1
0
Entering edit mode
8 months ago
hgrsce • 0

I decided to index a file and found that it would not be indexed, giving me the following output:

[ti_index_core] the chromosome blocks not continuous at line 7086, is the file sorted? [pos 4397121]

Then I try sorting the file, giving me the following output:

Writing to /tmp/bcftools-sort.a8aIlN
[W::bcf_hdr_check_sanity] GL should be declared as Number=G
[W::vcf_parse_info] INFO 'ISTP' is not defined in the header, assuming Type=String
Error encountered while parsing the input at chr1:4397121
Cleaning

I checked the line that is considered a problem here, and it did seem somewhat off, in that there is no tab following the "." after the position of the chromosome in the vcf file. I was hoping to gain some guidance regarding this issue.

Thank you

bcftools • 861 views
ADD COMMENT
0
Entering edit mode

how did you get that vcf ile ? what is the origin of the VCF ?

how about your previous questions like; Issues with bcftools

ADD REPLY
0
Entering edit mode

I deleted the empty files and ended up concatenating the files into the vcf file I am having issues with currently. I concatenated files for a set of samples and then merged the resulting file with another sample's concatenated file.

ADD REPLY
0
Entering edit mode

What do you see with a grep -C 4 4397121 <yourvcf> ? I suspect you concatenated headers into the vcf.

ADD REPLY
0
Entering edit mode

This is the output:

chrX    150711997   .   G   <INS:ME:ALU>    .   hDP;lc;s25  T

SD=AAGATGGTGATAACTG;ASSESS=5;INTERNAL=XR_001714229.1,INTRONIC;SVTYPE=ALU;SVLEN=281;MEINFO=AluY,0,281,+;DIFF=0.38:c4t,c20t,g51a,t89c,t144c,g196a,t244c,g269c;LP=5;RP=4;RA=0.322;PRIOR=false;SR=12    GT:GL:DP:AD ./.:-0,-0,-0:0:0    0/0:-0,-2.41,-48:4:0    ./.:-0,-0,-0:0:0    1/1:-20.6,-1.81,-0.6:3:2    ./.:-0,-0,-0:1:1    1/1:-134,-7.83,-0:13:13 ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0
chrX    151108204   .   T   <INS:ME:ALU>    .   lc;s25  TSD=AAATTAATGTT;ASSESS=5;INTERNAL=XM_016944625.2,INTRONIC;SVTYPE=ALU;SVLEN=64;MEINFO=AluYb,217,281,+;DIFF=0.23:c112g,c138t,c142t,g149a,c150t,g152c,g153a,c154t,c236t,i252gcagtcc,c248g,a252g;LP=2;RP=4;RA=-1;PRIOR=false;SR=13  GT:GL:DP:AD ./.:-0,-0,-0:1:0    0/0:-0,-3.61,-72:6:0    ./.:-0,-0,-0:1:0    ./.:-0,-0,-0:1:0    ./.:-0,-0,-0:0:0    0/1:-34,-5.42,-67.9:9:3 ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0
chrX    151346337   .   T   <INS:ME:ALU>    .   s25 TSD=CTCTTCCAGGCATCTTT;ASSESS=5;INTERNAL=XM_016947590.1,INTRONIC;SVTYPE=ALU;SVLEN=275;MEINFO=AluYb8,6,281,-;DIFF=0.98:c57t,c64t,c98a,i127aa,t144c,g211a,c236t,c248g,i252gcagtcc,g253a,a252g;LP=5;RP=7;RA=-0.485;PRIOR=false;SR=26    GT:GL:DP:AD ./.:-0,-0,-0:1:0    0/0:-0.07,-4.21,-64.5:7:0   ./.:-0,-0,-0:1:0    ./.:-0,-0,-0:1:0    1/1:-29.8,-1.81,-0:3:3  0/1:-151.8,-13.85,-70:23:16 ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0
chrX    151357243   .   T   <INS:ME:ALU>    .   hDP;s25 TSD=AAAACTGCCACGTT;ASSESS=5;INTERNAL=XM_016947601.2,PROMOTER;SVTYPE=ALU;SVLEN=281;MEINFO=AluYb,0,281,+;DIFF=0.79:c157a,g199a,c206t,c248g,a252g;LP=7;RP=4;RA=0.807;PRIOR=false;SR=14 GT:GL:DP:AD ./.:-0,-0,-0:0:0    0/0:-0,-1.81,-36:3:0    ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0    0/0:-1.2,-1.2,-1.2:2:1  0/1:-94,-7.83,-42:13:9  ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0
chr1    4397121 .   A   <INS:ME:LINE1>  .   hDP;s25 TSD=ACATTCTTTTA;ASSESS=5;INTERNAL=XR_001717477.2,5_UTR;SVTYPE=LINE1;SVLEN=-1;MEINFO=L1Ambig,-1,6019,-;DIFF=0.02:n1-5433,g5538c,n5586-6019;LP=1;RP=10;RA=-3.322;ISTP=5558;PRIOR=false;SR=29  GT:GL:DP:AD 0/0:-0,-1.81,-36:3:0    0/0:-0.02,-2.41,-39.4:4:0   0/0:-0,-1.2,-24:2:0 ./.:-0,-0,-0:1:0    1/1:-22,-1.2,-0:2:2 0/1:-157,-12.64,-72:21:14   ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0
chr1    12903758    .   G   <INS:ME:LINE1>  .   hDP;lc;s25  TSD=GAAAAGAAGGGGGAAGGG;ASSESS=5;INTERNAL=XM_016936493.2,INTRONIC;SVTYPE=LINE1;SVLEN=-1;MEINFO=L1Ambig,-1,6019,+;DIFF=0.02:n1-5037,c5070t,n5139-5161,n5330-6019;LP=7;RP=2;RA=1.807;ISTP=5296;PRIOR=false;SR=22   GT:GL:DP:AD ./.:-0,-0,-0:1:0    0/0:-0,-1.2,-24:2:0 ./.:-0,-0,-0:1:0    ./.:-0,-0,-0:0:0    1/1:-19.6,-1.2,-0:2:2   0/1:-163,-12.04,-42.1:20:16 ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0
chr1    24191837    .   G   <INS:ME:LINE1>  .   s25 TSD=GAAAGAGGCCCAGGAG;ASSESS=5;INTERNAL=NM_001038649.1,TERMINATOR;SVTYPE=LINE1;SVLEN=6005;MEINFO=L1Ambig,13,6018,-;DIFF=0.05:n1-12,i73g,a129g,a140g,c143t,t149g,t153c,c155t,g167a,c197t,g247c,t254a,g256a,c284t,a301g,n309-6019;LP=19;RP=4;RA=2.248;ISTP=0;PRIOR=false;SR=40 GT:GL:DP:AD 0/0:-0,-1.81,-36:3:0    0/0:-0,-4.21,-79.6:7:0  ./.:-0,-0,-0:1:0    0/1:-17,-3.01,-28.8:5:2 1/1:-68,-3.61,-0:6:6    0/1:-261.8,-17.46,-48:29:25 ./.:-0,-0,-0:1:1    ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0
chr1    27695769    .   T   <INS:ME:LINE1>  .   lc;s25  TSD=TTTTTT;ASSESS=5;INTERNAL=XM_009452214.3,INTRONIC;SVTYPE=LINE1;SVLEN=336;MEINFO=L1Ambig,5683,6019,-;DIFF=0.05:n1-5682,a5712g,n5852,t5877a,a5931g,t5998g;LP=10;RP=10;RA=0;ISTP=0;PRIOR=false;SR=22    GT:GL:DP:AD ./.:-0,-0,-0:0:0    0/0:-0,-5.42,-108:9:0   0/0:-0,-1.2,-24:2:0 0/0:-0.6,-1.81,-17.4:3:1    ./.:-0,-0,-0:1:1    0/1:-261.3,-17.46,-48:29:25 ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:1:1
chr1    28365528    .   T   <INS:ME:LINE1>  .   s25 TSD=dTAGGTGAAGCTAAAGATG;ASSESS=5;INTERNAL=null,null;SVTYPE=LINE1;SVLEN=786;MEINFO=L1Ambig,5232,6018,-;DIFF=0.05:n1-5231,g5268a,a5277g,t5316a,g5319a,t5325c,a5349g,c5391t,c5413a,g5420a,c5434a,g5465a,a5487g,n5494-6019;LP=20;RP=2;RA=3.322;ISTP=0;PRIOR=false;SR=27 GT:GL:DP:AD ./.:-0,-0,-0:0:0    0/0:-0,-3.61,-66.6:6:0  ./.:-0,-0,-0:1:0    1/1:-38.6,-3.01,-0.6:5:4    1/1:-32.6,-2.41,-0.6:4:3    1/1:-304,-17.46,-0:29:29    ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0    ./.:-0,-0,-0:0:0

It seems pretty standard except for line 4397121.

ADD REPLY
0
Entering edit mode
8 months ago
LChart 4.6k

Based on your output the files are not properly sorted. Rather than concatenating using cat, you should use bcftools or another approach to ensure that the output VCF is sorted. In the worst extremity you can use LC_COLLATE=C sort -k1,1 -k2,2n, though this will sort contigs lexicographically and therefore you may need to re-sort the header also.

TL;DR you merged these files improperly, try again with bcftools concat

ADD COMMENT

Login before adding your answer.

Traffic: 1260 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6