Entering edit mode
10.7 years ago
mad.cichlids
▴
140
Hi, I was trying to index my vcf file, i first sort it and then zip it before indexing as this post: Tabix -p vcf ERROR However, the similar error message still shows up, could you give some suggestions?
cat z.vcf | vcf-sort > out.vcf
bgzip out.vcf
tabix -p vcf out.vcf.gz
[ti_index_core] the file out of order at line 19
Here is the first 19 lines:
head -19 out.vcf
##fileformat=VCFv4.1
##fileDate=2014-02-26 22:00:03
##source=VCF_popgen.pl
##reference=file:Genome/AGTA02_WGS.fasta
##contig=N/A
##phasing=none
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Population">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="FGX Consensus Genotype (threshold model)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Sample Read Depth">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality (SAMtools bayesian framework)">
##FORMAT=<ID=EC,Number=.,Type=String,Description="Alternate Allele Counts in Sample">
##FORMAT=<ID=SG,Number=.,Type=String,Description="SAMtools Consensus Genotype (diploid model)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 100_sequence_1_pileup.txt 105_sequence_1_pileup.txt 10_sequence_1_pileup.txt 110_sequence_1_pileup.txt 112_sequence_1_pileup.txt 114_sequence_1_pileup.txt 118_sequence_1_pileup.txt 120_sequence_1_pileup.txt 122_sequence_1_pileup.txt 126_sequence_1_pileup.txt 130_sequence_1_pileup.txt 13_sequence_1_pileup.txt 147_sequence_1_pileup.txt 153_sequence_1_pileup.txt 154_sequence_1_pileup.txt 158_sequence_1_pileup.txt 15_sequence_1_pileup.txt 164_sequence_1_pileup.txt 168_sequence_1_pileup.txt 16_sequence_1_pileup.txt 171_sequence_1_pileup.txt 174_sequence_1_pileup.txt 179_sequence_1_pileup.txt 183_sequence_1_pileup.txt 188_sequence_1_pileup.txt 198_sequence_1_pileup.txt 1_sequence_1_pileup.txt 202_sequence_1_pileup.txt 203_sequence_1_pileup.txt 206_sequence_1_pileup.txt 208_sequence_1_pileup.txt 212_sequence_1_pileup.txt 214_sequence_1_pileup.txt 216_sequence_1_pileup.txt 218_sequence_1_pileup.txt 219_sequence_1_pileup.txt 21_sequence_1_pileup.txt 220_sequence_1_pileup.txt 22_sequence_1_pileup.txt 30_sequence_1_pileup.txt 32_sequence_1_pileup.txt 37_sequence_1_pileup.txt 38_sequence_1_pileup.txt 3_sequence_1_pileup.txt 44_sequence_1_pileup.txt 45_sequence_1_pileup.txt 49_sequence_1_pileup.txt 4_sequence_1_pileup.txt 51_sequence_1_pileup.txt 53_sequence_1_pileup.txt 57_sequence_1_pileup.txt 61_sequence_1_pileup.txt 66_sequence_1_pileup.txt 67_sequence_1_pileup.txt 69_sequence_1_pileup.txt 74_sequence_1_pileup.txt 86_sequence_1_pileup.txt 87_sequence_1_pileup.txt 90_sequence_1_pileup.txt 93_sequence_1_pileup.txt 95_sequence_1_pileup.txt 98_sequence_1_pileup.txt 9_sequence_1_pileup.txt dam01_sequence_1_pileup.txt dam02_sequence_1_pileup.txt dam03_sequence_1_pileup.txt dam04_sequence_1_pileup.txt FGXCONTROL_sequence_1_pileup.txt mbsire_sequence_1_pileup.txt mzdam_sequence_1_pileup.txt sire01_sequence_1_pileup.txt sire02_sequence_1_pileup.txt sire03_sequence_1_pileup.txt w118_sequence_1_pileup.txt w119_sequence_1_pileup.txt w120_sequence_1_pileup.txtw121_sequence_1_pileup.txt w174_sequence_1_pileup.txt w175_sequence_1_pileup.txt w176_sequence_1_pileup.txt w177_sequence_1_pileup.txt
gi|393925858|gb|AGTA02071966.1| 0000000739 . G A 121.20 PASS NS=74:AN=2:DP=8448 GT:DP:GQ:EC:SG 0/1:262:144:116:R 1:32:93:32:A 0/1:87:42:72:R .:0:0:0:. .:0:0:0:. 0/1:222:167:113:R 0/1:93:128:55:R 1:77:186:77:A 0/1:207:144:124:R 1:56:42:52:A 0/1:310:104:203:R 0/1:84:29:17:R 1:153:225:153:A 1:57:149:56:A 0/1:81:127:44:R 0/1:425:110:162:R 0/1:71:117:29:R .:0:0:0:. 0/1:66:75:53:R 0/1:130:28:103:R 1:101:193:100:A 0:32:123:0:G 1:68:180:68:A 0/1:76:0:66:A 1:30:87:30:A 0/1:72:95:54:R .:0:0:0:. 1:28:81:28:A 1:40:117:40:A 1:15:42:15:A 1:30:87:30:A 0/1:98:129:53:R 0/1:59:131:36:R 1:93:147:90:A 1:82:189:82:A 0/1:62:28:53:R 1:121:216:121:A 1:136:225:136:A 1:131:225:131:A 0/1:79:37:66:R 0/1:82:119:34:R 0/1:105:98:75:R 1:67:179:67:A 0/1:223:160:116:R 0/1:125:126:81:R 1:147:122:136:A 0/1:30:53:25:R 0/1:176:97:151:A 0/1:167:112:109:R 0/1:145:13:119:A 0/1:76:130:38:R 1:104:206:104:A 0/1:172:129:109:R 1:104:199:104:A 1:45:132:45:A 1:35:102:35:A 1:109:211:109:A 1:53:157:53:A 1:118:220:118:A 0/1:265:166:133:R 1:67:179:67:A 0/1:65:103:48:R 0/1:130:24:109:A 0/1:285:101:195:R 0/1:208:19:162:A 0/1:295:126:189:R 0/1:288:48:221:A .:0:0:0:. 1:141:225:141:A 0/1:166:141:99:R 0/1:213:115:137:R 1:132:225:132:A 1:126:225:126:A 1:21:60:21:A 0/1:24:123:15:R 0/1:120:129:59:R .:9:24:0:A 1:10:27:10:A .:0:0:0:. 1:19:54:19:A 0:15:72:0:G
gi|393925858|gb|AGTA02071966.1| 0000000781 . G A 120.61 PASS NS=74:AN=2:DP=8484 GT:DP:GQ:EC:SG 0/1:264:49:148:R 0:32:123:0:G 0/1:86:105:14:G .:0:0:0:. .:0:0:0:. 0:222:255:0:G 0/1:93:3:38:R 0:78:255:0:G 0/1:209:4:84:G 0:56:128:0:G 0/1:313:23:108:G 0/1:85:31:68:G 0:153:255:0:G 0:57:199:0:G 0/1:82:6:38:R 0/1:426:7:263:R 0/1:71:25:42:R .:0:0:0:. 0/1:66:63:13:G 0/1:131:110:27:G 0:101:255:0:G 1:33:58:33:A 0:69:235:0:G 0/1:76:84:11:G 0:30:117:0:G 0/1:72:33:18:G .:0:0:0:. 0:28:111:0:G 0:41:150:0:G 0:15:72:0:G 0:30:117:0:G 0/1:98:5:45:R 0/1:59:4:23:R 0:93:253:0:G 0:84:255:0:G 0/1:62:86:9:G 0:122:255:0:G 0:136:255:0:G 0:131:255:0:G 0/1:80:97:13:G 0/1:84:29:48:R 0/1:105:42:30:G 0:66:226:0:G 0/1:224:37:107:R 0/1:126:10:46:G 0:147:255:0:G 0/1:30:45:5:G 0/1:178:239:25:G 0/1:167:33:58:G 0/1:146:160:26:G 0/1:76:14:38:R 0:106:255:0:G 0/1:172:12:63:G 0:104:255:0:G 0:45:162:0:G 0:35:132:0:G 0:110:255:0:G 0:53:187:0:G 0:118:255:0:G 0/1:262:24:131:R 0:67:229:0:G 0/1:66:24:17:G 0/1:130:161:20:G 0/1:286:82:89:G 0/1:210:159:46:G 0/1:296:39:107:G 0/1:288:154:68:G .:0:0:0:. 0:143:255:0:G 0/1:168:7:68:R 0/1:214:29:77:G 0:132:255:0:G 0:126:255:0:G 0:21:90:0:G 0/1:24:10:9:R 0/1:122:28:63:R .:9:54:0:G 0:10:57:0:G .:0:0:0:. 0:19:84:0:G 1:15:42:15:A
show us the line 18,19 and 20 ....
thanks, i did not have a good way to display these lines. i am trying to use head 19 out.vcf. but the display is really messy
My bet is the leading zeros on your positions is screwing up VCF-SORT. Try "sort -k1,1 -k2,2n your.vcf > your.sorted.vcf"
Thank you, still did not do the trick, it has a newer error message "[ti_index_core] the file out of order at line 13", really appreciate your input though
this "should" only strip the leading zeros off your positions. Give it a shot? worth testing it on a couple hundred lines...
perl -lane '$_ =~ s/^0+// ; print $_' your.vcf > stripped.leading.zeros.vcf
Thank you so much! Here is what I did:
It seems that the zero are still there.
K i copied your example and fixed my code : perl -lane 'if($_ =~ /^#/){print; next}else{$F[1] =~ s/^0+//; print join "\t", @F}' your.vcf > your.zero.stripped.vcf
This indeed WORKED ! Thanks, man!