Thanks very much for providing the VCF that you're using. For others, it's
This VCF is corrupt and does not conform to the VCF specification. It has the following issues:
- whitespace in 'INFO' column
- contig '2' not defined in header
- 'FORMAT/GL' should be declared as Number
- 'FORMAT/PP' not defined in header
- 'FORMAT/BD' not defined in header
I was able to fix the VCF with these commands (below). Unfortunately, the 'FORMAT' field is a complete mess, so, I made an 'executive' decision to remove it, leaving just 'FORMAT/GT'. This loses some info, but leaves you with a validated VCF for anything else that you may want to do.
1, remove whitespace from column 8 ('INFO'), then zip via bgzip
zcat GEUVADIS.chr2.PH1PH2_465.IMPFRQFILT_BIALLELIC_PH.annotv2.genotypes.vcf.gz |\
sed 's/ damaging/_damaging/g' |\
bgzip > test.vcf.gz ;
2, tab-index (fixes contig '2' issue)
tabix -p vcf test.vcf.gz ;
3, BCFtools to remove all 'FORMAT' tags except GT
bcftools annotate -x 'FORMAT' --force test.vcf.gz -Oz > test.fixed.vcf.gz ;
This will initially show the warnings relating to 'FORMAT', but the use of --force allows us to skip these warnings. Also, by removing the problematic 'FORMAT' tags, we avoid the subsequent segmentation fault that occurs.
4, BCFtools to remove 'ID' field
bcftools annotate -x ID test.fixed.vcf.gz -Oz > test.fixed.noID.vcf.gz ;
5, SnpSift
java -jar SnpSift.jar annotate dbSnp144.vcf test.fixed.noID.vcf.gz
Kevin
Hi, please show:
the command:
I am getting no error. the input file use is .vcf file for chr2.
The output file is:
As you can see the in output file its adding rsid. but its still has previous ID from .vcf file snp_2_10133. What I want is to remove the id from .vcf file and update it to just rsid
I see - thanks. I am not sure that
SnpSift annotate
can do this, but you can first remove the original ID viabcftools
, as follows:There is no variant at that position, by the way - all entries are just
0|0
(?)I actually just added one part of the input file. 8,-0.48:0.150:.:. 0|0:-0.48,-0.48,-0.48:0.200:.:. 0|0:-0.48,-0.48,-0.48:0.350:.:. 0|0:-0.48,-0.48,-0.48:0.150:.:. 0|0:-0.48,-0.48,-0.48:0.400:.:. 0|0:-0.48,-0.48,-0.48:0.300:.:. 0|0:-0.48,-0.48,-0.48:0.050:.:. 0|0:-0.48,-0.48,-0.48:0.150:.:. 0|0:-0.48,-0.48,-0.48:0.250:.:. 0|0:-0.48,-0.48,-0.48:0.100:.:. 0|0:-0.48,-0.48,-0.48:0.150:.:. 0|0:-0.36,-0.41,-0.76:0.250:.:. 0|0:-0.48,-0.48,-0.48:0.300:.:. 0|0:-0.48,-0.48,-0.48:0.300:.:. 0|0:-0.48,-0.48,-0.48:0.250:.:. 0|0:-0.48,-0.48,-0.48:0.050:.:. 0|0:-0.48,-0.48,-0.48:0.400:.:. 0|0:-0.19,-0.46,-2.06:0.100:.:. 0|0:-0.48,-0.48,-0.48:0.200:.:. 0|0:-0.48,-0.48,-0.48:0.300:.:. 0|0:-0.48,-0.48,-0.48:0.100:.:. 0|0:-0.48,-0.48,-0.48:0.450:.:. 0|0:-0.48,-0.48,-0.48:0.450:.:. 0|0:-0.48,-0.48,-0.48:0.250:.:. 0|0:-0.48,-0.48,-0.48:0.200:.:. 0|0:-0.48,-0.48,-0.48:0.300:.:. 0|0:-0.48,-0.48,-0.48:0.200:.:. 0|0:-0.48,-0.48,-0.48:0.250:.:. 0|0:-0.48,-0.48,-0.48:0.200:.:. 0|0:-0.48,-0.48,-0.48:0.100:.:. 0|0:-0.48,-0.48,-0.48:0.300:.:. 0|0:-0.35,-0.41,-0.78:0.200:.:. 0|0:-0.48,-0.48,-0.48:0.050:.:. 0|0:-0.48,-0.48,-0.48:0.400:.:. 0|0:-0.48,-0.48,-0.48:0.450:.:. 0|1:-2.82,-0.45,-0.19:1.000:.:. 0|0:-0.48,-0.48,-0.48:0.300:.:. 0|0:-0.48,-0.48,-0.48:0.350:.:. 0|0:-0.48,-0.48,-0.48:0.200:.:. 0|0:-0.48,-0.48,-0.48:0.150:.:. 0|0:-0.48,-0.48,-0.48:0.200:.:. 0|0:-0.48,-0.48,-0.48:0.300:.:. 0|0:-0.48,-0.48,-0.48:0.200:.:. 1|0:-2.38,-0.43,-0.20:1.100:.:. 0|0:-0.48,-0.48,-0.48:0.400:.:. 0|0:-0.02,-1.46,-5.00:0.000:.:. 0|0:-0.00,-4.10,-5.00:0.000:.:. 0|0:-0.00,-2.30,-5.00:0.000:.:. 0|1:-2.69,-0.44,-0.20:1.000:.:. 0|0:-0.00,-3.02,-5.00:0.000:.:. 0|0:-0.00,-3.32,-5.00:0.000:.:. 0|0:-0.05,-0.97,-4.22:0.000:.:. 0|0:-0.00,-2.98,-5.00:0.000:.:. 0|0:-0.00,-2.15,-5.00:0.000:.:. 0|0:-0.48,-0.48,-0.48:0.050:.:. 0|0:-0.477139,-0.477113,-0.477113:0.150:.:. 0|0:-0.00,-2.68,-5.00:0.000:.:. 0|0:-0.48,-0.48,-0.48:0.300:.:. 0|0:-0.48,-0.48,-0.48:0.300:.:. 0|0:-0.48,-0.48,-0.48:0.300:.:. 0|0:-0.48,-0.48,-0.48:0.200:.:. 0|0:-0.48,-0.48,-0.48:0.200:.:. 0|0:-0.48,-0.48,-0.48:0.050:.:. 0|0:-0.48,-0.48,-0.48:0.350:.:. 0|0:-0.48,-0.48,-0.48:0.300:.:. 0|0:-0.48,-0.48,-0.48:0.350:.:. 0|0:-0.48,-0.48,-0.48:0.200:.:. 0|0:-0.48,-0.48,-0.48:0.200:.:. 0|0:-0.48,-0.48,-0.4: It has some variants.
I use the above command as mentioned by you. the commands:
Hi again, have you tried to index the VCF as suggested?
Actually, you may need this sequence of commands:
Hi, I use the above command But I am getting error the error:
Please paste the entire header of the file file_chr2.vcf
hint:
bcftools view -h file_chr2.vcf
i use the command;
This is the output:
the command:
the output: [W::bcf_hdr_check_sanity] GL should be declared as Number=G [W::vcf_parse] Contig '2' is not defined in the header. (Quick workaround: index the file with tabix.)
fileformat=VCFv4.1
FILTER=<ID=PASS,Description="All filters passed">
INFO=<ID=LDAF,Number=1,Type=Float,Description="MLE Allele Frequency Accounting for LD">
INFO=<ID=AVGPOST,Number=1,Type=Float,Description="Average posterior probability from MaCH/Thunder">
INFO=<ID=RSQ,Number=1,Type=Float,Description="Genotype imputation quality from MaCH/Thunder">
INFO=<ID=ERATE,Number=1,Type=Float,Description="Per-marker Mutation rate from MaCH/Thunder">
INFO=<ID=THETA,Number=1,Type=Float,Description="Per-marker Transition rate from MaCH/Thunder">
INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END for imprecise variants">
INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">
INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
[W::vcf_parse_format] FORMAT 'PP' is not defined in the header, assuming Type=String [W::vcf_parse_format] FORMAT 'BD' is not defined in the header, assuming Type=String Encountered error, cannot proceed. Please check the error output above.
Can you please make your file available somewhere so that I can download it and try it myself?
the command: bgzip file_chr2.vcf ; tabix -p vcf file_chr2.vcf.gz ;
the output: unrecognized preset file_chr2.vcf
Sure. this is the link to download the .vcf file for chr2. curl -o $SCRATCH/GEUVADIS/vcf_files/orig_vcfs/GEUVADIS.chr2.genotype.vcf.gz https://www.ebi.ac.uk/arrayexpress/files/E-GEUV-1/GEUVADIS.chr2.PH1PH2_465.IMPFRQFILT_BIALLELIC_PH.annotv2.genotypes.vcf.gz