Hi all,
I've just written a tool adding one or more extra column in a VCF file. The header now looks like this:
(...)
#CHROM POS ID REF ALT QUAL FILTER INFO MY_COL1 MY_COL2 FORMAT NA00001 NA00002 NA00003
(...)
Is there something in the VCF spec saying that another column can't be added ?
because when I used VCFTOOLS, it says:
vcftools --vcf file.vcf
(...)
Scanning file.vcf ...
Ninth Header entry should be FORMAT: MY_COL1
Currently scanning CHROM: 19
Currently scanning CHROM: 20
Currently scanning CHROM: X
BEDTools now supports VCF and you can tack on any number of columns you want. That said, if you are looking for specific functionality within VCFTools, then this isn't helpful at all.
Instead of inserting new columns which will screw up most tools, you should add your custom information at the ANNO column. This is what that field is designed for. With perl, it is very easy to extract the key-value pair there, e.g.:
perl -ane 'print "MYKEY=$1\n" if $F[7]=~/MYKEY=([^;]+)/'
Furthermore, VCF is not only used for SNPs, but also for INDELs and SVs. To make this format, various people from several major sequencing centers have joined the discussion. In my opinion, it is quite stable now. Small details may be changed in future, but not the number of columns.
ADD COMMENT
• link
updated 6.2 years ago by
Ram
44k
•
written 13.9 years ago by
lh3
33k
Which kind of information do you want to add? I don't think that the VCF specifications allow to add new columns, but since this format is still in an early phase of development, you could contact the authors and propose them a new functionality.
However, VCF files should be used only to describe the SNPs and their genotypes, and any other kind of information should go somewhere else... for example, if you have statistics associated with a snp, you should consider a flat file or a database.
I fixed this problem by creating a new file format :-) http://plindenbaum.blogspot.com/2010/05/first-rule-of-bioinfo-club.html