Adding An Extra Column To A Vcf File.
3
2
Entering edit mode
14.6 years ago

Hi all, I've just written a tool adding one or more extra column in a VCF file. The header now looks like this:

(...)
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    MY_COL1    MY_COL2    FORMAT    NA00001    NA00002    NA00003
(...)

Is there something in the VCF spec saying that another column can't be added ? because when I used VCFTOOLS, it says:

vcftools --vcf file.vcf 
(...)
Scanning file.vcf ... 
Ninth Header entry should be FORMAT: MY_COL1
Currently scanning CHROM: 19
Currently scanning CHROM: 20
Currently scanning CHROM: X
vcf format next-gen sequencing • 9.0k views
ADD COMMENT
0
Entering edit mode

I fixed this problem by creating a new file format :-) http://plindenbaum.blogspot.com/2010/05/first-rule-of-bioinfo-club.html

ADD REPLY
4
Entering edit mode
14.4 years ago

BEDTools now supports VCF and you can tack on any number of columns you want. That said, if you are looking for specific functionality within VCFTools, then this isn't helpful at all.

ADD COMMENT
4
Entering edit mode
13.9 years ago
lh3 33k

Instead of inserting new columns which will screw up most tools, you should add your custom information at the ANNO column. This is what that field is designed for. With perl, it is very easy to extract the key-value pair there, e.g.:

perl -ane 'print "MYKEY=$1\n" if $F[7]=~/MYKEY=([^;]+)/'

Furthermore, VCF is not only used for SNPs, but also for INDELs and SVs. To make this format, various people from several major sequencing centers have joined the discussion. In my opinion, it is quite stable now. Small details may be changed in future, but not the number of columns.

ADD COMMENT
3
Entering edit mode
14.6 years ago

Which kind of information do you want to add? I don't think that the VCF specifications allow to add new columns, but since this format is still in an early phase of development, you could contact the authors and propose them a new functionality.

However, VCF files should be used only to describe the SNPs and their genotypes, and any other kind of information should go somewhere else... for example, if you have statistics associated with a snp, you should consider a flat file or a database.

ADD COMMENT

Login before adding your answer.

Traffic: 2220 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6