Dear all,
I have a problem at hand regarding the manipulation of multiple VCF files (containing the same variants and referred to the same sample) so as to merge their INFO
fields..
The context.
Say I have the following VCF file (headers not included):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chr13 32903685 . C T 7555.77 PASS . GT:AD:DP:GQ:PL 0/1:219,340:569:99:7584,0,4763
Now, I create two copies of the same VCF file, and annotate each one of them with two annotation sources. So, the first one becomes:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chr13 32903685 . C T 7555.77 PASS CustomOne=1 GT:AD:DP:GQ:PL 0/1:219,340:569:99:7584,0,4763
while the second one becomes:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chr13 32903685 . C T 7555.77 PASS CustomTwo=2 GT:AD:DP:GQ:PL 0/1:219,340:569:99:7584,0,4763
I would like now to merge the aforementioned copies, so as to obtain:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chr13 32903685 . C T 7555.77 PASS CustomOne=1;CustomTwo=2 GT:AD:DP:GQ:PL 0/1:219,340:569:99:7584,0,4763
Basically, the result I would like to achieve maintains the same #CHROM
, POS
, REF
, ALT
, QUAL
, FILTER
, FORMAT
and sample
columns, and merges the contents of the INFO
column found in each copy.
The solution I tried.
I tried (unsuccessfully) with several options:
bcftool merge
, but this supposes to merge different samples, while I am working with the same samplebcftool concat
, but this concats two VCF filesSnpSift annotate
, but this does not accept a list of files which is greater than two, meaning that I cannot use this command if the number of copies to be merged is greater than two
My question!
Can you suggest me how to proceed?
Thank you for your help.
Yeah, sorry, I got a wrong example. I am to re-edit the question putting two different INFO fields... So, does this command allow multiple files too?
Maybe there is a more elegant solution but pipes should work:
zcat custom1.vcf.gz | bcftools annotate -a custom2.vcf.gz -c INFO/CustomTwo - | bcftools annotate -a custom3.vcf.gz -c INFO/CustomThree -
This is a solution that I applied at first, but it does not scale since it continuously opens new annotation processes (N-1 if the copies are N), which does not scale. Isn't there a tool that does this operation for me, without launching several annotation processes?