I found a way to implement this with the fpfilter output but this is not directly with the output file of the fpfilter out. I just selected the columns of chr pos and filter from the tab delimeted file. bgzipped it and then indexed , now I run the vcf-annotate to get in the FILTER column of the original VCF file the flags of the FILTER column present in the tab delimited file. Below is the way how I did it. It is not entirely perfect since the description to me is wrong but however works out as of now.
awk -v OFS='\t' '{print $1, $2, $8}' S_313_T_mutect_snvs.fpfilter > S_313_T_mutect_snvs.fpfilter_tab
Output
#CHROM POS FILTER
chr1 1267350 PASS
chr1 1289256 Strandedness
chr1 1424646 Strandedness
chr1 1886772 Strandedness
chr1 3044509 Strandedness
chr1 3424505 VarFrac
chr1 5927342 Strandedness
chr1 5948588 Strandedness
chr1 7847477 PASS
Now I compress the above file and index it
bgzip S_313_T_mutect_snvs.fpfilter_tab
tabix -s 1 -b 2 -e 2 S_313_T_mutect_snvs.fpfilter_tab.gz
Now running the vcf-annotate on the original vcf file
Command:
cat /scratch/GT/vdas/pietro/exome_seq/results/mutect/exonic_call/mutect_S_313soma_t_ex_flt.vcf \
| /scratch/GT/softwares/vcftools_0.1.12b/bin/vcf-annotate -a S_313_T_mutect_snvs.fpfilter_tab.gz \
-d key=INFO,ID=ANN,Number=1,Type=String,Description='FP filter annotation' \
-c CHROM,POS,FILTER > S_313_T_mutect_snvs.fpfilter.vcf
Output:
##INFO=<ID=ANN,Number=1,Type=String,Description="FP filter annotation">
##source_20150127.1=vcf-annotate(r731) -a S_313_T_mutect_snvs.fpfilter_tab.gz -d key=INFO,ID=ANN,Number=1,Type=String,Description=FP filter annotation -c CHROM,POS,FILTER
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT T_S7998 N_S8980
chr1 1267350 . C A . PASS SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:66,4:34:70:0.057:2 0:31,0:.:31:0.00:0
chr1 1289256 . C A . Strandedness SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:81,4:34:85:0.047:2 0:82,0:.:82:0.00:0
chr1 1424646 . C A . Strandedness SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:37,4:34:41:0.098:2 0:20,0:.:20:0.00:0
chr1 1886772 . C A . Strandedness SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:39,3:33:42:0.071:2 0:32,0:.:32:0.00:0
chr1 3044509 . G T . Strandedness SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:57,6:36:63:0.095:2 0:73,0:.:73:0.00:0
chr1 3424505 . G T . VarFrac SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:91,4:35:95:0.042:2 0:108,0:.:108:0.00:0
chr1 5927342 . C A . Strandedness SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:81,4:35:85:0.047:2 0:77,0:.:78:0.00:0
chr1 5948588 . C A . Strandedness SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:60,4:35:64:0.063:2 0:64,0:.:64:0.00:0
chr1 7847477 . G A . PASS SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:60,4:37:64:0.063:2 0:166,0:.:166:0.00:0
However I do not agree with the vcf-annotate command I used is proper since the -d
(for description) am providing is for INFO field and writes in the original vcf in the FILTER column from the tab delimited file. But still it is working.
Am having trouble with this annotation. Earlier I did not require it but now am trying to annotate the output of fpfilter back to original VCF file but somehow it is not being written properly.
My fpfilter out looks like this
I used the commands below to zip and annotate it
The original VCF file looks like below . Am not giving the first 124 lines of VCF format below
I'm using the command
But this does not give me the way I want. The
-c
handler should be the column of the annotation file which is in in zipped format but it is not writing in the desired column. Am getting confused. Am I doing wrong in the tabix command. Since the fpfilter output is not having TO column so the tabix column should be fine, it should have the same column for POS from fpfilter output. Then where am I getting wrong. I want to write the column FILTER of fpfilter out having "PASS" in the original VCF in the INFO key adding a ID "ANN" . How will I modify the command. I tried different ways but to no avail. Any help would be appreciated. May be its naive but somehow am not being able to figure it out.