How to extract specific field from VCF?
2
0
Entering edit mode
3.0 years ago

Hello,

I would like to extract info from my VCF file. No problem with info or annotation fields:

java -jar SnpSift.jar extractFields -s "," -e "EMPTY" test.vcf "CHROM" "POS" "REF" "ALT" "DP" "ANN[*].HGVS_P" > test-fields.xls

How ever I haven't found how to extract VAF value of the last field "GT:DP:AD:RO:QR:AO:QA:GL:VAF"

I didn't find name for this field, how can I do that?

Thank you for your advices

Here is a line of my VCF file:

(I want to extract "1" from VAF)

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  unknown                                                                                                                                                                                                             

NC_045512.2 210 .   G   T   16381.3 PASS    AB=0    ABP=0   AC=2    AF=1    AN=2    AO=487  CIGAR=1X    DP=487  DPB=487 DPRA=0  EPP=4.29892 EPPR=0  GTI=0   LEN=1   MEANALT=1   MQM=60  MQMR=0  NS=1    NUMALT=1    ODDS=679.731    PAIRED=0    PAIREDR=0   PAO=0   PQA=0   PQR=0   PRO=0   QA=18260    QR=0    RO=0    RPL=242 RPP=3.05043 RPPR=0  RPR=245 RUN=1   SAF=250 SAP=3.76385 SAR=237 SRF=0   SRP=0   SRR=0   TYPE=snp    ANN=T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|GU280_gp01|protein_coding||c.-56G>T|||||56|   T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725297.1|protein_coding||c.-56G>T|||||56|WARNING_TRANSCRIPT_NO_STOP_CODON   T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742608.1|protein_coding||c.-56G>T|||||56|WARNING_TRANSCRIPT_NO_STOP_CODON   T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|GU280_gp01.2|protein_coding||c.-56G>T|||||56| T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725298.1|protein_coding||c.-596G>T|||||596|WARNING_TRANSCRIPT_NO_START_CODON    T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742609.1|protein_coding||c.-596G>T|||||596|WARNING_TRANSCRIPT_NO_START_CODON    T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725299.1|protein_coding||c.-2510G>T|||||2510|WARNING_TRANSCRIPT_NO_START_CODON  T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742610.1|protein_coding||c.-2510G>T|||||2510|WARNING_TRANSCRIPT_NO_START_CODON  T|intergenic_region|MODIFIER|CHR_START-ORF1ab|CHR_START-GU280_gp01|intergenic_region|CHR_START-GU280_gp01|||n.210G>T||||||  GT:DP:AD:RO:QR:AO:QA:GL:VAF 1/1:487:0   487:0:0:487:18260:-1642.66  -146.602    0:1
extraction field vcf SnpSift • 3.1k views
ADD COMMENT
5
Entering edit mode
3.0 years ago
bcftools query -f '%CHROM %POS %REF %ALT[ %VAF]\n' in.vcf
ADD COMMENT
1
Entering edit mode
3.0 years ago
brunobsouzaa ▴ 840

You can use vcftools for that. It would be something like the following:

vcftools --vcf ${i}.vcf --extract-FORMAT-info VAF --out ${i}
ADD COMMENT
1
Entering edit mode

Whilst this may work, it's generally not recommended to use vcftools as it is hasn't been under active development for a long time and may contain bugs. Better to use bcftools.

ADD REPLY
0
Entering edit mode

that's very nice. I guess I can now put together "test-fields.xls" and output from vcftools to have a nice final file but I don't know how to do that: maybe with sed?

ADD REPLY
0
Entering edit mode

You can use awk for that task

ADD REPLY

Login before adding your answer.

Traffic: 2057 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6