Entering edit mode
3.0 years ago
curiousbiologist
▴
40
Hello,
I would like to extract info from my VCF file. No problem with info or annotation fields:
java -jar SnpSift.jar extractFields -s "," -e "EMPTY" test.vcf "CHROM" "POS" "REF" "ALT" "DP" "ANN[*].HGVS_P" > test-fields.xls
How ever I haven't found how to extract VAF value of the last field "GT:DP:AD:RO:QR:AO:QA:GL:VAF"
I didn't find name for this field, how can I do that?
Thank you for your advices
Here is a line of my VCF file:
(I want to extract "1" from VAF)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT unknown
NC_045512.2 210 . G T 16381.3 PASS AB=0 ABP=0 AC=2 AF=1 AN=2 AO=487 CIGAR=1X DP=487 DPB=487 DPRA=0 EPP=4.29892 EPPR=0 GTI=0 LEN=1 MEANALT=1 MQM=60 MQMR=0 NS=1 NUMALT=1 ODDS=679.731 PAIRED=0 PAIREDR=0 PAO=0 PQA=0 PQR=0 PRO=0 QA=18260 QR=0 RO=0 RPL=242 RPP=3.05043 RPPR=0 RPR=245 RUN=1 SAF=250 SAP=3.76385 SAR=237 SRF=0 SRP=0 SRR=0 TYPE=snp ANN=T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|GU280_gp01|protein_coding||c.-56G>T|||||56| T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725297.1|protein_coding||c.-56G>T|||||56|WARNING_TRANSCRIPT_NO_STOP_CODON T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742608.1|protein_coding||c.-56G>T|||||56|WARNING_TRANSCRIPT_NO_STOP_CODON T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|GU280_gp01.2|protein_coding||c.-56G>T|||||56| T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725298.1|protein_coding||c.-596G>T|||||596|WARNING_TRANSCRIPT_NO_START_CODON T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742609.1|protein_coding||c.-596G>T|||||596|WARNING_TRANSCRIPT_NO_START_CODON T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009725299.1|protein_coding||c.-2510G>T|||||2510|WARNING_TRANSCRIPT_NO_START_CODON T|upstream_gene_variant|MODIFIER|ORF1ab|GU280_gp01|transcript|YP_009742610.1|protein_coding||c.-2510G>T|||||2510|WARNING_TRANSCRIPT_NO_START_CODON T|intergenic_region|MODIFIER|CHR_START-ORF1ab|CHR_START-GU280_gp01|intergenic_region|CHR_START-GU280_gp01|||n.210G>T|||||| GT:DP:AD:RO:QR:AO:QA:GL:VAF 1/1:487:0 487:0:0:487:18260:-1642.66 -146.602 0:1
Whilst this may work, it's generally not recommended to use
vcftools
as it is hasn't been under active development for a long time and may contain bugs. Better to usebcftools
.that's very nice. I guess I can now put together "test-fields.xls" and output from vcftools to have a nice final file but I don't know how to do that: maybe with sed?
You can use
awk
for that task