Hi all,
I have a vcf file with an info column like this:
##fileformat=VCFv4.3
##fileDate=20180421
##source=PLINKv2.00
##filedate=20180410
##contig=<ID=10,length=135524727>
##INFO= ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|REFSEQ_MATCH|SOURCE|GENE_PHENO|SIFT|PolyPhen|DOMAINS|miRNA|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_A F|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE"
#CHROM POS ID REF ALT
I would like to only obtain the allele frequency (AF) data from the column. However, it is quite difficult for me to do so as all the data are clustered as one column. Are there any ways for me to overcome this? Thank you
OP wants the information that is contained into the VEP INFO/CSQ field, not the INFO/AF
Sorry misunderstood the question, ignore my answer... not sure then... maybe generate the VEP output in a tab format to avoid the clustering and then extract the AF column.