Hello,
I have seen a few answers to this but none seem to do what I would like.
I have an annotated vcf file which has the ensembl headers like this:
##VEP="v107" time="2022-09-12 19:16:50" cache="/home/c.c21087028/.vep/homo_sapiens/107_GRCh38" ensembl-io=107.a473894 ensembl-funcgen=107.0fbd7d5 ensembl=107.5f39899 ensembl-variation=107.db634f2 1000genomes="phase3" COSMIC="95" ClinVar="202201" HGMD-PUBLIC="20204" assembly="GRCh38.p13" dbSNP="154" gencode="GENCODE 41" genebuild="2014-07" gnomADe="r2.1.1" gnomADg="v3.1.2" polyphen="2.2.2" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|MANE_SELECT|MANE_PLUS_CLINICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|UNIPROT_ISOFORM|GENE_PHENO|SIFT|PolyPhen|DOMAINS|miRNA|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|gnomADe_AF|gnomADe_AFR_AF|gnomADe_AMR_AF|gnomADe_ASJ_AF|gnomADe_EAS_AF|gnomADe_FIN_AF|gnomADe_NFE_AF|gnomADe_OTH_AF|gnomADe_SAS_AF|gnomADg_AF|gnomADg_AFR_AF|gnomADg_AMI_AF|gnomADg_AMR_AF|gnomADg_ASJ_AF|gnomADg_EAS_AF|gnomADg_FIN_AF|gnomADg_MID_AF|gnomADg_NFE_AF|gnomADg_OTH_AF|gnomADg_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|TRANSCRIPTION_FACTORS|CADD_PHRED|CADD_RAW">
##CADD_PHRED=PHRED-like scaled CADD score
##CADD_RAW=Raw CADD score
I was wondering if it is possible to split these in the INFO field and also to assign the above header to the correct column.
I have tried this:
echo -e "CHROM\tPOS\tREF\tALT\t$(bcftools +split-vep -l input.vcf | cut -f 2 | tr '\n' '\t' | sed 's/\t$//')" > output.tsv
bcftools +split-vep -f '%CHROM\t%POS\t%REF\t%ALT\t%CSQ\n' -d -A tab input.vcf >> output.tsv
but it does not put the headers on, and also misses some of the above off the output.
Thanks, I hope this makes sense. Amy
That worked great thanks!
I thought I'd also add another answer I found that worked too:
Although this one didn't add the headers in after, but I might just do that as a second step after with another bash command.
Thanks!! Amy
If anyone wants to know how to keep the FORMAT column and also split that into columns you can use: