I have an annotated vcf file. I want to extract the gene name for each variant. How can I do this? This is the field I am interested in :
ANN=T|intron_variant|MODIFIER|Plekhg1|ENSMUSG00000040624|transcript|ENSMUST00000120274|protein_coding
|1/16|c.-169+10295G>T||||||,T|intron_variant|MODIFIER**|Plekhg1**|ENSMUSG00000040624|transcript|ENSMUST00000144543|retained_intron|
1/7|n.163+10295G>T||||||,T|
intron_variant|MODIFIER|Plekhg1|ENSMUSG00000040624|transcript|ENSMUST00000137111|retained_intron|1/7|n.343+9828G>T||||||"
I want to extract "Plekhg1" from the above entry.
I formatted the line to better visualize it I am not sure if all of that is supposed to be on a single line.
Is the gene name always in the 4th field (separator
|
)?