Here is an example of the information of the VCF data line:
chr10 4450587 . T C 2757.97 PASS ANN=C|downstream_gene_va
riant|MODIFIER|Armt1|ENSMUSG00000061759|transcript|ENSMUST00000143037.6|processe
d_transcript||n.*175T>C|||||175|,C|intron_variant|MODIFIER|Armt1|ENSMUSG00000061
759|transcript|ENSMUST00000095893.9|protein_coding|3/4|c.393-60T>C||||||,C|intro
n_variant|MODIFIER|Armt1|ENSMUSG00000061759|transcript|ENSMUST00000118544.6|prot
ein_coding|3/3|c.393-60T>C||||||,C|intron_variant|MODIFIER|Armt1|ENSMUSG00000061
759|transcript|ENSMUST00000152294.1|nonsense_mediated_decay|2/3|c.148-60T>C|||||
| GT:AD:DP:GQ:PL 0/1:18,12:30:99:334,0,510
chr10 5034864 . T G 58.16 PASS ANN=G|downstream_gene_va
riant|MODIFIER|Gm25694|ENSMUSG00000093189|transcript|ENSMUST00000175448.1|ribozy
me||n.*1311A>C|||||1311|,G|intron_variant|MODIFIER|Syne1|ENSMUSG00000096054|tran
script|ENSMUST00000095899.3|protein_coding|13/16|c.2244+69A>C|||||| GT:AD:DP
:GQ:PL 0/1:4,1:5:12:12,0,121
chr10 5231940 . G A 4507.45 PASS ANN=A|intragenic_variant
|MODIFIER|Syne1|ENSMUSG00000096054|gene_variant|ENSMUSG00000096054|||n.5231940C>
T|||||| GT:AD:DP:GQ:PL 0/1:3,4:7:63:86,0,63
chr10 5248017 . A G 4754.39 PASS ANN=G|intragenic_variant
|MODIFIER|Syne1|ENSMUSG00000096054|gene_variant|ENSMUSG00000096054|||n.5248017T>
C|||||| GT:AD:DP:GQ:PL 0/1:16,7:23:99:159,0,502
chr10 5248019 . A G 6149.69 PASS ANN=G|intragenic_variant
|MODIFIER|Syne1|ENSMUSG00000096054|gene_variant|ENSMUSG00000096054|||n.5248019T>
C|||||| GT:AD:DP:GQ:PL 0/1:17,8:25:99:191,0,507
chr10 5298584 . A G 57.56 PASS ANN=G|intragenic_variant
|MODIFIER|Syne1|ENSMUSG00000096054|gene_variant|ENSMUSG00000096054|||n.5298584T>
C|||||| GT:AD:DP:GQ:PL 0/1:2,3:5:56:96,0,56
chr10 6525873 . C A 151.79 PASS ANN=A|intergenic_region|
MODIFIER|Rgs17-Gm10945|ENSMUSG00000019775-ENSMUSG00000078488|intergenic_region|E
NSMUSG00000019775-ENSMUSG00000078488|||n.6525873C>A|||||| GT:AD:DP:GQ:PL
0/1:6,1:7:14:14,0,192
You will see 4 different genes have been annotated in the file and they are Armt1, Gm25694, Syne1 and Rgs17. However, there are 4 variants have been called in gene Syne1 and they are false positive callings in my case. I need to remove the gene Syne1 from the VCF file. So i expect only Armt1, Gm25694 and Rgs17 left in the final table
Or here is the final vcf that i expect:
chr10 4450587 . T C 2757.97 PASS ANN=C|downstream_gene_va
riant|MODIFIER|Armt1|ENSMUSG00000061759|transcript|ENSMUST00000143037.6|processe
d_transcript||n.*175T>C|||||175|,C|intron_variant|MODIFIER|Armt1|ENSMUSG00000061
759|transcript|ENSMUST00000095893.9|protein_coding|3/4|c.393-60T>C||||||,C|intro
n_variant|MODIFIER|Armt1|ENSMUSG00000061759|transcript|ENSMUST00000118544.6|prot
ein_coding|3/3|c.393-60T>C||||||,C|intron_variant|MODIFIER|Armt1|ENSMUSG00000061
759|transcript|ENSMUST00000152294.1|nonsense_mediated_decay|2/3|c.148-60T>C|||||
| GT:AD:DP:GQ:PL 0/1:18,12:30:99:334,0,510
chr10 5034864 . T G 58.16 PASS ANN=G|downstream_gene_va
riant|MODIFIER|Gm25694|ENSMUSG00000093189|transcript|ENSMUST00000175448.1|ribozy
me||n.*1311A>C|||||1311|,G|intron_variant|MODIFIER|Syne1|ENSMUSG00000096054|tran
script|ENSMUST00000095899.3|protein_coding|13/16|c.2244+69A>C|||||| GT:AD:DP
:GQ:PL 0/1:4,1:5:12:12,0,121
chr10 6525873 . C A 151.79 PASS ANN=A|intergenic_region|
MODIFIER|Rgs17-Gm10945|ENSMUSG00000019775-ENSMUSG00000078488|intergenic_region|E
NSMUSG00000019775-ENSMUSG00000078488|||n.6525873C>A|||||| GT:AD:DP:GQ:PL
0/1:6,1:7:14:14,0,192
I do not know how to write scripts to remove these genes which have multiple variants. I read vcftools and snpEff, i did not find tools can do this work. Anyone can provide scripts or know any tool can do this. Thanks