I obtained my .VCF file after GATK by using SnpEff. The entire ID column has a period, I am confused on whether a gene name belongs here. When I use the Ensembl Variant Effect Predictor, multiple genes are reflected PER variant.
For example, there is only one hit to Chromosome A, position 251395. When running my data through the variant effect predictor, 3 genes for this chromosome appear. I am confused about what this mean to my data, as none of these genes appear in my SnpEff gene text file for the same sample.
VCF File:
Ensembl Variant Effect Predictor:
SnpEff gene text file:
The definitions of the annotations are located here. An upstream or downstream variant indicates the variant is 5' or 3' of those candidates, respectively. These variants can have implications on expression, but do not change the AA sequence of either transcript. The
stop_gained
is more obvious and has more significant implications.Regarding the missing data among tools. Can you check the VCF file for the affected samples and see if the variant exists for the
stop_gained
? My guess with the upstream/downstream variants, the distance cutoff defaults are different among tools.