I am studying the effect of transposable elements in Saccharomyces cerevisiae populations. The outcome I have from Variant Effect Predictor is as follow:
Category Count
Variants processed 414
Overlapped genes 1553
Overlapped transcripts 1553
Overlapped regulatory features -
What is the difference between overlapped genes and transcripts? If a transposon is overlapping a gene then I might not have a transcript at all or I can get a different transcript, depends where it is located. On the other hand, from literature, I know transposon can interrupt regulatory elements, does your database have annotations in yeast?
For all consequences predicted:
Upstream gene variant 49
Downstream gene variant 42
Intergenic variant 4
Transcript ablation 3
Coding sequence variant 1
Feature elongation 1
3' UTR variant 1
From above 1553 genes were overlapping to a transposon, but here how so many genes can be affected with the results above?
And finally, the information for the consequences on a protein sequence:
Stop codon lost 16
Coding sequence variant 84
Above coding sequence variants represent 1%, how is it possible that here the consequence is 84%?
I hope you could help me out. Thank you in advance for your time.
I am still trying to understand the results. S. cerevisiae has ~6000 genes and according to the summary statistics, there are 1553 genes overlapping with a transposon sequence. This is approximately 25 % of genes been affected by transposons. How can there be so many genes being affected if the majority of the impact is in the up/downstream region of a gene? How can there be so many genes being affected by just 414 variants?
I really hope you could help me out. Thank you in advance.
The effect on the gene is that there is an up/downstream gene variant, ie that there is a gene within 5kb of the variant. This means that (49 + 42)% of the 1553 genes listed as being affected, ie 1413 genes, have a variant in the 5kb up/downstream of them. That is all that it means.
Additionally, my VCF input file had annotated 779 variants, but in your summary table depicted 414 variants processed. How does it work? I thought these two values were supposed to be the same.
It's possible that some of them failed. To find out more you'd need to send your list to helpdesk@ensembl.org.
Thank you, I will send an email. On the other hand, I was looking at the position of an affected gene when a deletion occurred in a transposable element (TE) in your genome browser and at the same time, I am using IGV. In your genome browser, how can I see my deletion? I can see the coding sequence variant and I noticed that in blue you label the TEs but I assume these are from the reference genome in the Saccharomyces Genome Database (SGD).