Hi All, I need some help with VEP analysis. I am hoping to understand the biological pathways that have been impacted by these mutations.
I have been given processed data and asked to visualize and interpret this. The VEP file is outputted from Ensemble VEP. I will outline the data I have:
- I have VEP files for variants: These have metadata(like the genome used for annotation) use custom gene IDs, locations, Alleles, features, feature type, consequence.
- I have the functional annotation file of the organisms used for annotation with the custom gene IDs, UNIPRIOT ID, E.C numbers, and various GO numbers.
- I have lists of high impact mutations for each of the variants. This has the custom IDs and says either EFFECT or NONE.
My current proposal for analysis is:
- Map the custom IDs from the VEP files to the corresponding data in the functional annotation file.
When merging I observed many-many relationship – appear to be duplicates in the annotation file but only the length number is different.
In Variant file the duplicates gave different ‘Feature’ For this Analysis I aggregated and kept 1 representative row for each gene. In the annotation file I kept the longest gene.
Filter the list based on the high impact list
Then I planned to perform GO analysis
- Then planned to do Pathway enrichment analysis- ReactomePA
I was wondering if the assumptions and if the further analysis I have proposed are suitable. Are there any other tools or methods or considerations I need to make for the analysis? Many thanks in advance for your guidance.