Entering edit mode
2.8 years ago
Maxime
•
0
Hello,
Is there a way to find which transcripts were used to model AlphaFold structures? My goal is to start with some VCF files to end by mapping variants on AF structures, but if I use the -canon option (only canonical transcripts) in SnpEff, I lose almost half of the information. Since SnpEff allows to provide a transcript list to use, I would like to find the associated information.
Thank you,
Maxime
AlphaFold structures seem to be using UniProt reference proteomes so that may be the way to go to figure out Ensembl transcripts from those ID's.
Hi Maxime,
If you have a list of AlphaFold structure predictions, you should have their corresponding UniProt IDs. You can then use this list of UniProt IDs as an input to retrieve Ensembl transcript IDs using BioMart (there is a tutorial if you are not familiar with BioMart on Ensembl).
Please note that in the current release of Ensembl 105 / Ensembl Genomes 52, you can view AlphaFold structure predictions for Arabidopsis thaliana only. In Ensembl 106 / Ensembl Genomes 53, you will be able to view AlphaFold structure predictions for human, mouse, zebrafish, maize and soybean (you can find more information about what’s coming in the new Ensembl release here).
Do let me know if this is what you were looking for. Alternatively, you could visit the AlphaFold database Frequently Asked Questions or contact the AlphaFold team directly if you have questions specifically about their database.
All the best,
Louisse