I am looking for a way to do the following
1) reliably find a protein structure e.g. pdb file or pre-computed alphafold results that is associated with a particular gene/transcript isoform. I found a way to do this somewhat for human genes using biomart, but i'd like to be able to do this for 'any species' (reason: i make tools, and I want to allow people to use my tool on any species of interest).
2) find a way to map genome coordinates onto that protein structure (3d position is relevant, but i guess just knowing the index into the 1d amino acid chain gets you most of the way there?). I feel like this is something variant annotation tools do, but is there a small purposeful code tool that does this instead of full fledged 'variant annotation'? my current way of doing things just looks at gff, takes every three letters of the CDS features, increments into the amino acid count, but I have a feeling this is not the most reliable way of doing things.
footnote: my gene to pdb structure biomart query i found...useful for now, but would be interested in finding a similar thing for other species http://useast.ensembl.org/biomart/martview/643c564ac8b632a4791ea866fb79f8e5?VIRTUALSCHEMANAME=default&ATTRIBUTES=hsapiens_gene_ensembl.default.feature_page.ensembl_gene_id|hsapiens_gene_ensembl.default.feature_page.ensembl_gene_id_version|hsapiens_gene_ensembl.default.feature_page.ensembl_transcript_id|hsapiens_gene_ensembl.default.feature_page.ensembl_transcript_id_version|hsapiens_gene_ensembl.default.feature_page.pdb&FILTERS=&VISIBLEPANEL=attributepanel
Regarding
For any species of interest maybe https://www.uniprot.org/uniprotkb?query=(database:AlphaFoldDB) is what you are looking for
I would love to take advantage of the results on "alphafolddb", however, is there a way to connect the data shown there to genomic coordinates?
When possible, entries in uniprot are cross-referenced with the GeneBank NCBI database (see this: link). So, there must be a way to recover the genomic coordinates
that is interesting to see that cross reference to genbank, i can indeed see from this that there is a "genomic DNA" cross reference that goes here https://www.ncbi.nlm.nih.gov/protein/ONL99085.1 which then has another reference to CM007647.1 in that file which is the Zea mays chromosome coordinates...i will have to check how many uniprot IDs have this, but I like that the coordinates and "joins" from the Genbank file format are explicitly mapping between genomic and protein sequences instead of doing potentially sketchy GFF math based on assumptions. will be a hop,skip,and a jump to get the full pipeline together from all this info, but it is a good lead. thanks!