Dear all,
I am working on a set of genes. I want to get the uniprot ID of each of them to draw their structures from the Alphafold website. I see each of my genes has been registered in UniProt twice; for instance, see the following link; it is one of my genes, here. Even though both genes are identical, only one of them has predicted Alphafold structure! I want to get the uniprot ID with an alphafold structure for each gene. How can I do it? I am working with a long list of genes, and I can't do it manually.
have you tried approaching this from the other direction? AlphaFold's website hosts information on organism-specific predicted structures here - https://alphafold.ebi.ac.uk/download. I am guessing the species that you are working with is Leishmania infantum, which is also listed here. On downloading the compressed file, and if I understand this correctly, you will get a pdb and mmCIF files per UniProt ID of the species, i.e all the UniProt IDs that have an AlphaFold structure will be shown here. If you only want the IDs, then you can use a simple "grep" command to extract them
Thanks for your reply. You gave me a great clue. Actually, I have an alternative ID for each gene, and I want to extract the genes with the ID I have at hand. However, the pdb and cif files don't contain my IDs. They just show the uniprot ID. I can create a fasta file whose gene names are uniprot IDs of proteins in Alphafold, and I can extract protein sequences from .pdb files. Then by blasting the fasta file containing my gene IDs against the created fasta file, I can find the mapping.