I'm trying to access identical protein information for a list of protein accession numbers. My goal it to take protein ID's, get identical protein list, use the nucleotide coordinates from that list to specifically download the upstream and downstream region of a given gene.
I was hoping to use the biopython Entrez module.
I've used the below code
epost_1 = Entrez.read(Entrez.epost("protein", id=",".join(Prot_ID_list)))
webenv = epost_1["WebEnv"]
query_key = epost_1["QueryKey"]
prot_link = Entrez.read(Entrez.elink(dbfrom="protein", db="protein", LinkName="protein_protein_identical", webenv=epost_1["WebEnv"], query_key=epost_1["QueryKey"]))
However, this doesn't work. It just returns me a list of protein IDs that I originally posted.
I've also tried using eutils on ubuntu
epost -db protein -id Prot_ID_List | elink -related -name protein_protein_identical | efetch -format text
but it just gives an error "QueryKey value not found in fetch input". I tried variations on this to no avail. I am probably doing something very wrong with eutils, but I am not sure what.
Any help is greatly appreciated.
Also, I am aware of How to Elink identical proteins from protein id?. However, I need to make thousands of requests, and this doesn't seem the most succinct method for doing so and takes longer.