Dear all,
I’m relatively new to harvesting data from NCBI databases, and I am struggling some time with the following task. I try to download gene names based on a list of protein accession IDs (in text file). For example: I want to download the gene name/identification of “AAR23114.1”, going to the NCBI page of this ID (https://www.ncbi.nlm.nih.gov/protein/AAR23114.1) I find the gene name below at “CDS” at the second line: “/gene=“cyp6a2”.
I have a list of >1000 accession IDs and I want to download the subsequent gene names for all of them. Off course I have tried to find the answer myself:
- Biomart does not work for ‘regular’ gene sequences of NCBI
- I have tried to download gene information in bulk using the Batch Entrez facilities, but unfortunately the gene name information is not included for every record in the files you can download (e.g. summary or feature table -> although it is available at the individual pages!), further the information lay-out is not standardized for every record in general.
I am trying to get this done with efetch, but without any success so far. Is there a way to retrieve these gene names based on (protein) accession IDs?
Thanks in advance!
example ?
Yes: "For example: I want to download the gene name/identification of “AAR23114.1”, going to the NCBI page of this ID (https://www.ncbi.nlm.nih.gov/protein/AAR23114.1) I find the gene name below at “CDS” at the second line: “/gene=“cyp6a2”."
yes, this is your first example. I was looking for the one where the gene name is only available in the download "(e.g. summary or feature table -> although it is available at the individual pages!),"
I have not found a case where it is only available in the download, the problem is that it is often missing in the download. So the information is available on the gene page (see previous example) but not in the downloaded summary: (Send to> file> summary/ gene feature or any other format):
Ideally I can download a list with all protein accessions linked to the gene names. E.g. through efetch?
Pierre: you're a (bio)star! Thanks a lot..