Question

Finding EntreZ IDs for refseq IDs

1

Entering edit mode

10 months ago

Pegasus ▴ 120

Hi all,

I have a list of bacterial RefSeq IDs corresponding to protein sequences (e.g., WP_007430823.1, WP_019686959.1, etc.). I need to retrieve the corresponding EntreZ IDs for these RefSeq IDs, in order to cotinue the RNA-seq downstream analysis (GO enrichment analysis ).

Here's part of these IDs: WP_007430823.1 WP_019686959.1 WP_016819208.1 WP_017428688.1 WP_017425888.1 WP_017426648.1 WP_013371668.1 WP_014280599.1 WP_010349214.1 WP_015737018.1 WP_016819923.1 WP_010347247.1 WP_017427551.1 WP_013373716.1 WP_019687916.1 WP_016822762.1 WP_013373542.1 WP_014278585.1 WP_010347544.1 WP_014278774.1 WP_016822628.1 WP_019638366.1

I followed the post : [Convert RefseqID to EntrezID], and Couldnot find any match using all the following databas-tools;

NCBI Batch Entrez tool

https://david.ncifcrf.gov/conversion.jsp

Uniprot (https://www.uniprot.org/id-mapping/)

https://useast.ensembl.org/Homo_sapiens/Tools/IDMapper

https://biit.cs.ut.ee/gprofiler/gost

Could someone guide me on the most effective way to obtain the EntreZ IDs for these RefSeq IDs, considering the issues I'm facing with the NCBI Batch Entrez tool? Are there alternative methods or tools that might be more suitable for this task?

Any help or suggestions would be greatly appreciated. Thank you!

RNA-SEQ • 968 views

ADD COMMENT • link 10 months ago by Pegasus ▴ 120

1

Entering edit mode

I think you can try using R. https://cran.r-project.org/web/packages/rentrez/vignettes/rentrez_tutorial.html -Check out this tutorial. Hope it helps.

ADD REPLY • link 10 months ago by Researcher ▴ 30

score 2 · Answer 1 · 2024-01-08

2

Entering edit mode

10 months ago

GenoMax 147k

You can use EntrezDirect.

$ esearch -db protein -query WP_007430823 | esummary | xtract -pattern DocumentSummary -element Id
494672882
$ esearch -db protein -query WP_013373542.1 | esummary | xtract -pattern DocumentSummary -element Id
503138881

These are multi-species records so they don't point to a single organism.

ADD COMMENT • link 10 months ago by GenoMax 147k

0

Entering edit mode

Thank you GenoMax, it worked perfectly. I Couldnt find the reply button, so I use add comment instead.

However, I am still facing the same issue. Althogh I got the EntreZ IDs, these IDs were not recognized by any GO enrichment analysis. tool.

As you mentioned, these IDs belong to different species, and this is why they dont fit an exact organism, and this is likely the reason I cannot continue the downstream analysis using these IDs.

example of these IDs: 494672882 518516752 515233603 515998105 515995305 515996065 503137007 504046605 498035058 506217243 515235573 498033091 515996968 503139055 518517709 515241838 503138881 504044591

Any suggession ore recmmendation would be very helpful

Thank you

ADD REPLY • link 10 months ago by Pegasus ▴ 120

1

Entering edit mode

How did you end up with these accessions in first place? Is this metatranscriptome data?

ADD REPLY • link 10 months ago by GenoMax 147k

0

Entering edit mode

No, its a new bacterial isolate > spads > Prokaryotis NCBI annotation pipeline . In assembly fasta.file, I get customized IDs (unidentifed) > (locus_tag = gene_ids = the costumized IDs). Coressponding RefSeq IDs were extracted from the GTF file, and converted to enterZ IDs following your post steps. So its a single bacterial genomic data. Nevertheless, I agree with what you mentioned before that "These are multispecies records", because the NCBI team has confirmed that annotating a non-modal bacterial genome may include refseq IDs that match a group of bacterial genomes. Due to the fact that all gene enrichment tools could not recognize the customized locus_tags nor the RefSeq / enreZ IDs, the RNA-seq analysis starting from this bacterial strain could not proceed.

ADD REPLY • link 10 months ago by Pegasus ▴ 120

1

Entering edit mode

Surely you know what the organism you are working with is and thus use one of the related genomes to "liftover" annotation where possible?

ADD REPLY • link 10 months ago by GenoMax 147k

0

Entering edit mode

do u mean annnotate the assembly using another reference genome (close)?, but then how can I align the locus_tages of my genes with the generated ones in the new annotation file.. they will have different ids

ADD REPLY • link 10 months ago by Pegasus ▴ 120