Hi, I have a list of genes with Refseq accession ids and I want to convert it to EntrezID, which can then be fit in the GENE ONTOLOGY enrichment and pathway analysis like DAVID and gProfile (these IDs belong to a bacterial specie that is not supported by ensemble nor gProfile.
I followed the post;
Bioinformatics: Converting Protein Refseq ID to Entrez Gene Accession
and still not able to convert these IDs, because it is different organism/specie. These RefSeq IDs were extracted from the reference.genome.gtf file (downloaded from NCBI)
Examples of these RefSeq protein accessions like below:
WP_007431075.1 WP_010344636.1 WP_017427837.1 WP_014278738.1 WP_010344656.1 WP_019688556.1 WP_016819793.1 WP_007724645.1 WP_016821111.1 NA WP_010347944.1 WP_016819622.1 NA
Could you please suggest any website/ tool or R-package,
Thank you
WP*
accession numbers refer to multiple genomes. See: https://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/The best you could do is to get the IPG ID's.
Thank you GenoMax, since efetch function is not supported by the HPC I am working on, I replaced it with the command below;
It worked well as below, so;
Which number does represent the ipg_ID?
can we modify it to work automatically through a list of 4000 IDs in csv.file ? and produce a list of their corresponding IPG_IDS as output.csv?
Should I re-convert the IPG IDs into entreZ in which I can advance to gene ontology/ pathway analysis, if yes, what tool do you recommend?