Entering edit mode
5.4 years ago
ARich
▴
130
Dear Biostar user,
I have a question regarding diamond output. I ran diamond blastp on my contigs against NR database. After this i used diamond view to convert m8 format. In this m8 file, the subject seqid is something like "WP_129184883.1" although i was expecting the genbank ID gi|...|. Can someone explain me why I have refseq protein id and how can i convert it to genbank IDS?
Thank you in advance! Best, AR
Did you try searching on the forum (or on Google) for "convert accession to genbank id"?
Yes i did try R package called "rentrez"
This provides me taxid. But I am looking for something where the diamond blastp output can be changed to geneID (gi|..|) instead of Refseq protein id (WP_129184883.1).
Thanks
NCBI is deprecating
gi
numbers, so you should probably go with the accession, which you already have. If you still wish to get them, you can useeutils
(orreutils
in R). Searching for your protein gives the gi in theId
field:https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=WP_129184883[Accn]
You can also fetch and parse ASN.1 format records, which will have gi information for all entries that have a gi entry.
What do you want to do this point onwards? As @Ram already indicated
gi
identifiers are deprecated for external use (by people like us).