Diamond blastp output
0
0
Entering edit mode
5.4 years ago
ARich ▴ 130

Dear Biostar user,

I have a question regarding diamond output. I ran diamond blastp on my contigs against NR database. After this i used diamond view to convert m8 format. In this m8 file, the subject seqid is something like "WP_129184883.1" although i was expecting the genbank ID gi|...|. Can someone explain me why I have refseq protein id and how can i convert it to genbank IDS?

Thank you in advance! Best, AR

sequence assembly • 3.3k views
ADD COMMENT
0
Entering edit mode

Did you try searching on the forum (or on Google) for "convert accession to genbank id"?

ADD REPLY
0
Entering edit mode

Yes i did try R package called "rentrez"

search <- entrez_search(db="protein", term="WP_129184883[Accn]")
(links <- entrez_link(dbfrom="protein", db="nuccore", id=search$ids)
links$links$protein_nuccore_wp

This provides me taxid. But I am looking for something where the diamond blastp output can be changed to geneID (gi|..|) instead of Refseq protein id (WP_129184883.1).

Thanks

ADD REPLY
0
Entering edit mode

NCBI is deprecating gi numbers, so you should probably go with the accession, which you already have. If you still wish to get them, you can use eutils (or reutils in R). Searching for your protein gives the gi in the Id field:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=WP_129184883[Accn]

You can also fetch and parse ASN.1 format records, which will have gi information for all entries that have a gi entry.

ADD REPLY
0
Entering edit mode

What do you want to do this point onwards? As @Ram already indicated gi identifiers are deprecated for external use (by people like us).

ADD REPLY

Login before adding your answer.

Traffic: 2008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6