Question

How to relate prokka annotated prokaryotes identifiers to the actual identity ? (Name)

0

Entering edit mode

17 months ago

glendich • 0

Taking over projects from someone, I got a list of identifiers for expression data annotated with prokka as shown as below

PNJECBGM_02289  gnl|Prokka|PNJECBGM_42
PNJECBGM_02290  gnl|Prokka|PNJECBGM_42
BKKAOALG_00637  gnl|Prokka|BKKAOALG_9

and something that show the locus tag, CDS and products...

ID=BPFNMOJC_00555_gene;Name=hisZ_2;gene=hisZ_2;locus_tag=BPFNMOJC_00555

Does anyone has any suggestion on how can i find out the name of the prokaryotes that own this protein/genes identifiers? I tried Uniprot, Genbank, esearch from NCBI, PFAM database and i got nothing in return? I looked at the gbk files as well but it is absolutely not helpful:

VERSION:
KEYWORDS:    
.
SOURCE:      Genus species
ORGANISM : Genus species
                      Unclassified.
COMMENT     Annotated using prokka 1.14.6 from https://github.com/tseemann/prokka.
FEATURES             Location/Qualifiers
 source          1..24892
                     /organism="Genus species"
                     /mol_type="genomic DNA"
                     /strain="strain"
 gene            52..885
                     /locus_tag="BPFNMOJC_00001" 
mRNA            52..885

transcriptomics prokka MAG • 1.6k views

ADD COMMENT • link 17 months ago by glendich • 0

0

Entering edit mode

I got a list of identifiers for expression data annotated with prokka

How can the expression data be annotated with prokka? You probably mean the result of the assembly was annotated with prokka and then expression analysis was done using that annotation? What types of annotation files do you have? You have this tagged with MAG so is this metatranscrptomic data?

ADD REPLY • link 17 months ago by GenoMax 151k

0

Entering edit mode

yes you right, this is a metatranscriptomics data :) I have gbk files for each MAGs , but none of them tells what the name of the prokaryotes are

ADD REPLY • link 17 months ago by glendich • 0

score 2 · Accepted Answer · 2023-12-27

2

Entering edit mode

17 months ago

Mensur Dlakic ★ 29k

PNJECBGM_02289

These are random names given to found ORFs and later genes by prokka. That means whoever did the annotation didn't specify informative genus/species/strain names, so they were randomly picked.

In short, those names do not relate to anything meaningful in the databases.

ADD COMMENT • link 17 months ago by Mensur Dlakic ★ 29k

0

Entering edit mode

Prokka generally runs very fast and will give you the same results, I'd suggest rerunning it and checking out the TSV files especially, those have nice long names (if there's a hit in the database, but you will still get many 'Hypothetical Proteins'). Torsten himself suggested Bakta: https://github.com/oschwengers/bakta As Prokka hasn't been updated in a while, you might get more hits with biologically relevant names in the newer Bakta database.

ADD REPLY • link 17 months ago by Philipp Bayer 8.8k

0

Entering edit mode

I think the problem is that the OP has some existing results and can't figure them out, rather than having a file that needs to be annotated.

Just occurred to me: taking several predicted proteins and blasting them against the NR database might give an answer to the file origin.

ADD REPLY • link 17 months ago by Mensur Dlakic ★ 29k