how to annotate bacterial genome which are annotated in reference with different name
0
0
Entering edit mode
2.3 years ago
Neel ▴ 20

Hi, I am having around 300 genome and the protein (50 in number) in which i am interested they are annotated by different name in reference genome as well as other faa file of other genome, which is result of prokka so my question is i want to annotate my 50 protein of interest with their real name? How can i do that ? please help me if anyone know anything about it?

Thank you!

annotation • 1.2k views
ADD COMMENT
0
Entering edit mode

this may be as "simple" as a gene name conversion situation, but several details are unclear.

For your 50 genes of interest, what information do you have? Do you have gene names, gene accessions, sequences, coordinates? This is an important detail to understand what you are working with.

For your 300 genomes, do you only have prokka annotations that you've generated or do you have publicly available annotations and sequence/genbank genome assemblies?

Is it correct to say that you are looking for homologous genes for your 50 genes of interest in these 300 genomes? If so, UniRef may be a good option as well: https://www.uniprot.org/help/uniref

ADD REPLY
0
Entering edit mode

Thank you for your time, yes off course i know the gene name, gene accessions, sequences information all of these and regarding 300 genomes, i have only prokka annotations sequences and genbank genome assembly file however i don't have publicly available annotations file. Actually, i did blastp against all the proteome of 300 genome and my query was this 50 protein which i have taken from TCDB( Transporter classification database) and what i have seen that there are few sequence which show 100 % identity but their name is annotated with different name by prokka( not by their real name which is present on uniprot/TCDB). for example CzrA a transporter which name is Cadmium, cobalt and zinc/H(+)-K(+) antiporter but by prooka it annotated by Multidrug Transporter A (MdtA).

ADD REPLY
0
Entering edit mode

i want to annotate my 50 protein of interest with their real name

What is a "real" name? If I may be pedantic for a moment, there is no such thing as a "real" or absolutely true gene name. What is wrong with the name these proteins currently have?

what i have seen that there are few sequence which show 100 % identity but their name is annotated with different name by prokka

This is not surprising, many genes have multiple different gene symbols many of which are synonymous.

I've done a fair amount of bacterial genome annotation. Please also keep in mind most gene name and gene function assignments are best guesses made by software and occasionally by people. You can't know if your gene of interest is more likely CzrA or MdtA without conducting actual experiments.

ADD REPLY
0
Entering edit mode

Actually i have been seen there are many genes which are annotated with same name like MdtA although according to CARD/TCB/Uniprot their name is different in such databases and i mentioned earlier their percent identity is also 100 % . Actually i want to add their another name also apart from their annotated name so that i can analyze but the problem is there are around 300 genome file. i can't add their name manually.

What i should do, i really need help of you guys to resolve this problem. If anyone know anything Please help me.

Thank you!

ADD REPLY
0
Entering edit mode

I'm still not clear on just what it is that you want to achieve, but it sound a lot like you want to search for proteins in the 300 genomes that are homologous to your 50 proteins of interest so perhaps something in the following will help:

  1. perform blastp using a database built of the proteins from the 300 genomes, parse the results for your homologous proteins
    • see rBlast, BioPerl, and BioPython for options to parse blast results
  2. from list of homologous protein hits build a conversion table/dictionary that connects the prokka name to the TCB name
  3. using your conversion table to programmatically modify your 300 genome annotations
ADD REPLY

Login before adding your answer.

Traffic: 2548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6