Hi all!
I have a problem with defining pseudogenes in bacteria genome. I defined pseudogene as an another copy of gene in genome.
Because my genome is bacteria I don't have any introns, so every same annotation for one gene will be an extra copy - pseudogene. I have 2000 unique genes and 454 repeated at least once. Going this way I found around 1000 pseudogenes. In comparison to other related species this amount is huge - that's why I am suspicious about my results.
So my questions are:
*Which one of defined pseudogenes represent gene and have functionality? How can I find them?
*This may be a stupid question but: Is it possible to have two same annotated genes divided by nucleotides in bacteria genome (one next to another with break)? If yes, is it one gene or gene and its pseudogene? Example below:
gene A_1-AGTCTATGTA-gene A_2
Many thanks for any suggestion.
Best, Agata
I would disagree with your definition of a pseudogene. I think a pseudogene is when one is duplicated and also deactivated through mutation. It may be lost in some future generation, or conserved for structural reasons, but it should not generate proteins. Generating a protein would promote it to a full gene. So look for deleted start codons, damaged regulatory elements, and evolutionary conservation.
Otherwise, you're looking at duplications that may well be functional. Perhaps that bacteria wants to have doubled expression of that protein, so it has two copies run in succession. It's not a pseudogene at all.
IMO any ORF that is never transcribed to mRNA can be described as a pseudogene. It doesn't have to exist as multiple copies or anything..
So, when I have one gene that occur 10 times in genome in different contigs - it can be all functional genes?
Absolutely.
Yes, but if real, I would guess it is a transposase or something similar. Did you try to annotate the duplicated genes?
yes, I annotated by prokka.
Is this a genome that you have assembled yourself? Could these be assembly artifacts?
I don't think this is an assembly artefacts.For de novo assmebly I used SPADes and for artifacts removal - blastn and specific genus nt database to select contigs of interest.
Is the genome a closed single circle? If not then your don't have a complete genome/assembly. It is still a subject for further refinement.