Question

How To Recognize A Pseudogene With A Frameshift?

3

Entering edit mode

14.4 years ago

Cindy ▴ 60

In NCBI database,the gene GSU1034 from complete genome of Geobacter sulfurreducens PCA is considered a pseudogene and has been excluded from the .ffn file. The description for this gene is as below :

methyl-accepting chemotaxis protein; this gene contains a frame shift which is not the result of sequencing error;identified by similarity to OMNI:NTL03PA04634

I'm confused that why the sequence is pseudo while it's perfect with a start codon and a stop codon between which the number of nucleotides is divisible by three. And what the phrase OMNI:NTL03PA04634 refers to? How the frameshift is detected?

Furthermore, another strain of Geobacter sulfurreducens which called KN400 is sequenced last year. Assembly and Annotation has been done, but no detection for pseudogene yet. There's a gene KN4001013 in this genome been annotated as methyl-accepting chemotaxis sensory transducer too. KN4001013 is right the homolog of GSU1034, with several mutants. I don't know if the annotation for KN400_1013 is reliable. Is it the mutant that makes the gene functional? Or later may it be just considered as a pseudogene like GSU1034 was?

genomics bacteria • 5.2k views

ADD COMMENT • link updated 14.4 years ago by Casey Bergman 18k • written 14.4 years ago by Cindy ▴ 60

3

Entering edit mode

Just wondering why you set this question as community wiki? We don't usually do that unless the question merits a wide-ranging, long discussion with no real "right answer".

ADD REPLY • link 14.4 years ago by Neilfws 49k

Ram · Answer 1 · 2011-03-17

It shouldn't be a surprise that the mutation's ORF-length is divisible by 3, every ORF length is divisible by 3, otherwise it's not an ORF! That's exactly how a frameshift mutation works, a single insertion or deletion causes a change in the AA sequence and moves the stop codon to another position.

An example, regard the following ORF of length 9 (excl. stop codon *) in the 'wild-type' with a single insertion of A in the third codon in the mutation, ofc we have to regard the surrounding transcript:

WT transcript:
AUG GUG AAG UAG UUU AGC
M   V   K   *   F   S
Mutation transcript:
AUG GUG A>A>A GUA GUU UAG C
M   V   K     V   V   *   -

So everything perfectly divisible by 3, also the inversion will happen with a deletion (say exchange WT and Mutation), the difference is only in ORF length and AA-sequence.

Have a look at pseudogenes.org, they have made a prediction of all pseudogenes in prokaryotes.

There is also a big file there with all prokaryote pseudo-gene predictions.

My personal opinion on pseudo-genes, and I expect to get some controversial comments: the notion of a pseudo-gene is misleading, overrated in prokaryotes at least, and should be avoided as it implies that the gene has evidently no function. This is often only based on predictions and not experimentally verified or hardly verifiable (or how does one prove that something has no function?). I believe, that this picture is going to change with the upcoming HTS technologies like RNA-seq, and that many annotated 'pseudogenes' might be found to be readily transcribed and even expressed in bacteria. Whether or not the remaining protein or peptide is functional depends on character of the mutation. (I cannot prove this yet, for sure)

Thus, try to analyze without a pre-made assumption (or prejudice) and include the pseudogenes from another source.

Ram · Answer 2 · 2011-03-17

I can see why this is confusing. Indeed, the nucleotide sequence does appear to encode an ORF, with triplets between start and stop codons.

I'm assuming that when compared to similar genes from other species, the frameshift is 3 (or a multiple of 3) - that is, a codon (or codons) have been inserted or deleted. I'm not sure about OMNI (some database I assume), but NTL03PA04634 refers to a gene from Pseudomonas aeruginosa (the PA is a clue). Information about that gene is here. You might like to try aligning with the PCA gene (or protein).

Why a pseudogene? It may be that the insertion/deletion is in a part of the sequence known to be functionally-important. Or there may be other indicators, such as unusual GC content or lack of transcript detection. Try searching PubMed to find the tools that people use to identify pseudogenes.

In terms of reliability, it's good to consult as many sources as possible: NCBI is not an authoritative resource for microbial genomics. Here's GSU1034 at the IMG - they also call it as a pseudogene. And at another database, PATRIC - they do not, but note that this source has less details and looks less recent. Often, it's best to consult the specialist database associated with the genome project, where such a thing exists.

score 0 · Answer 3 · 2011-03-18

0

Entering edit mode

14.4 years ago

Casey Bergman 18k

You could double check this using the Psi-Phi method of Lerat and and Ochman. There is not an implementation of this method available on the web, but the paper says one is available on request.

ADD COMMENT • link 14.4 years ago by Casey Bergman 18k