Gene prediction software
2
1
Entering edit mode
17 months ago

Hi, I need to predict pseudogenes from the assembled genome of a catfish. For this, I need to predict the genes from the genome and make a parent protein set for finding similarity in intergenic regions of the genome. There is possibility of processed pseudogene being predicted as a gene during prediction. Which software can be used for gene prediction that avoids the pseudogene in the results? Thanks in advance.

pseudogene gene prediction masking • 1.2k views
ADD COMMENT
1
Entering edit mode
17 months ago
Darked89 4.7k

How complete/polished is your catfish genome assembly?

Also: do you have some good quality RNA-Seq?

With a draft genome it would be hard to guess if a gene X is missing some say first or last exon(s) because a faulty genomic region duplication (pseudo-gene) or it is just missing from the assembly. Same goes for a frame shift/stop codon introduced by a sequencing error vs inactivating mutation in a paralogue.

You may get processed pseudogenes where introns will be missing.

Check the ENSEML annotation rules: https://grch37.ensembl.org/info/genome/genebuild/automatic_coding.html

ADD COMMENT
1
Entering edit mode

Thanks for the answer. The draft genome of walking catfish Clarias magur with the coverage of 94 percent of estimated genome size. Assembly scaffolding and several rounds of iterations resulted in 3484 scaffolds. The primary assembly unit does not have any assembled chromosomes or linkage groups.

ADD REPLY
0
Entering edit mode

Looks like there are five Clarias genomes:

The most complete seems to be Clarias gariepinus with ca 42k proteins.

Before looking for pseudogenes in Clarias magur I would try to get some idea if the contigs from your assembly can be ordered using C.gariepinus chromosomes. And map all 42k proteins to your species using i.e. miniprot

Just check that C.gariepinus is not somehow tetraploid, since then things get more complicated.

Last but not least: To get the general feeling about the quality of your genome assembly and annotation you may select 10 largest contigs and do the alignments with other Clarias genomes, map proteins, then take a look in a genome browser, compute stats.

ADD REPLY
0
Entering edit mode
17 months ago

Hi, I don't know if there are tools designed to predict pseudogenes, but If i understand correctly from your post you could predict all the ORFs (using Artemis or ORF finder) and after that to compare/align each identified ORF with eachother to see the sequence similarities between and to find some of pseudogenes (using BLAST or something similar). I hope this is helpful.

ADD COMMENT

Login before adding your answer.

Traffic: 2375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6