Which software is suitable for protein prediction from whole eukaryotic genomes?
2
0
Entering edit mode
5.1 years ago
carina2817 ▴ 20

Hello,

I want to try programms for Orthologous prediction (OrthoMCL for example), the programs work with proteomes but I have some genomes of eukaryotic organisms for which I don't have proteomic data, Could you please recomend me some software to predict protein sequences from whole genomes?

protein prediction eukaryotic genome • 1.5k views
ADD COMMENT
1
Entering edit mode
5.1 years ago
Juke34 8.9k

If you have the structural annotation (GFF,gtf) of these genomes you can extract the proteomes, otherwise you will have to perform the annotation yourself and this is a not the same story. Depending the species and the data available it could be quite complex. Here a list of gene prediction tools.

ADD COMMENT
0
Entering edit mode

Hi, thank you for your answer. I am trying to understand how to get proteomes using gff files (I found this files for some of my species). I read a post where someone was asking how to do this (https://bioinformatics.stackexchange.com/questions/6865/can-a-gff-file-be-converted-to-a-fasta-file) and someone recomended gffread (http://ccb.jhu.edu/software/stringtie/gff.shtml#gffread_ex), in that page there is this example:

gffread -w transcripts.fa -g /path/to/genome.fa transcripts.gtf

I think it is suposed to get the sequences of all transcripts in gff file using the genome as reference, if I understand well I should get the sequences of the features named "protein" or "transcript" or "CDS" in the gff file, but I used grep to look for these words("protein", "transcript", "CDS" and "gene") and I don't get results, I looked at my gff files and all I see is "region" as feature. So I guess this files won't be useful, right? then I have to get the annotation with Augustus (or similar) ...

ADD REPLY
0
Entering edit mode

You have extracted the transcripts using the -w option. you need to use the -y option for proteins.
Could you show few line of your transcripts.gtf file and few lines of what you get as output?

ADD REPLY
0
Entering edit mode

The gff file looks like this:

gff-version 3 !gff-spec-version 1.21 !processor NCBI annotwriter !genome-build ASM263302v1 !genome-build-accession NCBI_Assembly:GCA_002633025.1 sequence-region NMRB01000001.1 1 1576180 species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=416868 NMRB01000001.1 Genbank region 1 1576180 . + . ID=id0;Dbxref=taxon:416868;collected-by=Miyuki Kanda;collection-date=2013-08-01;country=Japan: Okayama%2C Ushimado;dev-stage=adult;gbkey=Src;identified-by=Tadashi Akiyama;mol_type=genomic DNA;strain=Ushimado;tissue-type=Whole animal sequence-region NMRB01000002.1 1 1458336 species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=416868 NMRB01000002.1 Genbank region 1 1458336 . + . ID=id1;Dbxref=taxon:416868;collected-by=Miyuki Kanda;collection-date=2013-08-01;country=Japan: Okayama%2C Ushimado;dev-stage=adult;gbkey=Src;identified-by=Tadashi Akiyama;mol_type=genomic DNA;strain=Ushimado;tissue-type=Whole animal

gffread -y proteins.fa -g ./GCA_002633025.1_ASM263302v1_genomic_Notospermus_geniculatus.fna GCA_002633025.1_ASM263302v1_genomic_Notospermus_geniculatus.gff

gffread -w transcripts.fa -g ./GCA_002633025.1_ASM263302v1_genomic_Notospermus_geniculatus.fna GCA_002633025.1_ASM263302v1_genomic_Notospermus_geniculatus.gff

proteins.fa and transcripts.fa are empty, so I guess there is no annotation for proteins in the file...

ADD REPLY
1
Entering edit mode

I think you are in the same case as in this post C: IGB won't open .gff file..
The gff file does not contain any prediction features (gene, mRNA, exon, CDS, UTRs, etc...) but only sequence/region description.

To quickly check the type of feature present in your file (column 3) you can do:
awk '{if($0 !~ /^#/) print $3}' GCA_002633025.1_ASM263302v1_genomic_Notospermus_geniculatus.gff | sort -u

I you don't find a proper gff/gtf annotation file I'm afraid you will have to perform the annotation yourself.

ADD REPLY
1
Entering edit mode
5.1 years ago
gb ★ 2.2k

Here a list of options:

https://en.wikipedia.org/wiki/List_of_gene_prediction_software

I used august before and that was easy depending on the organism.

ADD COMMENT
0
Entering edit mode

Yes Augustus is a good choice if an hmm model for a species not too diverged from the species you want to annotate exists.

ADD REPLY

Login before adding your answer.

Traffic: 2850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6