Question

Proteomes: measured or predicted from genomes/transcriptomes?

0

Entering edit mode

4 months ago

dec986 ▴ 380

I'm looking at a genome/proteome from a plant pathogen: https://protists.ensembl.org/Pythium_ultimum/Info/Index

but the proteome is lacking a specific protein, which I'm surprised is absent. We're thinking that it's some error.

Looking at the proteome, every protein looks like:

>PYU1_T012888 pep supercontig:pug:scf1117875581722:70717:72645:1 gene:PYU1_G012861 transcript:PYU1_T0128
88 gene_biotype:protein_coding transcript_biotype:protein_coding description:Uncharacterized protein  [S
ource:UniProtKB/TrEMBL;Acc:K3X6N9]

which tells me that this is a computed/estimated proteome. So, no information about splicing, and no certain proteins.

I'm very surprised that this particular protein is absent from the proteome of this fungus, and I think that this protein is absent because of a low-quality proteome.

Does this sound reasonable to anyone else?

Also, does a metric exist that can quantify proteome quality?

genome proteome • 274 views

ADD COMMENT • link updated 4 months ago by Mensur Dlakic ★ 28k • written 4 months ago by dec986 ▴ 380

score 3 · Accepted Answer · 2024-07-23

which tells me that this is a computed/estimated proteome

Most proteins out there are predicted, especially if not derived from model species.

Also, does a metric exist that can quantify proteome quality?

It is tied to genome quality. If a sequenced genome is complete, one should expect a complete proteome as well. Genome completeness could be estimated based on expected size (if there are known relatives or a genome mapping has been done) or from the presence of single-copy gene markers. For the latter, I suggest you Google BUSCO.

There could be many reasons for any single protein missing: 1) incomplete genome; 2) nonsense codons that result in truncations; 3) poor gene predictions; 4) sequence divergence such that a protein can't be identified even though it is there. A way around problems 2-3, and possibly 4, is to run TBLASTN with a protein sequence against a complete genome. If at least a partial match is generated, then it becomes a matter of digging deeper or doing manual annotations.