but the proteome is lacking a specific protein, which I'm surprised is absent. We're thinking that it's some error.
Looking at the proteome, every protein looks like:
>PYU1_T012888 pep supercontig:pug:scf1117875581722:70717:72645:1 gene:PYU1_G012861 transcript:PYU1_T0128
88 gene_biotype:protein_coding transcript_biotype:protein_coding description:Uncharacterized protein [S
ource:UniProtKB/TrEMBL;Acc:K3X6N9]
which tells me that this is a computed/estimated proteome. So, no information about splicing, and no certain proteins.
I'm very surprised that this particular protein is absent from the proteome of this fungus, and I think that this protein is absent because of a low-quality proteome.
Does this sound reasonable to anyone else?
Also, does a metric exist that can quantify proteome quality?
which tells me that this is a computed/estimated proteome
Most proteins out there are predicted, especially if not derived from model species.
Also, does a metric exist that can quantify proteome quality?
It is tied to genome quality. If a sequenced genome is complete, one should expect a complete proteome as well. Genome completeness could be estimated based on expected size (if there are known relatives or a genome mapping has been done) or from the presence of single-copy gene markers. For the latter, I suggest you Google BUSCO.
There could be many reasons for any single protein missing: 1) incomplete genome; 2) nonsense codons that result in truncations; 3) poor gene predictions; 4) sequence divergence such that a protein can't be identified even though it is there. A way around problems 2-3, and possibly 4, is to run TBLASTN with a protein sequence against a complete genome. If at least a partial match is generated, then it becomes a matter of digging deeper or doing manual annotations.