I am blasting some predicted peptides against the Uniprot database, and many of the hits are "Uncharacterized Protein". How is this designation chosen? i.e. what level of evidence is required for a peptide sequence to be added to the database and given this designation rather than be excluded?
I don't see it described on Uniprot website. I tried to read the publication to see where this term is explained, but there are a crazy number of pubs going back to 1997 (and I can't access that one) https://www.uniprot.org/help/publications
Thanks!
Probably not a lot.
If you look at the history of one such entry https://www.uniprot.org/uniprotkb/Q9H425/history you will see that it was originally added via Trembl. After a number of years it appears to have been seen in a mass spectrometry paper https://rest.uniprot.org/unisave/Q9H425?format=txt&versions=26 Is has stayed in the designation since that time.
You should use the reviewed
swiss-prot
part of UniProt or better yet use a specific proteome, if possible.