On December 18, 2019
The UniProt Proteomes portal is currently offering protein sequence sets obtained from the translation of completely sequenced genomes. Published genomes from NCBI Genome are brought into UniProt if they satisfy the following criteria:
- The genome is annotated and a set of coding sequences is available
- The number of predicted coding sequences falls within a statistically significant range of published proteomes from neighbouring species.
We will change these criteria to publish all proteomes that can be derived from NCBI genomes that are not considered to be low quality assemblies. We use a subset of the RefSeq reasons to exclude a genome assembly to determine which proteomes to bring into UniProtKB and we will give the reason(s) why a proteome is excluded from UniProtKB. We will also provide two metrics to help users to assess the quality of a proteome:
- A score obtained with the BUSCO software.
- A score based on the number of coding sequences expected based on neighbouring species.
The Complete proteome keyword will be removed from all UniProtKB entries. Individual proteomes can be retrieved from the UniProt website by their unique proteome identifier, e.g. UP000005640.
Other planned changes for UniProt can be found at https://www.uniprot.org/changes. Please don't hesitate to contact the UniProt helpdesk in case of questions.