In short, this is due to NON-redundant nature of the BLAST database.
See explanation appended below for more technical details. Regards,
NCBI User Services
Stat from this file
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/relnotes.txt
UniProtKB/Swiss-Prot: 561,911 entries
matches closely to what is available here from NCBI Protein database:
https://www.ncbi.nlm.nih.gov/protein?term=%22swissprot%22%5BFilter%5D
Items: 1 to 20 of 561499
So what in the BLAST database description is essentially correct.
Even though BLAST gives something different: Title:Non-redundant
UniProtKB/SwissProt sequences. Molecule Type:Protein Update
date:2020/04/09 Number of sequences:473509
This is after identical sequences are collapsed into ipg, each group
will contain control-A char in the defline, each will have 2 or more
sequences in it:
$ gunzip -c db/FASTA/swissprot.gz | grep ">" | grep -c $'\01' 38217
For example, swissprot has this entry:
$ curl
'ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz'
|gunzip -c | grep "Q7V8U8"
sp|Q7V8U8|YIDD_PROMM Putative membrane protein insertion efficiency factor OS=Prochlorococcus marinus (strain MIT 9313) OX=74547
GN=PMT_0228 PE=3 SV=1
It is part of a non-redundant set in blast database: $ gunzip -c
db/FASTA/swissprot.gz | grep ">" | grep Q7V8U8
A2CBJ0.1 RecName: Full=Putative membrane protein insertion efficiency factor [Prochlorococcus marinus str. MIT 9303]Q7V8U8.1 RecName:
Full=Putative membrane protein insertion efficiency factor
[Prochlorococcus marinus str. MIT 9313]
So two swissprot sequences are collapsed into a single entry in this
case. Some set/group have quite a few sequences collapsed in them,
this would make the number of sequences much larger than 38K, making
up for the differences.
The one may have heard that non-redundancy is a pretty flexible term and in this case we can see the example.
Taking a look on UniProt web page about their databases redundancy (https://www.uniprot.org/help/redundancy) we read the following regarding Swiss-Prot:
Inspecting sequences of entries Q7V8U8 and A2CBJ0 it's notable that the sequences are identical but belong to different strains of the same specie. So, in NCBI Swiss-Prot database these two entries are collapsed into one which is more sensible for me.
NCBI support is always supporting and responsive