Entering edit mode
10 months ago
cdsouthan
★
1.9k
UniParc https://www.ebi.ac.uk/uniparc/ is at 632,168,010 sequences and NCBI NR is 648,450839
Does anyone know if the diff between these includes useful sequences in one but not t'other?
"useful" is probably relative to the user. The differences may be simply be due to time/processes the two databases use. With time missing sequences may make it into either (unless a source is not being looked at at all). UniParc lists their sources but not NCBI as far as I see.
Ta for reply. The NCBI result return says ":All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects"
Patent proteins not specified (as they are for UniParc) but I guess are also in there. W.b.g to know where those +10K in NR originated from. As ever we have to live with the transatlantic two-stop-shops without knowing precisely what is behind the counter.....