Does The First Letter Of A Uniprot Accession Number Have A Meaning?
4
2
Entering edit mode
13.4 years ago
Luispedro ▴ 60

Accession numbers are string of the form Q3TET3 or P47753. I was wondering whether the first letter has any meaning.

uniprot • 7.7k views
ADD COMMENT
4
Entering edit mode
13.4 years ago
Lyco ★ 2.3k

Larry is correct, the very old numbers were P..., Q...., O..., followed by numbers only. Later on, they allowed letters instead of numbers. The next wave of accession numbers started with A..., B..., C... and so on. From the first letter, you can roughly estimate how old the accession number is. Unfortunately, the uniprot people have begun to assign 'new' accession numbers to old entries (keeping the original number as secondary accnos)

ADD COMMENT
4
Entering edit mode
13.4 years ago

To add to @Lyco's answer: they are not simply assigning 'new' accession number to old entries. That only happens when items are merged or split, and the original numbers are indeed kept as secondary accession numbers.

Entries can have more than one accession number. This can be due to two distinct mechanisms:

a) When two or more entries are merged, the accession numbers from all entries are kept. The first accession number is referred to as the ‘Primary (citable) accession number’, while the others are referred to as ‘Secondary accession numbers’. These are listed in alphanumerical order.

b) If an existing entry is split into two or more entries (‘demerged’), new ‘primary’ accession numbers are attributed to all the split entries while all original accession numbers are retained as ‘secondary’ accession numbers.

(from: http://www.uniprot.org/manual/accession_numbers)

Also be aware that you: "should always use the primary accession number of an entry in any citation and link since it is the only unique stable identifier for an entry."

(Ohh and the P actually meant "protein" but then they ran out of P's)

ADD COMMENT
0
Entering edit mode

@chris, new accession numbers only for splits and mergers is the theory. In real life, these things happen quite a lot. Just have a look how many yeast proteins have accession numbers starting with C,D,E... although they are present in the database since completion of the genome. Take e.g. STE2_YEAST which is now http://www.uniprot.org/uniprot/D6VTK4 but used to be http://www.uniprot.org/uniprot/P06842.txt?version=8 or even http://www.uniprot.org/uniprot/P06842.txt?version=1 when they were still using dollar signs for separating the species name.

ADD REPLY
0
Entering edit mode

Well if you trace it back it says on http://www.uniprot.org/uniprot/P06842?version=* that P06842 "Demerged into D6VTK4 and P0CI39." so that rally seems to be a split. These things might just occur more often then you think.

ADD REPLY
0
Entering edit mode

I would not call this one a split, as the sequences and species ID for P06842 and D6VTK4 are identical. What happened here is the following: There used to be one swissprot entry for budding yeasst STE2 with the accession number P06842. Then, somebody sequence another strain of budding yeast, and the STE2 sequence happen to be indential. Rather than doing the logical thing - giving the new sequence a new accession number - the uniprot philosophy argues that up to this point, the old sequence entry represented both strains and now they have to 'de-merging' them to create separate entries-

ADD REPLY
0
Entering edit mode

I would not call this one a split, as the sequences and species ID for P06842 and D6VTK4 are identical. What happened here is the following: There used to be one swissprot entry for budding yeasst STE2 with the accession number P06842. Then, somebody sequenced another strain of budding yeast, and the STE2 sequence happens to be indential. Rather than doing the logical thing - giving the new sequence a new accession number - the uniprot philosophy argues that up to this point, the old sequence entry represented both strains and now they have to 'de-merge' them to create separate entries- –

ADD REPLY
3
Entering edit mode
13.4 years ago

The first entries in the form of P##### into the database began with P for protein. Then, I believe Q was added as a prefix, followed by O. Other designations, such as Q3TET3, came later.

ADD COMMENT
1
Entering edit mode
13.4 years ago
Rm 8.3k

find the info here http://www.uniprot.org/manual/accession_numbers

ADD COMMENT
1
Entering edit mode

I had seen that, but either I'm being thick or it doesn't answer my question.

ADD REPLY

Login before adding your answer.

Traffic: 2450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6