I recommend the Unified Informatic Identifier (UINI) scheme. I designed it to:
- Allow universal tracking of samples and organisms through the wet lab, to the products of in silico analysis
- Allow independent teams of scientists to merge and share data with databanks
- Provide for identification of relatively large numbers of items
- Provide for efficient computation and transcoding by machines
- Assure low cost of adoption
- Assure low cost of administration
- Remain viable as an identifier scheme for a period of decades
- Eliminate idiosyncratic human affordances which will soon become immaterial due to automation
The UINI scheme is free. It requires no centralized administration. However, it is not widely used, and would surely benefit from peer review, public ridicule, or whatever.
There is currently no standard barcode format for UINIs, perhaps you can suggest one, or offer feedback to the community if you adopt UINIs in your LIMS.
The UINI scheme is more suited to computers than humans. I love humans, but they usually dislike manual data entry; they do it slowly and inaccurately. A barcode scanner is a better solution. We have already seen the advent of robotic wet labs and robotic sample libraries. For these reasons, and more, the UINI scheme favors automated processing over manual human processing. If you are in charge of an enterprise, it will probably be cheaper to deploy barcode scanners in your wet labs, than hire legions of computer programmers to cope with quirky, inward-looking identifier schemes. Barcode scanners that emulate computer keyboards are now available and require little in the way of custom software integration. Depending on your specific requirements, barcodes can sometimes be printed on standard printers with free or inexpensive software.
The UINI scheme does NOT encode data within the identifier itself. A UINI is a sequence of numbers and letters. For reasons too numerous to list here, encoding data within identifiers is generally a terrible idea. Encoding data within identifiers is suitable for insular groups of humans who are working manually, so the practice refuses to die and is widely misapplied everywhere else. The UINI scheme avoids this practice, because it would constitute malpractice in the broader contexts to which the UINI scheme pertains.
If you operate a LIMS, pipeline, or databank, you can standardize on the UINI scheme to refer to all classes of data. Although, whatever scheme you adopt, there will be issues related to historical data and identifier mapping. This issue is not unique to the UINI scheme. If you've been operating for any length of time, you may have identifier mapping solutions in place, already.
You can download the UINI specification as a PDF, online. The first public UINI specification, dated December 2012, was 36 pages, and was identified by 59D25EAD-3C7C-4871-B9B3-76D559F0DC22. If the hyperlink does not work, you may use your computer's software clipboard to cut-and-paste the document UUID into a search engine. The hyperlink takes you to an index. That index should be updated in the future to include revisions to the UINI specification. Use the hyperlink if it works for you. Persons who construct information systems were the intended audience for the specification, so it may or may not be comprehensible.
If you identify a problem with the UINI scheme, please advertise your complaint so that the problem can be remedied or others will know to avoid it.
In a sufficiently narrow and well-regulated context, you may be able to get away with only printing the last several digits of a UINI on your sample; which could be sufficient for distinction of samples within that context. However, I strongly recommend a barcode-based solution that encodes the entire UINI.
Thanks to barcode scanners, hyperlinks, and software clipboards, nobody should have to manually type a UINI; so that issue should be moot.
I wouldn't normally have responded to this two year old discussion thread, but this question re-appeared in the Biostar feed about seven days ago, so perhaps the question is still relevant. I also don't like to engage in self-promotion, but the UINI scheme is free, and I don't want to see another quirky identifier scheme proliferate. I tried to invent an identifier scheme that will be broadly workable.
Identification of scientific samples in LIMS was a specific concern when I conceived the UINI, because I was interested in being able to trace genetic and proteomic information back to the specific sample or individual animal from which it was obtained. Hopefully the UINI scheme will prove helpful to you or your colleagues.
Thanks indeed for your answer Jeremy, very interesting :)
What's actually your take on a "human readable, extensible, and scalable." label/string ? How did they actually look like on your system ?
I mean i can be as simple as experiment.condition.sample.individual.run or whatever your workflow entails. The point is to try to foresee not painting yourself into a corner, such that if you need to run a sample twice or divide a sample into two conditions you are able to produce a barcode within your system.
in our system i believe it was experiment.condition.plate.flat.run