For the purposes of updating some tools and making them more widely available, I'd like to know if formal rules have ever been defined for what characters are allowed for naming genomic annotations.
For instance, I have seen some dimers get glued together with two colons (geneA::geneB
) in annotation tables. It is a strict subset of ASCII? Do researchers in China or France, say, use extended or other character sets to name things?
Not really sure what's out there. Thanks for any pointers to specification documents, specifically, if any such exist.
The first link is more about what is named, but not how, but the second link gets a bit closer to the formal definitions I am after. Thanks for that!