As the SAM/BAM spec says:
Note that tags starting with 'X', 'Y' and 'Z' or tags containing lowercase letters in either position are reserved for local use and will not be formally defined in any future version of this specification.
These optional tags are used by all sorts of aligners and downstream programs. Some of them are so prevalent (like XM) that they are just as well known as the official tags.
After some interesting discussion here, I am thinking it would be pretty neat to have an "optical duplicate" tag, and/or a PCR duplicate and biological duplicate tag, to differentiate between the three. Currently the flag 010000000000 is being used for duplicates, but it doesn't differentiate between the three.
So before I modify Anna's script (from the above thread) to tag reads rather than delete them, I'm wondering if there is a list of know or common user-tags out there that I can check against, so i choose a new one not an existing one. Probably I would choose XO (optical), XP (PCR), XB (Biological) -- but one or all might already be taken! :)
There's definitely no authoritative source for the custom tags. If you really want to make sure you're not using a tag anyone else is then you should be pretty safe with lower case tags. I almost never see those.
BTW, I think bwa uses XO for something (no clue if it's bwa mem or bwa aln).
Tags starting with X, Y, and Z are fair game. If you want to write software that is stable, robust, compatible, and future-proof... do not use those flags. Do not generate or parse them (by default). If you do, you will end up with brittle software that is version-specific and cannot be switched to an alternative program.
Internally, feel free to use any XYZ tag for anything you want. That's the whole point - to allow internal custom use without changing the API. Anyone who requires a custom flag on a standard format, for externally-accessible software... is doing it wrong. If it's really that crucial, they need to talk to the standards committee and make it a standard flag.
Making observations into de-facto-official standards destroys standards.
That makes a lot of sense - particularly, as you say, I can't control who else wants to use the same tags I use. A new mapper might come out that uses all the tags I use, and now we're incompatible. I suppose being the author of BBMap, you know all about these issues more than anyone.
Having said that, I always saw the tagging system as a way to improve upon the standard, rather than to only be used internally. I guess it all comes down to the fact that there is no authoritative source for tags, or description of what they are and what they should be used for. Perhaps if there was, the standard could be extended reliably.
Personally, I really wish there was an "explain sam flags" for tags, even if it wasn't authoritative.
BBMap has various custom tags, but I don't use them as interfaces. They display internal state, rather than sending information to the next process in the pipeline. It takes a huge amount of effort to ensure your software is compliant with "popular" tags (and the general case is impossible, since they can conflict or be insufficiently specified); ensuring compliance with official tags is already difficult enough!
It's a valid use to develop internal pipelines that use "sam" files which require specific unofficial fields that are created by your internal software. But, it is bad practice to publish and promote such things externally, as it fragments the standard.