Question

Is there an authoritative source for optional BAM tags?

1

Entering edit mode

8.7 years ago

John 13k

As the SAM/BAM spec says:

Note that tags starting with 'X', 'Y' and 'Z' or tags containing lowercase letters in either position are reserved for local use and will not be formally defined in any future version of this specification.

These optional tags are used by all sorts of aligners and downstream programs. Some of them are so prevalent (like XM) that they are just as well known as the official tags.

After some interesting discussion here, I am thinking it would be pretty neat to have an "optical duplicate" tag, and/or a PCR duplicate and biological duplicate tag, to differentiate between the three. Currently the flag 010000000000 is being used for duplicates, but it doesn't differentiate between the three.

So before I modify Anna's script (from the above thread) to tag reads rather than delete them, I'm wondering if there is a list of know or common user-tags out there that I can check against, so i choose a new one not an existing one. Probably I would choose XO (optical), XP (PCR), XB (Biological) -- but one or all might already be taken! :)

BAM Duplicates MarkDuplicates Picard • 2.7k views

ADD COMMENT • link updated 6.2 years ago by Ram 44k • written 8.7 years ago by John 13k

2

Entering edit mode

There's definitely no authoritative source for the custom tags. If you really want to make sure you're not using a tag anyone else is then you should be pretty safe with lower case tags. I almost never see those.

BTW, I think bwa uses XO for something (no clue if it's bwa mem or bwa aln).

ADD REPLY • link updated 6.2 years ago by Ram 44k • written 8.7 years ago by Devon Ryan 104k

1

Entering edit mode

Tags starting with X, Y, and Z are fair game. If you want to write software that is stable, robust, compatible, and future-proof... do not use those flags. Do not generate or parse them (by default). If you do, you will end up with brittle software that is version-specific and cannot be switched to an alternative program.

Internally, feel free to use any XYZ tag for anything you want. That's the whole point - to allow internal custom use without changing the API. Anyone who requires a custom flag on a standard format, for externally-accessible software... is doing it wrong. If it's really that crucial, they need to talk to the standards committee and make it a standard flag.

Making observations into de-facto-official standards destroys standards.

ADD REPLY • link updated 6.2 years ago by Ram 44k • written 8.7 years ago by Brian Bushnell 20k

0

Entering edit mode

That makes a lot of sense - particularly, as you say, I can't control who else wants to use the same tags I use. A new mapper might come out that uses all the tags I use, and now we're incompatible. I suppose being the author of BBMap, you know all about these issues more than anyone.

Having said that, I always saw the tagging system as a way to improve upon the standard, rather than to only be used internally. I guess it all comes down to the fact that there is no authoritative source for tags, or description of what they are and what they should be used for. Perhaps if there was, the standard could be extended reliably.

Personally, I really wish there was an "explain sam flags" for tags, even if it wasn't authoritative.

ADD REPLY • link updated 6.2 years ago by Ram 44k • written 8.7 years ago by John 13k

1

Entering edit mode

BBMap has various custom tags, but I don't use them as interfaces. They display internal state, rather than sending information to the next process in the pipeline. It takes a huge amount of effort to ensure your software is compliant with "popular" tags (and the general case is impossible, since they can conflict or be insufficiently specified); ensuring compliance with official tags is already difficult enough!

It's a valid use to develop internal pipelines that use "sam" files which require specific unofficial fields that are created by your internal software. But, it is bad practice to publish and promote such things externally, as it fragments the standard.

ADD REPLY • link updated 6.2 years ago by Ram 44k • written 8.7 years ago by Brian Bushnell 20k

Ram · Answer 1 · 2016-02-28

1

Entering edit mode

8.7 years ago

John 13k

I just read this, and I don't quite know how I missed it so many times before, but:

You can freely add new tags, and if a new tag may be of general interest, you can email samtools-devel@lists.sourceforge.net to add the new tag to the specification. Note that tags starting with ‘X’, ‘Y’ and ‘Z’ or tags containing lowercase letters in either position are reserved for local use and will not be formally defined in any future version of this specification.

So maybe my question, as Brian points out, is wrong on principle. Maybe the question should be "why aren't popular mapping tools ensuring their X/Y/Z tags are put into the SAM spec!" :)

I suppose because it has to be reserved retro-actively...?

ADD COMMENT • link updated 6.2 years ago by Ram 44k • written 8.7 years ago by John 13k

1

Entering edit mode

You can also just make a PR on the hts-specs repo on github, it has largely the same effect and I get the feeling that most of the people on the samtools-devel list follow the repo.

I've seen a few requests to get tags added over the last year or two, but for the most part tags end up being really particular to a specific tool or workflow so it's hard to argue that they're general enough to get added to the spec.

ADD REPLY • link 8.7 years ago by Devon Ryan 104k