I'm running the dataprocessing pipeline qscript provided by the GATK people to do dataprocessing according to their best practices. Looking at the files it generates I've noticed that adds some tags to the sam-file, which are non-standard (at least I can't find a explanation for them in the samformat specification) and not under the X/Y/Z "namespace" specified for user defined tag in the specification. Google has not turned up anything this far, so I'm turning here to see if anyone can tell be what the "BD" and "BI" tags are? Understanding this would hopefully help me understand why they hold identical values in my test files, and further down the road see if this is a problem or not. Below is an example what my sam-files look like:
30PPJAAXX090125:1:48:836:440#0 147 chr1 97289 0 76M = 97204 -161 TTGTTGAGTTATTTATGTATATAATTTCATGCAATCTTCATGTTATGGGGATGTTCTAATCCACTGTGACTCTGTC &'"&(()""'"1"&)"&")''")'&""""'"))'"'&"")(""'"(&&&"'""""'")"&&)-'("((0'0''""& BD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN PG:Z:0 RG:Z:exampleBAM.bam BI:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN SM:Z:exampleBAM.bam BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ MQ:i:0
My guess is that these tags are related to the base quality recalibration or the indel realignment - but I'm hoping to confirm this.