Hello all :)
I created an umapped bam with all my metadata in it, including an RGID in the format:
@RG ID:HWI-ST552.C5WGPACXX.1.AGTTCC SM:B6.BKS-Leprdb/J LB:Mm08.WGBS PL:illumina CN:Essen DS:Obese
.
I then used bwa-meth to map my trimmed reads, which seems to have inserted its own RGID for every read:
HWI-ST552:160:C2J56ACXX:2:1211:16219:92907 99 chr1 3000373 44 101M = 3000544 272 TATGTTTTTTAGTTAGTTTGGTTAAGGGTTTATTTATTTTGTTGATTTTTTTAAAGAATTAGTTATTAGTTTGGTTGATTTTTTGAATATTTTTTTTTGTT @?@DFFFFHHHHFGHGIIJJJEHEHGIJEFHHHIJIIIJJGHH@<FGHHIGICCAA=EEECEHFHHEA@CDFFDA?BDCDDDDD?@CCEDDEEDDDDD<BC NM:i:1 MD:Z:68T32 AS:i:98 XS:i:93 RG:Z:44_Mm01_WEAd_C2_WGBS_E_1_L002__trimmed YC:Z:CT YD:Z:f
But now, surprisingly, when i use Picard's MergeBamAlignment, even though the only ATTRIBUTES_TO_RETAIN=XS, I end up with a file with the RGID from the unmapped bam in the header, and the RGID of bwa-meth in every read. I also seem to have lost the YC and YD tags that bwa-meth puts in - i'm not sure how important these tags are (or what they are used for).
HWI-ST552:160:C2J56ACXX:2:1211:16219:92907 99 chr1 3000373 44 101M = 3000544 272 TATGTTTTTTAGTTAGTTTGGTTAAGGGTTTATTTATTTTGTTGATTTTTTTAAAGAATTAGTTATTAGTTTGGTTGATTTTTTGAATATTTTTTTTTGTT @?@DFFFFHHHHFGHGIIJJJEHEHGIJEFHHHIJIIIJJGHH@<FGHHIGICCAA=EEECEHFHHEA@CDFFDA?BDCDDDDD?@CCEDDEEDDDDD<BC MC:Z:101M MD:Z:68T32 PG:Z:bwa-meth RG:Z:44_Mm01_WEAd_C2_WGBS_E_1_L002__trimmed NM:i:1 MQ:i:60 UQ:i:31 AS:i:98 XS:i:93
I now can't merge reads or mark duplicates, etc, until this is all fixed up.
ERROR: Record 1, Read name HWI-ST552:217:C5WGPACXX:1:1209:10866:87788, RG ID on SAMRecord not found in header: 44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed
WARNING: Read name HWI-ST552:217:C5WGPACXX:1:1209:10866:87788, A record is missing a read group
At the moment none of my file have been merged so one option would be to set the RGID of each read to the RGID in the header. What is the easiest way to do this? Also, is this a bug that I should report on Picard's Github, or is this a feature?
I still have the original bwa-meth mapped bams, and the uBAM, and the merged BAMs, so whatever is easiest to move forward I can do :)