Hello, I have a set of BAM files that were generated using the DRAGEN Somatic analysis pipeline. The BAM file read groups are formatted as below:
@RG ID:index.1 LB:UnknownLibrary PU:1 SM:sampleID
@RG ID:index.2 LB:UnknownLibrary PU:2 SM:sampleID
@RG ID:index.3 LB:UnknownLibrary PU:3 SM:sampleID
@RG ID:index.4 LB:UnknownLibrary PU:4 SM:sampleID
@RG ID:index.5 LB:UnknownLibrary PU:5 SM:sampleID
@RG ID:index.6 LB:UnknownLibrary PU:6 SM:sampleID
@RG ID:index.7 LB:UnknownLibrary PU:7 SM:sampleID
@RG ID:index.8 LB:UnknownLibrary PU:8 SM:sampleID
I want to update all the PU fields to include the flow cell ID while maintaining the other information as well as the lane indications like this:
@RG ID:index.1 LB:UnknownLibrary PU:XXXXXXX.1 SM:sampleID
@RG ID:index.2 LB:UnknownLibrary PU:XXXXXXX.2 SM:sampleID
@RG ID:index.3 LB:UnknownLibrary PU:XXXXXXX.3 SM:sampleID
@RG ID:index.4 LB:UnknownLibrary PU:XXXXXXX.4 SM:sampleID
@RG ID:index.5 LB:UnknownLibrary PU:XXXXXXX.5 SM:sampleID
@RG ID:index.6 LB:UnknownLibrary PU:XXXXXXX.6 SM:sampleID
@RG ID:index.7 LB:UnknownLibrary PU:XXXXXXX.7 SM:sampleID
@RG ID:index.8 LB:UnknownLibrary PU:XXXXXXX.8 SM:sampleID
I am struggling to find a quick way to do this via samtools or picard but will be happy to be proven wrong. Feel free to point me to existing answers for this specific problem, I have not found one yet.
TIA for the help.
is this onboard or basespace or ICA? dragen