Modify read groups in BAM file
1
0
Entering edit mode
4 days ago
Alex • 0

Hello, I have a set of BAM files that were generated using the DRAGEN Somatic analysis pipeline. The BAM file read groups are formatted as below:

@RG ID:index.1  LB:UnknownLibrary   PU:1    SM:sampleID
@RG ID:index.2  LB:UnknownLibrary   PU:2    SM:sampleID
@RG ID:index.3  LB:UnknownLibrary   PU:3    SM:sampleID
@RG ID:index.4  LB:UnknownLibrary   PU:4    SM:sampleID
@RG ID:index.5  LB:UnknownLibrary   PU:5    SM:sampleID
@RG ID:index.6  LB:UnknownLibrary   PU:6    SM:sampleID
@RG ID:index.7  LB:UnknownLibrary   PU:7    SM:sampleID
@RG ID:index.8  LB:UnknownLibrary   PU:8    SM:sampleID

I want to update all the PU fields to include the flow cell ID while maintaining the other information as well as the lane indications like this:

@RG ID:index.1  LB:UnknownLibrary   PU:XXXXXXX.1    SM:sampleID
@RG ID:index.2  LB:UnknownLibrary   PU:XXXXXXX.2    SM:sampleID
@RG ID:index.3  LB:UnknownLibrary   PU:XXXXXXX.3    SM:sampleID
@RG ID:index.4  LB:UnknownLibrary   PU:XXXXXXX.4    SM:sampleID
@RG ID:index.5  LB:UnknownLibrary   PU:XXXXXXX.5    SM:sampleID
@RG ID:index.6  LB:UnknownLibrary   PU:XXXXXXX.6    SM:sampleID
@RG ID:index.7  LB:UnknownLibrary   PU:XXXXXXX.7    SM:sampleID
@RG ID:index.8  LB:UnknownLibrary   PU:XXXXXXX.8    SM:sampleID

I am struggling to find a quick way to do this via samtools or picard but will be happy to be proven wrong. Feel free to point me to existing answers for this specific problem, I have not found one yet.

TIA for the help.

samtools BAM picard DRAGEN • 365 views
ADD COMMENT
0
Entering edit mode

is this onboard or basespace or ICA? dragen

ADD REPLY
1
Entering edit mode
4 days ago

I agree that the syntax and efficiency of read group handling in these tools are surprisingly convoluted.

It would take me many tries to get the syntax right, and I expect to find that it takes "forever "to run and maybe needs multiple passes.

What I would do instead is bypass the whole thing, rewrite the header manually like so:

# generate the header
samtools view -H alignment.bam  > head.sam

# generate the body
samtools view alignment.bam > body.sam

now edit the head.sam to so the PU fields contain what you want. Then, rejoin the SAM files into BAM.

cat head.sam body.sam | samtools view -b > out.bam

If some sort of automation is desired, I would write a Python program that does the above by running samtools as a process :-)

ADD COMMENT
0
Entering edit mode

Thanks Istvan, this seems like a reasonable enough solution that could be scripted across large batches of files. I'll give it a shot. Was really hoping I was missing something obvious in samtools or picard but it appears not. On the side, I've asked Illumina why their pipeline isn't including this information by default in the first place.

ADD REPLY

Login before adding your answer.

Traffic: 2224 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6