bbduk.sh trimming to BAM output file
2
0
Entering edit mode
6 days ago
bge • 0

Hi,

I want to use bbduk.sh to trim reads in a uBAM file and write the trimmed reads to a new uBAM file. It appears that the reads in the output file are not trimmed.

One set of bbduk.sh parameters that I tried is

literal=polyA k=12 mink=11 ktrim=r trimclip=t

I see trimmed reads when the output file format is fastq but not when BAM or SAM.

Perhaps I am missing something?

Thank you!

Brent

bbduk.sh • 348 views
ADD COMMENT
0
Entering edit mode
6 days ago
GenoMax 150k

Using a uBAM input and writing uBAM output is indeed not working directly. Probably because it is an edge case that @Brian likely did not test/code for.

Following seems to work for me for a single-end read file. Convert uBAM to fastq | trim using bbduk | write the result out as uBAM.

$ reformat.sh -Xmx3g in=test.bam out=stdout.fq int=f | bbduk.sh -Xmx3g in=stdin.fq out=stdout.fq int=f forcetrimleft=10 | reformat.sh -Xmx3g in=stdin.fq out=trimmed.bam int=f
ADD COMMENT
0
Entering edit mode

Hi,

I appreciate the confirmation -- I worry that I mess up.

Anyway, the input uBAM file has a tag with the barcode+umi sequences required by STARsolo. I don't see a straightforward way to preserve this information with a conversion to fastq.

Do you know whether this program is maintained?

Thank you!

Brent

ADD REPLY
0
Entering edit mode

BBMap is actively maintained but what you have is an edge case. You can try writing to Brian Bushnell (his email can be found in software in-line help).

Why do you need to trim the data? STARsolo may be able to handle it as is.

ADD REPLY
0
Entering edit mode
3 hours ago

Hi Brent, at one time, and to some extent, BBDuk did trimming of sam/bam files. However, it's a very fiddly process because the cigar strings and MDTags have to be regenerated, and some other tags may end up becoming incorrect as a result of the trimming operation, so I don't really recommend it (although for ubam it wouldn't matter). Generically, one can put metadata in a fastq header and then move it back to a bam field later.

What's happening here is that BBDuk is trimming the read successfully, but then outputting the original untrimmed SamLine anyway instead of regenerating it (if you output as fastq you'll see that the reads actually got trimmed in that case), which is a bug. This is easy to fix in the case of ubam, and I will fix it in my next release. Whether the change will work universally for mapped bam is less likely; I'll probably prevent that.

ADD COMMENT

Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6