bbduk.sh trimmer with multiple input files

0

Entering edit mode

2.1 years ago

predeus ★ 2.1k

Hi all,

I am using bbduk.sh and was wondering if there's an efficient way to process multiple sets of reads with it? E.g. if you have read 1 as 4 separate files and read 2 as 4 separate files, typical mappers like bowtie2 or STAR support the comma-separated syntax.

Concatenating files seems like a waste of I/O which is under heavy stress already, when we process many samples.

bbduk bbmap bbtools • 1.2k views

ADD COMMENT • link updated 2.0 years ago by GenoMax 152k • written 2.1 years ago by predeus ★ 2.1k

1

Entering edit mode

Have you tried using process substitution or a named pipe? Efficient way of processing data would still be starting 4 jobs in parallel.

ADD REPLY • link 2.1 years ago by GenoMax 152k

0

Entering edit mode

Thank you for the suggestions! Process substitution fails and I'm not sure why - something in its I/O block chokes on stdin I think? The errors don't make much sense. Haven't tried named pipe yet.

Bbduk is already extremely efficient, so even doing things sequentially is actually OK - what takes longer is concatenating the sequences after (and, if they are large, which is often, this causes 100's of GB of unnecessary redundant I/O load). For now I have the "extensive" solution, but I'll post here if I find something that's efficient and sleek.

ADD REPLY • link 2.0 years ago by predeus ★ 2.1k

0

Entering edit mode

Merging BAM's at a point further down the workflow would likely be the most efficient way since samtools can do it multi-threaded.

ADD REPLY • link 2.0 years ago by GenoMax 152k

Login before adding your answer.