How to do BBmap batch processing correctly?
0
2
Entering edit mode
8.9 years ago
elvissober ▴ 20

How to do BBmap batch processing correctly?

Here is my bash code:

cd /home/user/Documents/bbmap
./bbmap.sh ref=ref.fa # indexes fasta format file and sets reference
filelist="fastaone.fasta, fastatwo.fasta, fastathree.fasta"
for list in "$filelist"
do ./bbmap.sh

May I also run it with in Python that way (?):

from subprocess import *
call('cd /home/user/Documents/bbmap')
call('./bbmap.sh ref=ref.fa # indexes fasta format file and sets reference
filelist="fastaone.fasta, fastatwo.fasta, fastathree.fasta"
for list in "$filelist"
do ./bbmap.sh ')

Thank you

python sequencing wgs sequence software-error • 5.3k views
ADD COMMENT
0
Entering edit mode

Just to clarify, are you processing the same set of reads against multiple different references, or multiple sets of reads against multiple references?

BBMap has a "nodisk" flag, which will build an index in memory and not write anything to disk. You can use it like this:

bbmap.sh in=reads1.fq out=mapped1.sam ref=ref1.fa nodisk
bbmap.sh in=reads2.fq out=mapped2.sam ref=ref2.fa nodisk
bbmap.sh in=reads3.fq out=mapped3.sam ref=ref3.fa nodisk

...etc. Because no index is written to disk, these can be run sequentially, or at the same time in the same directory, with no risk of collisions. The only reason to write an index to disk is if you will be using it repeatedly, like this:

bbmap.sh ref=ref.fa
(wait for that to finish)
bbmap.sh in=reads1.fq out=mapped1.sam
bbmap.sh in=reads2.fq out=mapped2.sam
bbmap.sh in=reads3.fq out=mapped3.sam

Here, the first process writes an index to disk, and the next 3 load that index for mapping.

ADD REPLY
0
Entering edit mode

multiple sets of reads against one reference, thx

ADD REPLY
0
Entering edit mode

OK, so just do it like this:

bbmap.sh ref=ref.fa
(wait for that to finish)
bbmap.sh in=reads1.fq out=mapped1.sam
bbmap.sh in=reads2.fq out=mapped2.sam
bbmap.sh in=reads3.fq out=mapped3.sam
ADD REPLY
0
Entering edit mode

Is there any possibility for a more prorgrammatic elegant way without rewriting file names and using for loops in Shell or Bash for that? Thank you.

ADD REPLY
1
Entering edit mode

Write a for loop (or similar) in bash, perl, python, etc. Pass in a file with the names of the input files, feed parsed list to loop. You can even put a sample name along with the input files if desired so you can feed the sample name to the loop as well for generating the output name. How to accomplish this will depend on what language you want to write the script in and how you want to parse and pass the data but there are a ton of different ways to proceed.

ADD REPLY
0
Entering edit mode

There is another way:

bbwrap.sh ref=ref.fa in=reads1.fq,reads2.fq,reads3.fq out=mapped1.sam,mapped2.sam,mapped3.sam

That will sequentially map the sets of reads and produce the different output files. You still need all the names, though, so I don't think it's any more convenient. The purpose of bbwrap is mainly to avoid reloading the index every time, in case the index is really huge.

ADD REPLY

Login before adding your answer.

Traffic: 2287 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6