Entering edit mode
8.9 years ago
elvissober
▴
20
How to do BBmap batch processing correctly?
Here is my bash code:
cd /home/user/Documents/bbmap
./bbmap.sh ref=ref.fa # indexes fasta format file and sets reference
filelist="fastaone.fasta, fastatwo.fasta, fastathree.fasta"
for list in "$filelist"
do ./bbmap.sh
May I also run it with in Python that way (?):
from subprocess import *
call('cd /home/user/Documents/bbmap')
call('./bbmap.sh ref=ref.fa # indexes fasta format file and sets reference
filelist="fastaone.fasta, fastatwo.fasta, fastathree.fasta"
for list in "$filelist"
do ./bbmap.sh ')
Thank you
Just to clarify, are you processing the same set of reads against multiple different references, or multiple sets of reads against multiple references?
BBMap has a "nodisk" flag, which will build an index in memory and not write anything to disk. You can use it like this:
...etc. Because no index is written to disk, these can be run sequentially, or at the same time in the same directory, with no risk of collisions. The only reason to write an index to disk is if you will be using it repeatedly, like this:
Here, the first process writes an index to disk, and the next 3 load that index for mapping.
multiple sets of reads against one reference, thx
OK, so just do it like this:
Is there any possibility for a more prorgrammatic elegant way without rewriting file names and using for loops in Shell or Bash for that? Thank you.
Write a for loop (or similar) in bash, perl, python, etc. Pass in a file with the names of the input files, feed parsed list to loop. You can even put a sample name along with the input files if desired so you can feed the sample name to the loop as well for generating the output name. How to accomplish this will depend on what language you want to write the script in and how you want to parse and pass the data but there are a ton of different ways to proceed.
There is another way:
That will sequentially map the sets of reads and produce the different output files. You still need all the names, though, so I don't think it's any more convenient. The purpose of bbwrap is mainly to avoid reloading the index every time, in case the index is really huge.