I am in need developing a script that loops through multiple reference genomes, runs the alignment software for each genome (one by one), extracts and saves the results (only alignment %) from the sam file, and deletes the sam file after execution.
My input would be paired data.
bbsplit
is meant to be used with multiple genomes at the same time.You could put a loop around that to deal with multiple samples:
Do you mean I can do more than 1 paired-read at a time?
Not in one run. One sample at one time against multiple genomes. Doing more than 1 sample would not make logical sense. If you have access to a compute cluster you could start many such jobs in parallel.
Interesting. But I would like to extract alignment stats from bbtools output. How do I go about doing so without producing .sam file (is that possible)?
By default
bbsplit.sh
will produce fastq files (it can do SAM/BAM files, but that is not recommended if you need to visualize the data). If you do not provideout*
directive then you will only get the stats.If I do one sample at a time, then is it really necessary to run a loop around multiple samples?
Does this loop run through the reference genomes one at a time or the samples 1 at a time? It looks like the latter.
The looping is using the samples.
bbsplit.sh
will align each read to each of to the reference genomes and will output reads to a file for the reference they best match. So, not exactly what you wanted, but likely still useful.Ada : Since you changed the content of the original post significantly, above set of comments now appear to be off-topic.
in bash, using parallel: