using GNU parallel for bwa mem and samtools
1
0
Entering edit mode
5.3 years ago
little_more ▴ 70

Hi all, My (paired-end) sequencing data is comprised of several samples:

fq/S01_R1_fq.gz
fq/S01_R2_fq.gz
fq/S02_R1_fq.gz
fq/S02_R2_fq.gz

I need to align the reads to reference genome and to convert the output to BAM format. I'd like to use parallel for that but I don't know how to do it. That's what I got for now:

parallel "bwa mem \
-t 10 \
-M hg1kv37.fa {1} {2} \
-v 1 -R "@RG\tID:{1}\tSM:{1}\tPL:ILLUMINA\tLB:{1}" \
| samtools view -Sb - > {=1 s:fq/:aln/:;s:R1_fq\.gz:alignment.bam:; =}" ::: fq/*R1.fq.gz ::: fq/*R2.fq.gz

I tried without quotes:

parallel: Warning: Input is read from the terminal. You either know what you
parallel: Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
parallel: Warning: ::: or :::: or -a or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.

So I added quotes but now I keep getting the following error:

/usr/bin/sh: aln/S01_alignment.bam: No such file or directory
[E::bwa_set_rg] no ID within the read group line

I guess samtools is trying to find the output file for some reason. Any suggestions?

samtools bwa parallel • 4.1k views
ADD COMMENT
1
Entering edit mode

Another option would be to use snakemake. This can handle all the parallel things for.

An introduction that might be useful for you, is this tutorial I wrote some time ago:

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

With quotes, you have nested double quotes, so your shell is seeing this as one block:

"bwa mem \
-t 10 \
-M hg1kv37.fa {1} {2} \
-v 1 -R "

And this as another block:

" \
| samtools view -Sb - > {=1 s:fq/:aln/:;s:R1_fq\.gz:alignment.bam:; =}"

I would create a bash script which takes R1 and R2 as arguments, performs the file name manipulations, and then maps with bwa. You can then use this script with GNU Parallel.

ADD REPLY
0
Entering edit mode

thank you all for your answers! @h.mon, I tried various combinations of quotes, including combinations of different types of quotes but it didn't help. @finswimmer, thanks for the link, I'll check it :)

ADD REPLY
3
Entering edit mode
5.3 years ago
ole.tange ★ 4.5k

It makes it easier to read and to test if you create a function and call that.

You can then test your function on a single file set before handing it to GNU Parallel:

bwatosam() {
  id="$1"

  bwa mem -t 10 -M -R "@RG\tID:{}\tSM:{}\tPL:ILLUMINA\tLB:{}" hg1kv37.fa "$id"_R1.fq.gz "$id"_R2.fq.gz |
    samtools view -o "$id".bam
}
export -f bwatosam

# --plus enables {%...} to remove a substring    
ls *_R1.fq.gz |
  parallel --plus bwatosam {%_R1.fq.gz}

(Are you coming for the anniversary party or hosting your own? https://www.gnu.org/software/parallel/10-years-anniversary.html )

ADD COMMENT
0
Entering edit mode

Thank you so much for your solution! it seems to be exactly what I need but I get an error saying:

/usr/bin/sh: line 1: mem: command not found

I think it might be caused by the fact that I use a variable for calling bwa:

bwa=/work/gr-fe/lawless/tool/bwa-0.7.17/bwa

and then

$bwa mem ...

Do you know how can I handle it?

I also tried adding alias to bwa to my ~/.bashrc file so now I can call bwa from the terminal. But if I try to call bwa from within the script I get the error:

/usr/bin/sh: line 1: bwa: command not found

Any help appreciated!

ADD REPLY
0
Entering edit mode

I seem to overcome the problem with bwa: I just added a path to ~/.bashrc file.

Now I have another issue:

[E::bwa_idx_load_from_disk] fail to locate the index files

although the dir contains index files. What should I do?

ADD REPLY
0
Entering edit mode

Just put the path to your index file within the function

bwa mem -t 10 -M -R "@RG\tID:{}\tSM:{}\tPL:ILLUMINA\tLB:{}" path/to/hg19/
ADD REPLY

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6