Hi all, My (paired-end) sequencing data is comprised of several samples:
fq/S01_R1_fq.gz
fq/S01_R2_fq.gz
fq/S02_R1_fq.gz
fq/S02_R2_fq.gz
I need to align the reads to reference genome and to convert the output to BAM format. I'd like to use parallel for that but I don't know how to do it. That's what I got for now:
parallel "bwa mem \
-t 10 \
-M hg1kv37.fa {1} {2} \
-v 1 -R "@RG\tID:{1}\tSM:{1}\tPL:ILLUMINA\tLB:{1}" \
| samtools view -Sb - > {=1 s:fq/:aln/:;s:R1_fq\.gz:alignment.bam:; =}" ::: fq/*R1.fq.gz ::: fq/*R2.fq.gz
I tried without quotes:
parallel: Warning: Input is read from the terminal. You either know what you
parallel: Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
parallel: Warning: ::: or :::: or -a or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.
So I added quotes but now I keep getting the following error:
/usr/bin/sh: aln/S01_alignment.bam: No such file or directory
[E::bwa_set_rg] no ID within the read group line
I guess samtools is trying to find the output file for some reason. Any suggestions?
Another option would be to use snakemake. This can handle all the parallel things for.
An introduction that might be useful for you, is this tutorial I wrote some time ago:
see BWA mem on multiple samples
With quotes, you have nested double quotes, so your shell is seeing this as one block:
And this as another block:
I would create a bash script which takes R1 and R2 as arguments, performs the file name manipulations, and then maps with bwa. You can then use this script with GNU Parallel.
thank you all for your answers! @h.mon, I tried various combinations of quotes, including combinations of different types of quotes but it didn't help. @finswimmer, thanks for the link, I'll check it :)