Parallel for bwa mem - problem with -R argument for ID and SM
3
0
Entering edit mode
6.7 years ago
Korsocius ▴ 260

Dear all,

I need help with the Read group ID and SM. I have parallel syntax to alignment via bwa mem.

ls *R1_001.fastq | parallel 'bwa mem -k 19 -A 1 -B 4 -O 6 -L 5 -R '@RG\tID:'{.}'\tSM:'{.}'\tLB:'Trusight_custom_amplicon_CARR'\tPL:'ILLUMINA'\tPI:150' $REFERENCE {} {= s/_R1_001/_R2_001/ =} > {= s/_R1_001.fastq/.sam/ =}'

For the -R argument I would like to get name of sample.. I tried save the name like variable and echo it and so on, but still time doesn't work. Could you help me please to get name of sample to ID and SM ?

Thank you.

parallel alignment bwa • 3.1k views
ADD COMMENT
3
Entering edit mode
6.7 years ago

second answer, use a Makefile: the same way than in my previous answer C: Is it possible to run variant calling software in parallel, are there any Shell

to run 16 parallel jobs invoke with

make -j 16

ADD COMMENT
0
Entering edit mode

Make file looks like the best option for all these process :-), i tried it without quotes, but I received error log : No input for ID.....

ADD REPLY
0
Entering edit mode
6.7 years ago

(not tested) try to remove those single quotes ?

'@RG\tID:'{.}'\tSM:'{.}'\tLB:'Trusight_custom_amplicon_CARR'\tPL:'ILLUMINA'\tPI:150'

to

'@RG\tID:{.}\tSM:{.}\tLB:Trusight_custom_amplicon_CARR\tPL:ILLUMINA\tPI:150'
ADD COMMENT
0
Entering edit mode
5.7 years ago
ATpoint 85k

For parallel jobs it is convenient to wrap the actual job script into a function and use parallel to parallelize this function:

## Want to do parallel alignments:
function BWAMEM {

  BASENAME=$1
  IDX=$2
  bwa mem (options...) -R '@RG\tID:'${BASENAME}'_ID\tSM:'${BASENAME}'_SM\tPL:Illumina' $IDX ${BASENAME}.R1_001.fastq | \
  samtools view -o ${BASENAME}_unsorted.bam

}; export -f BWAMEM

## Call the function within parallel using awk to extract the basename of the files without the ".R1_001.fastq":
ls *.R1_001.fastq | awk -F ".R1_001.fastq" '{print $1}' | parallel "BWAMEM {} /path/to/bwa_index 2> {}.log"

Using 2> {}.log will print all stderr messages from within the function to one log file per processed fastq file. Say you have a file test.R1_001.fastq, you'll get as output test_unsorted.bam and test.log. Using this approach you do not have to bother yourself with quotes within the parallel function. You can do basically everything you want inside the BWAMEM function and parallel only executes it. Try squeezing an awk command with all its single-and double quotes into parallel, it will be...fun. Or simply write it into a wrapper function to avoid the issues ;-)

ADD COMMENT

Login before adding your answer.

Traffic: 2565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6