bwa mem: Passing a variable to read group
2
7
Entering edit mode
7.1 years ago
firestar ★ 1.6k

I would like to add read group info (-R) during the mapping/alignment stage as part of my variant calling gatk pipeline.

I am doing something like this:

bwa mem \
-M \
-t 8 \
-v 3 \
-R <(sh a-illumina-read-group.sh $1) \
"$path_dr_bwaindex_genome" \
$1 $2

where the read group string is generated automatically from the fastq file by the shell script a-illumina-read-group.sh. It produces a string like:

'@RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA'

but, when I run bwa, it fails with this error: [E::bwa_set_rg] the read group line is not started with @RG

I have tried excluding the single quotes ('') around read group info, but that didn't change anything. I also tried variations in how the variable is passed.

-R=$(sh a-illumina-read-group.sh $1) \

-R=$(echo $(sh a-illumina-read-group.sh $1)) \

Just as a test, I tried:

rg='@RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA'

-R <(echo $rg) \

But, they all produce the same error. I would appreciate any solutions to this issue.

I understand that I can add read groups later on using PicardTools AddOrReplaceReadGroup, but I thought it might be convenient doing it here in one step.

Thanks.

bwa bwa-mem alignment variant calling gatk • 21k views
ADD COMMENT
0
Entering edit mode

rmf, I had a similar idea and met with the same result. Did you ever find a solution?

ADD REPLY
1
Entering edit mode

I have added an answer.

ADD REPLY
7
Entering edit mode
6.6 years ago
firestar ★ 1.6k

What worked for me was to read the read group information from the fastq file during the mapping run. My read name in the fastq file looks like this: @ST-E00274:188:H3JWNCCXY:4:1101:5142:1221 1:N:0:NTTGTA.

The bash file looks like this:

#!/bin/bash

header=$(zcat $1 | head -n 1)
id=$(echo $header | head -n 1 | cut -f 1-4 -d":" | sed 's/@//' | sed 's/:/_/g')
sm=$(echo $header | head -n 1 | grep -Eo "[ATGCN]+$")
echo "Read Group @RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA"

bwa mem \
-M \
-t 8 \
-v 3 \
-R $(echo "@RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA") \
"$path_bwaindex_genome" \
$1 $2 | samblaster -M | samtools fixmate - - | samtools sort -O bam -o "mapped-bwa.bam"

And the bash file is run as

bwa-mapper.sh read_1.fq.gz read_2.fq.gz

You can remove this part (| samblaster -M | samtools fixmate - - | samtools sort -O bam -o) and replace with > if you don't need it. samblaster marks duplicates like Picard. I don't remember what fixmate does. The sort part sorts the SAM file and generates an output BAM.

ADD COMMENT
1
Entering edit mode

Thanks, rmf ! It worked for me.

ADD REPLY
0
Entering edit mode

Hi there, thanks for providing your bash script. However, I tried to use the batch script and it printed this as below:

gzip: SRR34567_1.fastq: not in gzip format Read Group @RG\tID:\tSM:_\tLB:_\tPL:ILLUMINA [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 1600000 sequences (240000000 bp)... [M::process] read 1600000 sequences (240000000 bp)...

Nothing printed to the RG when I checked the outputted SAM file using samtools view -H sample.bam | grep '@RG'

So I wasn't sure what went wrong. Thank you.

ADD REPLY
0
Entering edit mode

Well.. Your error message says your input fastq file is not in gzip format. In that case you have to modify the script to accept non-gzipped fastq files. Change zcat to cat. That's as far as I can see.

ADD REPLY
0
Entering edit mode

Thank you so much for your kind reply. I have modified the zcat to cat to accept non-zipped fastq file and it did print out the read group; however, it will not run the bwa mem.

./test.sh SRR5038390_1.fastq SRR5038390_2.fastq

Read Group @RG\tID:SRR5038390.1 1 length=150\tSM:SRR5038390.1 1 length=150_\tLB:SRR5038390.1 1 length=150_\tPL:ILLUMINA

Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]

So I wasn't sure what went wrong. Thank you so much.

My script as below test.sh), basically I just changed the last part:

bwa mem \

-M \

-t 8 \

-R $(echo "@RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA") \

/home/index/hg19/hg19bwaidx \

$1 $2 |samtools sort -O bam -o "mapped-bwa.bam"

ADD REPLY
0
Entering edit mode

My header looks different:

> zcat OZBenth2_R1.fq.gz | head -n 1
@HWI-ST945_0069:2:1101:1483:1976#GNNNNN/1
> echo $(echo $header | head -n 1 | cut -f 1-4 -d":" | sed 's/@//' | sed 's/:/_/g')
HWI-ST945_0069_2_1101_1483
> echo $(echo $header | head -n 1 | grep -Eo "[ATGCN]+$")

Is there a way to make it compatible for all different Illumina headers. Did anyone use https://github.com/ekg/bamaddrg?

ADD REPLY
0
Entering edit mode

Once you figure out what the different parts of your header mean, you can split them apart using regular expressions.

ADD REPLY
2
Entering edit mode
7.1 years ago
<(sh a-illumina-read-group.sh $1)

returns a temporary filename. just try echo <(sh a-illumina-read-group.sh $1) and see...

you want (anti-quote)

`sh a-illumina-read-group.sh $1`
ADD COMMENT
0
Entering edit mode

-R echo <(sh a-illumina-read-group.sh $1) didn't work. Got same error. With the anti-quote, I am not sure exactly what you meant. But, I tried -R echo <(sh a-illumina-read-group.sh $1) and -R <(sh a-illumina-read-group.sh $1). Both don't work. I get the same error as before plus this new error: /var/spool/slurmd/job11433254/slurm_script: line 52: @RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA: command not found. Sorry about the weird formatting. The backticks inside backticks don't seem to work.

ADD REPLY
0
Entering edit mode

i just wanted to show you that

 echo <(sh a-illumina-read-group.sh $1)

will print a tmp filename !

ADD REPLY
0
Entering edit mode

Ahan. Yea. sure. That works. The variable prints the correct content as expected. It's just that bwa doesn't accept it.

ADD REPLY
0
Entering edit mode

Hmmm.. Actually not.

This works

echo $(sh a-illumina-read-group.sh $1)

@RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA

This doesn't

echo <(sh a-illumina-read-group.sh $1)

/dev/fd/62

ADD REPLY

Login before adding your answer.

Traffic: 1853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6