Running BWA mem in a for loop
1
2
Entering edit mode
4.9 years ago
beausoleilmo ▴ 600

I have a small script that will paste the command which is going to run the bwa mem for all my forward and reverse read for a lot of individuals.

The thing is that I would want to remove the echo command so that I can run directly the command inside terminal or within a script for a HPC.

cd ${PATHTOWGSSEQUENCESTRIM}
for R1 in *R1_001_val_*.fq.gz 
  do
    name=${R1%_R1_001_val_*.fq.gz}
    R2=${R1%R1_001_val_*.fq.gz}R2_001_val_*.fq.gz
    echo "bwa mem -t 15 -R '@RG\tID:"$name'\tPL:Illumina\tLB:'$name'\tSM:'$name"'" ${REFGENO}/referencegenome.fa.gz $R1 $R2 '>' ${OUTALIGN}/${name}.sam '2>' ${OUTALIGN}/${name}.BWAmem.log >> ${HPCSCRIPTS}/bwa.align.pbs
done

This is the command that basically needs to be changed to make it work directly. Since there is a variable name in the -R flag, I don't know how to make it work

    echo "bwa mem -t 15 -R '@RG\tID:"$name'\tPL:Illumina\tLB:'$name'\tSM:'$name"'" ${REFGENO}/referencegenome.fa.gz $R1 $R2 '>' ${OUTALIGN}/${name}.sam '2>' ${OUTALIGN}/${name}.BWAmem.log >> ${HPCSCRIPTS}/bwa.align.pbs

I've tried to create another variable to change the naming system in the -R flag, but it's not working...

READGRHEADERLINE=$(echo "'@RG\tID:"$name'\tPL:Illumina\tLB:'$name'\tSM:'$name"'")                
name=${R1%_R1_001_val_*.fq.gz}
echo $READGRHEADERLINE

bwa mem -t 15 -R $READGRHEADERLINE ${REFGENO}/referencegenome.fa.gz $R1 $R2 > ${OUTALIGN}/${name}.sam 2> ${OUTALIGN}/${name}.BWAmem.log >> ${HPCSCRIPTS}/bwa.align.pbs

How would I be able to run the bwa mem command in the for loop and take into account all the file names properly?

BWA alignment WGS • 5.4k views
ADD COMMENT
2
Entering edit mode

I think you're doing it wrong: you should use a workflow manager like snakemake of nextflow.

ADD REPLY
0
Entering edit mode

So true! You're right that this would be better. I just never tried snakemake, but I understand that it'd be better. https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/bwa/mem.html

But in the meantime (before I'm able to use snakemake), is there a way to use the for loop?

ADD REPLY
1
Entering edit mode

But in the meantime

But in the meantime try to use make. e.g: A: Parallel for bwa mem - problem with -R argument for ID and SM

ADD REPLY
0
Entering edit mode

OK! You got me, I'll do it ;).

ADD REPLY
0
Entering edit mode

I haven't tested it but you could simply capture the output. The shell should deal with all the variable expansion (assuming the command as echo was outputting it was correct in the first place), so then you just run the string.

e.g.

...
  command=$(echo "bwa mem -t 15 -R '@RG\tID:"$name'\tPL:Illumina\tLB:'$name'\tSM:'$name"'" ${REFGENO}/referencegenome.fa.gz $RR2 '>' ${OUTALIGN}/${name}.sam '2>' ${OUTALIGN}/${name}.BWAmem.log >> ${HPCSCRIPTS}/bwa.align.pbs)
  command
done

Alternatively, it should just be a case of fixing your quotes (you're using a mix of hard and soft quoting which is probably making life difficult). Switch the whole string to soft quotes and as long as there's nothing too strange going on, it should expand the variables correctly (again I didn't test this though).

ADD REPLY
2
Entering edit mode
4.9 years ago
Gio12 ▴ 220

You can do something like

for i in $(ls -1 *R1.fq.gz | sed 's/\.R1.fq.gz//'); do bwa mem -t 16 /path/to/reference/genome/reference.genome.fa.gz $i\.R1.fq.gz $i\.R2.fq.gz | samtools view -Sb - | samtools sort -@ 16 -o $i\.bam - ; done

Hope this helps

ADD COMMENT

Login before adding your answer.

Traffic: 1779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6