processing in strelka2 with multiples bam file in directory
0
0
Entering edit mode
4.9 years ago
bioguy24 ▴ 230

If I manually tell strelka2 to use these three bam files below, then I get the desired results of 3 individually genome files in results/variants.

xxx_00.bam
yyy_01.bam
zzz_02.bam

${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py \
  --bam xxx_00.bam \
  --bam yyy_01.bam \
  --bam zzz_02 \
  --referenceFasta <fasta> \
  --callRegions <.bed.gz> \
  --runDir <dir>
# execute strelka2
${dir}/runWorkflow.py -m local -j 20

However, if I try to use a loop like below, each file ovverwrites the other and I only get 1 genome file in results/variants.

  for bam_file in *.bam ; do
  ${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py \
--bam ${bam} \
--referenceFasta ${fasta} \
--runDir ${dir}
 ${dir}/runWorkflow.py -m local -j 20
 done

I also, tried the below which does print the correct format, but does not execute.

printf -- "${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py \\\\\n%s\n\t\t--referenceFasta ${fasta} \\\\\n\t\t--callRegions ${bed} \\\\\n\t\t--exome \\\\\n\t\t--runDir ${dir}\n" \
"$(for bamfile in *.bam; do printf -- "\t\t--bam %s \\\\\n"    "${bamfile}"; done)"
   ${dir}/runWorkflow.py -m local -j 20


${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py \
  --bam xxx_00.bam \
  --bam yyy_01.bam \
  --bam zzz_02 \
  --referenceFasta <fasta> \
  --callRegions <bed> \
  --exome \
  --runDir <dir>
${dir}/runWorkflow.py -m local -j 20

So, my question is what is the best way to process mutilple bam's with strelka2 in a using a loop? Thank you :).

strelka2 • 2.9k views
ADD COMMENT
1
Entering edit mode

Your loop should be around the --bam argument, not the entire command. I'm sure there are better ways to do this, but you can use a bash loop to generate the command and then xargs that command to a sub-shell.

echo "${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py $(for f in *.bam; do echo -n "--bam ${f} "; done;) --referenceFasta ${fasta} --runDir ${dir} ${dir}/runWorkflow.py -m local -j 20" | xargs sh

Try running just the echo first and check the generated command. If it looks fine, add the | xargs sh part.

ADD REPLY
0
Entering edit mode

Things look good untill the | xargs sh then I get:

/usr/local/bin/strelka-2.9.10.centos6_x86_64/bin/configureStrelkaGermlineWorkflow.py: line 27: syntax error near unexpected token `('
/usr/local/bin/strelka-2.9.10.centos6_x86_64/bin/configureStrelkaGermlineWorkflow.py: line 27: `if sys.version_info >= (3,0):'

which I know are python version errors but I am not sure how to fix them as I do not know python. Thank you :.

ADD REPLY
1
Entering edit mode

Can you show me the output of the echo command? If it looks exactly like your first command, this error should not happen (assuming your first command ran fine)

ADD REPLY
0
Entering edit mode

Here is the output: strelka2 v2.9.10.centos6_x86_64, with

python --version
Python 2.7.5

echo "${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py $(for f in duplicate; do echo "--bam ${bam} "; done;) --referenceFasta $genome --callRegions $bed --exome --runDir     ${abra_dir} ${abra_dir}/runWorkflow.py -m local -j 20"

${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py --bam xxx_00.bam 
--bam yyy_01.bam 
--bam zzz_02.bam  --referenceFasta ${fasta} --callRegions ${bed} --exome --runDir ${dir} ${dir}/runWorkflow.py -m local -j 20

${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py line 23: $'\nThis script configures the strelka germline small variant calling workflow\n': command not found
import: unable to open X server `' @ error/import.c/ImportImageCommand/369.
${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py line 27: syntax error near unexpected token `('
${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py line 27: `if sys.version_info >= (3,0):'

Thank you :).

ADD REPLY
1
Entering edit mode

I apologize for this - there's a bug in my echo within the loop that's introducing new line characters and breaking the command. I've fixed it now and added a -n that should take care of it.

It looks different from your first command, but it would take a trained eye to see the difference. The command my code generates has no back-slashes that escape new line characters whereas your command did have back-slashes.

ADD REPLY
0
Entering edit mode

I added the echo -n and get the same as above when the | xargs sh is added. The echo -n without looks good. Thank you for your help :).

ADD REPLY
0
Entering edit mode

Where did you add the echo -n? Can you once again copy-paste the command you used and its output please?

ADD REPLY
0
Entering edit mode

The command with the typo in the bam extention:

echo -n "${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py $(for f in duplicate.bam; do echo "--bam ${f} "; done;) --referenceFasta $genome --callRegions $bed --exome --runDir     ${abra_dir} ${abra_dir}/runWorkflow.py -m local -j 20" | xargs sh

${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py line 23: $'\nThis script configures the strelka germline small variant calling workflow\n': command 
not found
import: unable to open X server `' @ 
error/import.c/ImportImageCommand/369.
${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py line 27: syntax error near unexpected token `('
${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py line 27: `if sys.version_info >= (3,0):'

Thank you :).

ADD REPLY
1
Entering edit mode

It's added in the wrong place. I said "in my echo within the loop" and "I've fixed it now and added a -n that should take care of it".

Please use my command as-is.

ADD REPLY
0
Entering edit mode

I apologize for mis-reading the post and appreciate the help. The same bam file seems to be used in the echo statement and the xargs sh gives the below error. Thank you very much:).

echo "${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py $(for f in *dups.bam; do echo -n "--bam ${bam} "; done;) --referenceFasta ${fasta} --callRegions ${bed} --exome --runDir ${dir} ${dir}/runWorkflow.py -m local -j 20"

${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py --bam zzz_02.dups.bam --bam zzz_02.dups.bam --bam zzz_02.dups.bam  --referenceFasta ${fasta} --callRegions ${bed} --exome --runDir ${dir} ${dir}/runWorkflow.py -m local -j 20

echo "${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py $(for f in *dups.bam; do echo -n "--bam ${bam} "; done;) --referenceFasta ${fasta} --callRegions ${bed} --exome --runDir ${dir} ${dir}/runWorkflow.py -m local -j 20" | xargs sh

Using ${f} within the loop echos all the bam files but the | xargs sh has the same error.

echo "${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py $(for f in *rdups.bam; do echo -n "--bam ${f} "; done;) --referenceFasta $genome --runDir ${abra_dir} ${abra_dir}/runWorkflow.py -m local -j 20"

${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py --bam xxx_00.dups.bam --bam yyy_01.dups.bam --bam zzz_02.dups.bam  --referenceFasta ${fasta} --runDir ${dir} ${dir}/runWorkflow.py -m local -j 20

echo "${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py $(for f in *rdups.bam; do echo -n "--bam ${f} "; done;) --referenceFasta $genome --runDir ${abra_dir} ${abra_dir}/runWorkflow.py -m local -j 20" | xargs sh

{path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py: line 23: $'\nThis script configures the strelka germline small variant calling workflow\n': command not found
import: unable to open X server `' @ error/import.c/ImportImageCommand/369.
${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py: line 27: syntax error near unexpected token `('
${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py: line 27: `if sys.version_info >= (3,0):'

${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py: line 23: $'\nThis script configures the strelka germline small variant calling workflow\n': command not found
import: unable to open X server `' @ error/import.c/ImportImageCommand/369.
${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py: line 27: syntax error near unexpected token `('
${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py: line 27: `if sys.version_info >= (3,0):'
ADD REPLY
1
Entering edit mode

OMG I am such an idiot. Of course it's $f and not $bam. I copy-pasted the dummy code I used here without testing it properly.

It would seem that even an executable .py file fails if executed as sh exec.py. Try using xargs python instead of xargs sh. That worked in my trial.

➜ ./conf.py 1 2 3 # The script conf.py basically prints sys.argv separated by commas.
Hello, 1, 2, 3

➜ echo "conf.py $(for bam in *.bam; do echo -n "$bam "; done;)" # Try with bam files
conf.py a.bam b.bam c.bam

➜ echo "conf.py $(for bam in *.bam; do echo -n "$bam "; done;)" | xargs sh # Try with xargs sh
conf.py: line 3: import: command not found
conf.py: line 5: syntax error near unexpected token `'Hello, ''
conf.py: line 5: `print('Hello, ' + ', '. join(sys.argv[1:]));'

➜ echo "conf.py $(for bam in *.bam; do echo -n "$bam "; done;)" | xargs python # Try with xargs python
Hello, a.bam, b.bam, c.bam

➜ sh conf.py 1 2 3 # Try a plain `sh conf.py`
conf.py: line 3: import: command not found
conf.py: line 5: syntax error near unexpected token `'Hello, ''
conf.py: line 5: `print('Hello, ' + ', '. join(sys.argv[1:]));'
ADD REPLY
0
Entering edit mode

The -m and -j are options in the github site but are not reconized. Thank you :).

echo "${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py $(for f in *dups.bam; do echo -n "--bam ${f} "; done;) --referenceFasta ${fasta} --callRegions ${bed} --exome --runDir ${dir} ${dir}/runWorkflow.py -m local -j 20" | xargs python
Usage: configureStrelkaGermlineWorkflow.py [options]

configureStrelkaGermlineWorkflow.py: error: no such option: -m
ADD REPLY
0
Entering edit mode

I changed the shebang line in the python executable to #!/usr/bin/env python to match the other .py on my system and get the below error.

python --version on centos 7
Python 2.7.5

echo "${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py $(for f in *dups.bam; do echo -n "--bam ${f} "; done;) --referenceFasta ${fasta} --callRegions ${bed} --exome --runDir ${dir} ${dir}/runWorkflow.py -m local -j 20" | python


File "<stdin>", line 1
${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py --bam xxx_00.bam --bam yyy_01.bam --bam zzz_02.bam  --referenceFasta ${fasta} --callRegions ${bed} --exome --runDir ${dir} ${dir}/runWorkflow.py -m local -j 20
                                                                                                                                                                                            ^
                                                                                                                                                                      SyntaxError: invalid syntax
     .... | xargs pyhthon

        error: no such option: -m  

     .... | sh

        error: no such option: -m    

     .... | xargs sh

       ${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py: line 23: $'\nThis script configures the strelka germline small variant calling workflow\n': command not found
       import: unable to open X server `' @ error/import.c/ImportImageCommand/369.
       ${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py: line 27: syntax error near unexpected token `('
       ${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py: line 27: `if sys.version_info >= (3,0):'

I also tried

echo "${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py $"(for f in *dups.bam; do echo -n "--bam ${f} "; done;)" --referenceFasta ${fasta} --callRegions ${bed} --exome --runDir ${dir} ${dir}/runWorkflow.py -m local -j 20" | xargs sh

-bash: syntax error near unexpected token `('

Thank you :).

ADD REPLY
1
Entering edit mode

Don't change the shebang line, xargs it to the right python version (xargs /path/to/python2.7).

Also, your command says you misspelled python as pyhthon. Please check for typos.

The second command (runWorkflow.py) should be separated from the first with a semicolon or a &&, not a white space. Plus, the xargs must be used only for the first command.

If you are trying to write a pipeline with multiple commands, only one of which needs the repeated --bam arguments, please generate the commands using a separate command and then execute them later. Writing a one liner for this will only serve to confuse.

ADD REPLY
0
Entering edit mode

I want to understand the rationale behind this analysis. I have 2-3 samples per patient. Should I run each sample independently? or in groups according to the patient? or a single run with all my samples? I couldn't find information about this. If anyone can help me I will be very thankful.

ADD REPLY

Login before adding your answer.

Traffic: 2175 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6