Entering edit mode
4.9 years ago
bioguy24
▴
230
If I manually tell strelka2 to use these three bam files below, then I get the desired results of 3 individually genome files in results/variants.
xxx_00.bam
yyy_01.bam
zzz_02.bam
${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py \
--bam xxx_00.bam \
--bam yyy_01.bam \
--bam zzz_02 \
--referenceFasta <fasta> \
--callRegions <.bed.gz> \
--runDir <dir>
# execute strelka2
${dir}/runWorkflow.py -m local -j 20
However, if I try to use a loop like below, each file ovverwrites the other and I only get 1 genome file in results/variants.
for bam_file in *.bam ; do
${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py \
--bam ${bam} \
--referenceFasta ${fasta} \
--runDir ${dir}
${dir}/runWorkflow.py -m local -j 20
done
I also, tried the below which does print the correct format, but does not execute.
printf -- "${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py \\\\\n%s\n\t\t--referenceFasta ${fasta} \\\\\n\t\t--callRegions ${bed} \\\\\n\t\t--exome \\\\\n\t\t--runDir ${dir}\n" \
"$(for bamfile in *.bam; do printf -- "\t\t--bam %s \\\\\n" "${bamfile}"; done)"
${dir}/runWorkflow.py -m local -j 20
${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py \
--bam xxx_00.bam \
--bam yyy_01.bam \
--bam zzz_02 \
--referenceFasta <fasta> \
--callRegions <bed> \
--exome \
--runDir <dir>
${dir}/runWorkflow.py -m local -j 20
So, my question is what is the best way to process mutilple bam's with strelka2 in a using a loop? Thank you :).
Your loop should be around the
--bam
argument, not the entire command. I'm sure there are better ways to do this, but you can use a bash loop to generate the command and thenxargs
that command to a sub-shell.Try running just the
echo
first and check the generated command. If it looks fine, add the| xargs sh
part.Things look good untill the | xargs sh then I get:
which I know are python version errors but I am not sure how to fix them as I do not know python. Thank you :.
Can you show me the output of the
echo
command? If it looks exactly like your first command, this error should not happen (assuming your first command ran fine)Here is the output: strelka2 v2.9.10.centos6_x86_64, with
Thank you :).
I apologize for this - there's a bug in my
echo
within the loop that's introducing new line characters and breaking the command. I've fixed it now and added a-n
that should take care of it.It looks different from your first command, but it would take a trained eye to see the difference. The command my code generates has no back-slashes that escape new line characters whereas your command did have back-slashes.
I added the echo -n and get the same as above when the | xargs sh is added. The echo -n without looks good. Thank you for your help :).
Where did you add the
echo -n
? Can you once again copy-paste the command you used and its output please?The command with the typo in the bam extention:
Thank you :).
It's added in the wrong place. I said "in my
echo
within the loop" and "I've fixed it now and added a-n
that should take care of it".Please use my command as-is.
I apologize for mis-reading the post and appreciate the help. The same bam file seems to be used in the echo statement and the xargs sh gives the below error. Thank you very much:).
Using ${f} within the loop echos all the bam files but the | xargs sh has the same error.
OMG I am such an idiot. Of course it's
$f
and not$bam
. I copy-pasted the dummy code I used here without testing it properly.It would seem that even an executable
.py
file fails if executed assh exec.py
. Try usingxargs python
instead ofxargs sh
. That worked in my trial.The -m and -j are options in the github site but are not reconized. Thank you :).
I changed the shebang line in the python executable to #!/usr/bin/env python to match the other .py on my system and get the below error.
I also tried
Thank you :).
Don't change the shebang line, xargs it to the right python version (
xargs /path/to/python2.7
).Also, your command says you misspelled python as pyhthon. Please check for typos.
The second command (
runWorkflow.py
) should be separated from the first with a semicolon or a&&
, not a white space. Plus, thexargs
must be used only for the first command.If you are trying to write a pipeline with multiple commands, only one of which needs the repeated
--bam
arguments, please generate the commands using a separate command and then execute them later. Writing a one liner for this will only serve to confuse.I want to understand the rationale behind this analysis. I have 2-3 samples per patient. Should I run each sample independently? or in groups according to the patient? or a single run with all my samples? I couldn't find information about this. If anyone can help me I will be very thankful.