How to run MLST with multiple fastq files
1
0
Entering edit mode
3.1 years ago
Kumar ▴ 170

Hi, I am trying to run a bash script for MLST at CGE (https://cge.cbs.dtu.dk/services/MLST/). I have fastq files and downloaded the MLST program. When I ran the program with single fastq file (R1 and R2) it is able to generate the results but I am getting problem with the batch run in the script. Please see the script below and the error message.

#!/usr/bin/env bash
if [ -z $1 ] ; then
echo "Hint: $0 Input_Directory_Containing_.fastq.gz Speciesname"
exit 1
fi
inputdir=$1
species=$2
export MLST_DB=/home/CGE_server/mlst/mlst_db

cd ~/HOME_MLST
for files in "$inputdir"/*r1.fastq.gz ; do
echo "Working on $files and ${files/r1.fastq.gz/r2.fastq.gz} for species $species"
#cp $files ${files/r1.fastq.gz/r2.fastq.gz} .
bfile=$(basename $files)
mkdir "${bfile%%r1.fastq.gz}"
sudo docker run --rm -it -v $MLST_DB:/database -v $(pwd):/workdir mlst -i $bfile ${bfile/r1.fastq.gz/r2.fastq.gz} -o ${bfile%%r1.fastq.gz} -s $species -x
echo "Finished working on $files"
done

ERROR:

$bash mlst.sh ecoli
Working on ecoli/*r1.fastq.gz and ecoli/*r2.fastq.gz for species 
usage: mlst.py [-h] -i INFILE [INFILE ...] [-o OUTDIR] -s SPECIES
           [-p DATABASE] [-t TMP_DIR] [-mp METHOD_PATH] [-x] [-q]
           [-matrix]
mlst.py: error: argument -s/--species: expected one argument
Finished working on ecoli/*r1.fastq.gz
MLST Bash Genome • 2.1k views
ADD COMMENT
1
Entering edit mode
3.1 years ago
cfos4698 ★ 1.1k

Your script isn't doing what you intend it to do, or, at least, you aren't running it in a way that makes it work as intended. Your script expects the first command line arg to be the inputdir, and the second command line arg to be the species name. However, you're running it this way:

bash mlst.sh ecoli

So, 'ecoli' is treated as the inputdir, and no species name is provided. Hence, the second command arg ($2) is empty, which leads to the 'species' variable being empty, and the -s flag of mlst.py evaluates to nothing. Accordingly, the following error occurs: 'error: argument -s/--species: expected one argument'.

Try the following:

#!/usr/bin/env bash
[ "$#" -ne 2 ] &&  echo "Hint: $0 Input_Directory_Containing_.fastq.gz Speciesname" && exit 1

inputdir=$1
species=$2
export MLST_DB=/home/CGE_server/mlst/mlst_db

cd ~/HOME_MLST

for i in "${inputdir}"/*.r1.fastq.gz ; do
R1=${i}
R2=${i/.r1.fastq.gz/.r2.fastq.gz}
echo $R1
echo $R2
outname=${R1%.r1.fastq.gz}
mkdir ${outname}

echo "Working on ${R1} and ${R2} for species $species"

sudo docker run --rm -it -v $MLST_DB:/database -v $(pwd):/workdir mlst -i ${R1} ${R2} -o ${outname} -s ${species} -x
echo "Finished working on ${outname}"
done

Note: I can't run the program, so just assuming this will work.

ADD COMMENT
0
Entering edit mode

Hi, Thank you for your suggestions.

I am running the script. However, it shows below error.

 $bash mlst.sh ecoli

error message:

Hint: mlst.sh Input_Directory_Containing_.fastq.gz Speciesname
ADD REPLY
0
Entering edit mode

Did you try thinking about what to do after seeing the hint and seeing the long answer I wrote? I specifically said "Your script expects the first command line arg to be the inputdir, and the second command line arg to be the species name." You only provided one command line argument, the species name, and it was in the wrong position. What's missing? What order should you put the arguments in?

ADD REPLY
0
Entering edit mode

Off course. I read your comments otherwise how I knew the error. As script expects the first command line arg to be the inputdir, and the second command line arg to be the species name, I am giving first argument mlst.sh and second ecoli.

Anyway I ran my script with the same running command ($bash mlst.sh ecoli) as inputdir (pwd) and got run.

inputdir=$(pwd) species=$1

ADD REPLY
0
Entering edit mode

You ran bash mlst.sh species.

mlst.sh is a file, not a directory.

Correct:

bash mlst.sh inputdir species

ADD REPLY
0
Entering edit mode

Thank you for letting me know. I tried in this way also but it show following error. I am in working directory and giving the command (bash mlst.sh /home/MLST/ ecoli). Sorry I am new in bash scripting.

/home/MLST//*.r1.fastq.gz
/home/MLST//*.r2.fastq.gz
mkdir: cannot create directory ‘/home/MLST//Lm03_S7_L001_r1.fastq.gz’: File exists
mkdir: cannot create directory ‘/home/MLST//Lm03_S7_L001_r2.fastq.gz’: File exists
mkdir: cannot create directory ‘/home/MLST//mlst_full_directoryfastq.sh’: File exists
mkdir: cannot create directory ‘/home/MLST//mlst.sh’: File exists
Working on /home/MLST//*.r1.fastq.gz and /home/bmaurice/LM_MLST//*.r2.fastq.gz for species ecoli
usage: mlst.py [-h] -i INFILE [INFILE ...] [-o OUTDIR] -s SPECIES
           [-p DATABASE] [-t TMP_DIR] [-mp METHOD_PATH] [-x] [-q]
           [-matrix]
 mlst.py: error: unrecognized arguments: /home/MLST//Lm03_S7_L001_r2.fastq.gz /home/MLST//mlst_full_directoryfastq.sh /home/MLST//mlst.sh
 Finished working on /home/MLST//*
ADD REPLY

Login before adding your answer.

Traffic: 1471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6