looping a list of trimed fasta files to run spades assembler
1
1
Entering edit mode
6.0 years ago
m.al_amiri ▴ 30

Hi everyone, I have a list of trimmed fasta files and need to run spades assembler for them using a list my fasta files name are like that s1_2 s1_3 s1_4 ... each sequence has been trimmed using trimmomatic in a directory with the same name my question is : how I can define a list then run spades for all in a loop?

loop loop a list Spades Assembly sequence • 5.6k views
ADD COMMENT
1
Entering edit mode

Not sure what the input for SPAdes needs to be, but with find . -type f -name "s1_*" you will be able to get the list of all your files

ADD REPLY
0
Entering edit mode

thanks can you write it for my 3 samples s1_2 s1_3 s1_4 ?

ADD REPLY
0
Entering edit mode

I mean listing and then looping the spades

ADD REPLY
0
Entering edit mode

OK, you lost me :/

you only have three input files? I assume you mean fastq rather then fasta, correct? Do you want to run SPAdes on all samples together or once per sample/file ?

ADD REPLY
0
Entering edit mode

sorry, I have all file paired and unpaired fastaq.gz and my samples are 95 samples so I need to make a list and run spades for all

ADD REPLY
0
Entering edit mode

right, and the find command I provided in the first comment is not doing/giving what you want/expect ? Simply run that in your top folder and it will report all files matching the regex in the -name option

ADD REPLY
0
Entering edit mode

thank you still help me. this did not work

for file in $(find . -type f -name "s1_*");
 do
 spades.py -1 *_R1_paired.fastq.gz -2 *_R2_paired.fastq.gz -s *_R1_unpaired.fastq.gz -s *_R2_unpaired.fastq.gz -m 30 -o assembly -careful 
done
ADD REPLY
0
Entering edit mode

Question about the loop - will the output folder ( -o assembly) be overwritten each time it gets a new sample?

ADD REPLY
0
Entering edit mode

Probably. You'll need to check spades manual or its source code to be sure.

ADD REPLY
0
Entering edit mode

There are countless posts with bash loop questions, and countless solutions with either bash for loops or GNU parallel. Please read some of them and try to implement a solution, then ask a more detailed question if you get stuck.

How to run Spades For Nextseq data

Bash loop for files in several directories

For loop script

How to run a set or batch of genome assemblies at once in one go?

ADD REPLY
0
Entering edit mode

I have a directory with the name genome and in this directory, I have 90 directories each one for one sample. I did trimommatic for all then I need to loop them but it does not work. I used this command but nothing happened.

for FILE in (find . -type f -name "s1_*");
do
spades.py -1 *_R1_paired.fastq.gz -2 *_R2_paired.fastq.gz -s *_R1_unpaired.fastq.gz -s *_R2_unpaired.fastq.gz -m 30 -o assembly –careful
done
ADD REPLY
2
Entering edit mode

That's because the syntax is wrong in several ways. Go and take a look at how find works, and how to use commands in for loops (hint: $(find ...)).

Second hint, your loop declares the variable FILE but you then never use it, so it's not really any wonder the loop doesn't work.

Don't blindly copy and paste commands, attempt to understand what they do. This is important, because one time you may copy a command without thinking and erase your data, or maybe worse.

ADD REPLY
0
Entering edit mode

Two things:

  1. Use the code formatting to present your posts better
  2. The hyphen you're using as part of the -careful seems to be a non ASCII character, probably from a copy-paste out of PDF/Word/a website. Ensure you type your commands on the terminal, avoid copy-paste unless you've gained considerable expertise at noticing non ASCII characters when you eyeball text.
ADD REPLY
2
Entering edit mode
6.0 years ago
m.al_amiri ▴ 30

Hi, finally I did it first I made a list in the main directory which all trimmed sequences directories exist. I used the following command:

ls $search_path > list

then I run this command:

cat list | while read line;
do
cd $line
spades.py -1 *_R1_001_paired.fastq.gz -2 *_R2_001_paired.fastq.gz -s *_R1_001_unpaired.fastq.gz -s *_R2_001_unpaired.fastq.gz -m 30 -o assembly --careful
cd ../
done
ADD COMMENT
3
Entering edit mode

Good job figuring it out! You can now work on making this better. For example, the file list doesn't need to exist, you can just:

ls ${search_path} | while read line;
do
# run the commands
done

In your case, you should have specified your problem in the following fashion:


I have 95 sample directories, each of which have files named in the format

s1_1_R1_001.paired.fastq.gz, s1_2_R1_001.paired.fastq.gz, s1_3_R1_001.paired.fastq.gz, ..., 
s2_1_R1_001.paired.fastq.gz, s2_2_R1_001.paired.fastq.gz, s2_3_R1_001.paired.fastq.gz, ..., 
s2_1_R1_001.unpaired.fastq.gz, s2_2_R1_001.unpaired.fastq.gz, s2_3_R1_001.unpaired.fastq.gz, ..., 
s2_1_R2_001.unpaired.fastq.gz, s2_2_R2_001.unpaired.fastq.gz, s2_3_R2_001.unpaired.fastq.gz, ...,

How can I run spades.py per directory passing in all fastq files in the appropriate parameters like so:

spades.py -1 *_R1_001_paired.fastq.gz -2 *_R2_001_paired.fastq.gz -s *_R1_001_unpaired.fastq.gz -s *_R2_001_unpaired.fastq.gz -m 30 -o assembly --careful

In the time you'd take to explain your problem in this fashion, you'd automatically figure out the solution - that's the advantage of a well written post :-)

ADD REPLY

Login before adding your answer.

Traffic: 2296 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6