Automating Flash Sequence Joins
1
1
Entering edit mode
8.6 years ago
espop23 ▴ 60

Hello,

I have around 100 fastq files, of forward and reverse reads I want to join in flash. I would like to create an automated script that would just go through a folder and join the reads for me.

Does anyone know if this is possible?

Best

flash bash script automate • 3.5k views
ADD COMMENT
0
Entering edit mode

You mean like cat *.fastq > onebig.fastq ?

ADD REPLY
0
Entering edit mode

Is this not just creating a big file? I want to use the program flash. Do you think putting them all in one file and then applying flash is the way to go?

ADD REPLY
1
Entering edit mode

Not necessarily. You could write a for loop and go through the file set (I guess 50 pairs). If you have access to a cluster you could submit all 50 jobs at the same time.

ADD REPLY
0
Entering edit mode

What is flash, could you link it?

ADD REPLY
0
Entering edit mode

FLASH is a read joiner (like BBMerge from BBMap).

ADD REPLY
3
Entering edit mode
8.6 years ago
c.v.oflynn ▴ 100

A for loop would do the trick, assuming the paired reads are named like; reads1_1.fq reads1_2.fq

for i in $(ls *fq | grep  "_1" | cut -f 1 -d "_"); do flash ${i}_1.fq ${i}_2.fq; done
ADD COMMENT
0
Entering edit mode

May want to post an additional version with R1/R2 nomenclature since that is default filenames with Illumina pipelines.
@espop23 add delete flash options as needed.

ADD REPLY
0
Entering edit mode

The files appear like this: NG-7284_49811102_lib40117_2432_1_1.fastq ; NG-7284_49811102_lib40117_2432_1_2.fastq - does this change anything?

ADD REPLY
0
Entering edit mode

Give following a try. If you are running this on a single machine it may not be advisable to run the jobs like this since all 50 jobs would be submitted at the same time.

for i in $(ls *_1.fastq | cut -f 1-5 -d "_"); do flash ${i}_1.fastq ${i}_2.fastq; done

This will work only if all files names follow the nomenclature you posted above.

ADD REPLY
0
Entering edit mode
ls *_1.fastq | grep -Poh ".*_1" | sed 's/1$//'

will work if the number of underscores is not always the same

ADD REPLY
0
Entering edit mode

Is there a way to create folders for each pair in the script or someway to separate the output? Since it seems the output files are being overwritten each time...

ADD REPLY
0
Entering edit mode

I can only speculate and say that you are not running the command right. Can you post the exact command you are running?
We assumed you know how to run FLASH from before. You may need to send the output to new file something like (use the correct syntax I am only generalizing using an output redirect).

for i in $(ls *_1.fastq | cut -f 1-5 -d "_"); do flash ${i}_1.fastq ${i}_2.fastq > $i\_merged.fastq ; done
ADD REPLY
0
Entering edit mode

I ran the exact line you last wrote, and got a merged fastq for each input file. There is only one out.extendedFrags.fastq - is this as expected? (I am new to FLASH, apologies for confusion)

ADD REPLY
0
Entering edit mode

I only got one output, when I ran it on 12 files with the nomenclature I wrote previously. Not sure what is wrong..

ADD REPLY

Login before adding your answer.

Traffic: 1228 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6