Question

How to assemble mulitple paired-end files?

0

Entering edit mode

4.3 years ago

A_heath ▴ 170

Hi all,

I downloaded multiple paired-end reads from the SRA (NCBI) and I want to assemble two paired-end read files with each other. All the files look like this:

SRRXXX_1.fastq

SRRXXX_2.fastq

SRRYYY_1.fastq

SRRYYY_2.fastq

SRRWWW_1.fastq

SRRWWW_2.fastq

First, I use Megahit to do so, is that appropriate for what I want to do?

Then, I tried this:

for file in *.fastq;

do

f=$(basename $file)

megahit -1 "$file"_1.fastq -2 "$file"_2.fastq -o "$file"_assembled.fasta/;

done;

But it didn't work because the file names aren't correct and it makes sense because _1.fastq or _2.fastq are added to the original file names. Unfortunately, I don't know how to proceed differently ...

How could I assemble both paired-end read files at the same time and do this for all files?

Thank you for your help, it will be greatly appreciated!

paired-end Assembly • 2.0k views

ADD COMMENT • link updated 4.3 years ago by GenoMax 147k • written 4.3 years ago by A_heath ▴ 170

0

Entering edit mode

I want to assemble two paired-end read files with each other.

I don't know what exactly you mean by that. If you are looking to merge R1/R2 reads because they overlap (e.g. size of insert is smaller than length of sequencing) then you need to be using a program like bbmerge.sh from BBMap suite or FLASH. If you want to actually assemble the sequences into contigs then megahit would be appropriate.

ADD REPLY • link 4.3 years ago by GenoMax 147k

score 6 · Accepted Answer · 2020-08-04

6

Entering edit mode

4.3 years ago

Assa Yeroslaviz ★ 1.9k

this should work

for file in *_1.fastq;

do

f=$(echo $file | sed -E "s/\_.*//");

megahit -1 "$f"_1.fastq -2 "$f"_2.fastq -o "$file"_assembled.fasta/; 
done