Question

How to run Spades For Nextseq data

1

Entering edit mode

9.8 years ago

jeccy.J ▴ 60

Hi guys,

I was having a try to run Nextseq data set with Spades. I have 4 fr reads and 4 re reads of each bacterial strains. To run assembly I wrote simple shell script, but the problem coming to take input by Spades. Can anyone help me out to find my error in this script.

for R1 in *R1*.fastq.gz
    do
        echo $R1
        R2=`echo $R1 | sed 's/_R1_/_R2_/'`
        bname=`echo $R1 | sed 's/_R1_.\+//'`
        echo $R2
    python spades.py --pe1-1 $R1 \--pe1-2 $R2 --pe1-1 $R1 \--pe1-2 $R2 --pe1-1 $R1 \--pe1-2 $R2 --pe1-1 $R1 \--pe1-2 $R2 -o $bname   
done

== Error ==  file /home/jc/bio-tool/SPAdes-3.1.0-Linux/bin/1-31-18019401_S1_L003_R2_001.fastq.gz was specified at least twice

script Assembly • 4.4k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by jeccy.J ▴ 60

0

Entering edit mode

Your are specifying the very same read set 4-times. Please change

python spades.py --pe1-1 $R1 --pe1-2 $R2 --pe1-1 $R1 --pe1-2 $R2 --pe1-1 $R1 --pe1-2 $R2 --pe1-1 $R1 --pe1-2 $R2 -o $bname

to

python spades.py --pe1-1 $R1 --pe1-2 $R2 -o $bname

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by piet ★ 1.9k

0

Entering edit mode

Hi Piet,

It would take first two file as a input. Not all 4 fr reads and 4 rev reads. Is it not? Or else I am missing something?

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by HG ★ 1.2k

0

Entering edit mode

You can run your script in debugging mode if you want to see what it really does. Store your script in a file named myfirstloop.sh and run it with options -v (verbose) and -x (expand commands):

bash -v -x myfirstloop.sh

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by piet ★ 1.9k

Ram · Answer 1 · 2015-02-14

0

Entering edit mode

9.8 years ago

Asaf 10k

I can't help you with this error but I want to raise another issue. In my nextseq reads the re reads have a lot of poly-G sequences (1-10% of the reads). You might want to check it before assembling. I found nothing on this issue online, I will be happy if you could confirm that you ran into it too (fastqc is an easy way to find it out)

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Asaf 10k

0

Entering edit mode

We have found aggressive quality-trimming of NextSeq data (to Q15) to be very useful prior to running Spades (at least, in single-cell mode), as the quality is low and the quality scores are very inaccurate. I have not noticed whether the quality scores correlate with poly-G, but the base compositions of the reads is highly skewed as well, especially toward the tail.

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Brian Bushnell 20k

0

Entering edit mode

I also having similar problem but after assembly with spades I found all poly G mostly on short contig, So hope I can filter all the contig about some cutoff value. Although I also hope its not a better idea.

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by HG ★ 1.2k