How to run Spades For Nextseq data
1
1
Entering edit mode
9.8 years ago
jeccy.J ▴ 60

Hi guys,

I was having a try to run Nextseq data set with Spades. I have 4 fr reads and 4 re reads of each bacterial strains. To run assembly I wrote simple shell script, but the problem coming to take input by Spades. Can anyone help me out to find my error in this script.

for R1 in *R1*.fastq.gz
    do
        echo $R1
        R2=`echo $R1 | sed 's/_R1_/_R2_/'`
        bname=`echo $R1 | sed 's/_R1_.\+//'`
        echo $R2
    python spades.py --pe1-1 $R1 \--pe1-2 $R2 --pe1-1 $R1 \--pe1-2 $R2 --pe1-1 $R1 \--pe1-2 $R2 --pe1-1 $R1 \--pe1-2 $R2 -o $bname   
done

== Error ==  file /home/jc/bio-tool/SPAdes-3.1.0-Linux/bin/1-31-18019401_S1_L003_R2_001.fastq.gz was specified at least twice
script Assembly • 4.4k views
ADD COMMENT
0
Entering edit mode

Your are specifying the very same read set 4-times. Please change

python spades.py --pe1-1 $R1 --pe1-2 $R2 --pe1-1 $R1 --pe1-2 $R2 --pe1-1 $R1 --pe1-2 $R2 --pe1-1 $R1 --pe1-2 $R2 -o $bname

to

python spades.py --pe1-1 $R1 --pe1-2 $R2 -o $bname
ADD REPLY
0
Entering edit mode

Hi Piet,

It would take first two file as a input. Not all 4 fr reads and 4 rev reads. Is it not? Or else I am missing something?

ADD REPLY
0
Entering edit mode

You can run your script in debugging mode if you want to see what it really does. Store your script in a file named myfirstloop.sh and run it with options -v (verbose) and -x (expand commands):

bash -v -x myfirstloop.sh
ADD REPLY
0
Entering edit mode
9.8 years ago
Asaf 10k

I can't help you with this error but I want to raise another issue. In my nextseq reads the re reads have a lot of poly-G sequences (1-10% of the reads). You might want to check it before assembling. I found nothing on this issue online, I will be happy if you could confirm that you ran into it too (fastqc is an easy way to find it out)

ADD COMMENT
0
Entering edit mode

We have found aggressive quality-trimming of NextSeq data (to Q15) to be very useful prior to running Spades (at least, in single-cell mode), as the quality is low and the quality scores are very inaccurate. I have not noticed whether the quality scores correlate with poly-G, but the base compositions of the reads is highly skewed as well, especially toward the tail.

ADD REPLY
0
Entering edit mode

I also having similar problem but after assembly with spades I found all poly G mostly on short contig, So hope I can filter all the contig about some cutoff value. Although I also hope its not a better idea.

ADD REPLY

Login before adding your answer.

Traffic: 1612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6