How to give merged reads as input in spades
1
0
Entering edit mode
5.9 years ago

I have the merged paired end reads and the rest of the separate paired end reads in two different files. How can I give this as input to Spades for assembling?

assembly spades merged de novo • 5.0k views
ADD COMMENT
1
Entering edit mode

Have you checked SPAdes manual? This information is in the input section.

ADD REPLY
1
Entering edit mode

I think you have an option like --merged but I don't remember. As said you should check the manual and you would propably find it because you can literally give any kind of data to spades.

ADD REPLY
0
Entering edit mode

Ok Thank you sir. Will check

ADD REPLY
0
Entering edit mode

So after giving " --merged name_of_merged_file" , can I give my rest of the separate paired reads which is not merged as," -1 read1.fq -2 read2.fq "

ADD REPLY
1
Entering edit mode
5.9 years ago
Joseph Hughes ★ 3.0k

It is unclear whether you have reads that are interlaced or joined. If you have joined the reads together because they overlap, then these new merged reads can be specfied as a single-end reads along with the other paired-end reads:

spades.py -1 read1.fq -2 read2.fq -s merged.fq -o spades_test

However, if you mean that you have a set of reads that are in the interlaced format and a set that is in the paired-end format:

spades.py -1 read1.fq -2 read2.fq --12 merged.fq -o spades_test

You can read Spades definition of interlaced here: http://spades.bioinf.spbau.ru/release3.10.1/manual.html#sec3.2

ADD COMMENT
0
Entering edit mode

Thank you, but still I am confused. Actually, I am working with the tool , BBsplit. I want to split the reads I have according to the reference genome to which it maps. So, after i run bbsplit, if we used paired end reads, the tool gives an output in such a way that, the paired end reads that map to a particular reference genome is put together and the ones which did not map are given as seperate output. For example :

command :bbsplit.sh in1=reads1.fq in2=reads2.fq ref=ecoli.fa,salmonella.fa basename=out_%.fq outu1=clean1.fq outu2=clean2.fq output : out_ecoli.fq, out_salmonelaa.fq, clean1.fq and clean2.fq. , where the out_ecoli and out_salmonella are paired end reads that mapped to reference genome of ecoli and salmonella and clean1 and clean2 are forward and reverse reads which did not map to any reference genome. And bbsplit manual says that BBSplit is a tool that bins reads by mapping to multiple references simultaneously, using BBMap. The reads go to the bin of the reference they map to best. There are also disambiguation options, such that reads that map to multiple references can be binned with all of them, none of them, one of them, or put in a special "ambiguous" file for each of them. Paired reads will always be kept together.

So the out_ecoli or out_salmonella, are they paired end reads which are interlaced or are they joined because they overlap?

ADD REPLY
1
Entering edit mode

The format of the sequences and the identifiers in out_ecoli and out_salmonella should enable you to determine that.

ADD REPLY
1
Entering edit mode

manjumoorthy95 : You obtained interleaved reads in out_ecoli and out_salmonella files because of the way your specified your output in bbsplit.sh command. You can easily de-interleave the reads by doing reformat.sh in=out_ecoli.fq out1=ecoli_R1.fq out2=ecoli_R2.fq.

You could also run your original bbsplit.sh command like this bbsplit.sh in1=reads1.fq in2=reads2.fq ref=ecoli.fa,salmonella.fa basename=out_%_#.fq outu1=clean1.fq outu2=clean2.fq to get R1/R2 reads as separate files.

ADD REPLY

Login before adding your answer.

Traffic: 1898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6