Variable Length Reads In De Novo Assembly
4
4
Entering edit mode
14.5 years ago
Pauln ▴ 60

Hi Are there any de novo assemblers that will allow for variable length reads in one input file?

I have rna-seq transcriptome data and want remove adaptors (or parts of them) found in each read. The only way I can think of doing this is to remove the adaptors and write them back to a file. The problem is that there will be a range of read lengths in this file. I was hoping to avoid having to split this file into seperate files each containing same length reads. Hopefully this makes sense. Apologies for not explaining more clearly.
assembly • 4.6k views
ADD COMMENT
0
Entering edit mode

do you mean variable-trimmed reads or reads from different sequencers (sanger, illumina, 454)

ADD REPLY
0
Entering edit mode

do you mean variable quality-trimmed reads or reads from different sequencers (sanger, illumina, 454)?

ADD REPLY
0
Entering edit mode

Hi @paulN For this question, as well as for any next ones, it would be good to detail things a bit more. What is your starting material? What are you trying to accomplish? What are some limitations, assumptions? You will thus get much better answers and they will be more usefull to others too. Cheers!

ADD REPLY
5
Entering edit mode
14.5 years ago
Dstan ▴ 160

We used the Newbler assembler (454/Roche) to assemble transcriptome reads from several different platforms (454, Sanger, and Illumina). Reads from these platforms are of different lengths.

Is it a requirement that the reads be in a single input file? The Newbler assembler will accept multiple input files.

ADD COMMENT
2
Entering edit mode
14.5 years ago
User 59 13k

Velvet also accepts variable length reads - but they need to be tagged as with the read type (long/short) at assemble time (and thus in different files). Why so keen to have the reads in the same file?

ADD COMMENT
0
Entering edit mode

I was talking to Paul the other day. He has some Illumina reads that look like they still have adapter sequence in them. I think he wants to trim out the adapter sequences from the fastq file which would leave some reads shorter than others. This is why reads of different length are in the same file (at the moment)

ADD REPLY
2
Entering edit mode
14.5 years ago

mira3 will also permit the assembly of different length fragments. Depending on what you need exactly, they may have to be in different files. (ex: 454 vs sanger)

ADD COMMENT
0
Entering edit mode
13.3 years ago
Ryan Thompson ★ 3.6k

You may have to pool your sequences into separate files based on length. Let's say you started with 100-bp reads, and you trimmed variable overlaps of an adapter sequence off of them. You could put all the full-length ones in one file, all the 90-99bp ones in another file, 80-89bp ones in another, and so on down the the smallest you can use (30bp or so?). Then you can use an assembler that accepts different sizes in different files.

ADD COMMENT

Login before adding your answer.

Traffic: 2151 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6