Question

De novo assembly around 1 short length (<100bp) known sequence.

0

Entering edit mode

9.6 years ago

linzdm1187 • 0

Hello all,

In an effort to avoid RACE-pcr my group is trying to use some sequencing data we have available to discover the sequence of a single gene in a crustacean. We know the sequence of a short 85bp region of this gene. Using the sequencing data we have (~60 million 50bp single end Illumina reads) I performed a standard de novo assembly using Trinity, but as expected (given the single end reads) the assembly was poor, and no contigs matched our gene of interest. I am curious if there is any way to use this known sequence as a partial guide for assembly in attempt to resolve more of the sequence around this region (using Trinity or any other assembler).

Dave

Assembly rna-seq • 2.5k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by linzdm1187 • 0

Ram · Answer 1 · 2015-04-22

2

Entering edit mode

9.6 years ago

thackl ★ 3.0k

Have a look at TASR and Mapsambler. Both are assembly programs that use seed sequences as starting points for assembly.

ADD COMMENT • link 9.6 years ago by thackl ★ 3.0k

2

Entering edit mode

I've spent a lot of time (trying) to do this sort of thing with what's out there, and I have to say, none of the programs are necessarily great. Better than these two - TASR is slowwwww / never had particularly good results with the original mapsembler myself (haven't tried v2) - the best I've found has been PRICE. PRICE loops the entire set of input files for every iterative extension, though, so depending on the amount of data you have, it can also take a good while!

ADD REPLY • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by george.ry ★ 1.2k

Ram · Answer 2 · 2015-04-23

At least in my hands, and starting with poor data, it is a bad idea going to get a transcriptome using trinity for the goal you want.

Try using an old-fashioned assembler such CAP3 that you can use from web servers such as EGASSEMBLER. It will no try to assemble the whole transcriptome, but will try to get overlapping sequences to classify them as contigs or singletons. With some luck, maybe you will be ending with one of such as useful contigs