Question

Refining My Transcriptome Assembly

1

Entering edit mode

13.2 years ago

Craig Anderson ▴ 10

Hi,

I'm doing some transcriptomics on a non-model (an earthworm) and am having issues with my assembly.

I've got HiSeq RNAseq data from pooled samples (around 20-25 monophyletic) individuals for each of 3 exposures. I've assembled the transcriptome of a single exposure group using Velvet and Oases, but I've got a massive haul, with an N50 of 82,694 >= 1465 bp.

I anticipate that vast amount of variation within my sample will mean that theres an awful lot of very similar sequences in my data- What software is out there to help me achieve a consensus transcriptome?

I really would appreciate any pointers,

Craig

P.S. There is a draft reference genome for this species, but its of a genetically distinct (14% according to mitochondrial COII markers and AFLP) alternative lineage.

Edit: Because I've pooled so many individuals, I'd like to reduce the number of contigs that occur as individual sequences due to SNPs, sequencing errors or whatever.

I'm aware that I need to redo the assembly to get rid of sequences that velvetg has attempted to scaffold with Ns. All other parameters other than kmer length and insert length are at default values.

Hope that helps!

transcriptome rna assembly • 3.5k views

ADD COMMENT • link updated 13.2 years ago by Anna ▴ 140 • written 13.2 years ago by Craig Anderson ▴ 10

0

Entering edit mode

Just for clarification: You have an L50 of 14694? Thats is pretty huge. I am not getting what you are asking. An assembly is a consensus sequence.

ADD REPLY • link 13.2 years ago by Fabian Bull ★ 1.3k

0

Entering edit mode

It is not entirely clear what are you after - are you asking about advice on achieving a better assembly?

ADD REPLY • link 13.2 years ago by Istvan Albert 102k

score 2 · Answer 1 · 2011-10-21

hi Craig,

there are several ways or reducing redundancy.

for example, if you have a draft assembly you can use the reads mapped to individual contigs to reduce the possible reads that velvet/oases uses. You'd be running one velvet-oases for each contig using ONLY the reads mapping to that contig. That would also make Oases run with less memory and much quicker. Anothe tip, avoid pooling samples if you can. That worked very well for me, and I'm also work in worms!

another approach would be to use some software such as Jigsaw

http://www.cbcb.umd.edu/software/jigsaw/

or any other consensus caller - loads of all EST paper would have lists of them.

hope this helps

Anna