Question

Does It Make Sense To Merge Oases Multiple-Kmer Assemblies

2

Entering edit mode

12.7 years ago

Lhl ▴ 760

Hi All,

I am using OASES to do transcriptome de novo assembly. I tried multiple kmer values and were able to choose the best one based on N50, larges transcripts and so on. how ever, I am wondering if it makes sense to merge assemblies produced with different kmers.

another question would be dealing with the un-used reads, should be also merge them with multiple transcripts.fa ?

Will be happy if anyone give me a hint. Thanks in advance.

Regards,

lhl

assembly • 6.2k views

ADD COMMENT • link updated 12.7 years ago by Jeremy Leipzig 22k • written 12.7 years ago by Lhl ▴ 760

score 6 · Answer 1 · 2012-03-01

6

Entering edit mode

12.7 years ago

Jeremy Leipzig 22k

Oases has its own merge function now.

There is certainly no risk in merging with assemblies generated with a higher kmer, since they are more conservative. Assemblies generated with lower kmers may introduce false contigs. You definitely don't want to actually extend those further, which is why I am somewhat leery of assembling assemblies with programs like CAP3.

ADD COMMENT • link 12.7 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Thanks very much Jeremy. Your suggestion is very helpful. I found that the higher kmers produce very fewer and shorter contigs/transcripts. In my case, i used kmer from 19 to 79. However, which kmer value should i start to merge into the final assembly. should i start from the kmer (27), which yields best assembly, (in terms of N50, total number of bases in contigs, number of contigs longer than 500) and include all kmers larger than this value? I am new to NGS assembly. I would be very happy if you can let me know how can i find erroneous assembly (e.g. transcriptome assembly).

ADD REPLY • link 12.7 years ago by Lhl ▴ 760

0

Entering edit mode

Should i try to improve sequencing depth, length or anything else? Kind Regards

ADD REPLY • link 12.7 years ago by Lhl ▴ 760

0

Entering edit mode

Also, when you said you are 'somewhat leery of assembling assemblies with programs like CAP3', do you mean you prefer other assemblers? Or you are simply cautious about using the merging-multiple-assemblies strategy?

ADD REPLY • link 12.7 years ago by Lhl ▴ 760

0

Entering edit mode

Normally a higher kmer results in smaller assemblies with hopefully a few longer contigs - it's those longer ones you are after. Obviously as your kmer approaches read length this will break down.

CAP3 is a greedy assembler. If you have contigs that did not assemble in a debruijn assembler there was probably a reason (i.e. ambiguity). If you throw them into CAP3 and they assemble you should be cautious, it might be making some risky decisions.

As far as the metrics for judging your assemblies you are really in the best position to compare. I would try to blasting against similar organisms.

ADD REPLY • link 12.7 years ago by Jeremy Leipzig 22k

0

Entering edit mode

I got what you mean, thanks very much Jeremy.

ADD REPLY • link 12.7 years ago by Lhl ▴ 760

0

Entering edit mode

Hi Jeremy,

I know this post is long dated, but I still have some question which would be great to have your suggestion.

I have de novo RNAseq data (Illumina Hiseq) from multiple organs of a single animal. I have tried Velvet/Oases with multiples kmer for data from each organ, which finally yielded (as Oases -merge output) a transcript.fasta file for each organ. Now I would like to make a whole reference transcriptome of this animal, how can I merge transcripts obtained from multiple organs? or should I break down those transcripts.fa to kmer and reassemble them? At the moment, which is the best suited tool for my purpose? Thank you in advance!

Phuong.

ADD REPLY • link 9.0 years ago by pbigbig ▴ 250

0

Entering edit mode

This probably deserves its own new biostars question.

ADD REPLY • link 9.0 years ago by Jeremy Leipzig 22k

0

Entering edit mode

ok, I will post a new one

Thank you

ADD REPLY • link 9.0 years ago by pbigbig ▴ 250

score 3 · Answer 2 · 2012-03-01

3

Entering edit mode

12.7 years ago

Erick Cardenas ▴ 30

There are programs that indeed merge assemblies done with different k values. One I know is minimus from the Amos pacakge. See a tutorial here

Since the optimal kmer value is a function of the coverage. I assume that genes with different transcriptions levels will have different optimal k. I have not worked with RNA seq myself but have heard that some kind of normalization is recommended to improve the assembly since most of the graph assembler will use the median kmer coverage to select which paths are erroneous. So they will assume that reads with low-frequency coverage are bad even though they could be just genes with low expression.

ADD COMMENT • link 12.7 years ago by Erick Cardenas ▴ 30

0

Entering edit mode

thanks Eric. Most of my transcriptome data was normalized!! a small proportion is from RNAseq sequencing. Thanks for your suggestion. At this moment i am trying using CAP3 to further hybridize transcripts produced by OASES. But i will also give minimus a go.

ADD REPLY • link 12.7 years ago by Lhl ▴ 760