Hi All,
I am using OASES to do transcriptome de novo assembly. I tried multiple kmer values and were able to choose the best one based on N50, larges transcripts and so on. how ever, I am wondering if it makes sense to merge assemblies produced with different kmers.
another question would be dealing with the un-used reads, should be also merge them with multiple transcripts.fa ?
Will be happy if anyone give me a hint. Thanks in advance.
Regards,
lhl
Thanks very much Jeremy. Your suggestion is very helpful. I found that the higher kmers produce very fewer and shorter contigs/transcripts. In my case, i used kmer from 19 to 79. However, which kmer value should i start to merge into the final assembly. should i start from the kmer (27), which yields best assembly, (in terms of N50, total number of bases in contigs, number of contigs longer than 500) and include all kmers larger than this value? I am new to NGS assembly. I would be very happy if you can let me know how can i find erroneous assembly (e.g. transcriptome assembly).
Should i try to improve sequencing depth, length or anything else? Kind Regards
Also, when you said you are 'somewhat leery of assembling assemblies with programs like CAP3', do you mean you prefer other assemblers? Or you are simply cautious about using the merging-multiple-assemblies strategy?
Normally a higher kmer results in smaller assemblies with hopefully a few longer contigs - it's those longer ones you are after. Obviously as your kmer approaches read length this will break down.
CAP3 is a greedy assembler. If you have contigs that did not assemble in a debruijn assembler there was probably a reason (i.e. ambiguity). If you throw them into CAP3 and they assemble you should be cautious, it might be making some risky decisions.
As far as the metrics for judging your assemblies you are really in the best position to compare. I would try to blasting against similar organisms.
I got what you mean, thanks very much Jeremy.
Hi Jeremy,
I know this post is long dated, but I still have some question which would be great to have your suggestion.
I have de novo RNAseq data (Illumina Hiseq) from multiple organs of a single animal. I have tried Velvet/Oases with multiples kmer for data from each organ, which finally yielded (as Oases -merge output) a transcript.fasta file for each organ. Now I would like to make a whole reference transcriptome of this animal, how can I merge transcripts obtained from multiple organs? or should I break down those transcripts.fa to kmer and reassemble them? At the moment, which is the best suited tool for my purpose? Thank you in advance!
Phuong.
This probably deserves its own new biostars question.
ok, I will post a new one
Thank you