I am working on some RNA-Seq data from Cynomolgus monkey. I was originally planning to do de novo transcript assembly, but then I realized that the Cyno genome has recently been released, so I can do reference-guided transcript assembly instead. However, I am wondering if there is any compelling reason to do de novo transcriptome assembly as well as or instead of reference-guided.
(By the way, my data is Illumina 2x100 with a 250 bp insert size, in case that makes a difference.)
In the paper for Trinity, which appears to be the current best-of-breed de novo transcriptome assembler, they compare their results to the two best-known reference-guided assemblers, Cufflinks and Scripture. In mouse, ref-guided was better (recovered more full-length genes + isoforms). In pombe, it was worse.
I don't remember why ref-guided was worse for pombe, or if it was even discussed, but I think it relates to the high density / low structural complexity of pombe genes -- Cufflinks and Scripture were designed with vertebrate transcriptomes in mind.
So if you're working in vertebrate, ref-guided is probably the best option.
No, I wouldn't say there is. Of course, it depends on what you mean by "having a reference". It is still necessary to try to de novo assemble highly variable regions such as the HLA region in the human genome, or poorly covered regions. For your case, I would prefer a reference guided assembly.
Considering the quality of your reference assembly, it may be worth to do both: ref-based alignment of the RNA-seq reads and at the same time de novo assembly. The de novo assembly will pick up a number of transcripts that cannot be reliably mapped to the reference due to gaps or miss-assembled regions, and complement the transcripts found on the ref-based set.
I guess none of the monkey genomes achieved the level of completion comparable to the human genome. But even in the best case scenario: human RNA-Seq + human genome there was a recent article showing that you can get novel transcripts. I will provide link later on.
Also you may get stuff which is not in the genome, like viruses infecting your sample.
So the answer would be: do both (guided and RNA-Seq assembly), probably always, maybe except some tiny, multiple times sequenced genomes.
If you are going to do a reference guided transcriptome assembly, you have to keep in mind that all the assembly errors of the reference genome will be reflected in your results. So, you should view the assembly statistics of this genome (like N50, number of scaffolds, etc) before take a decision. But otherwise, if you have well assembled genomes like hg19 or mm9, the guided way is the best option.
Hey there, Ariel, long time no chat :)