I'm working with human samples and I'm trying to identify novel lncRNAs from tumor samples of Prostate cancer. I'm using reference based transcriptome assembly with stringtie
. I have seen that stringtie also has De novo
mode. But I'm wondering because stringtie is still using the reference genome sequence to guide the transcript assembly, it's just not using the reference annotation.
Is it really a denovo assembly? In this tutorial De novo transcriptome reconstruction with RNA-Seq Check the paragraph De novo transcript reconstruction
they mention like below:
"Now that we have mapped our reads to the mouse genome with HISAT, we want to determine transcript structures that are represented by the aligned reads. This is called de novo transcriptome reconstruction"
1) De novo is basically without reference genome right? why stringtie says denovo mode and using genome for alignment?
2) I would like to know whether identifying novel lncRNAs using reference based assembly is best or denovo assembly? [working on human samples for which reference annotation and reference genome is present]. Which is better and why?
I don't know stringtie so I can't really answer your question.
As a suggestion, one good approach is to use Trinity to denovo assembly your transcriptome. Later you can align the FASTA file containing transcripts against the full transcriptome annotation that can be found in gencode. All nonmatch transcripts are potentially new lncRNA. These should be further investigated to determine the authenticity of such new transcripts.