Hi,
This might be another naive question. I have RNA-Seq data from 2 different types of samples (pain vs no-pain), and I want to identify novel lncRNA/eRNA in pain. I also have DHS/ATAC data for that gives me accessible regions in the DNA, some of which would be regulatory.
One way to do this is to use Trinity (using all samples) for de-novo transcript assembly, then use Salmon on this assembled transcriptome to find transcripts that are differentially expressed, and then among those, the transcripts that don't have prior annotation would be 'novel' transcripts. Cufflinks can also be used for this.
Another way would be to just directly run Salmon on the samples, using the hg38 transcriptome, and find differentially expressed transcripts, some of which could be novel.
I want to know if doing this via the first method will lead to more novel transcripts, and are there any merits to the first method over the second, in general.
Thank you!
Hi Kristoffer, thank you for your response. The linked bioconductor page is also very helpful! I had a question about what you said. Here you mention that for de-novo transcript reconstruction, Cufflinks/StringTie are good options. Whereas in the linked page you mention Salmon/Kallisto being good options for this work. So, I am slightly confused. If my goal is to find novel enhancer transcripts/lncRNA, then which of the two options would make more sense?
Salmon/Kallisto are very good at quantifying known transcripts - but you are looking for novel features so you cannot use those. Instead you would need to use Cufflinks/StringTie - they can both find novel features. Make sure to read the manual pages carefully as they may filter very lowly expressed features out using their default cutoffs and you are looking for (very) lowly expressed features.
Hi Kristoffer.vittingseerup, I used StringTie and it came out with a list of genes in this form of identification MSTRG. I gave up of using StringTie for now because I couldn't find any way to convert this format in ensemble id or HGNC, do you have advices for that? Maybe I'm a bit OT in this discussion...
Thank you also for your workflow, I will take a look
When you run Stringtie with the --merge function you just have to add in the annotation GTF (the same you used to do guided predictions with) and it will add the gene names as extra columns (naturally only for the known features - the novel will not have any).
Thank you Kristoffer :)