I am interested in finding non-coding RNA (lncRNA, eRNA) that are being differentially expressed in the disease case. For genes, doing this is pretty easy with DESeq2. But a collaborator told me that DESeq2 couldn't be used right away for the non-coding transcripts.
Is this true? What are some of the things that I should keep in mind while analyzing non-coding transcripts using DESeq2? He had mentioned that since the amount of ncRNA varies from sample to sample, special care has to be taken to normalize for that. The samples were depleted for ribosomal RNA, but he said that there would still be a lot of rRNA in the samples, and this amount differs from one sample to another.
The data are from total-RNAseq.
Did you check if that was actually the case?
Even if this is the case, once you remove the rRNA you should be able to perform normalization as usual. I suggest you use MA-plots to explore if the bulk of genes after normalization is centered around y=0 to go in line with the underlying assumptions that the DESeq2 normalization has which is that the median ratio captures the size relationship (quote from here).
Thanks for asking. I haven't done it yet. I think one brute force method would be to look at the Human GTF and based on the annotation, make a list of RNA genes, and then find their counts using featurecounts. Is there a simpler way?
That's probably the simplest, but perhaps not the best. See earlier discussions: