Question

Can I use DESeq2 for non-coding RNA?

0

Entering edit mode

5.2 years ago

c_u ▴ 530

I am interested in finding non-coding RNA (lncRNA, eRNA) that are being differentially expressed in the disease case. For genes, doing this is pretty easy with DESeq2. But a collaborator told me that DESeq2 couldn't be used right away for the non-coding transcripts.

Is this true? What are some of the things that I should keep in mind while analyzing non-coding transcripts using DESeq2? He had mentioned that since the amount of ncRNA varies from sample to sample, special care has to be taken to normalize for that. The samples were depleted for ribosomal RNA, but he said that there would still be a lot of rRNA in the samples, and this amount differs from one sample to another.

The data are from total-RNAseq.

RNA-Seq DESeq2 noncoding • 2.5k views

ADD COMMENT • link updated 5.2 years ago by colin.kern ★ 1.1k • written 5.2 years ago by c_u ▴ 530

2

Entering edit mode

he said that there would still be a lot of rRNA in the samples, and this amount differs from one sample to another

Did you check if that was actually the case?

ADD REPLY • link 5.2 years ago by igor 13k

1

Entering edit mode

Even if this is the case, once you remove the rRNA you should be able to perform normalization as usual. I suggest you use MA-plots to explore if the bulk of genes after normalization is centered around y=0 to go in line with the underlying assumptions that the DESeq2 normalization has which is that the median ratio captures the size relationship (quote from here).

ADD REPLY • link 5.2 years ago by ATpoint 87k

0

Entering edit mode

Thanks for asking. I haven't done it yet. I think one brute force method would be to look at the Human GTF and based on the annotation, make a list of RNA genes, and then find their counts using featurecounts. Is there a simpler way?

ADD REPLY • link 5.2 years ago by c_u ▴ 530

1

Entering edit mode

That's probably the simplest, but perhaps not the best. See earlier discussions:

ADD REPLY • link 5.2 years ago by igor 13k

1

Entering edit mode

5.2 years ago

colin.kern ★ 1.1k

He had mentioned that since the amount of ncRNA varies from sample to sample, special care has to be taken to normalize for that. The samples were depleted for ribosomal RNA, but he said that there would still be a lot of rRNA in the samples, and this amount differs from one sample to another.

This is a major concern when using TPM/FPKM for expression values which are used by tools like Cufflinks and StringTie. However, DESeq2 and edgeR use normalization methods that are intentionally designed to handle this situation. There is some literature that shows edgeR's normalization method may be better than DESeq2's, and I've even seen papers that have normalized their counts with edgeR, exported them, and then identified the DEGs with DESeq2. I don't think there is any problem with using DESeq2's normalization method, though.

ADD COMMENT • link 5.2 years ago by colin.kern ★ 1.1k

1

Entering edit mode

There is some literature that shows edgeR's normalization method may be better than DESeq2's,

Can you link references?

ADD REPLY • link 5.2 years ago by ATpoint 87k

score 3 · Accepted Answer · 2020-02-21

Apart from the fact that you should expect a much higher level of zero-inflation on the raw counts, I don't think that you should worry about anything else initially. So, you will notice many more genes being filtered out based on low count - that's for sure.

The ncRNA profile can differ from sample to sample, and so can the protein coding profile differ.

Kevin