lncRNA differential expression analysis
1
2
Entering edit mode
4.6 years ago
seta ★ 1.9k

Dear all,

I'm busy with an RNA-seq analysis of case and control samples and got some differentially expressed coding genes and long non-coding RNA (lncRNA) by edgeR. I would like to do an integrative lncRNA-mRNA analysis; the library size of cases is small (about 2 million raw counts) compared to controls (about 60 million raw counts), so I filtered the genes with CPM value of less than 5 during the edgeR analysis. Given that the lncRNAs usually have the low expression value, I'm concerned about the CPM threshold as some lncRNA may miss during the analysis. Could you please share your idea about the analysis?

Thanks a lot

RNA-seq differential expression lncRNA • 1.5k views
ADD COMMENT
2
Entering edit mode
4.6 years ago

The fact is indeed that lncRNAs are lowly expressed, apart from notable exceptions (MALAT1, XIST, TSIX, etc) under certain conditions. Are you more worried about the difference in library size? Neither of us can see your data and try out different filters. However, you could start with CPM > 1 as a minimum cut-off, and take it from there.

It is likely, in my opinion, that many known lncRNAs are merely reflective of 'transcriptional noise' —being expressed as a result of the expression of nearby protein coding genes, for example— and, for all intents and purposes, may have no function other than to occupy volume in the nucleus and cytoplasm, where they will be digested. They are still expressed, though, and using a cut-off of 1 will at least ensure that these are included in your analysis, for better or for worse.

If your approach is ultimately about correlation, then the large library size differences may not have as large an effect as you think (because correlation metrics will be independent of it). Again, though, we cannot see the data - I would be checking histograms, box-and-whisker plots, summary statistics, etc.

Kevin

ADD COMMENT
0
Entering edit mode

Many thanks, Kevin for your always help. Actually, I'm not concerned about the large library size differences. My issue is the CPM cutoff for the analysis. As far as I know, the read with the count of less than 10 should be usually removed, which is equivalent to CPM of 0.5 for the library size containing 20 million reads. Now, as I mentioned in the post, the library size of my patient samples is about 2 million reads, so I forced to set the high CPM cutoff (5) to filter the low count read (less than 10). But, here, many lncRNAs may miss from the analysis and is indeed my problem. Could you please let me know if you have any suggestions?

ADD REPLY
0
Entering edit mode

Apart from trying different cut-offs, I have no more suggestions.

ADD REPLY
0
Entering edit mode

Thanks. Sorry, if do you suggest the CPM cutoff of 1 for the library size of 2 million reads? Please kindly let me know what I should look for in the output of different cutoffs?

ADD REPLY
0
Entering edit mode

I still don't know what is your idea for integrating these datasets, which is important to understand; so, I am limited in how I can advise on specifics There is no right or wrong here - you can apply the same cut-off for both, or use a different cut-off. Then proceed with your analysis, with the view that you can always go back and modify certain parameters. Having many low-expressed genes in your dataset will affect things like p-value correction, amount of required RAM, fold-change calculations, PCA, clusterting, etc.

You just have to make an 'executive' decision with your own project, and then move forward with your analysis. Again, you can always later go back to modify things.

ADD REPLY
1
Entering edit mode

My goal is to do an integrative lncRNA-mRNA analysis to find the lncRNAs and their target genes that related to a given disease as well as to understand the corresponding regulatory role of the lncRNAs. Yes, Kevin I usually go ahead with the analysis and go back to do again. However, consulting with other experienced peoples, like you is always valuable for me.

ADD REPLY
0
Entering edit mode

Oh I know, but how you do that integration is important. Anyway, feel free to ask more questions!

ADD REPLY

Login before adding your answer.

Traffic: 2638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6