Hi friends,
I am working with RNASeq data and completely new in it. I have already performed differential expression analysis using DESeq2 and identified differentially expressed genes.
Now with the identified differentially expressed genes, I want to perform clustering to identify which genes works in groups for their biological activities.
Please suggest me
1) Which normalization technique is the most suitable right now for RNASeq data along with links/code for performing that normalisation technique.
2) Is log2 transformation is to be accompanied with the normalisation as it was the case in Microarray analysis?
3) In DESeq2 tutorials, there is vst and rlog transformation, is it necessary to perform any one of those along with other normalization / preprocessing steps?
3) Please provide me link for tutorials or code for performing the most suitable normalisation technique and any other necessary steps (if any) before I can apply any clustering techniques like hierarchical clustering with those preprocessed gene expressions of the differentially expressed genes identified using DESeq2.
4) Whether there is any difference in Differential Expression analysis techniques and Normalization procedure for single cell RNASeq (scRNASeq) data and RNASeq data?
5) How do I ensure whether a NCBI geo series data is just RNASeq or scRNASeq? I mean is there any distinct text written in NCBI all accession series home page that it's RNASeq or scRNASeq?
Thanks in advance.
+1 from me. An essential step is the inter-library normalization done via the
normalized=T
. You can read more about why that is needed here.Thanks a lot friend.
Thanks for your response. I have read vst and rlog. Actually, I want to know whether any other better normalisation techniques available than using vst or rlog? Probably vst and rlog techniques were proposed quite a few years back. I want to know whether any recent better technique available or not.
They will do just fine and are established. They offer normalization for depth and library composition, stabilize varianze and rlog even does shrinkage to limit the variability of lowly-expressed genes. Do not think too much ;-) (really, this is not meant to be offensive) but use these established techniques which are recommended by the
DESeq2
author(s) who are among the top guys when it comes to expertise in NGS data analysis and proceed with your analysis.Thanks a lot... One more question, whether there is any difference in Differential Expression analysis techniques and Normalization procedure for single cell RNASeq (scRNASeq) data and RNASeq data? And how do I ensure whether a NCBI geo series data is just RNASeq or scRNASeq? I mean is there any distinct text written in NCBI all accession series home page that it's RNASeq or scRNASeq?
I am editing the set of questions above also for better visibility of questions.
There is definitely a difference because scRNA-seq data have in general lower coverage and are zero-inflated. There is extensive literature on this out there comparing different techniques for scRNA-seq. maybe the ZINBWAVE paper is a good start. If it is scRNA-seq or not you can see when looking at the methods description.
Thanks a lot again for your extensive help. Sorry for asking silly question again as I am new to this area. Whether in NCBI for all accession series home page, is there any distinct text written to mention it's RNASeq or scRNASeq?
Don't worry. I keep asking the folks at our Biostars slack biostar.slack.com: Chat for the biostars community questions all the time. This is the way to improve yourself :)