Hi friends,
I have downloaded DNA Microarray data from NCBI. Data contains both control samples and affected samples for all genes. I want to perform downstream analysis like clustering, classification. I know that some preprocessing steps like normalization, log2 transformation and differential expressed genes selection are necessary before performing clustering or classification.
But I am unsure about the exact order of such preprocessing steps although I know that normalization is performed before log2 transformation. Please let me know the following things:
1) Whether preprocessing steps normalization and then log2 transformation needs to be done before differential expressed genes selection and differential expressed genes selection needs to be done using modified normalized and log2-ed data?
2) In case of RNASeq data, I learned that differential expression analysis is done using un-normalized and un-logged count data as the statistical model is most powerful when applied to un-normalized counts. Then whether we can also select differentially expressed genes from microarray data without performing normalization and log2 transformation? Please note that I will use SAM or Limma for selecting differentially expressed genes from microarray data.
3) Are there any other preprocessing or quality control steps necessary before clustering? If so please mention their exact order.
Thanks in advance.
See this end-to-end workflow.
Thanks a lot. Now I have understood, there are a lot more preprocessing steps which I have to carry out before we can apply limma for differentially expressed genes. And limma can be applied with the final preprocessed data only. But can you please tell me why it is different from RNASeq? I mean why limma should be applied using final preprocessed data in case of microarray whereas in case of RNASeq, DESeq2 should be applied with raw count data without normalization and log2?
some packages like
gcrma
take care of normalization and log transformation. You can refer here- http://www.bioconductor.org/packages/release/bioc/html/gcrma.htmlThen you can use these value to perform DE analysis using
limma
Thanks a lot for your response and referred package.