Hi all, I have a basic question regarding correlation calculation for gene micro array data. I have micro array data with gene names in rows and columns containing gene expression value. Now there are 2000 genes and 8 samples with 4 samples for healthy data and 4 samples for disease data. For correlation calculation among two genes should I have to consider only disease related 4 samples or should I consider all 8 samples at once and calculate correlation among genes. I want to use Matlab for correlation calculation but no idea which sample should I pick for this purpose. Also If there are more than two categories like healthy data 1, healthy data 2, disease data 1, disease data 1, then how should I continue with it. Thanks.
It depends on the question you want to address with this. Pearson's correlation tells you how strong the linear relationship is between the expression levels of two genes. In which context are you interested in this ? Do you want to know if two genes are correlated in general or in healthy people only or in disease 1 only ? The computation of any quantity should be motivated by a biological question. Find the question you want to address then compute the relevant answer.
I want to make a network based on correlation among genes. If i want to find disease module in this network then should I consider only diseased samples. And if I want to know that how two genes are correlated in general then I will consider both diseased and healthy sample?
Yes, if you only care about correlation in general, you take all data.