Hi
I found @Kevin Blighe answer for finding Pearson correlation coefficients between a list of genes, and want to learn this solution on my problem. Basically, I have Affymetrix gene level expression matrix (genes in the rows and sample ID on the columns), and I have annotation data of microarray experiment observation where sample ID in the rows and description identification on the columns.
example gene expression data
here is gene level expression matix:
> exprs_mat[1:4, 1:3]
Tarca_001_P1A01 Tarca_003_P1A03 Tarca_004_P1A04
1_at 6.062215 6.125023 5.875502
10_at 3.796484 3.805305 3.450245
100_at 5.849338 6.191562 6.550525
1000_at 3.567779 3.452524 3.316134
and here is annotation data which contain experiment observation:
> head(ano)
SampleID GA Batch Set Train Platform
Tarca_001_P1A01 Tarca_001_P1A01 11.0 1 PRB_HTA 1 HTA20
Tarca_013_P1B01 Tarca_013_P1B01 15.3 1 PRB_HTA 1 HTA20
Tarca_025_P1C01 Tarca_025_P1C01 21.7 1 PRB_HTA 1 HTA20
Tarca_037_P1D01 Tarca_037_P1D01 26.7 1 PRB_HTA 1 HTA20
Tarca_049_P1E01 Tarca_049_P1E01 31.3 1 PRB_HTA 1 HTA20
Tarca_061_P1F01 Tarca_061_P1F01 32.1 1 PRB_HTA 1 HTA20
I intend to see how the genes in each sample are correlated with GA value of corresponding samples in the annotation data. How can I get my expected correlation matrix and filter out the genes by correlation value? any idea to make this happen correctly?
my attemp by using limma:
Here is what I tried by using limma
:
library(limma)
fit <- limma::lmFit(exprs_mat, design = model.matrix( ~ t(ano$GA))
fit <- eBayes(fit)
topTable(fit, coef=2)
but I got a dimension error and I need to have a proper design matrix for this. can anyone point me out any idea to do this?
desired output:
How can I get a sub (reduced dimension) gene-expression matrix of only these highly correlated genes? How to make the above implementation more efficient in R? any way to make this happen? Any thoughts?
desired output format:
I intend to filter out genes in the gene expression matrix expr_mat
by using correlation value (only keep highly correlated genes), expected sub expression matrix should have same data structure as expr_mat
. any idea to make this happen? any thought?
Please describe the output you get and how it's different from the expected output.
Also, your code has
final_df <- as.data.frame()
, which will cause an error no matter what else is going on becauseas.data.frame()
needs at least one parameter - the dataset to convert to adata.frame
.Thanks for your reply. I intend to filter out genes in the gene expression matrix
expr_mat
by using correlation value (only keep highly correlated genes), and expected sub expression matrix should have same data structure asexpr_mat
. Do you have a possible idea to make this happen? Thank youI also found this biostars post as useful but my correlation matrix is not correct. Any thought? Thanks
Take a look at the heatmap(cor(X)) and it should be square, symmetrical. Picking genes is tough because they are all strong on the diagonal. I bet every gene will be correlated to some part of the dataset.