Hello.
I'm working on some microarray data, and I have lists with deregulated genes for each treatment. Now i want to run a PCA analysis on these genes to see if they are grouped in respect of treatments.
The thing is that the lists with the expression values that I have are not of the same length. For example (values are random) :
deregulated_treat1 = c( 2.334, 1.3244, 2.2223, 4,5554, 7.5444, 3.3455 )
deregulated_treat2 = c( 1.674, 1.3534, 3.2253, 4,5554, 7.5444, 8.1115, )
deregulated_treat1 = c( 2.334, 1.3244, 2.2223, 4,5554 )
The first step before running the PCA analysis is to create a matrix that will contain all treatments and their values. But some genes are not deregulated in all treatments and I don't know what value should i use at its place in the matrix. Should i use the 0 value or NA's ?
The matrix that I'm thinking to build is like the following
|Gene Name|Treat1|Treat2|Ttreat3|
|Gene1 |2.445 |1.533 |7.344 |
|Gene2 |2.445 |0/NA |1.424 |
|Gene3 |2.445 |2.463 |0/NA |
|Gene4 |2.445 |5.533 |0/NA |
Any ideas ?
I would like to ask when you say expression values, how do you retrieve these expression values for your genes across each treatment. If its microarray then all should have certain intensity values or normalized expression values, so NA should not be there. 0 expression is still a possibility if you do not have any expression for those genes. PCA is a way of reducing the dimension with a new characteristic on linear combination. So probably your PCA is not on the correct values. When you say de-regulated does it refer to differentially expressed genes from microarray, in that case you should be able to obtain normalized expression of those genes across your sample and simply plot them. Otherwise , I see this as a problem , others might give more inputs but to me it does not make sense.
The initial microarray data was expression values and normalized them with quantile method. Then i run on them a dunnet's test to get the most significant deregulated (differentially expressed) genes. Let's say that for each treatment i got the following DE genes
The above values are the indexes of the genes (rows) from the initial dataset. The next step is to match these indexes with the initial normalized expression values
and finally construct the corresponding matrix
And I want to use these data for PCA.