I am working on a copy-number-variation count data, which is not continuous data. For example:
a b c
reg1 1 2 1
reg2 1 1 2
reg3 3 3 1
Does any one know if I can apply traditional PCA on this count table? I think it is possible as the value is not categorical or ordinal, but I am not sure if PCA is allowed on count data
Depends on how many CNV you have in your matrix and your objectives I would say. If you have too few CNV no need to reduce your dimensions more. I would expect the distribution to be more or less similar to gene expression, so you could pick your most variable CNV and run a PCA on them.
I would do like it is done in single cell, where your reg1,2,3 are cells and a,b,c are genes. You don't need to log transform your counts but you can selecting a number of variable CNVs (a,b,c...) by plotting the standard deviation over the mean. Then, you scale your new matrix and you can run a PCA.
Typically a PCA is calculated on a matrix of correlation coefficients so the robustness of a PCA depends on certain assumptions that come with the methodology used - see page 55 on the list of assumptions underlying PCA. Given your data is not continuous and takes discrete states, you'll have to accept that the resulting PCA will not be robust, although this is partially mitigated by having larger datasets with more observations.
That said, this really depends on what you're trying to do with the data? Are these raw counts or have they been previously normalised/transformed in some way? What are the aims of your analysis?
Depends on how many CNV you have in your matrix and your objectives I would say. If you have too few CNV no need to reduce your dimensions more. I would expect the distribution to be more or less similar to gene expression, so you could pick your most variable CNV and run a PCA on them.
the CNV range from 1 - 5 (not log / normalization), for a matrix of 20,000 x 8
I would do like it is done in single cell, where your
reg1,2,3
are cells anda,b,c
are genes. You don't need to log transform your counts but you can selecting a number of variable CNVs (a,b,c...) by plotting the standard deviation over the mean. Then, you scale your new matrix and you can run a PCA.thank you!