Entering edit mode
14 months ago
bioinfo
▴
150
Hello,
I am doing PCA on my samples using R. I have a dataframe where the sample names are the row names and the genes are are the column names. I initially tried to do it using:
pca_res <- prcomp(log(tpm+1), scale. = TRUE)
However, that was giving me the following error:
Error in prcomp.default(log(tpm+ 1), scale. = TRUE) :
cannot rescale a constant/zero column to unit variance
To get around this I do the following:
# Identify constant or zero columns
tpm_transposed_cons <- sapply(tpm, function(x) is.atomic(x) && length(unique(x)) == 1)
# Remove constant or zero columns
tpm_transposed_no <- tpm[, !tpm_transposed_cons]
My code works afterwards and my technical replicates look very good. However, I just wanted to ask if it is correct to do this and if I may be skewing my data by removing those values.
Thank you
There may be genes with no counts, I suggest these genes to be removed before any analysis. You should check your data to understand what is happening here
Any constant data columns, whether the value in them is zero or something else, contain no useful information. Some implementations of dimensionality reduction will automatically ignore these columns, but there should be no harm in removing them manually.