Removing constant/zero columns for PCA?
0
0
Entering edit mode
14 months ago
bioinfo ▴ 150

Hello,

I am doing PCA on my samples using R. I have a dataframe where the sample names are the row names and the genes are are the column names. I initially tried to do it using:

pca_res <- prcomp(log(tpm+1), scale. = TRUE)

However, that was giving me the following error:

Error in prcomp.default(log(tpm+ 1), scale. = TRUE) : 
  cannot rescale a constant/zero column to unit variance

To get around this I do the following:

# Identify constant or zero columns
tpm_transposed_cons <- sapply(tpm, function(x) is.atomic(x) && length(unique(x)) == 1)

# Remove constant or zero columns
tpm_transposed_no <- tpm[, !tpm_transposed_cons]

My code works afterwards and my technical replicates look very good. However, I just wanted to ask if it is correct to do this and if I may be skewing my data by removing those values.

Thank you

R pca • 2.5k views
ADD COMMENT
0
Entering edit mode

There may be genes with no counts, I suggest these genes to be removed before any analysis. You should check your data to understand what is happening here

ADD REPLY
0
Entering edit mode

Any constant data columns, whether the value in them is zero or something else, contain no useful information. Some implementations of dimensionality reduction will automatically ignore these columns, but there should be no harm in removing them manually.

ADD REPLY

Login before adding your answer.

Traffic: 2796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6