Entering edit mode
11 weeks ago
Eduardo Oñate
•
0
Hello everyone,
I am currently working on RNA-Seq data analysis and encountering an error while attempting to perform Principal Component Analysis (PCA) in R.
The process that I follow is the next:
- I have loaded my raw count data from a CSV file and defined batch information from my metadata.
- I am using the ComBat-Seq method to remove batch effects.
- I attempt to conduct PCA on my raw count data using the prcomp() function.
The error message when I run the PCA code, I receive is:
Error in prcomp.default(t(Raw_countData), scale. = TRUE): cannot rescale a constant/zero column to unit variance
Traceback:
1. prcomp(t(Raw_countData), scale. = TRUE)
2. prcomp.default(t(Raw_countData), scale. = TRUE)
3. stop("cannot rescale a constant/zero column to unit variance")
Could someone please help me understand how to identify and remove columns with zero variance from my dataset? Any tips or code snippets would be greatly appreciated!
Thank you
This is the code that I use to make the batch effect elimination and the plot PCA:
# Eliminación del BATCH EFFECT usando ComBat-Seq
# Definir el vector de batch
batch <- samples$bioproject # Reemplazar "bioproject" con el nombre de la columna de batch
# Definir el vector de grupos (opcional)
group <- samples$condition # Reemplazar "condition" si la columna tiene otro nombre
# Aplicar ComBat-Seq para eliminar el efecto de lote
combat_Raw_count <- ComBat_seq(counts = Raw_countData,
batch = batch,
group = group) # Si no deseas ajustar por grupo, se coloca NULL
# Preparar datos para PCA
pca_before <- prcomp(t(Raw_countData), scale. = TRUE)
pca_before_df <- data.frame(PC1 = pca_before$x[, 1],
PC2 = pca_before$x[, 2],
Batch = as.factor(batch))
You can use something like
which(colSums(RawcountData) == 0)
to identify which columns are entirely 0 counts, and then you can remove them.EDIT: If it is still a problem after, you might have to identify which columns are all identical, but in my experience this is usually caused by all 0s.
rowSums, not colSums. It complains about columns because one transposes read count data before feeding into PCA, since it by default works column-wise, not rowwise as one needs.
thank you very much for your response!! I'll try it