Hi!
This is my first time asking a question on this forum. My apologies for any (obvious) mistakes.
Background: Gene expression data in FPKM format 48 hours after stim. Eight samples (4 HD and 4 sick). Columns are samples, rows are genes.
FPKM data was log2 transformed, all values <1 were filtered out and genes with too many NA's too (at least 3 values) .
After this the values are normalized using Z-score normalization, I use the following steps:
SDs <- apply(x,1,function(x){sd(x,na.rm = T)})
means <- rowMeans(x, na.rm = T)
RNA_log2_FPKM_cleaned <- (x - means) / SDs
Peaking at the data via a histogram results in the following:
These peaks are at places -0.707 and 0.707. They were not all the same value before the Zs-score normalization (as they were different genes). Have I done something wrong? Thanks in advance for any help I can get.
I suspect the same " guess you have rows with just two values not being NA"
Thank you for your help and info!
The NA's are a result of filtering all values below 1 (all <1 <- NA). I thought the next step would remove all NA's with less than 3 values, but I will edit it to actually do that.
I will also scale the matrix in the easier method. Thanks again!