Entering edit mode
15 months ago
Raheleh
▴
260
Hi all,
I'm working with cancer RNAseq data, and I've recently applied Singular Value Decomposition (SVD) to this dataset. I'm interested in extracting the genes that contribute to the top 100 rank space obtained through SVD.
Could someone please guide me on how to identify these genes or point me in the right direction to achieve this?
Is it correct the way that I extracted the 100 rank space?
Any help or insights would be greatly appreciated.
My data:
df.h[1:6,1:6]
TCGA.F4.6461 TCGA.A6.6653 TCGA.A6.A56B TCGA.D5.6532 TCGA.AD.6963 TCGA.A6.6138
A1CF -0.3227907 0.06999482 -2.032670077 0.77075479 0.9943631 0.5945894
A2M 0.3901582 -0.66722812 0.125580720 -1.13952071 -0.7306534 1.1062881
A4GALT 1.2823830 -0.14344452 -0.013006693 -1.64378193 -0.1376719 0.6570290
A4GNT 1.5839032 0.25142095 1.419461278 -0.33856936 -0.3385694 -0.3385694
AAAS -0.1805468 0.32017313 -0.004897326 -0.05855282 0.2155894 -0.3062923
AACS 1.1443031 -0.19237747 -1.688284404 0.33823336 0.4363605 -0.2813214
This is my script:
data_matrix <- t(df.h)
svd_result <- svd(data_matrix)
# keep the top 100 singular values
k <- 100
U_100 <- svd_result$u[, 1:k]
S_100 <- diag(svd_result$d[1:k])
V_100 <- svd_result$v[, 1:k]
# Reconstruct the 100-rank approximation of the data
data_100_rank <- U_100 %*% S_100 %*% t(V_100)
Many thanks!