How to determine low and high variable row and column in a table?
0
0
Entering edit mode
5.2 years ago
star ▴ 350

I have a big table, its rows are genomic coordinates and columns are the genomic features (like below). I would like to separate rows and columns based on the variability, I have tried to use some basic statistics like below codes, but I like to know is it the right way or is there an alternative (statistical) way that would be more accurate?

DF:

          Feature_A     Feature_B    Feature_C    Feature_D

cord_1         0.9              1           0.8           1  
cord_2         0.6              0.1         0.9         0.5
cord_3           0              0             0           0
cord_4         0.1              0             0         0.2

codes:

DF$skew<-rowSkewness(DF)
DF$var <-rowVars(DF)
DF$sd <-rowSds(DF)
DF$IQR <- rowIQRs(DF))
DF$mean <- rowMeans(DF)
DF$coef.var <- DF$sd /DF$mean

I would like to consider cord_2 (as more variable) and ignore cord_1,3 and 4 in my output, so based on that, which statistic element is more better?

biostatistics methematics basic_statistics • 1.0k views
ADD COMMENT
0
Entering edit mode

Use IQR!

ADD REPLY

Login before adding your answer.

Traffic: 2396 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6