How to extract the minimum value of all columns in R
0
0
Entering edit mode
2.4 years ago
KABILAN ▴ 130

I am working on mass spectrometry proteomics expression data. For statistical analysis of the data, I have to find the top three minimum value of each column in the dataframe like below,

structure(list(Type = c("knn_vsn", "knn_loess", "knn_rlr", "lls_vsn", 
"lls_loess", "lls_rlr", "svd_vsn", "svd_loess", "svd_rlr"), Group1 = c(0.00318368971435714, 
0.00317086486813191, 0.00317086486813191, 0.00312821095645019, 
0.00311632537571597, 0.00313568333628438, 0.00394831935666465, 
0.00393605637633005, 0.00395599132474446), Group2 = c(0.0056588221783197, 
0.00560933517836751, 0.00560933517836751, 0.00550114679857588, 
0.00548316209864631, 0.00550230673346083, 0.00737865310351839, 
0.0073411154394253, 0.00735748595511963), Group3 = c(0.00418838138878096, 
0.00417201215938804, 0.00417201215938804, 0.00398819978362592, 
0.00397093259462351, 0.00398827962107259, 0.00424157479553304, 
0.00422638750183658, 0.00424175886713471), Group4 = c(0.0039811913527127, 
0.00394649435912413, 0.00394649435912413, 0.00397059873107098, 
0.00393840233766712, 0.00396385071387178, 0.0041077267588457, 
0.00407577176849463, 0.00410191492380459)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -9L), groups = structure(list(
    Type = c("knn_loess", "knn_rlr", "knn_vsn", "lls_loess", 
    "lls_rlr", "lls_vsn", "svd_loess", "svd_rlr", "svd_vsn"), 
    .rows = structure(list(2L, 3L, 1L, 5L, 6L, 4L, 8L, 9L, 7L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -9L), .drop = TRUE))

And I need the output like below,\

structure(list(`Type ` = c("lls_loess", "lls_rlr", "lls_vsn"), 
    Group1 = c(0.00311632537571597, 0.00313568333628438, 0.00312821095645019
    ), ` Type` = c("lls_loess", "lls_rlr", "lls_vsn"), Group2 = c(0.00548316209864631, 
    0.00550230673346083, 0.00550114679857588), `  Type` = c("lls_loess", 
    "lls_rlr", "lls_vsn"), Group3 = c(0.00397093259462351, 0.00398827962107259, 
    0.00398819978362592), `Type  ` = c("lls_loess", "lls_rlr", 
    "lls_vsn"), Group4 = c(0.00393840233766712, 0.00396385071387178, 
    0.00397059873107098)), class = "data.frame", row.names = c(NA, 
-3L))

Please suggest some useful R code for this issue. Thank you in advance.

minimum data-frame value R extraction • 1.5k views
ADD COMMENT
1
Entering edit mode

Please take the time to understand the answers given in your previous posts and then adapt it. This all is data.table/tidyverse/baseR basics that can be googled and should be learned by some tutorials as data munging is the base discipline of any analyst. Biostars is generally not a code writing service. It is really in your best interest to learn these things and figure them out yourselves because there are some users who post these types of questions for many years as they never took the time to improve themselves and by this are stuck at a basic level forever. Try to avoid that, please take no offense, it is really a well-intended advise. Google for example ‚tidyverse/dplyr top values per group‘.

ADD REPLY
0
Entering edit mode

Yes, I think you are right. Since I am a beginner, it is little bit tough for me to understand the logics. But I will learn definitely. Thank you @ATpoint

ADD REPLY
1
Entering edit mode

Seems 4th grouping is incorrect in expected result

> df %>% 
+   pivot_longer(-Type) %>% 
+   group_by(name) %>% 
+   arrange(value) %>% 
+   slice(1:3) %>% 
+   ungroup() %>%
+   pivot_wider(names_from = name, values_from = value) %>% 
+   as.data.frame()
       Type      Group1      Group2      Group3      Group4
1 lls_loess 0.003116325 0.005483162 0.003970933 0.003938402
2   lls_vsn 0.003128211 0.005501147 0.003988200          NA
3   lls_rlr 0.003135683 0.005502307 0.003988280          NA
4 knn_loess          NA          NA          NA 0.003946494
5   knn_rlr          NA          NA          NA 0.003946494

Expected output

> df2
      Type       Group1      Type      Group2      Type      Group3    Type        Group4
1 lls_loess 0.003116325 lls_loess 0.005483162 lls_loess 0.003970933 lls_loess 0.003938402
2   lls_rlr 0.003135683   lls_rlr 0.005502307   lls_rlr 0.003988280   lls_rlr 0.003963851
3   lls_vsn 0.003128211   lls_vsn 0.005501147   lls_vsn 0.003988200   lls_vsn 0.003970599
ADD REPLY
0
Entering edit mode

Thank you cpad0112 for your useful. I will once check the results of group4.

ADD REPLY
1
Entering edit mode

One cannot (probably should not) have duplicate column names as it affects downstream data processing. In df2, column names are duplicate for 4 columns. Closest you can have is:

> df %>% 
+   pivot_longer(-Type) %>% 
+   group_by(name) %>% 
+   arrange(value) %>% 
+   slice(1:3) %>%
+   ungroup() %>% 
+   split(.$name) %>% 
+   map(~select(., c(1,3))) %>% 
+   do.call(cbind,.) %>% 
+   rename_with(~ str_remove(., "\\.value"), contains("value"))

  Group1.Type      Group1 Group2.Type      Group2 Group3.Type      Group3 Group4.Type      Group4
1   lls_loess 0.003116325   lls_loess 0.005483162   lls_loess 0.003970933   lls_loess 0.003938402
2     lls_vsn 0.003128211     lls_vsn 0.005501147     lls_vsn 0.003988200   knn_loess 0.003946494
3     lls_rlr 0.003135683     lls_rlr 0.005502307     lls_rlr 0.003988280     knn_rlr 0.003946494
ADD REPLY
2
Entering edit mode

I discourage this type of data representation that OP asks about as it is redundant. The *.Type columns are all the same so either make it a single unique column or use rownames for it. For large data this unnecessarily inflates size and is more difficult to parse later on.

ADD REPLY
0
Entering edit mode

Did we move to SO style commenting?

ADD REPLY
0
Entering edit mode

Sorry cpad0112, I don't know about those commenting. I am a beginner to R.

ADD REPLY

Login before adding your answer.

Traffic: 2656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6