Entering edit mode
2.4 years ago
KABILAN
▴
100
I am working on mass spectrometry proteomics expression data. For statistical analysis of the data, I have to find the top three minimum value of each column in the dataframe like below,
structure(list(Type = c("knn_vsn", "knn_loess", "knn_rlr", "lls_vsn",
"lls_loess", "lls_rlr", "svd_vsn", "svd_loess", "svd_rlr"), Group1 = c(0.00318368971435714,
0.00317086486813191, 0.00317086486813191, 0.00312821095645019,
0.00311632537571597, 0.00313568333628438, 0.00394831935666465,
0.00393605637633005, 0.00395599132474446), Group2 = c(0.0056588221783197,
0.00560933517836751, 0.00560933517836751, 0.00550114679857588,
0.00548316209864631, 0.00550230673346083, 0.00737865310351839,
0.0073411154394253, 0.00735748595511963), Group3 = c(0.00418838138878096,
0.00417201215938804, 0.00417201215938804, 0.00398819978362592,
0.00397093259462351, 0.00398827962107259, 0.00424157479553304,
0.00422638750183658, 0.00424175886713471), Group4 = c(0.0039811913527127,
0.00394649435912413, 0.00394649435912413, 0.00397059873107098,
0.00393840233766712, 0.00396385071387178, 0.0041077267588457,
0.00407577176849463, 0.00410191492380459)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -9L), groups = structure(list(
Type = c("knn_loess", "knn_rlr", "knn_vsn", "lls_loess",
"lls_rlr", "lls_vsn", "svd_loess", "svd_rlr", "svd_vsn"),
.rows = structure(list(2L, 3L, 1L, 5L, 6L, 4L, 8L, 9L, 7L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -9L), .drop = TRUE))
And I need the output like below,\
structure(list(`Type ` = c("lls_loess", "lls_rlr", "lls_vsn"),
Group1 = c(0.00311632537571597, 0.00313568333628438, 0.00312821095645019
), ` Type` = c("lls_loess", "lls_rlr", "lls_vsn"), Group2 = c(0.00548316209864631,
0.00550230673346083, 0.00550114679857588), ` Type` = c("lls_loess",
"lls_rlr", "lls_vsn"), Group3 = c(0.00397093259462351, 0.00398827962107259,
0.00398819978362592), `Type ` = c("lls_loess", "lls_rlr",
"lls_vsn"), Group4 = c(0.00393840233766712, 0.00396385071387178,
0.00397059873107098)), class = "data.frame", row.names = c(NA,
-3L))
Please suggest some useful R code for this issue. Thank you in advance.
Please take the time to understand the answers given in your previous posts and then adapt it. This all is data.table/tidyverse/baseR basics that can be googled and should be learned by some tutorials as data munging is the base discipline of any analyst. Biostars is generally not a code writing service. It is really in your best interest to learn these things and figure them out yourselves because there are some users who post these types of questions for many years as they never took the time to improve themselves and by this are stuck at a basic level forever. Try to avoid that, please take no offense, it is really a well-intended advise. Google for example ‚tidyverse/dplyr top values per group‘.
Yes, I think you are right. Since I am a beginner, it is little bit tough for me to understand the logics. But I will learn definitely. Thank you @ATpoint
Seems 4th grouping is incorrect in expected result
Expected output
Thank you cpad0112 for your useful. I will once check the results of group4.
One cannot (probably should not) have duplicate column names as it affects downstream data processing. In df2, column names are duplicate for 4 columns. Closest you can have is:
I discourage this type of data representation that OP asks about as it is redundant. The
*.Type
columns are all the same so either make it a single unique column or use rownames for it. For large data this unnecessarily inflates size and is more difficult to parse later on.Did we move to SO style commenting?
Sorry cpad0112, I don't know about those commenting. I am a beginner to R.