Hi,
I have a question regarding the multi column data sorting on long dataframe in R. I have (7938 rows x 7 columns
) in dataframe. First I would like to sort the column "sub_aggregate" and then column "diseases". Once both these columns are sorted I would like to sort the column "Aggregate" which has 27 unique factors repeated for each observation (27 Aggregates x 42 sub_aggregates x 7 diseases = 7938 observations
)and keep the order of the "sub_aggregate" and "diseases" column intact. However, after sorting based on "Aggregate" column, I obtain only (27 rows x 7 columns) instead of (7938 rows x 7 columns). Please assist me with this.
Hers are the steps that I am trying to sort the dataframe:
df_app_MT is the dataframe
df_app_MT <- df_app_MT[order(df_app_MT$sub_aggregate),]
df_app_MT <- df_app_MT[order(df_app_MT$diseases),]
Now, I would like to sort the column "Aggregate" which has 27 unique factors repeated for each observation and hence, I created this as a character list "target" (see below)
target <- c("A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A15", "A16", "A17", "A18", "A24", "A25", "A26", "A27", "A28", "A29", "A30", "A31", "A32", "A33", "A34", "A35", "A36", "A37", "A38")
df_app_MT <- df_app_MT[match(target, df_app_MT$Aggregate),]
After running this code, I obtain only (27 rows* 7 columns), however, I need complete dataframe (i.e 7938 rows x 7 columns
)
Aggregate has 27 factors
df_app_MT$Aggregate: factor "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A15", "A16", "A17", "A18", "A24", "A25", "A26", "A27", "A28", "A29", "A30", "A31", "A32", "A33", "A34", "A35", "A36", "A37", "A38"
diseases has 7 character
df_app_MT$diseases: chr "Group1" "Group2" "Group3" "Group4a" "Group5a" "Group5b" "Leishmaniasis"
sub_aggregate has 42 integers from (1 to 42 sequentially)
df_app_MT$sub_aggregate: int 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
The dataframe subset is provided below along with the input data and expected data:
Input data
dput(head(df_app_MT))
structure(list(Aggregate = structure(c(11L, 11L, 11L, 11L, 11L,
11L), .Label = c("A1", "A15", "A16", "A17", "A18", "A2", "A24",
"A25", "A26", "A27", "A28", "A29", "A3", "A30", "A31", "A32",
"A33", "A34", "A35", "A36", "A37", "A38", "A4", "A5", "A6", "A7",
"A8"), class = "factor"), sub_aggregate = c(1L, 1L, 1L, 1L, 1L,
1L), `%Response` = c(95.2380952380952, 76.1904761904762, 0, 0,
0, 0), diseases = c("Group1", "Group2", "Group3", "Group4a",
"Group5a", "Group5b"), location = c("A28.1", "A28.1", "A28.1",
"A28.1", "A28.1", "A28.1"), Function = c("Interferon", "Interferon",
"Interferon", "Interferon", "Interferon", "Interferon"), Module = c("M10.1",
"M10.1", "M10.1", "M10.1", "M10.1", "M10.1")), row.names = c(NA,
6L), class = c("tbl_df", "tbl", "data.frame"))
dput(tail(df_app_MT))
structure(list(Aggregate = structure(c(27L, 27L, 27L, 27L, 27L,
27L), .Label = c("A1", "A15", "A16", "A17", "A18", "A2", "A24",
"A25", "A26", "A27", "A28", "A29", "A3", "A30", "A31", "A32",
"A33", "A34", "A35", "A36", "A37", "A38", "A4", "A5", "A6", "A7",
"A8"), class = "factor"), sub_aggregate = c(9L, 9L, 9L, 9L, 9L,
9L), `%Response` = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), diseases = c("Group2", "Group3", "Group4a",
"Group5a", "Group5b", "Leishmaniasis"), location = c("A8.9",
"A8.9", "A8.9", "A8.9", "A8.9", "A8.9"), Function = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
), Module = c(NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_)), row.names = 7933:7938, class = c("tbl_df",
"tbl", "data.frame"))
Expected data
diseases column = Arranged as Group1 to be on top, and Leishmaniasis on bottom
sub_aggregate = Arranged as 1 to 42 for all groups in disease column sequentially (for instance, Group 1: 1 to 42, Group 2: 1 to 42, ......, Leishmaniasis: 1 to 42)
dput(head(df_app))
structure(list(Aggregate = structure(1:6, .Label = c("A1", "A2",
"A3", "A4", "A5", "A6", "A7", "A8", "A15", "A16", "A17", "A18",
"A24", "A25", "A26", "A27", "A28", "A29", "A30", "A31", "A32",
"A33", "A34", "A35", "A36", "A37", "A38"), class = "factor"),
sub_aggregate = c(1L, 1L, 1L, 1L, 1L, 1L), `%Response` = c(55.1724137931034,
53.8461538461538, -15.7894736842105, -10, -33.3333333333333,
16.6666666666667), diseases = c("Group1", "Group1", "Group1",
"Group1", "Group1", "Group1"), location = c("A1.1", "A2.1",
"A3.1", "A4.1", "A5.1", "A6.1"), Function = c("B cells",
"Cytotoxic lymphocytes", "Protein modification", "Cell death",
"B cells", "Cell cycle"), Module = c("M13.27", "M9.1", "M15.44",
"M15.49", "M16.57", "M14.54")), row.names = c("A1.1", "A2.1",
"A3.1", "A4.1", "A5.1", "A6.1"), class = "data.frame")
dput(tail(df_app))
structure(list(Aggregate = structure(22:27, .Label = c("A1",
"A2", "A3", "A4", "A5", "A6", "A7", "A8", "A15", "A16", "A17",
"A18", "A24", "A25", "A26", "A27", "A28", "A29", "A30", "A31",
"A32", "A33", "A34", "A35", "A36", "A37", "A38"), class = "factor"),
sub_aggregate = c(42L, 42L, 42L, 42L, 42L, 42L), `%Response` = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), diseases = c("Leishmaniasis",
"Leishmaniasis", "Leishmaniasis", "Leishmaniasis", "Leishmaniasis",
"Leishmaniasis"), location = c("A33.42", "A34.42", "A35.42",
"A36.42", "A37.42", "A38.42"), Function = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), Module = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
)), row.names = c("A33.426", "A34.426", "A35.426", "A36.426",
"A37.426", "A38.426"), class = "data.frame")
Thank you,
Toufiq
@caggtaagtat , thank you very much for the suggestion, however, this was not my expected results. After running the code you provided (See below image),
For instance,
Aggregate column
should be ordered as:sub_aggregate
column should be ordered as:disease
column: starting with Group 1 and ending with Leishmaniasis orderOk, sry I don't understand, how you want to sort it. But it might be helpfull to remove the "A" from the column "Aggregate" and transforming the character vector to an numeric one, so that it is not sorted by alphabet anymore
@caggtaagtat, OK, thank you. After removing "A" from the column "Aggregate" and transforming the character vector to an numeric one. Is it again possible to add/append "A" to the column so that it's helpful for plotting at a latter point? Basically I would like to sort the data by expected results as provided in the
screenshot
anddput
above.try this