Question

How to sort multi columns on long dataframe in R

0

Entering edit mode

3.7 years ago

mohammedtoufiq91 ▴ 260

Hi,

I have a question regarding the multi column data sorting on long dataframe in R. I have (7938 rows x 7 columns) in dataframe. First I would like to sort the column "sub_aggregate" and then column "diseases". Once both these columns are sorted I would like to sort the column "Aggregate" which has 27 unique factors repeated for each observation (27 Aggregates x 42 sub_aggregates x 7 diseases = 7938 observations)and keep the order of the "sub_aggregate" and "diseases" column intact. However, after sorting based on "Aggregate" column, I obtain only (27 rows x 7 columns) instead of (7938 rows x 7 columns). Please assist me with this.

Hers are the steps that I am trying to sort the dataframe:

df_app_MT is the dataframe
df_app_MT <- df_app_MT[order(df_app_MT$sub_aggregate),]
df_app_MT <- df_app_MT[order(df_app_MT$diseases),]

Now, I would like to sort the column "Aggregate" which has 27 unique factors repeated for each observation and hence, I created this as a character list "target" (see below)

target <- c("A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A15", "A16", "A17", "A18", "A24", "A25", "A26", "A27", "A28", "A29", "A30", "A31", "A32", "A33", "A34", "A35", "A36", "A37", "A38")

df_app_MT <- df_app_MT[match(target, df_app_MT$Aggregate),]

After running this code, I obtain only (27 rows* 7 columns), however, I need complete dataframe (i.e 7938 rows x 7 columns)

Aggregate has 27 factors

df_app_MT$Aggregate: factor "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A15", "A16", "A17", "A18", "A24", "A25", "A26", "A27", "A28", "A29", "A30", "A31", "A32", "A33", "A34", "A35", "A36", "A37", "A38"

diseases has 7 character

df_app_MT$diseases: chr  "Group1" "Group2" "Group3" "Group4a" "Group5a" "Group5b" "Leishmaniasis"

sub_aggregate has 42 integers from (1 to 42 sequentially)

df_app_MT$sub_aggregate: int 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

The dataframe subset is provided below along with the input data and expected data:

Input data

dput(head(df_app_MT))
structure(list(Aggregate = structure(c(11L, 11L, 11L, 11L, 11L, 
11L), .Label = c("A1", "A15", "A16", "A17", "A18", "A2", "A24", 
"A25", "A26", "A27", "A28", "A29", "A3", "A30", "A31", "A32", 
"A33", "A34", "A35", "A36", "A37", "A38", "A4", "A5", "A6", "A7", 
"A8"), class = "factor"), sub_aggregate = c(1L, 1L, 1L, 1L, 1L, 
1L), `%Response` = c(95.2380952380952, 76.1904761904762, 0, 0, 
0, 0), diseases = c("Group1", "Group2", "Group3", "Group4a", 
"Group5a", "Group5b"), location = c("A28.1", "A28.1", "A28.1", 
"A28.1", "A28.1", "A28.1"), Function = c("Interferon", "Interferon", 
"Interferon", "Interferon", "Interferon", "Interferon"), Module = c("M10.1", 
"M10.1", "M10.1", "M10.1", "M10.1", "M10.1")), row.names = c(NA, 
6L), class = c("tbl_df", "tbl", "data.frame"))

dput(tail(df_app_MT))
structure(list(Aggregate = structure(c(27L, 27L, 27L, 27L, 27L, 
27L), .Label = c("A1", "A15", "A16", "A17", "A18", "A2", "A24", 
"A25", "A26", "A27", "A28", "A29", "A3", "A30", "A31", "A32", 
"A33", "A34", "A35", "A36", "A37", "A38", "A4", "A5", "A6", "A7", 
"A8"), class = "factor"), sub_aggregate = c(9L, 9L, 9L, 9L, 9L, 
9L), `%Response` = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), diseases = c("Group2", "Group3", "Group4a", 
"Group5a", "Group5b", "Leishmaniasis"), location = c("A8.9", 
"A8.9", "A8.9", "A8.9", "A8.9", "A8.9"), Function = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
), Module = c(NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_)), row.names = 7933:7938, class = c("tbl_df", 
"tbl", "data.frame"))

Expected data

diseases column = Arranged as Group1 to be on top, and Leishmaniasis on bottom

sub_aggregate = Arranged as 1 to 42 for all groups in disease column sequentially (for instance, Group 1: 1 to 42, Group 2: 1 to 42, ......, Leishmaniasis: 1 to 42)

dput(head(df_app))
structure(list(Aggregate = structure(1:6, .Label = c("A1", "A2", 
"A3", "A4", "A5", "A6", "A7", "A8", "A15", "A16", "A17", "A18", 
"A24", "A25", "A26", "A27", "A28", "A29", "A30", "A31", "A32", 
"A33", "A34", "A35", "A36", "A37", "A38"), class = "factor"), 
    sub_aggregate = c(1L, 1L, 1L, 1L, 1L, 1L), `%Response` = c(55.1724137931034, 
    53.8461538461538, -15.7894736842105, -10, -33.3333333333333, 
    16.6666666666667), diseases = c("Group1", "Group1", "Group1", 
    "Group1", "Group1", "Group1"), location = c("A1.1", "A2.1", 
    "A3.1", "A4.1", "A5.1", "A6.1"), Function = c("B cells", 
    "Cytotoxic lymphocytes", "Protein modification", "Cell death", 
    "B cells", "Cell cycle"), Module = c("M13.27", "M9.1", "M15.44", 
    "M15.49", "M16.57", "M14.54")), row.names = c("A1.1", "A2.1", 
"A3.1", "A4.1", "A5.1", "A6.1"), class = "data.frame")


dput(tail(df_app))
structure(list(Aggregate = structure(22:27, .Label = c("A1", 
"A2", "A3", "A4", "A5", "A6", "A7", "A8", "A15", "A16", "A17", 
"A18", "A24", "A25", "A26", "A27", "A28", "A29", "A30", "A31", 
"A32", "A33", "A34", "A35", "A36", "A37", "A38"), class = "factor"), 
    sub_aggregate = c(42L, 42L, 42L, 42L, 42L, 42L), `%Response` = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), diseases = c("Leishmaniasis", 
    "Leishmaniasis", "Leishmaniasis", "Leishmaniasis", "Leishmaniasis", 
    "Leishmaniasis"), location = c("A33.42", "A34.42", "A35.42", 
    "A36.42", "A37.42", "A38.42"), Function = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_), Module = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    )), row.names = c("A33.426", "A34.426", "A35.426", "A36.426", 
"A37.426", "A38.426"), class = "data.frame")

Thank you,

Toufiq

data.frame Order R dplyr plyr • 1.5k views

ADD COMMENT • link updated 3.7 years ago by gglim ▴ 220 • written 3.7 years ago by mohammedtoufiq91 ▴ 260

score 0 · Answer 1 · 2021-04-14

0

Entering edit mode

3.7 years ago

caggtaagtat ★ 1.9k

Hi, you can sort with multiple columns like this:

df_app_MT <- df_app_MT[order(df_app_MT$sub_aggregate,df_app_MT$diseases,df_app_MT$Aggregate),]

ADD COMMENT • link 3.7 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

@caggtaagtat , thank you very much for the suggestion, however, this was not my expected results. After running the code you provided (See below image),

df_app_MT <- df_app_MT[order(df_app_MT$sub_aggregate,df_app_MT$diseases,df_app_MT$Aggregate),]

For instance,

Aggregate column should be ordered as:

("A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A15", "A16", "A17", "A18", "A24", "A25", "A26", "A27", "A28", "A29", "A30", "A31", "A32", "A33", "A34", "A35", "A36", "A37", "A38")

sub_aggregate column should be ordered as:

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ................42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42

disease column: starting with Group 1 and ending with Leishmaniasis order

Group 1 Group 1 Group 1 Group 1 Group 1 Group 1 Group 1 Group 1.....................Leishmaniasis Leishmaniasis Leishmaniasis Leishmaniasis....

ADD REPLY • link 3.7 years ago by mohammedtoufiq91 ▴ 260

1

Entering edit mode

Ok, sry I don't understand, how you want to sort it. But it might be helpfull to remove the "A" from the column "Aggregate" and transforming the character vector to an numeric one, so that it is not sorted by alphabet anymore

ADD REPLY • link 3.7 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

@caggtaagtat, OK, thank you. After removing "A" from the column "Aggregate" and transforming the character vector to an numeric one. Is it again possible to add/append "A" to the column so that it's helpful for plotting at a latter point? Basically I would like to sort the data by expected results as provided in the screenshot and dput above.

ADD REPLY • link 3.7 years ago by mohammedtoufiq91 ▴ 260

0

Entering edit mode

try this

df_app_MT$Aggregate <- paste0(rep("A",7938),df_app_MT$Aggregate)

ADD REPLY • link 3.7 years ago by gglim ▴ 220