How to create a new column with for and if/else
1
0
Entering edit mode
3.6 years ago

Hi guys,

I'd like to create a new column of a dataframe using for and if/else functions. My dataframe ds1 has the following 4 columns:

Sample_Name     Sample_Well     Sentrix_ID     Sentrix_Position
pool_women      A01             2,04426E+11    R01C01
213141          B01             2,04426E+11    R02C0
pool_men        C01             2,04426E+11    R03C0
253141          D01             2,04426E+11    R04C01
202196          E01             2,04426E+11    R05C01
200569          F01             2,04426E+11    R06C01
242196          G01             2,04426E+11    R07C01

Now, I want to create a new column Sample_group, in which I should find 0 for samples starting with "21" and "20" in Sample_Name, 1 for samples starting with "25" and "24" in Sample_Names and 2 for the others (pool_women and pool_men), as following:

Sample_Name     Sample_Well     Sentrix_ID     Sentrix_Position     Sample_group
pool_women      A01             2,04426E+11    R01C01               2
213141          B01             2,04426E+11    R02C01               0
pool_men        C01             2,04426E+11    R03C01               2
253141          D01             2,04426E+11    R04C01               1
202196          E01             2,04426E+11    R05C01               0
200569          F01             2,04426E+11    R06C01               0
242196          G01             2,04426E+11    R07C01               1

I wrote the following code:

variables <- colnames(ds1[,which(colnames(ds1)=="Sample_Name")])

for(i in variables) {
  if(gsub("(^\\d{2}).*", "\\1", i) == "21" | gsub("(^\\d{2}).*", "\\1", i) == "20") {ds1$Sample_group1 <- 0}
  if(gsub("(^\\d{2}).*", "\\1", i) == "24" | gsub("(^\\d{2}).*", "\\1", i) == "25") {ds1$Sample_group1 <- 1}
  else {ds1$Sample_group1 <- 2}
}

However, I found only 1 at the Sample_group column for all samples.

What's wrong with my code?

Thank you!

R • 814 views
ADD COMMENT
1
Entering edit mode

input:

> df

  Sample_Name
1  pool_women
2      213141
3    pool_men
4      253141
5      202196
6      200569
7      242196

output:

> df$group = with (df, 
+            ifelse (grepl("^20|^21", Sample_Name),0, 
+            ifelse(grepl("^25|^24",  Sample_Name),1,2 )))

> df
  Sample_Name group
1  pool_women     2
2      213141     0
3    pool_men     2
4      253141     1
5      202196     0
6      200569     0
7      242196     1

with dplyr:

df %>%
    mutate(across(
        .cols = Sample_Name,
        ~ ifelse (grepl("^20|^21", .),0, ifelse(grepl("^25|^24", .),1,2 )),
        .names = "group"
    ))
ADD REPLY
5
Entering edit mode
3.6 years ago
ATpoint 86k

Don't use for loops, that will access every single row, which will take ages if you have large data.frames.

#/ Example data:
df <- data.frame(Sample_Name=c("200", "211", 
                               "240", "251", 
                               "345", "456"))

#----------------------------------------------------------
# BASE R:
#----------------------------------------------------------
#/ new column with "2" (so other):
df$Sample_Group <- rep("2", nrow(df))

#/ and now replace 20/21s with 0 and 24/25s with 1:
df$Sample_Group[grep("^20|^21", df$Sample_Name)] <- 0
df$Sample_Group[grep("^24|^25", df$Sample_Name)] <- 1

> df
Sample_Name Sample_Group
1         200            0
2         211            0
3         240            1
4         251            1
5         345            2
6         456            2

#----------------------------------------------------------
# TIDYVERSE
#----------------------------------------------------------
library(tidyverse)
df %>%
  mutate(Sample_Group = 
           case_when(str_detect(Sample_Name, "^20|^21") ~ "0",
                     str_detect(Sample_Name, "^24|^25") ~ "1",
                     !str_detect(Sample_Name, "^20|^21|^24|^25") ~ "2")
)

Sample_Name Sample_Group
1         200            0
2         211            0
3         240            1
4         251            1
5         345            2
6         456            2

The ^ means "starts with". The first solution is with only base R functions, the second one uses the tidyverse packages.

Next time please provide example data, e.g. via dput. If you have a data.frame named df you can run dput(df) and it will print an ASCII representation of the data which you can provide. That makes it easy to copy/paste your data rather than typing down things. Also try to provide a small but representative selection of the data or dummy data to keep the post short and readable.

ADD COMMENT

Login before adding your answer.

Traffic: 1839 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6