How binarize my data frame in R?
1
0
Entering edit mode
2.4 years ago

Hi everybody I need trasforms a data frame in a data binarize, for example:

this is my data frame

mydata():
ID  AA  AB  AC  AD
Camp1   0   0   2   9
Camp2   2   1   2   9
Camp3   1   1   2   9
Camp4   1   1   2   9
Camp5   0   9   2   2
Camp6   9   9   0   2
Camp7   9   9   0   2
Camp8   1   9   0   2

this is what I would like to achieve: I

D   AA_0    AA_1    AA_2    AA_9    AB_0    AB_1    AB_9    AC_2    AC_0    AD_9    AD_2
Camp1   1   0   0   0   1   0   0   1   0   1   0
Camp2   0   0   1   0   0   1   0   1   0   1   0
Camp3   0   1   0   0   0   1   0   1   0   1   0
Camp4   0   1   0   0   0   1   0   1   0   1   0
Camp5   1   0   0   0   0   0   1   1   0   0   1
Camp6   0   0   0   1   0   0   1   0   1   0   1
Camp7   0   0   0   1   0   0   1   0   1   0   1
Camp8   0   1   0   0   0   0   1   0   1   0   1

I have tried to use these two commands:

data <- mydata[,-c(1)]
data <- data %>% dplyr::mutate_if(is.character,as.factor)
data <- mltools::one_hot(data.table::as.data.table(data)

the commands run but the file is not binarized!! Does anyone know an alternative to this?

thank you

R dataframe binarize • 1.5k views
ADD COMMENT
0
Entering edit mode
# Generate sample data
data <- as.data.frame(matrix(rbinom(100,5,0.3),ncol=5))

# Binarize
data_binary <- apply(data,2,function(x){as.numeric(x>0)})
ADD REPLY
0
Entering edit mode

If you want to speed up the computation you can just do (data > 0) * 1.

ADD REPLY
0
Entering edit mode

I think that is not what I wants. It seems they, for every column want to create a new column that says how many 0's, 1's, 2's are in the original column.

ADD REPLY
0
Entering edit mode

yes, that's it!

ADD REPLY
1
Entering edit mode
2.4 years ago
> df %>% 
+   pivot_longer(-ID,names_to = "k",values_to = "v") %>% 
+   mutate(new=paste(k,v,sep = "_")) %>% 
+   select(ID,new) %>% 
+   table() %>% 
+   as.data.frame.matrix() 

Full tidy:

> df %>% 
+   pivot_longer(-ID,names_to = "k",values_to = "v") %>% 
+   mutate(new=paste(k,v,sep = "_")) %>%
+   group_by(ID, new) %>% 
+   summarise(freq = n()) %>%
+   ungroup() %>% 
+   pivot_wider(names_from = new, values_from = freq, values_fill = 0) 

with data.table:

df=fread("test.txt", sep = "\t", header = T)
df |> 
  melt("ID") |> 
  dcast (ID ~ variable + value, length)
ADD COMMENT
0
Entering edit mode

+1 That's a nice example of how to elegantly combine tidyverse and base R (table, data.frame.matrix) rather than trying to squeeze everything into a clumsy tidy solution.

ADD REPLY
2
Entering edit mode

The tidyverse solution in this case isn't too bad. It's just 2 lines.

library("tidyr")

df |>
  pivot_longer(!ID) |>
  pivot_wider(names_from=c(name, value), values_fn=length, values_fill=0)

The result.

# A tibble: 8 × 12
  ID     AA_0  AB_0  AC_2  AD_9  AA_2  AB_1  AA_1  AB_9  AD_2  AA_9  AC_0
  <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 Camp1     1     1     1     1     0     0     0     0     0     0     0
2 Camp2     0     0     1     1     1     1     0     0     0     0     0
3 Camp3     0     0     1     1     0     1     1     0     0     0     0
4 Camp4     0     0     1     1     0     1     1     0     0     0     0
5 Camp5     1     0     1     0     0     0     0     1     1     0     0
6 Camp6     0     0     0     0     0     0     0     1     1     1     1
7 Camp7     0     0     0     0     0     0     0     1     1     1     1
8 Camp8     0     0     0     0     0     0     1     1     1     0     1
ADD REPLY

Login before adding your answer.

Traffic: 1886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6