How to fill with dots empty fields in a dataframe in R when splitting a column by a separator
1
0
Entering edit mode
4.2 years ago
giusdalt95 ▴ 10

Hi everybody! I'm new to Bioinformatics and recently I came across a problem. I have a column ("name") with comma delimited values:

                                          name
name=1,name2=11,name3=111,name4=1111,name5=11111
name=2,name2=22,name4=2222,name5=22222
name=3,name2=33,name3=333,name4=3333,name5=33333
name=4,name2=44,name3=444,name4=4444,name5=44444
name=5,name2=55,name3=555,name4=5555,name5=55555
name=6,name3=666,name4=6666,name5=66666
name=7,name2=77,name3=777,name5=77777

I want to split this column and print values in cells of a dataframe with this tab delimited header: "name name2 name3 name4 name5" as you can see, "header" has 5 columns, whilst "name" has different numbers of comma separated values (5 in the first row, 4 in the second), and when i use the "separate()" function I have a frameshift. How can I fill missing values with a dot or something else? Thank you so much

R • 1.2k views
ADD COMMENT
0
Entering edit mode

Thank you so much, that was literaly what I was looking for

ADD REPLY
0
Entering edit mode

please mark the best answer as accepted, upvote other useful answers and write any thanks into comments and not as an answer

ADD REPLY
4
Entering edit mode
4.2 years ago

The data.

df <- structure(list(name = c("name=1,name2=11,name3=111,name4=1111,name5=11111", 
"name=2,name2=22,name4=2222,name5=22222", "name=3,name2=33,name3=333,name4=3333,name5=33333", 
"name=4,name2=44,name3=444,name4=4444,name5=44444", "name=5,name2=55,name3=555,name4=5555,name5=55555", 
"name=6,name3=666,name4=6666,name5=66666", "name=7,name2=77,name3=777,name5=77777"
)), class = "data.frame", row.names = c(NA, -7L))

> df
                                              name
1 name=1,name2=11,name3=111,name4=1111,name5=11111
2           name=2,name2=22,name4=2222,name5=22222
3 name=3,name2=33,name3=333,name4=3333,name5=33333
4 name=4,name2=44,name3=444,name4=4444,name5=44444
5 name=5,name2=55,name3=555,name4=5555,name5=55555
6          name=6,name3=666,name4=6666,name5=66666
7            name=7,name2=77,name3=777,name5=77777

A tidyverse answer.

library("tidyverse")

df <- df %>%
  rowid_to_column %>%
  separate_rows(name, sep=",") %>%
  separate(name, into=c("name_id", "value"), sep="=", convert=TRUE) %>%
  pivot_wider(names_from=name_id, values_from=value) %>%
  select(!rowid)

> df
# A tibble: 7 x 5
   name name2 name3 name4 name5
  <int> <int> <int> <int> <int>
1     1    11   111  1111 11111
2     2    22    NA  2222 22222
3     3    33   333  3333 33333
4     4    44   444  4444 44444
5     5    55   555  5555 55555
6     6    NA   666  6666 66666
7     7    77   777    NA 77777
ADD COMMENT

Login before adding your answer.

Traffic: 4071 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6