Question

R - find a pattern in rows (dinucleotide:number), turn the pattern into colname and keep the numbers in rows

0

Entering edit mode

3.0 years ago

Bianca ▴ 20

Hello, I am new to R and to biostars.

I have a data frame that contains 2 columns: 30 sample names in the first column (sample1, sample2, sample3...); and the second column contains 'unique dinucleotides combinations:number' separated by spaces (for example CA:13 TG:20 GC:8 TT:13).

I want to grab the different dinucleotide combinations and turn them into colnames with the respective numbers in rows and keep the corresponding samples in the first column. Please see example of input below:

v1           v2
sample1      CA:34 TG:12 CG:10 AA:20 AT:13
saample2     CA:12 TG:19 AT:13 AC:13
sample3      CA:24 TG:14 CG:1 AA:30 AG:20 CC:24

Please see example of desired output

            CA    TG    CG    AA    AT    AC    AG    CC
sample1     34    12    10    20    13    13    20    24
saample2    12    19    0     0     13    0     0     0
sample3     24    14    1     30    0     0     0     0

Also, any tips on good resources to learn regex for dyplr tidyr would be much appreciated!

Thanks

dataframe tidyr grep r dplyr • 1.3k views

ADD COMMENT • link 3.0 years ago by Bianca ▴ 20

score 2 · Accepted Answer · 2022-07-22

2

Entering edit mode

3.0 years ago

Basti ★ 2.1k

Hi, using tidyr you could use :

df %>% 
  separate_rows(v2, sep = " ")%>% 
  separate(v2, into=c("v2","v3"), sep = ":")%>%
  pivot_wider(names_from = v2, values_from = v3)

ADD COMMENT • link 3.0 years ago by Basti ★ 2.1k

0

Entering edit mode

Hello, thank you for your help! That introduced a lot of NA's. Is there any way that I can turn them into zeros instead?

ADD REPLY • link 3.0 years ago by Bianca ▴ 20

1

Entering edit mode

library(tibble)
df %>% 
  separate_rows(v2, sep = " ")%>% 
  separate(v2, into=c("v2","v3"), sep = ":")%>%
  pivot_wider(names_from = v2, values_from = v3)%>%
  column_to_rownames(var = "v1")%>% 
  replace(is.na(.), 0)