R - find a pattern in rows (dinucleotide:number), turn the pattern into colname and keep the numbers in rows
1
0
Entering edit mode
2.4 years ago
Bianca ▴ 20

Hello, I am new to R and to biostars.

I have a data frame that contains 2 columns: 30 sample names in the first column (sample1, sample2, sample3...); and the second column contains 'unique dinucleotides combinations:number' separated by spaces (for example CA:13 TG:20 GC:8 TT:13).

I want to grab the different dinucleotide combinations and turn them into colnames with the respective numbers in rows and keep the corresponding samples in the first column. Please see example of input below:

v1           v2
sample1      CA:34 TG:12 CG:10 AA:20 AT:13
saample2     CA:12 TG:19 AT:13 AC:13
sample3      CA:24 TG:14 CG:1 AA:30 AG:20 CC:24

Please see example of desired output

            CA    TG    CG    AA    AT    AC    AG    CC
sample1     34    12    10    20    13    13    20    24
saample2    12    19    0     0     13    0     0     0
sample3     24    14    1     30    0     0     0     0

Also, any tips on good resources to learn regex for dyplr tidyr would be much appreciated!

Thanks

dataframe tidyr grep r dplyr • 1.1k views
ADD COMMENT
2
Entering edit mode
2.4 years ago
Basti ★ 2.0k

Hi, using tidyr you could use :

df %>% 
  separate_rows(v2, sep = " ")%>% 
  separate(v2, into=c("v2","v3"), sep = ":")%>%
  pivot_wider(names_from = v2, values_from = v3)
ADD COMMENT
0
Entering edit mode

Hello, thank you for your help! That introduced a lot of NA's. Is there any way that I can turn them into zeros instead?

ADD REPLY
1
Entering edit mode
library(tibble)
df %>% 
  separate_rows(v2, sep = " ")%>% 
  separate(v2, into=c("v2","v3"), sep = ":")%>%
  pivot_wider(names_from = v2, values_from = v3)%>%
  column_to_rownames(var = "v1")%>% 
  replace(is.na(.), 0)
ADD REPLY
0
Entering edit mode

Thank you so much!

ADD REPLY

Login before adding your answer.

Traffic: 1947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6