Hello, I am new to R and to biostars.
I have a data frame that contains 2 columns: 30 sample names in the first column (sample1, sample2, sample3...); and the second column contains 'unique dinucleotides combinations:number' separated by spaces (for example CA:13 TG:20 GC:8 TT:13).
I want to grab the different dinucleotide combinations and turn them into colnames with the respective numbers in rows and keep the corresponding samples in the first column. Please see example of input below:
v1 v2
sample1 CA:34 TG:12 CG:10 AA:20 AT:13
saample2 CA:12 TG:19 AT:13 AC:13
sample3 CA:24 TG:14 CG:1 AA:30 AG:20 CC:24
Please see example of desired output
CA TG CG AA AT AC AG CC
sample1 34 12 10 20 13 13 20 24
saample2 12 19 0 0 13 0 0 0
sample3 24 14 1 30 0 0 0 0
Also, any tips on good resources to learn regex for dyplr tidyr would be much appreciated!
Thanks
Hello, thank you for your help! That introduced a lot of NA's. Is there any way that I can turn them into zeros instead?
Thank you so much!