Entering edit mode
2.2 years ago
Assa Yeroslaviz
★
1.9k
I have a huge matrix with intensity values. In the first column some of the identifier have multiple entries, separated by a ;
. I would like to expand these rows and duplicate them into separate rows, as shown in the example below.
this is my input data frame:
Names 100 101 102 103 104
gene1 21.07819176 23.59656334 22.82142258 22.45487976 22.58430672
gene2 22.06554222 22.30852509 22.00048065 19.69557953 22.39509964
gene3;gene13 28.92616272 27.95604324 27.27648544 28.94528198 28.76357269
gene4 26.13991928 27.39280319 26.18173409 26.68411636 26.81592178
gene5 18.65508842 25.58026314 24.83395576 24.77213287 23.68795204
gene6;gene9 23.71098518 24.2517643 23.48701668 24.06730843 24.05034447
gene7 24.33637047 23.93660927 25.7705574 22.52026558 23.99619293
gene8 29.45690346 28.30786705 29.16029167 28.49213028 28.90692902
gene10 22.11131859 22.68044853 21.56625557 23.16993523 22.79750633
gene12 21.12464714 22.55580902 19.1800499 21.61175728 21.85718918
gene11;gene17;gene22 18.44805717 21.8158741 22.70597649 21.2684803 23.24526405
The rows with two or more entries i would like to duplicate and put each of these identifier in a separate row like this:
Names 100 101 102 103 104
gene1 21.07819176 23.59656334 22.82142258 22.45487976 22.58430672
gene2 22.06554222 22.30852509 22.00048065 19.69557953 22.39509964
gene3 28.92616272 27.95604324 27.27648544 28.94528198 28.76357269
gene13 28.92616272 27.95604324 27.27648544 28.94528198 28.76357269
gene4 26.13991928 27.39280319 26.18173409 26.68411636 26.81592178
gene5 18.65508842 25.58026314 24.83395576 24.77213287 23.68795204
gene6 23.71098518 24.2517643 23.48701668 24.06730843 24.05034447
gene9 23.71098518 24.2517643 23.48701668 24.06730843 24.05034447
gene7 24.33637047 23.93660927 25.7705574 22.52026558 23.99619293
gene8 29.45690346 28.30786705 29.16029167 28.49213028 28.90692902
gene10 22.11131859 22.68044853 21.56625557 23.16993523 22.79750633
gene12 21.12464714 22.55580902 19.1800499 21.61175728 21.85718918
gene11 18.44805717 21.8158741 22.70597649 21.2684803 23.24526405
gene17 18.44805717 21.8158741 22.70597649 21.2684803 23.24526405
gene22 18.44805717 21.8158741 22.70597649 21.2684803 23.24526405
Is there a way to do it using the tidyverse
(dplyr
) packages?
thanks Assa
df <- structure(list(Names = c("gene1", "gene2", "gene3;gene13", "gene4",
"gene5", "gene6;gene9", "gene7", "gene8", "gene10", "gene12",
"gene11;gene17;gene22"), X100 = c("21.07819176", "22.06554222",
"28.92616272", "26.13991928", "18.65508842", "23.71098518", "24.33637047",
"29.45690346", "22.11131859", "21.12464714", "18.44805717"),
X101 = c("23.59656334", "22.30852509", "27.95604324", "27.39280319",
"25.58026314", "24.2517643", "23.93660927", "28.30786705",
"22.68044853", "22.55580902", "21.8158741"), X102 = c("22.82142258",
"22.00048065", "27.27648544", "26.18173409", "24.83395576",
"23.48701668", "25.7705574", "29.16029167", "21.56625557",
"19.1800499", "22.70597649"), X103 = c("22.45487976", "19.69557953",
"28.94528198", "26.68411636", "24.77213287", "24.06730843",
"22.52026558", "28.49213028", "23.16993523", "21.61175728",
"21.2684803"), X104 = c("22.58430672", "22.39509964", "28.76357269",
"26.81592178", "23.68795204", "24.05034447", "23.99619293",
"28.90692902", "22.79750633", "21.85718918", "23.24526405"
)), class = "data.frame", row.names = c(NA, -11L))
Thanks, this does the work. a very easy and efficient solution.