I have an excel file which has three columns. In the first column there are drug names (with repetition); In the second column there are cell line names in which these drugs are tested on. The same drug names as well cell line names can be found in multiple rows. In the third column I have the AUC value. The thing is that each drug has been tested on different nr of cell lines. I want to format this file so that i have a final matrix in which :
Drug names should be first column (one row to each drug), cell line names should be on the first row and the cells should be filled with AUC values. In those cells where a drug hasn't been tested on that specific cell line I want "NA" values.
This is in part a rant... I don't understand why the "tidyverse" ecosystem has so many functions that seem to do the same thing. I mean, why is there spread, complete, gather, etc...? I don't use tidyr but I work with ggplot2 a lot and I have the same feeling. With data.table I can do pretty much everything dplyr does without learning a number of new functions. For example:
spread and gather are now pivot_wider and pivot_longer, which is same as data.tabledcast and melt. complete has a different functionality - in data.table it is CJ - see data.table equivalent of tidyr::complete().
So it is 3 functions in tidyverse and 3 functions in data.table.
Thanks for the comment - I don't doubt there is a strong reasoning behind tidyverse since otherwise it wouldn't be so popular - it's probably just me. I think this post summarises my feelings pretty well even if I don't agree with everything in there.
mandatory picture...
Any R suggestions are welcomed as well. I am not necessarily looking to solve this on Excel.
See related StackOverflow post, using R,
tidyr::complete
: