I have two dataframes (df1 and df2) that look like this:
df1:
df2:
My aim is to change the IDs of df2 to geo_accession using df1 as reference:
Example: SYMBOL GSM4187205 ......
But I have encountered two problems:
- The number of variables in df1 is 587 (number of samples) while the number of samples in df2 is 552. 35 samples have been removed. How can I fix df1 to be exactly the same as df2?
- I have written the following code:
df2.mod <- df2 %>%
gather(key = "samples", value = "counts", -SYMBOL) %>%
mutate(samples = gsub("X", "", samples)) %>%
inner_join(., df1, by = c('samples' = 'title')) %>%
spread(key = 'geo_accession', value = 'counts') %>%
column_to_rownames(var = 'SYMBOL')
But it doesn't work because the number of samples and title variables don't match.
Error: <0 rows> (o 0- extensión row.names)
Thank you for your help,
Can you confirm that values in header of dataframe 2 correspond to column 2 of dataframe 1?
Hi,
I confirm. I I manually checked the first 10 with the full dataset. Both files come from the same data: one is the metadata dataset and the other is the gene expression dataset.