So basically I have a dataframe that kinda looks like this:
Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24
Akutan city NA NA NA NA NA NA 71
Alcan Border NA NA 2 NA NA NA NA
Alcan Border NA NA NA NA NA 2 NA
Alcan Border NA NA NA NA 5 NA NA
Ambler City 224 NA NA NA NA NA NA
Ambler City NA NA NA 17 NA NA NA
Is there a simple way to combine multiple rows based on multiple column data? I've seen a few scripts that say you can combine one duplicate variable in a column based on one or two data columns but I need to do it more large scale (I have ~400 rows with duplicates and ~30 columns (and each column has a large name). Ideally it would look like:
Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24
Akutan city NA NA NA NA NA NA 71
Alcan Border NA NA 2 NA 5 2 NA
Ambler City 224 NA NA 17 NA NA NA
I tried using the following code but im getting an error:
setDT(df)[,lapply(.SD, function(x) ifelseallis.na(x)), NA_integer_, sum("x", na.rm = T))),
by = sample_id]
data2 <- column_to_rownames(data2, 'Community')
Error in sum("x", na.rm = T) : invalid 'type' (character) of argument
Hi, After using df %>% group_by(Community) %>% summarise_if( is.numeric, sum, na.rm = TRUE )
It gave me the following output:
Community Akutan city
Alcan Border
Alcan Border
Alcan Border Ambler City
Ambler City
Please assist!
Help me help you. Could you use
dput
on a subset of your data and put the result into your original question: so that I have some example data to work withExample: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example