I am trying to sum taxa per participant in this partial of a dataset.
In the first column there are repeating taxa, the other columns are participants and counts of the taxa.
When I had only one person, I used the =SUMIF($A$2:$A$15, E2, $B$2:$B$15)
function after removing duplicates in excel which did the trick.
I am trying to avoid having to do it 150 times for each participant. Would anyone have an idea, either in Excel or R?
Taxon A B C D
Bacteroides 193,568 30,950 114,136 390,546
Prevotella 0 364,092 0 0
Bacteroides 352,626 389,938 8,492 213,736
Fusicatenibacter 669,374 402,290 317,594 178,340
Bacteroides 563,284 14,114 832,400 30,958
Bacteroides 0 0 0 78,302
Agathobacter 204,810 117,144 180,104 69,296
Alistipes 0 136,800 0 8,430
Blautia 161,132 261,660 9,092 118,602
Ruminococcus 75,010 0 282 26,368
Bacteroides 50,782 23,066 61,236 37,214
Prevotella 91,560 0 19,676 69,752
Blautia 125,734 0 26,750 93,874
Alistipes 139,518 51,154 87,490 5,266
Agathobacter 4,854 13,830 34,772 1,782
How do you know which duplicate taxon to remove?
I remove all duplicates, then sum counts with the SUMIF function in excel. Works for one participant/column and I am looking for a formula which does it for all columns.