summing taxon counts per participant
2
0
Entering edit mode
2.2 years ago

I am trying to sum taxa per participant in this partial of a dataset. In the first column there are repeating taxa, the other columns are participants and counts of the taxa. When I had only one person, I used the =SUMIF($A$2:$A$15, E2, $B$2:$B$15) function after removing duplicates in excel which did the trick. I am trying to avoid having to do it 150 times for each participant. Would anyone have an idea, either in Excel or R?

Taxon                  A              B          C                 D
 Bacteroides    193,568     30,950     114,136  390,546
 Prevotella     0               364,092    0                  0
 Bacteroides    352,626     389,938 8,492   213,736
 Fusicatenibacter   669,374 402,290 317,594 178,340
 Bacteroides    563,284      14,114 832,400 30,958
 Bacteroides        0                     0           0          78,302
 Agathobacter     204,810       117,144 180,104 69,296
 Alistipes               0                136,800     0            8,430
 Blautia              161,132        261,660      9,092 118,602
 Ruminococcus  75,010             0           282   26,368
 Bacteroides       50,782         23,066    61,236  37,214
 Prevotella    91,560              0             19,676 69,752
 Blautia              125,734              0            26,750  93,874
 Alistipes        139,518         51,154    87,490   5,266
 Agathobacter      4,854          13,830    34,772   1,782
R excel • 1.1k views
ADD COMMENT
0
Entering edit mode

How do you know which duplicate taxon to remove?

ADD REPLY
0
Entering edit mode

I remove all duplicates, then sum counts with the SUMIF function in excel. Works for one participant/column and I am looking for a formula which does it for all columns.

ADD REPLY
1
Entering edit mode
2.2 years ago
zx8754 12k

I think you want to group by Taxon then sum all columns.

Excel, use pivot:

enter image description here

R, see this SO post: - Aggregate / summarize multiple variables per group (e.g. sum, mean)

ADD COMMENT
0
Entering edit mode

Fantastic, thank you so much.

ADD REPLY
1
Entering edit mode
2.2 years ago

With R you can use the function aggregate

aggregate(. ~ Taxon, df, sum) # df is the input data.frame
ADD COMMENT
0
Entering edit mode

Love it, thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 2796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6