Hello,
I need to group the rows in a table based on a set of conditions. Here is the table:
And I need to group it based on the following conditions:
- I want to identify the rows that have the same taxonomy.
- If the rows have the same taxonomy but difference of the value for column11 is more than 10^5 from the other columns, I want to drop the row. For example, in the table above, I would drop row 5 (tomato) because the value for column11 is more than five orders of magnitude different than the value from column11 for the other tomatoes.
- I want to group the remaining rows by taxonomy and keep the mean of column11 as the final column 11 value per row. Here is the desired result:
Here is the code I have so far:
import numpy as np
import pandas as pd
df = pd.read_csv('data.csv', sep='\t', decimal='.')
l= list(df.iloc[:,0])
q = len(l) - 1
r=[False]
tax = df.taxonomy
m = 0
for k in range(0,q):
r.append(l[k]==l[k+1])
print('k')
print(l[k])
print('k+1')
print(l[k+1])
print(r)
dif = df.column11[k] - df.column11[k+1]
if ((r == 'True') AND (dif > 100000)):
df.drop(k)
The code finds the rows that have the same taxonomy, but does not execute the "if" loop. I'd be grateful if I could have some help.
Thanks!