Question

Grouping entries in a column based on string dataframe

1

Entering edit mode

2.6 years ago

Paula ▴ 60

Hi!

I have a table with different categories and the respective coverage values:

taxonomy     coverage

A                     1815.928793
ADK             5.488047
ADL                 5.047244
AJ                 71.070325
AL                  119.333441

Now, I want to group all the entries that contain one or more of the letters "A,B,J,K,L" in a new category called "W", and add the coverages. I would use Python. The result should look like this:

taxonomy     coverage
ADK     5.488047
ADL     5.047244
W   2006.332559

Thanks a lot!

python biopython • 1.0k views

ADD COMMENT • link updated 2.6 years ago by Jeremy ▴ 930 • written 2.6 years ago by Paula ▴ 60

0

Entering edit mode

Hello Paula ! It is usually a good practice to provide your attempt to solve the problem :). What have you tried?

ADD REPLY • link 2.6 years ago by iraun 6.2k

0

Entering edit mode

Is D the only taxonomy that needs to be excluded from W or are there other letters that need to be excluded?

ADD REPLY • link 2.6 years ago by Jeremy ▴ 930

score 1 · Answer 1 · 2022-06-03

If you can save your data in a CSV file or convert it to a dataframe, you can use the following script in Python:

#Import needed packages.
import pandas as pd

#Read in CSV file.
df = pd.read_csv('df.csv')
print(df)

#Sum rows that only have A, B, J, K, or L and delete those rows.
n = 0
for row in range(len(df)):
    if ('D' or 'E' or 'F' or 'G' or 'H' or 'I' or 'M' or 'N' or 'O' or 'P' or 'Q' or 'R' or 'S' or 'T' or 'U' or 'V' or 'W' or 'X' or 'Y' or 'Z') not in df.loc[row, 'taxonomy']:
         n += df.loc[row, 'coverage']
         df = df.drop(labels = row, axis = 0)

#Make a new row with W as the taxonomy and the sum from above as the coverage.
df.loc[len(df) + 1, 'coverage'] = n
df.loc[len(df), 'taxonomy'] = 'W'
print(df)