Grouping entries in a column based on string dataframe
1
1
Entering edit mode
2.5 years ago
Paula ▴ 60

Hi!

I have a table with different categories and the respective coverage values:

taxonomy     coverage

A                     1815.928793
ADK             5.488047
ADL                 5.047244
AJ                 71.070325
AL                  119.333441

Now, I want to group all the entries that contain one or more of the letters "A,B,J,K,L" in a new category called "W", and add the coverages. I would use Python. The result should look like this:

taxonomy     coverage
ADK     5.488047
ADL     5.047244
W   2006.332559

Thanks a lot!

python biopython • 1.0k views
ADD COMMENT
0
Entering edit mode

Hello Paula ! It is usually a good practice to provide your attempt to solve the problem :). What have you tried?

ADD REPLY
0
Entering edit mode

Is D the only taxonomy that needs to be excluded from W or are there other letters that need to be excluded?

ADD REPLY
1
Entering edit mode
2.5 years ago
Jeremy ▴ 930

If you can save your data in a CSV file or convert it to a dataframe, you can use the following script in Python:

#Import needed packages.
import pandas as pd

#Read in CSV file.
df = pd.read_csv('df.csv')
print(df)

#Sum rows that only have A, B, J, K, or L and delete those rows.
n = 0
for row in range(len(df)):
    if ('D' or 'E' or 'F' or 'G' or 'H' or 'I' or 'M' or 'N' or 'O' or 'P' or 'Q' or 'R' or 'S' or 'T' or 'U' or 'V' or 'W' or 'X' or 'Y' or 'Z') not in df.loc[row, 'taxonomy']:
         n += df.loc[row, 'coverage']
         df = df.drop(labels = row, axis = 0)

#Make a new row with W as the taxonomy and the sum from above as the coverage.
df.loc[len(df) + 1, 'coverage'] = n
df.loc[len(df), 'taxonomy'] = 'W'
print(df)
ADD COMMENT

Login before adding your answer.

Traffic: 1555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6