I'm writing a script that deletes all the rows from ANY column that contains the word "Eukaryota" from a data frame. Note that the row needs to contain the word "Eukaryota" not be = to "Eukaryota". The columns from the data frame do not have header names.
I am trying the following command:
import numpy as np
import pandas as pd
df = pd.read_csv('output.emapper.annotations_1_10.txt', sep='\t')
df.drop(df[df.apply(lambda row: 'Eukaryota' in row.to_string(header=False), axis=1)].index, inplace=True)
df.to_csv('sin_euk.csv', sep='\t')
The script runs but the file "sin_euk.csv" still contains the entries with the word "Eukaryota"
I have also tried the following strategy and did not obtain the desired result:
df = df[~df.isin(['Eukaryota']).any(axis=1)]
Do you know of any other strategies?
Thank you!
sed
can also do the job for you