Assign redudancy in pandas dataframe
1
0
Entering edit mode
4.9 years ago
flogin ▴ 280

Hy guys, I have a dataframe like this:

                                                          Host Redundant
Cluster                                                                  
>Cluster 0                                   Diabrotica_barberi       NaN
>Cluster 1                                   Diabrotica_barberi       NaN
>Cluster 2                              Trichogramma_dendrolimi       NaN
>Cluster 3                                      Formica_exsecta       NaN
>Cluster 4                                      Formica_exsecta       NaN
>Cluster 5                                     Nephila_plumipes       NaN
>Cluster 6                             Ceutorhynchus_obstrictus       NaN
>Cluster 7                                   Spalangia_cameroni       NaN
>Cluster 8                                     Diaphorina_citri       NaN
>Cluster 9                                Aleurodicus_dispersus       NaN
>Cluster 10                               Aleurodicus_dispersus       NaN
>Cluster 11                                               Culex       NaN
>Cluster 12                            Chaetophiloscia_elongata       NaN
>Cluster 13                                      Bemisia_tabaci       NaN
>Cluster 14                                 Sogatella_furcifera       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                                 Sogatella_furcifera       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                              Laodelphax_striatellus       NaN
>Cluster 14                                 Sogatella_furcifera       NaN
>Cluster 14                                 Sogatella_furcifera       NaN
>Cluster 15                              Laodelphax_striatellus       NaN
>Cluster 16                                  Chilo_suppressalis       NaN
>Cluster 17                                  Metaphycus_ericeri       NaN
>Cluster 18                                   Cotesia_glomerata       NaN
>Cluster 18                                            Vespidae       NaN
>Cluster 18                                            Vespidae       NaN
>Cluster 19                               Neoceratitis_asiatica       NaN
>Cluster 19                               Neoceratitis_asiatica       NaN
>Cluster 19                               Neoceratitis_asiatica       NaN
>Cluster 20                                Brontispa_longissima       NaN

Where I have an Index "Cluster" and 2 columns (Host) and (Redundant).

I want to identify Index that has All hosts redundant, like Cluster 19 (considering that have several clusters with some redundant hosts like clusters 14 and 18, but I want assign as Redundant = True only clusters with all hosts equal.)

I know that df.drop_duplicates can remove redundant rows, but I want only to assign as True the "Redundant" column.

Can anyone explain to me how I can do this?

Thanks,

python pandas dataframe • 1.1k views
ADD COMMENT
3
Entering edit mode
4.9 years ago
Jianyu ▴ 580

Do you mean you want to assign True to all clusters with the same hosts? Try this (Remember to store the dataframe first):

def allSame(mylist):
    return mylist == mylist[::-1]
for i in df.index.unique():
    df.loc[i, 'Redundant'] = allSame(list(df.loc[i,'Host']))
ADD COMMENT
0
Entering edit mode

Exactly ! thanks !!!

ADD REPLY

Login before adding your answer.

Traffic: 1994 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6