Question

How to fillna() in Pandas pivoted tables

0

Entering edit mode

7.2 years ago

Eric Lim ★ 2.2k

I don't spend much time using pandas. I spent a couple hours yesterday searching on Google but couldn't get it done. While I came up with a hacky workaround, I thought I'd ask the experts here for the pandas' way.

The following is a contrived example but should show what I want to accomplish.

import pandas as pd

data = [{'gene': 'gene1', 'sj_coord': 'chr1:1-2', 'as_event': 'exon_skipping', 'rep': 'A', 'age': 'E14', 'psj': 1, 'gene_expr': 10},
        {'gene': 'gene1', 'sj_coord': 'chr1:1-2', 'as_event': 'exon_skipping', 'rep': 'B', 'age': 'E14', 'psj': 2, 'gene_expr': 10},
        {'gene': 'gene2', 'sj_coord': 'chr2:10-20', 'as_event': 'exon_inclusion', 'rep': 'A', 'age': 'E16', 'psj': 3, 'gene_expr': 30},
        {'gene': 'gene2', 'sj_coord': 'chr2:10-20', 'as_event': 'exon_inclusion', 'rep': 'B', 'age': 'E16', 'psj': 4, 'gene_expr': 30}]

df = pd.DataFrame(data) \
       .pivot_table(index=['gene', 'sj_coord', 'as_event', 'rep'], columns=['age'], values=['psj', 'gene_expr'])

After pivoting:

                                    gene_expr        psj     
age                                       E14   E16  E14  E16
gene  sj_coord   as_event       rep                          
gene1 chr1:1-2   exon_skipping  A        10.0   NaN  1.0  NaN
                                B        10.0   NaN  2.0  NaN
gene2 chr2:10-20 exon_inclusion A         NaN  30.0  NaN  3.0
                                B         NaN  30.0  NaN  4.0

I'd like to fill NaN under gene_expr with the actual data for each gene across different rep and age. Those information exists, just not in the input data I pivoted from. I wonder how I can do it within the pandas' ecosystem.

Thanks!

pandas • 1.6k views

ADD COMMENT • link 7.2 years ago by Eric Lim ★ 2.2k

0

Entering edit mode

You want to fill, for example the first gene_expr, NaN (age = E16, rep =A, gene=gene1) with information you don't have in data ?

Those information exists, just not in the input data I pivoted from

Where do you have these infos ?

ADD REPLY • link 7.2 years ago by Bastien Hervé 6.4k

0

Entering edit mode

Perhaps it's better to use the contrived example below:

data = [{'gene': 'gene1', 'sj_coord': 'chr1:1-2', 'as_event': 'exon_skipping', 'rep': 'A', 'age': 'E14', 'psj': 1, 'gene_expr': 10},
        {'gene': 'gene1', 'sj_coord': 'chr1:10-20', 'as_event': 'exon_inclusion', 'rep': 'A', 'age': 'E16', 'psj': 2, 'gene_expr': 10}]


df = pd.DataFrame(data) \
       .pivot_table(index=['gene', 'sj_coord', 'as_event', 'rep'], columns=['age'], values=['psj', 'gene_expr'])

                                    gene_expr        psj     
age                                       E14   E16  E14  E16
gene  sj_coord   as_event       rep                          
gene1 chr1:1-2   exon_skipping  A        10.0   NaN  1.0  NaN
      chr1:10-20 exon_inclusion A         NaN  10.0  NaN  2.0

In this case, we identified two distinct splicing events (sj_coord) in the same gene: one found exclusively in E14 while the other in E16. We have gene_expr for gene1 in both ages, as shown in the pivoted table, but if an event is not identified in an age group, it's not reported in the input data. I would like to fill the values, either from within the pivoted table, or by supplying additional data structure to it.

Does this make more sense?

ADD REPLY • link 7.2 years ago by Eric Lim ★ 2.2k

0

Entering edit mode

If you know what will be missing before pivoting the dataframe why not fullfill the dataframe with your new data then pivot ? I'm bothered by the fact that you want to fill the dataframe after the pivot.

I would like to fill the values, either from within the pivoted table, or by supplying additional data structure to it.

What do you have in mind ? You already have supplementary data in an other dataframe ?

ADD REPLY • link 7.2 years ago by Bastien Hervé 6.4k

0

Entering edit mode

If you know what will be missing before pivoting the dataframe why not fullfill the dataframe with your new data then pivot ?

That's what I ended up doing, which I think it was rather unnecessary. I thought that had to be a simple way to fill values post pivoting.

Thank you for your help.

ADD REPLY • link 7.2 years ago by Eric Lim ★ 2.2k