Entering edit mode
6.6 years ago
Eric Lim
★
2.2k
I don't spend much time using pandas. I spent a couple hours yesterday searching on Google but couldn't get it done. While I came up with a hacky workaround, I thought I'd ask the experts here for the pandas' way.
The following is a contrived example but should show what I want to accomplish.
import pandas as pd
data = [{'gene': 'gene1', 'sj_coord': 'chr1:1-2', 'as_event': 'exon_skipping', 'rep': 'A', 'age': 'E14', 'psj': 1, 'gene_expr': 10},
{'gene': 'gene1', 'sj_coord': 'chr1:1-2', 'as_event': 'exon_skipping', 'rep': 'B', 'age': 'E14', 'psj': 2, 'gene_expr': 10},
{'gene': 'gene2', 'sj_coord': 'chr2:10-20', 'as_event': 'exon_inclusion', 'rep': 'A', 'age': 'E16', 'psj': 3, 'gene_expr': 30},
{'gene': 'gene2', 'sj_coord': 'chr2:10-20', 'as_event': 'exon_inclusion', 'rep': 'B', 'age': 'E16', 'psj': 4, 'gene_expr': 30}]
df = pd.DataFrame(data) \
.pivot_table(index=['gene', 'sj_coord', 'as_event', 'rep'], columns=['age'], values=['psj', 'gene_expr'])
After pivoting:
gene_expr psj
age E14 E16 E14 E16
gene sj_coord as_event rep
gene1 chr1:1-2 exon_skipping A 10.0 NaN 1.0 NaN
B 10.0 NaN 2.0 NaN
gene2 chr2:10-20 exon_inclusion A NaN 30.0 NaN 3.0
B NaN 30.0 NaN 4.0
I'd like to fill NaN under gene_expr
with the actual data for each gene across different rep
and age
. Those information exists, just not in the input data I pivoted from. I wonder how I can do it within the pandas' ecosystem.
Thanks!
You want to fill, for example the first gene_expr, NaN (age = E16, rep =A, gene=gene1) with information you don't have in
data
?Where do you have these infos ?
Perhaps it's better to use the contrived example below:
In this case, we identified two distinct splicing events (
sj_coord
) in the samegene
: one found exclusively inE14
while the other inE16
. We havegene_expr
forgene1
in both ages, as shown in the pivoted table, but if an event is not identified in an age group, it's not reported in the input data. I would like to fill the values, either from within the pivoted table, or by supplying additional data structure to it.Does this make more sense?
If you know what will be missing before pivoting the dataframe why not fullfill the dataframe with your new data then pivot ? I'm bothered by the fact that you want to fill the dataframe after the pivot.
What do you have in mind ? You already have supplementary data in an other dataframe ?
That's what I ended up doing, which I think it was rather unnecessary. I thought that had to be a simple way to fill values post pivoting.
Thank you for your help.