Python – Calculate weighted average using a pandas/dataframe

numpypandaspython

I have the following table. I want to calculate a weighted average grouped by each date based on the formula below. I can do this using some standard conventional code, but assuming that this data is in a pandas dataframe, is there any easier way to achieve this rather than through iteration?

Date        ID      wt      value   w_avg
01/01/2012  100     0.50    60      0.791666667
01/01/2012  101     0.75    80
01/01/2012  102     1.00    100
01/02/2012  201     0.50    100     0.722222222
01/02/2012  202     1.00    80

01/01/2012 w_avg = 0.5 * ( 60/ sum(60,80,100)) + .75 * (80/
sum(60,80,100)) + 1.0 * (100/sum(60,80,100))

01/02/2012 w_avg = 0.5 * ( 100/ sum(100,80)) + 1.0 * ( 80/
sum(100,80))

Best Answer

Let's first create the example pandas dataframe:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: index = pd.Index(['01/01/2012','01/01/2012','01/01/2012','01/02/2012','01/02/2012'], name='Date')

In [4]: df = pd.DataFrame({'ID':[100,101,102,201,202],'wt':[.5,.75,1,.5,1],'value':[60,80,100,100,80]},index=index)

Then, the average of 'wt' weighted by 'value' and grouped by the index is obtained as:

In [5]: df.groupby(df.index).apply(lambda x: np.average(x.wt, weights=x.value))
Out[5]: 
Date
01/01/2012    0.791667
01/02/2012    0.722222
dtype: float64

Alternatively, one can also define a function:

In [5]: def grouped_weighted_avg(values, weights, by):
   ...:     return (values * weights).groupby(by).sum() / weights.groupby(by).sum()

In [6]: grouped_weighted_avg(values=df.wt, weights=df.value, by=df.index)
Out[6]: 
Date
01/01/2012    0.791667
01/02/2012    0.722222
dtype: float64