Pandas define your own groupby aggregation functions

1 minute read

The .agg method does aggregation as it sounds and you can pass in the names of aggregation methods, Python aggregations, Numpy reduce functions and you can also define your own function.

The beauty of .agg is you can do multiple aggregation at the same time.

So let’s group this data by Student and compute the max, min, grade(based on condition) and difference between max and min scores for each student

	Student	Score
0	1	50
1	5	42
2	2	24
3	3	61
4	4	75
5	1	98
6	2	37
7	3	42
8	4	90
9	5	43

Create a custom function grade that takes a series of score as parameter and computes the mean and returns grade A if mean score of student is greater than 50 otherwise returns grade B

def grade(x):
    mean_score = x.mean()
    return 'A' if mean_score >= 50 else 'B'

Now pass multiple aggregation function name along with our custom function grade and a lambda function to compute the difference between max and min score

df.groupby('col').value.agg(max_score = 'max',
                      min_score = 'min',
                      grade = grade,
                      diff_max_min = lambda x: x.max()-x.min())

Output:

	student	max_score	min_score	grade	diff_max_min
0	1	98	50	A	48
1	2	37	24	B	13
2	3	61	42	A	19
3	4	90	75	A	15
4	5	43	42	B	1

Share on

Twitter Facebook LinkedIn

Pandas define your own groupby aggregation functions

Share on

You may also enjoy

pandas count duplicate rows

Pandas value error while merging two dataframes with different data types

How to get True Positive, False Positive, True Negative and False Negative from confusion matrix in scikit learn

Pandas how to use list of values to select rows from a dataframe