1 minute read

In this post, we will learn how to create list of values in a pandas groupby.

We will first create a dataframe of 4 columns , first column is continent, second is country and third & fourth column represents their GDP value in trillion and Member of G20 group respectively. These are fake numbers and doesn’t represent their real GDP worth.

Once this dataframe is created then we will group the dataframe by continent and list all countries in each group

Create a dataframe

Let’s create a dataframe with all the four columns: continent, country, GDP(trillion) and Member_G20

For the third column GDP(trillion), I’m using numpy randint function to create random numbers for all these countries

import pandas as pd
import numpy as np

df = pd.DataFrame({'continent' : ['Asia','NorthAmerica','NorthAmerica','Europe','Europe', 'Europe','Asia', 'Europe', 'Asia'],
               'country' : ['China', 'USA', 'Canada', 'Poland', 'Romania', 'Italy', 'India', 'Germany', 'Russia'],
               'GDP(trillion)' : np.random.randint(1, 9 , 9)})

Pandas groupby and list of values in a column

So we will first group by continent and then list all the countries under a continent

we will use apply and list function that will list group-wise and combine the results together.

While apply is a very flexible method, its downside is that using it can be quite a bit slower than using more specific methods

df.groupby(['continent'])['country'].apply(list)

Output:

continent
Asia                         [China, China, China]
Europe          [Poland, Romania, Poland, Romania]
NorthAmerica                            [USA, USA]
Name: country, dtype: object

Let’s explore other ways to list the values

we will use groupby agg function to aggregate this list of countries operation over the specified axis

df.groupby('continent')['country'].agg(list)

Next, if you want to return a datafame with the list of values as a column then just reset the index as shown below

df.groupby('continent')['country'].apply(list).reset_index()

Now if you want to rename this column of list of values then pass name parameter in the reset_index function as your new column name

df.groupby('continent')['country'].apply(list).reset_index(name='country_list')

Output

Pandas groupby and list of unique values

The list of values may contain duplicates and in order to get unique values we will use set method for this

df.groupby('continent')['country'].agg(lambda x: list(set(x))).reset_index()

Alternatively, we can also pass the set or unique func in aggregate function to get the unique list of values

df.groupby('continent')['country'].agg(set).reset_index()

or

df.groupby('continent')['country'].agg('unique').reset_index()

finally, if we want to see both the list of values and it’s unique value in the dataframe then we can use groupby named aggregation

df.groupby('continent').agg(country_list=('country',list),
                          country_list_unique=('country','unique')).reset_index()

Tags: , ,

Categories: , ,

Updated: