pandas, python,

Sort Pandas Dataframe and Series

Posted on Jan 28, 2020 · 10 mins read
Share this

Sorting a dataframe by row and column values or by index is easy a task if you know how to do it using the pandas and numpy built-in functions

However sometimes you may find it confusing on how to sort values by two columns, a list of values or reset the index after sorting.

In this post we will learn sorting a dataframe and Series using the following functions

a) sort_values b) sort_index c) Categorical Series d) numpy sort and argsort e) Reindex f) And Sorted() function

Let’s create a dataframe of 11 counties with their CO2 emission and population and a column for the continent they belong to

import numpy as np
import pandas as pd
df = pd.DataFrame({
    'Country':['USA','China','India','Russia','Switzerland','Japan','Sweden','Singapore','South Korea','UK','Australia'],
    'Co2_emission':[5107.393,10877.218,2454.774,1764.866,39.738,1320.776,np.NaN,55.018,673.324,379.150,764.866],
    'Population_million':[329,1433,1366,145,8.5,126,10,5.8,51,67,np.NaN],
    'Continent': ['NA','Asia','Asia','EU','EU','Asia','EU','Asia','Asia','EU','AU']
                 })
  Country Co2_emission Population_million Continent
0 USA 5107.393 329.0 NA
1 China 10877.218 1433.0 Asia
2 India 2454.774 1366.0 Asia
3 Russia 1764.866 145.0 EU
4 Switzerland 39.738 8.5 EU
5 Japan 1320.776 126.0 Asia
6 Sweden NaN 10.0 EU
7 Singapore 55.018 5.8 Asia
8 South Korea 673.324 51.0 Asia
9 UK 379.150 67.0 EU
10 Australia 764.866 NaN AU

Sort dataframe by column Values

Here we are sorting the dataframe by column values i.e. population_in_million

sort_values functions sorts the dataframe by the column values provided as the first argument and setting ascending as True/False will display the results in that order

df.sort_values('Population_million',ascending = False)
  Country Co2_emission Population_million Continent
1 China 10877.218 1433.0 Asia
2 India 2454.774 1366.0 Asia
0 USA 5107.393 329.0 NA
3 Russia 1764.866 145.0 EU
5 Japan 1320.776 126.0 Asia
9 UK 379.150 67.0 EU
8 South Korea 673.324 51.0 Asia
6 Sweden NaN 10.0 EU
4 Switzerland 39.738 8.5 EU
7 Singapore 55.018 5.8 Asia
10 Australia 764.866 NaN AU

Sort Dataframe with NaN on Top or Bottom

The column contains the NaN value and you can choose to display the NaN values either on Top or Bottom row of the sorted dataframe

na_position parameters you can set as first or last to push those values on the top or the bottom of dataframe

df.sort_values('Population_million',ascending = False,na_position = 'first')
  Country Co2_emission Population_million Continent
10 Australia 764.866 NaN AU
1 China 10877.218 1433.0 Asia
2 India 2454.774 1366.0 Asia
0 USA 5107.393 329.0 NA
3 Russia 1764.866 145.0 EU
5 Japan 1320.776 126.0 Asia
9 UK 379.150 67.0 EU
8 South Korea 673.324 51.0 Asia
6 Sweden NaN 10.0 EU
4 Switzerland 39.738 8.5 EU
7 Singapore 55.018 5.8 Asia

Sort dataframe rows by multiple columns

You can also provide the list of columns to be sorted and their order also a list of boolean

Here we are sorting the dataframe first by column CO2_emission and then by Population_million and we want the order of the CO2_emission in ascending and population_million in Descending

df.sort_values(by = ['Co2_emission','Population_million'],ascending = [True,False])
  Country Co2_emission Population_million Continent
4 Switzerland 39.738 8.5 EU
7 Singapore 55.018 5.8 Asia
9 UK 379.150 67.0 EU
8 South Korea 673.324 51.0 Asia
10 Australia 764.866 NaN AU
5 Japan 1320.776 126.0 Asia
3 Russia 1764.866 145.0 EU
2 India 2454.774 1366.0 Asia
0 USA 5107.393 329.0 NA
1 China 10877.218 1433.0 Asia
6 Sweden NaN 10.0 EU

Sort text column by Alphabetical order

if there is a text column and you want to sort it just alphabetically then applying sort_values on that column will do so

df.sort_values(by='Country')
  Country Co2_emission Population_million Continent
10 Australia 764.866 NaN AU
1 China 10877.218 1433.0 Asia
2 India 2454.774 1366.0 Asia
5 Japan 1320.776 126.0 Asia
3 Russia 1764.866 145.0 EU
7 Singapore 55.018 5.8 Asia
8 South Korea 673.324 51.0 Asia
6 Sweden NaN 10.0 EU
4 Switzerland 39.738 8.5 EU
9 UK 379.150 67.0 EU
0 USA 5107.393 329.0 NA

Sort by Custom list or Dictionary using Categorical Series

Here we wanted to sort the dataframe by the continent column but in a particular custom order and not alphabetically

That custom order can be a list or the keys of a dictionary

[ “NA”, “Asia”, “AU”, “EU”])

or

{“NA”: 0, “Asia”: 1, “AU”: 2, “EU”: 3}

Pandas has an excellent feature called Categorical Series using which we can sort the column values in the pre-defined order

First make the Continent column a categorical and specify the ordering to use by passing a list value as the categories argument

df [ 'Continent'] = pd.Categorical(df['Continent'], categories=["NA", "Asia", "AU", "EU"])

Finally, sort the values by Continent column and it will order the rows as per the custom list passed in the categories column

df.sort_values(by='Continent')
Country Co2_emission Population_million Continent
USA 5107.393 329.0 NA
China 10877.218 1433.0 Asia
India 2454.774 1366.0 Asia
Japan 1320.776 126.0 Asia
Singapore 55.018 5.8 Asia
South Korea 673.324 51.0 Asia
Australia 764.866 NaN AU
Russia 1764.866 145.0 EU
Switzerland 39.738 8.5 EU
Sweden NaN 10.0 EU
UK 379.150 67.0 EU

Sort dataframe based on a custom list

It’s not necessary that everytime you use the sort_values function for sorting

Here is a unique case if you have list of countries and want to arrange the dataframe rows based on the elements inside this list

First set the index of the dataframe to Country column and then pass the list as argument to the reindex function and that will Conform DataFrame to new index as per the list

reorderlist = ['China','Russia','Australia','India','Japan','Singapore','South Korea','UK','USA','Sweden','Switzerland']
df.set_index('Country',inplace=True)
df.reindex(reorderlist)
Country Co2_emission Population_million Continent
China 10877.218 1433.0 Asia
Russia 1764.866 145.0 EU
Australia 764.866 NaN AU
India 2454.774 1366.0 Asia
Japan 1320.776 126.0 Asia
Singapore 55.018 5.8 Asia
South Korea 673.324 51.0 Asia
UK 379.150 67.0 EU
USA 5107.393 329.0 NA
Sweden NaN 10.0 EU
Switzerland 39.738 8.5 EU

reset index after sorting dataframe

When you sort the dataframe the index are moved up and down and to re-arrange the index you can use reset_index function and set drop as True to drop the index column which will be inserted as a new column

df.sort_values('Population_million',ascending = False).reset_index(drop=True)
  Country Co2_emission Population_million Continent
0 China 10877.218 1433.0 Asia
1 India 2454.774 1366.0 Asia
2 USA 5107.393 329.0 NA
3 Russia 1764.866 145.0 EU
4 Japan 1320.776 126.0 Asia
5 UK 379.150 67.0 EU
6 South Korea 673.324 51.0 Asia
7 Sweden NaN 10.0 EU
8 Switzerland 39.738 8.5 EU
9 Singapore 55.018 5.8 Asia
10 Australia 764.866 NaN AU

Sort dataframe by date column

sort_values functions let you sort the dataframe by dates also

Let’s create a column with date range

df['Date']=pd.date_range(start='1/12/2019', end='1/22/2019', freq='D')

Sort the column by Date in descending order

df.sort_values('Date',ascending=False,inplace=True)
  Country Co2_emission Population_million Continent Date
10 Australia 764.866 NaN AU 2019-01-22
9 UK 379.150 67.0 EU 2019-01-21
8 South Korea 673.324 51.0 Asia 2019-01-20
7 Singapore 55.018 5.8 Asia 2019-01-19
6 Sweden NaN 10.0 EU 2019-01-18
5 Japan 1320.776 126.0 Asia 2019-01-17
4 Switzerland 39.738 8.5 EU 2019-01-16
3 Russia 1764.866 145.0 EU 2019-01-15
2 India 2454.774 1366.0 Asia 2019-01-14
1 China 10877.218 1433.0 Asia 2019-01-13
0 USA 5107.393 329.0 NA 2019-01-12

Sort dataframe by datetime index using sort_index

You can also sort the df by datetime index using the sort_index function

First set the date column created above as an index and then sort the index

df.set_index('Date').sort_index()

|Date|Country|Co2_emission|Population_million|Continent| |— |— |— |— |— | |2019-01-12|USA|5107.393|329.0|NA| |2019-01-13|China|10877.218|1433.0|Asia| |2019-01-14|India|2454.774|1366.0|Asia| |2019-01-15|Russia|1764.866|145.0|EU| |2019-01-16|Switzerland|39.738|8.5|EU| |2019-01-17|Japan|1320.776|126.0|Asia| |2019-01-18|Sweden|NaN|10.0|EU| |2019-01-19|Singapore|55.018|5.8|Asia| |2019-01-20|South Korea|673.324|51.0|Asia| |2019-01-21|UK|379.150|67.0|EU| |2019-01-22|Australia|764.866|NaN|AU|

Sort dataframe by row values

Here we will see how to sort the dataframe by a specific row values

Let’s create a dataframe of numbers first

df = pd.DataFrame(data={'x':[28,10,90], 'y':[45,58,67],'z':[19,82,37]}, index=['a', 'b', 'c'])
  x y z
a 28 45 19
b 10 58 82
c 90 67 37

Sort row values using sort_values

We are sorting the above dataframe based on the values in row a and by default it is ascending order

df.sort_values(by='a', axis=1)
  z x y
a 19 28 45
b 82 10 58
c 37 90 67

Note: In all the below cases the output will be same as the above table

Sort row values using numpy

We can also use numpy sort and argsort to achieve the same result as above

df.iloc[:, np.argsort(df.loc['a'])]

or

 df.apply(np.sort, axis = 1)

Alternatively, if you want to reverse the order then

df[['x', 'y', 'z']] = np.sort(df)[:, ::-1]

Sort row values using reindex

We can also use reindex and sorted function with a lambda as a key to sort the dataframe based on values in row a

df.reindex(sorted(df.columns, key=lambda x: df[x]['a']), axis=1)

Conclusion:

So we have reached to the end of this blog post and just wanted to highlight few key points that we have learn in this blog post

  1. Sorting of the dataframe by single, multiple column values and arranging the sorted columns in ascending and descending order
  2. How to use sort_values functions and the arguments like na_position, ascending etc.
  3. Sorting columns based on a custom list or dictionary and using Pandas Categorical Series and reindex
  4. How to Sort with date column and datetime index using sort_index
  5. Sorting based on a specific row values by different methods like numpy argsort, sort_values and reindex

if there is any additional function which you know can be used to sort the rows and column values in a dataframe then please leave your comments in the comment section below