pandas, python,

Convert Pandas dataframe to dictionary

Posted on Mar 24, 2020 · 8 mins read
Share this

In my this blog we will discover what are the different ways to convert a Dataframe into a Python Dictionary or Key/Value Pair

There are multiple ways you wanted to see the dataframe into a dictionary

We will explore and cover all the possible ways a data can be exported into a Python dictionary

Let’s create a dataframe first with three columns Name, Age and City and just to keep things simpler we will have 4 rows in this Dataframe

import pandas as pd
df = pd.DataFrame({'Name': ['John', 'Sara','Peter','Cecilia'],
                   'Age': [38, 47,63,28],
                  'City':['Boston', 'Charlotte','London','Memphis']})
df
  Name Age City
0 John 38 Boston
1 Sara 47 Charlotte
2 Peter 63 London
3 Cecilia 28 Memphis

Pandas to_dict() function

A simple function to convert the dataframe to dictionary. So let’s convert the above dataframe to dictionary without passing any parameters

df.to_dict()
{'Name': {0: 'John', 1: 'Sara', 2: 'Peter', 3: 'Cecilia'},
 'Age': {0: 38, 1: 47, 2: 63, 3: 28},
 'City': {0: 'Boston', 1: 'Charlotte', 2: 'London', 3: 'Memphis'}}

It returns the Column header as Key and each row as value and their key as index of the datframe

If you see the Name key it has a dictionary of values where each value has row index as Key i.e. 0 as John, 1 as Sara and so on

Pandas Dataframe to Dictionary by Rows

Let’s change the orient of this dictionary and set it to index

df.to_dict('index')
{0: {'Name': 'John', 'Age': 38, 'City': 'Boston'},
 1: {'Name': 'Sara', 'Age': 47, 'City': 'Charlotte'},
 2: {'Name': 'Peter', 'Age': 63, 'City': 'London'},
 3: {'Name': 'Cecilia', 'Age': 28, 'City': 'Memphis'}}

Now the Dictionary key is the index of the dataframe and values are each row

The first index of dataframe is 0 which is the first key of dictionary and has a dictionary of a row as value and each value inside this dictionary has column header as Key

Dataframe to Dictionary with values as list

Now change the orient to list and see what type of dictionary we get as an output

df.to_dict('list')

{'Name': ['John', 'Sara', 'John', 'Sara'],
 'Sem': ['Sem1', 'Sem1', 'Sem2', 'Sem2'],
 'Subject': ['Mathematics', 'Biology', 'Biology', 'Mathematics'],
 'Grade': ['A', 'B', 'A+', 'B++']}

It returns Column Headers as Key and all the row as list of values

Pandas Dataframe to List of Dictionary

Let’s change the orient to records and check the result

df.to_dict('records')
[{'Name': 'John', 'Age': 38, 'City': 'Boston'},  {'Name': 'Sara', 'Age': 47, 'City': 'Charlotte'},  {'Name': 'Peter', 'Age': 63, 'City': 'London'},  {'Name': 'Cecilia', 'Age': 28, 'City': 'Memphis'}]

it returns the list of dictionary and each dictionary contains the individual rows

Dataframe to Dictionary with one Column as Key

So we are setting the index of dataframe as Name first and then Transpose the Dataframe and convert it into a dictionary with values as list

df.set_index('Name').T.to_dict('list')
{'John': [38, 'Boston'],
 'Sara': [47, 'Charlotte'],
 'Peter': [63, 'London'],
 'Cecilia': [28, 'Memphis']}

It returns Name as Key and the other values Age and City as list

Converting Timestamp data to dictionary

Let’s see how to_dict function works with timestamp data

Let’s create a simple dataframe with date and time values in it

tsmp = Timestamp("20200101")
data = DataFrame({"A": [tsmp, tsmp], "B": [tsmp, tsmp]})
data
  A B
0 2020-01-01 2020-01-01
1 2020-01-01 2020-01-01
data.to_dict(orient="records")
[{'A': Timestamp('2020-01-01 00:00:00'),
  'B': Timestamp('2020-01-01 00:00:00')},
 {'A': Timestamp('2020-01-01 00:00:00'),
  'B': Timestamp('2020-01-01 00:00:00')}]

It returns list of dictionary and each row values is a dictionary having colum label as key and timestamp object as their values

Let’s take another example of dataframe with datetime object and timezone parameter info

data = [
            (datetime(2017, 11, 18, 21, 53, 0, 219225, tzinfo=pytz.utc),),
            (datetime(2017, 11, 18, 22, 6, 30, 61810, tzinfo=pytz.utc),),
        ]
df = DataFrame(list(data), columns=["d"])

result = df.to_dict(orient="records")
result
[{'d': Timestamp('2017-11-18 21:53:00.219225+0000', tz='UTC')},
 {'d': Timestamp('2017-11-18 22:06:30.061810+0000', tz='UTC')}]

It returns the list of dictionary with timezone info

Dataframe to OrderedDict and defaultdict

to_dict() Into parameter:

You can specify the type from the collections.abc.Mapping subclass used for all Mappings in the return value

For example: the into values can be dict, collections.defaultdict, collections.OrderedDict and collections.Counter.

Let’s take a look at these two examples here for OrderedDict and defaultdict

from collections import OrderedDict, defaultdict
test_data = {"A": {"1": 1, "2": 2}, "B": {"1": "1", "2": "2", "3": "3"}}
DataFrame(test_data).to_dict(into=OrderedDict)
OrderedDict([('A',
              OrderedDict([(0, Timestamp('2013-01-01 00:00:00')),
                           (1, Timestamp('2013-01-01 00:00:00'))])),
             ('B',
              OrderedDict([(0, Timestamp('2013-01-01 00:00:00')),
                           (1, Timestamp('2013-01-01 00:00:00'))]))])
DataFrame(test_data).to_dict(into=defaultdict(list))
defaultdict(list,
            {'A': defaultdict(list,
                         {0: Timestamp('2013-01-01 00:00:00'),
                          1: Timestamp('2013-01-01 00:00:00')}),
             'B': defaultdict(list,
                         {0: Timestamp('2013-01-01 00:00:00'),
                          1: Timestamp('2013-01-01 00:00:00')})})
DataFrame(test_data).to_dict(into=dict)

{‘A’: {0: Timestamp(‘2013-01-01 00:00:00’), 1: Timestamp(‘2013-01-01 00:00:00’)}, ‘B’: {0: Timestamp(‘2013-01-01 00:00:00’), 1: Timestamp(‘2013-01-01 00:00:00’)}}

Convert a Pandas Groupby to Dictionary

You can also group the values in a column and create the dictionary. Let’s understand this with the help of this simple example

df = pd.DataFrame({'Serial_No': [0,1,1,1,2,2,3,3,4,5,5,5],'Segment':['A','B','C','C','B',
                                                                     'A','B','A','D','A',
                                                                    'A','C'],'Area':[23,45,64,23,64,65,23,45,23,64,23,64]})
  Serial_No Segment Area
0 0 A 23
1 1 B 45
2 1 C 64
3 1 C 23
4 2 B 64
5 2 A 65
6 3 B 23
7 3 A 45
8 4 D 23
9 5 A 64
10 5 A 23
11 5 C 64

We will group the above dataframe by column Serial_No and all the values in Area column of that group will be displayed as list

df.groupby(['Serial_No'])['Area'].apply(list).to_dict()
{0: [23], 1: [45, 64, 23], 2: [64, 65], 3: [23, 45], 4: [23], 5: [64, 23, 64]}

Convert Dataframe to Nested Dictionary

This is a very interesting example where we will create a nested dictionary from a dataframe

Let’s create a dataframe with four columns Name, Semester, Subject and Grade

df = pd.DataFrame({'Name':['John','Sara','John','Sara'],'Sem':['Sem1','Sem1','Sem2','Sem2'],'Subject':['Mathematics','Biology','Biology','Mathematics'],'Grade':['A','B','A+','B++']})
 df
  Name Sem Subject Grade
0 John Sem1 Mathematics A
1 Sara Sem1 Biology B
2 John Sem2 Biology A+
3 Sara Sem2 Mathematics B++

Now we are interested to build a dictionary out of this dataframe where the key will be Name and the two Semesters (Sem 1 and Sem 2) will be nested dictionary keys and for each Semester we want to display the Grade for each Subject.

For example: John data should be shown as below

{
  "John": {
    "Sem1": {
      "Subject": "Mathematics",
      "Grade": "A"
    },
    "Sem2": {
      "Subject": "Biology",
      "Grade": "A+"
    }
        }
}

As you can see in the following code we are using a Dictionary comprehension along with groupby to achieve this.

We have set the index to Name and Sem which are the Keys of each dictionary and then grouping this data by Name

And iterating this groupy object inside the dictionary comprehension to get the desired dictionary format

ppdict = {n: grp.loc[n].to_dict('index')
 for n, grp in df.set_index(['Name', 'Sem']).groupby(level='Name')}
print (json.dumps(ppdict, indent=2))
{
  "John": {
    "Sem1": {
      "Subject": "Mathematics",
      "Grade": "A"
    },
    "Sem2": {
      "Subject": "Biology",
      "Grade": "A+"
    }
  },
  "Sara": {
    "Sem1": {
      "Subject": "Biology",
      "Grade": "B"
    },
    "Sem2": {
      "Subject": "Mathematics",
      "Grade": "B++"
    }
  }
}

Conclusion

So just to summarize our key learning in this post, here are some of the main points that we touched upon:

  • How to convert a dataframe into a dictionary using to_dict() function
  • Using the oriented parameter to customize the result of our dictionary
  • into parameter can be used to specify the return type as defaultdict, Ordereddict and Counter
  • How a data with timestamp and datetime values can be converted into a dictionary
  • Using groupby to group values in one column and converting the values of another column as list and finally converting it into a dictionary
  • Finally how to create a nested dictionary from your dataframe using groupby and dictionary comprehension