# Pandas Group dataframe by time interval - Hour, Month, Year, Number of days and start grouping at specific time

In this post we are going to see how to group a time-series dataframe by time interval such as Hour, Month, Year, Number of days and also see how to use parameters like offset to start the grouping bin at certain specific time

Here are the steps to be followed for grouping by Time intervals:

- We will learn about pandas grouper and resample API’s
- Create a time-series dataframe
- Group the dataframe using grouper and resample by Year, 2 Month, 15 days, and 10 Minutes
- Use offset parameter to start the grouping bin at a specific time

Grouper and resample are used when the time series data has to be grouped by time intervals. Grouper and resample functions are similar in nature, both achieve the same result. Though, one may be preferred over the other depending on the data at hand.

## Create Dataframe

We will see how to group time series with time intervals using Grouper() and resample() as they both could be used to group the time-series data.

Let’s create our time-series dataframe with three columns - date, sales, product - with datetime column as index.

```
# importing required libraries
import pandas as pd
# creating dataframe with date, sales and product as columns
df = pd.DataFrame({'date':
pd.to_datetime(['2020-01-11 06:29:57',
'2020-02-12 06:30:01',
'2020-03-14 08:37:22',
'2020-04-16 23:11:13',
'2021-05-18 23:21:43',
'2021-06-20 23:22:36',
'2022-07-26 08:08:15',
'2022-08-16 16:08:56',
'2022-09-04 19:05:30',
'2022-10-10 22:48:15']),
'sales': [955, 889, 364, 856, 754,
328, 999, 652, 742, 856],
'product': ['Chips', 'Chocolate',
'Popcorn', 'Nuts',
'Crackers', 'Cookies',
'Brownies', 'Marshmallows',
'Gummy bears', 'Jelly beans']})
# setting date as index
df = df.set_index('date')
# printing the dataframe
df
```

**Out:**

sales | product | |
---|---|---|

date | ||

2020-01-11 06:29:57 | 955 | Chips |

2020-02-12 06:30:01 | 889 | Chocolate |

2020-03-14 08:37:22 | 364 | Popcorn |

2020-04-16 23:11:13 | 856 | Nuts |

2021-05-18 23:21:43 | 754 | Crackers |

2021-06-20 23:22:36 | 328 | Cookies |

2022-07-26 08:08:15 | 999 | Brownies |

2022-08-16 16:08:56 | 652 | Marshmallows |

2022-09-04 19:05:30 | 742 | Gummy bears |

2022-10-10 22:48:15 | 856 | Jelly beans |

## Group by Year, Month, Hour and Minutes

We want to group the above dataframe by year, month, number of days, 5hours and 10 minutes. we will use freq and rule parameters from grouper and resample respectively.

*freq:* str / frequency object, defaults to None

*rule:* DateOffset, Timedelta or str

`Grouper(freq = S )`

and `resample(rule =)`

is an argument set to a time period. This is the most important argument to group time interval data. Check out all the available pandas frequencies here.

**a) Group by Year**

```
# Grouper
df.groupby(pd.Grouper(freq = 'Y')).mean()
#OR
# resample
df.resample(rule = 'Y').mean()
```

We will use the mean() calculation throughout this article. You can use other statistical computations depending on your need.

**Out:**

sales | |
---|---|

date | |

2020-12-31 | 766.00 |

2021-12-31 | 541.00 |

2022-12-31 | 812.25 |

**b) Group by 2 months**

```
df.groupby(pd.Grouper(freq = '2M')).sum()
```

**Out:**

sales | |
---|---|

date | |

2020-01-31 | 955 |

2020-03-31 | 626 |

2020-05-31 | 856 |

2020-07-31 | NaN |

2020-09-30 | NaN |

**b) Group by 15 days**

```
df.groupby(pd.Grouper(freq = '15D')).mean()
```

**Out:**

sales | |
---|---|

date | |

2020-01-11 | 955 |

2020-01-26 | NaN |

2020-02-10 | 889 |

2020-02-25 | NaN |

2020-03-11 | 364 |

**b) Group by 10 minutes**

```
df.groupby(pd.Grouper(freq = '10min')).mean()
```

**Out:**

sales | |
---|---|

date | |

2020-01-11 06:20:00 | 955 |

2020-01-11 06:30:00 | NaN |

2020-01-11 06:40:00 | NaN |

2020-01-11 06:50:00 | NaN |

2020-01-11 07:00:00 | NaN |

**Group by time interval and start the grouping at specific time**

We want to group the daraframe by 1 hours and 30 minutes and start the grouping from 05:45, There is an offset parameter that let you add an offset value to the origin, It takes either Timedelta or str and default is None

```
# Grouper
df.groupby(pd.Grouper(freq = '1H30min',offset = '-15min')).mean()
#OR
# resample
df.resample(rule = '2H30T', offset = '-15min').mean().head()
```

**Out:**

We have set the offset to negative 15 Minutes so the time-series starts at 5:45 and group by 1 hours and 30 minutes

sales | |
---|---|

date | |

2020-01-11 05:45:00 | 955.0 |

2020-01-11 07:15:00 | NaN |

2020-01-11 08:45:00 | NaN |

2020-01-11 10:15:00 | NaN |

2020-01-11 11:45:00 | NaN |