How To Calculate Date Difference Between Rows In Pandas

Image for post
Image for post
Photo by Aron Visuals on Unsplash

Problem:

For instance, You have some sample data for GPS tracking, and it has the start and end time at each location (latitude and longitude). You would like to calculate the time gap within each location or between two locations.

import pandas as pd 
import numpy as np
df = pd.read_excel("GPS Data.xlsx")
df.head(10)

For a quick view, you can see the sample data output as per below:

Image for post
Image for post

Solutions:

Data frame diff function is the most straightforward way to compare the values between the current row and the previous rows. By default, it compare the current and previous row, and you can also specify the period argument in order to compare the current row and current — period row. To calculate the time gap of the start time between two consecutive rows:

You shall see the below output:

Image for post
Image for post

If you check the date type of the result, you may see it is showing as dtype(‘<m8[ns]’), you can convert the result into integer minutes for easier reading.

In this case, you can use the below timedelta from numpy to convert the date difference into minutes:

df["Start Time"].diff().apply(lambda x: x/np.timedelta64(1, 'm')).fillna(0).astype('int64')

You shall see the below output:

Image for post
Image for post

You can also select multiple date columns as a data frame to apply the diff function.

Option 2: Using Series or Data Frame shift with — operator

Shift function allows us to move the values up/down or left /right to the given periods depends on what axis you have specified. You can imagine it is the same as Excel shift cells function.

To calculate the difference between the current and next row, you will need to shift the subtrahend column up 1 cell, below is how to calculate the difference between current End Time and the Start Time from the following row:

df["End Time"] - df["Start Time"].shift(1)

Yous shall see the below result:

Image for post
Image for post

If you want to calculate the difference for multiple date columns, you can use the data frame shift operation as per below:

df[["End Time", "Start Time"]] - df[["Start Time", "End Time"]].shift(1)
Image for post
Image for post

Option 3: Using data frame sub

The data frame sub function is self-explanatory by it’s name. You can either apply the subtraction at row level or column level by specifying the aixs argument. For our case, to calculate the date difference between two rows, you can use the original data frame to subtract another data frame which starts from the second row of the original data frame. Below is the code:

(df.loc[:,["Start Time", "End Time"]].sub(df.loc[0,["Start Time", "End Time"]], axis='columns')/np.timedelta64(1, "m")).astype("int64")

You can see the below output:

Image for post
Image for post

If you would like to calculate the gap between current End Time and next Start Time, you can use the below:

df["End Time"].sub(df["Start Time"].shift(1))

It should produce the same result as previously when we use — with shift.

Conclusion:

Originally published at https://www.codeforests.com

Resources and tutorials for python, data science and automation solutions

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store