Recently, I was trying to group consecutive rows in SQL using one and then multiple column values. I looked up the internet on how to do and couldn't find anything so finally when I was able to figure it out, I thought I should share this so that it can be of some help others who might be facing the problem.
The Problem
Let's start with defining and understanding the problem first. Let's say we are running a mobility company (take for example, Rapido). In order to solve a particular use case, we are collecting information on the time spent by each bike in different areas. Let's say this is represented as a table named lapse_time
in our database with bike_id
, area_id
and the corresponding time_spent
(in minutes).
CREATE table lapse_time(
bike_id varchar,
area_id VARCHAR,
time_spent INTEGER
)
Here is the information we have collected for example:

*Note: This data is sorted by bike_id
and then by the order of entry of the row in the database. For example, bike_1
first spent 120 minutes in area_1
. Then, it went to area_2
and spent 10 minutes over there and so on.
Now if you take a close look at row numbers 2, 3, and 4 in figure 1, you will observe that all these consecutive rows have the same area_id
and bike_id
, and as the data is sorted by the order of entry of the row in the database, it means that bike_1
spent 10 minutes in area_2
, then 20 minutes again in area_2
, and then again 40 minutes in area_2
. This looks very cumbersome and to solve the particular use case I was looking at, I needed to group together these 3 consecutive columns so that it conveys that bike_1
spent 70 (40 + 20 +10) minutes in area_2
. Let's take another example to make it abundantly clear. Both the consecutive rows, 6 and 7, have the same area_id
and bike_id
, and thus can be grouped to the time_spent
by the bike_2
inarea_2
as 37 (12 + 25) minutes.
Many of you might be wondering why at all have we got separate rows for the same bike_id
in the same area. That can be because of multiple reasons, primarily owing to the way how the data collection has been done. For example, the hardware might be collecting information at specific intervals (e.g. every 10 minutes or so) and calculating the time spent in that interval. It might be the case that a bike is at the same location in multiple intervals, in which case we get separate rows. So, this scenario is expected in real-time data.
The Solution
So, now let's try to solve this problem. The end goal here is to convert what it looks like in the figure 2 to that in the figure 3.

To this →

I personally encountered this problem while I was trying to calculate median idle time_spent
in an area. If you take the data present in figure 2 and try to calculate the median idle time spent in let's say area_2
, you won't get the desired result as the data hasn't been arranged accordingly. In figure 3, we have arranged the data by merging consecutive rows having the same bike_id
and area_id
; thus making it perfect for all sorts of calculations. Merging consecutive rows according to values of multiple columns was something that wasn't easily available, which then inspired me to write a blog regarding it.
So, how to do this? Let's go step by step.
Let's first make a dummy table in postgres named as lapse_time
CREATE table lapse_time(
bike_id varchar,
area_id VARCHAR,
time_spent INTEGER
)
Now, let's insert some dummy data into it.
INSERT into lapse_time VALUES
('bike_1', 'area_1', 120),
('bike_1', 'area_2', 20),
('bike_1', 'area_2', 40),
('bike_1', 'area_2', 10),
('bike_1', 'area_3', 45),
('bike_2', 'area_2', 12),
('bike_2', 'area_2', 25),
('bike_2', 'area_3', 50)
We have a table exactly like the one in figure 2.
- Now, we need to add a
row_number()
to our table. This can be done simply using the following query. Let's call itquery_1
select *, ROW_NUMBER() OVER (ORDER BY bike_id) as rw_nm from lapse_time;
Here's the result:

We need to calculate LAG of the columns based on which we want to perform the sum (in this case, bike_id
and area_id
). So, let's call the following operation query_2
.
SELECT t.*,
LAG(bike_id) OVER (
ORDER BY rw_nm
) as prev_1,
LAG(area_id) OVER (
ORDER BY rw_nm
) AS prev_2
from ({query_1}) t;
Here's the result:

Now comes the tricky part. We'll need to use sum over window function
. In order to know more about the same, one should read more about window functions in postgres.
Before going to the actual query, here's one query that one should try to get clarity:
SELECT t.*,
SUM(1) OVER (
ORDER BY rw_nm
) as grp
from ({ query_2 }) t
Here's the result for this query

For the actual query, we would use this with a slight modification in the sum
function. We would be using CASE condition with sum, so here's the actual query. Let's call this query_3
.
SELECT t.*,
SUM(
CASE
WHEN bike_id = prev_1
AND area_id = prev_2 THEN 0
ELSE 1
END
) OVER (
ORDER BY rw_nm
) as grp
from ({ query_2 }) t
We would do the sum only when the column value is not equal to its lag value. Here's the result that we would get.

If we look at the grp
column over here, we would notice that the consecutive rows with the same values of bike_id
and area_id
have the same value for grp
. The final step is to use group by clause over the grp
column, here's the query for the same:
select bike_id,
area_id,
SUM(time_spent) as time_spent,
min(rw_nm) as rw_nm
from ({query_3}) t
GROUP by grp,
bike_id,
area_id
ORDER by rw_nm
And here's the result:

Now one can further go above this and select just the bike_id
, area_id
and time_spent
from this table to get the exact result as in the Figure 3.
The final query
Aggregating all the steps mentioned above, here's the final query for merging consecutive rows in SQL using one or multiple columns
select bike_id,
area_id,
SUM(time_spent) as time_spent,
min(rw_nm) as rw_nm
from (
SELECT t.*,
SUM(
CASE
WHEN bike_id = prev_1
AND area_id = prev_2 THEN 0
ELSE 1
END
) OVER (
ORDER BY rw_nm
) as grp
from (
SELECT t.*,
LAG(bike_id) OVER (
ORDER BY rw_nm
) as prev_1,
LAG(area_id) OVER (
ORDER BY rw_nm
) AS prev_2
from (
select *,
ROW_NUMBER() OVER (
ORDER BY bike_id
) as rw_nm
from lapse_time
) t
) t
) t
GROUP by grp,
bike_id,
area_id
ORDER by rw_nm
Further Reading
- This problem can further be generalized as recognition of patterns in SQL and rapid detection of unbroken sequences, better known as gaps and island problem.
- I personally faced this problem while working to measure fleet utilization and idle time. I have written another detailed blog post on the same. You might find interesting to deep dive into that problem here:

Read more on the problem we are solving at Locale.ai:
