Generating actionable insights from the data collected by location analytics is of utmost importance for a business to grow and improve operational efficiency. Here, we focus on the technical problem of how one can analyse fleet data and measure various metrics from it in real time. By analysing fleet utilisation we can get answers to important questions like:
At what time of the day are most of the fleets idle? In which areas is the time to get a ride minimal? In which lap of the journey are they idle? and so on.

In the blog, we talk about what is fleet utilisation and why it is important to have.  We have also briefly described the solution to measure fleet utilisation and the various complexities associated with it.

What is a Fleet in the Context of Location Analytics?

A fleet or a supplier in the location analytics context is any person or entity who delivers goods or services based on customer requests. Here are some examples which can be considered:

  • For a hyperlocal delivery company such as Swiggy or Doordash, the delivery personnel is the fleet. He/She gets requests/orders from restaurants and delivers them.
  • For a cab service provider company such as Ola or Uber, the cabs are the fleets; they get trips from users, and they complete those trips.
  • For a bike-sharing company such as Bounceshare, the bikes itself are the fleets, they are picked up by a customer, and they drop them again after completing their trip.
  • For a logistics company like Fedx, DHL and so on the driver fleet is the fleet. They pick up the goods and drop them off at different locations.
Source: Here360

How is the Fleet Data Captured?

In the fleet data, there are 4 metrics that are required to be captured in order for it to make sense. The data captured can be more but has to have at least these 4 fields:

  • Fleet unique identifier e.g. bike_id
  • Current location of the fleet
  • Current status of the fleet (idle, busy, out of operation, etc.): This is a very important metric as the status of the fleet is an identifier of its current state. A typical example of status is [idle, busy, out_of_operation].
  • Timestamp of captured data

This data can be captured using an app that sends pings at a regular interval, a sensor attached to the bike, a cab that can again be programmed to send pings at a regular interval, or any similar system can be developed to capture this data.

Required Analysis

Now considering the minimal data which we are getting (the 4 metrics mentioned previously), we need to answer some questions like

  • Which is the area in which the time to get a ride is minimal, and in which area/areas it takes a lot of time to get a ride?
  • At what time of the day are most of the fleets idle?
  • On an average, how much time do all the fleets spend being idle and in which area?
  • What is the pattern of idle time during peak hours?
  • In which lap of the journey/delivery are they idle? Console

Required Metrics

In order to generate the analysis from the raw data, here are the metrics that would be required.

1. Time Spent in a Specific State

In order to answer the other questions like "How much time do all the fleets spend idle?" or "At what time of the day are most of the fleets idle?", we need to know the time spent by a fleet in a particular state. This would basically be start timestamp and end timestamp of that fleet state.

2. Area

In order to answer the questions like "In a give area, when is the time to get the ride is minimal" we need a way to classify the areas in a map. Although this seems to be a big problem statement in itself, this problem has already been solved. For classifying areas we can use the famous h3 library developed by Uber, to divide the map into smaller hexagonal  grids. These grids can be of various levels. For more details regarding h3 one can refer to its official documentation.

Hexagonal hierarchical geospatial indexing system. Contribute to uber/h3 development by creating an account on GitHub.

The hexagons which can be seen in the image are called as grids and each of them have a unique grid id. Moreover, these grids are represented for a particular resolution. Changing the resolution will change the grid IDs correspondingly.

Thus, if we are able to generate the metrics that a bike x from timestamp t1 to timestamp t2 was in a state s in area/grid g, the required analysis can be generated.

Basis of the Analysis Generated

Thus formalising the problem statement we have the original metric →

And the required metric which we need to generate, let's call it Analysis data.

This particular analysis data shows that the supplier x from timestamp t1 to timestamp t2 was in grid g having status s

To generate the analysis data from the supplier data we use an assumption based out of principles of continuity. The assumption states the following  →

If a supplier x has an entry of [t1, l, s] and the immediate next entry is [t2, l1, s1] then we assume that for the time  t1 to t2 ,  x was in location l having the status s.


Consider the following example, here is the supplier ping for a particular supplier (note: here we have already converted the lat, long sent to corresponding grid)

If we get a ping that an supplier with id  123  was idle at 10 AM in area_1 and the immediate next ping of the same supplier shows that at 10:12 AM it was at area_2 and was busy, we make the assumption that the supplier was idle from 10:00 - 10:12 AM at area_1.

Once we get this analysis data we can easily calculate median idle time in an area, total idle time spent etc... by using queries to aggregate in on one of the columns in the above table. Console

Unpacking the Solution

The solution to the problem is fairly complicated, especially if we are doing it in real time. The solution requires maintaining a separate table for analysis data and maintaining a pipeline from supplier data to the source analysis data from where the queries executed. Below we have described the pipeline in brief. The supplier data is:

We then add grid ids based on the lat long cols and create it into a range table.

But having just this data isn’t sufficient as there is redundancy in row number 4 and 5 (as highlighted), the same supplier is at the same grid and the same status consecutively, this would prevent to from doing operations such as median or average of time_spent over a status (here s1), the data should be grouped for a supplier_id at the same grid and same status consecutively, and it should look like the one given below.

Final Analysis table

Now this data contains all the information which you would require to answer the questions which were asked at the starting of the blog. For instance, take the example of a ride hailing company such as Uber, and statuses of cars/bike/fleets to be idle, busy, or oos(out of operation), we can calculate the average time to get a ride in an area by just calculating the average in the area grid over the status idle.

Thus, the table shown above made by grouping the range table is the final analysis table which is used for the queries regarding supplier analysis.

Why is it So Hard to do it in Realtime?

Doing the analysis in real time in our context essentially means updating the location analytics data at constant intervals. For every incoming new fleet data, we update the supplier table and then the final analysis/group table that is used for queries. Updating the supplier data isn't really the problem as we just have to dump the new pings in the original supplier table.  The real challenge lies in updating the final analysis table.

Doing the analysis in real time in our context essentially means updating the location analytics data at constant intervals.
  • In case of analysis table, the analysis to be generated from new supplier data will also be dependent on the previous supplier data as without taking in consideration the prior pings of a supplier we cannot possibly generate the range table accurately. We need to query the prior supplier data and then make the range table out of it after which grouping of data takes place.
  • The next nuance is that we cannot directly insert the grouped data into the Analysis table as there are edge cases in that we need to handle  This process is as complicated in the code as it sounds, thus making the real time location analytics updates hard.

The Pipeline

Thus, to sum it up, here is the pipeline which has to be developed to in order to measure fleet utilisation in real time.

Using Location Analytics Platform as a Solution

We, at Locale, have solved for location analytics problems and related complexities with the best industry practices, to improve operational efficiency.

You have the ability to customise the solution based on your industry. For example, a ride hailing company can use this to measure the idle time of the vehicles or the drive fleet. Similarly, a hyperlocal delivery service can track the idle time of the delivery personnel and an e-commerce or a logistics company can track the time not spent delivering goods.

These location insights are key to ensure that your fleets and supply are being used to their fullest potential. With Locale, these hyperlocal insights about your fleets can be derived in a matter of few minutes as opposed to weeks. We are dedicated to solving operational challenges and enable companies to make the most out of their location data.

If you want to know more or want a custom build for your company, get in touch with Aditi Sinha, Co-founder at Locale on LinkedIn or Twitter.

Read Similar:

How Mobility Companies can increase Asset Utilization Geospatially
By monitoring and analyzing demand-supply gaps on the ground.