Geo is practically everywhere today!
With the prevalence of GPS all around us & the rise of smartphones, location data is being collected in abundance. Legend says almost all of the data (~80%) that companies store has a location component.
Our lives are now filled with products and services available at the tap of a phone, getting things delivered in minutes to our doorstep! Today, it is hard to find an app which doesn’t ask for your location permissions.
There is a famous quote in the GIS (Geographic Information Systems) world which says “everything happens somewhere.” When you place an order from any food delivery app, every event starting from placing of the order — to delivery executives (DE) getting assigned to — the DE picking it up from the restaurant — to him delivering happens at a certain lat-long!
Which brings me to my next point:
Why are we obsessed with “location” as a dimension?
Firstly, there is no “mobility” without dynamic location. In other words, location is a fundamental aspect whenever there is movement of assets (people, vehicles, cargo, parcel) on the ground. After all, in cities and towns where things change at every sq km in a city, it’s important to add the “context” of those areas.
For the on-demand economy that we live in, analyzing location data across all your dimensions (users, stores, partners) becomes critical in real-time. This is because you have to match your supply with demand, do location-based pricing, promotions or faster and more accurate deliveries at scale!
Secondly, location also becomes crucial for analysis whenever there is a question of “where” in a business problem. For instance, where should I open my new store? Where should I launch campaigns and products? Where do I send my delivery personnel to?
And finally, the most successful location-based experiences for consumers are based on sequential activities at a very granular level: Where are you? Where do you need to go? What do you want to do when you get there? This becomes important when you need to acquire customers by targeting ads and retaining and engaging them by studying their behavior.
Location is exciting! If played right, it can drive significant revenue improvements (case in point: Uber & Airbnb). Its high usage and real-time nature make it really valuable and sticky.
But, all this time location component that has been particularly neglected in business decisions. This is because conventional analytics tools are fantastic for the treasure trove of statistical data that businesses have but somewhere, they fall short of location-based decisioning.
You might be wondering: “But, why?”
Before we deep-dive into that, let’s understand how geospatial data is different from statistical data.
What sets geospatial data apart from statistical data?
Just like text, sound, image are different kinds of data, Latitude-Longitude is a different kind of data that can add immense depth, meaning and insights to statistical data in a space-time context.
Statistical data comes in a tabular format and usually, comprises of two elements: values across time. Geospatial data (also called geographic data or location data) is often formatted as points (latitude-longitude coordinates), polygons or polylines.
It incorporates the third dimension on top: values across time and dynamic location — which requires a completely different approach and treatment.
Two special properties of geospatial data are autocorrelation and non-stationarity, which make it difficult to meet the assumptions and requirements of traditional (nonspatial) statistical methods, like OLS regression.
Spatial Autocorrelation means how similar (correlated) are my measures between nearby observations. It falls in-line with the first law of geography as well- “everything is related to everything else, but near things are more related than distant things”.
Spatial nonstationarity is a condition in which a simple “global” model cannot explain spatial relationships between the variables. The nature of the model alters over space and behaves differently in different parts of the study area.
Let’s answer our pending question now:
Why are current statistical tools not apt for geospatial analysis?
Today, the current way of dealing with location data inside companies is broken. Due to the dearth of any location analytics products out there, most of the businesses have no choice but to rely on traditional BI and analytics tools. It’s not puzzling that this is a highly ineffective strategy because these tools are not really meant for geo-analysis.
Most companies collect location log data in the form of “pings”. The way to analyze the patterns they encompass is to have an infrastructure that can ingest these pings in real-time — something that platforms like Periscope or Tableau don’t cater to.
Moreover, location data is present across disparate databases, in different structures. Hence, slicing and dicing variables across tables in real-time comes even more complex.
All of us know that a statistical dashboard contains bars and charts which sprout from carrying out operations (sum, count, divide, average, etc) on variables. While getting live trend updates through spikes and dips on graphs might be helpful, these charts are work better on aggregated historical data.
Adding or dividing lat-longs and creating bars and charts on them is pretty futile. To make sense of these lat-longs, you need to have a map by your side to understand their spatial distribution!
Another important aspect that governs all the properties of a geospatial dashboard is a layer. Not having this layering mechanism which you can use to display multiple types of data points on the map, sort of misses the point. For instance, layers help in viewing how my orders (first layer) and partners locations (second layer) are distributed across my area clusters (third layer).
Maps are also more insightful to draw inferences than bars and charts if there is movement of components on the ground involved. Hence, real-time geographic analysis when everything is dynamic becomes fundamental.
Traditional BI tools like PowerBI and Geckoboard offer the capability to plot points on a map. However, just plotting points on a map is not adequate as billion dots in space is not very intuitive. Moreover,
Location intelligence is so much more than tracking and plotting points on a map!
Strategies like clustering, heat mapping, aggregation, indexing, etc. come in handy to absorb a large number of points.
Some tools like Tableau and Periscope also allow the creation of heat maps — a fantastic way to depict the patterns of metrics. The disadvantage of heatmaps is that they are only a visual representation, thus restricting you to do any sort of analysis on it.
A more efficient way to do aggregation is indexing them on hexagonal grids or geohashes. Once you analyze the pattern of your metrics (such as demand and supply) across grids, you can use the cells as a single unit in your models as well. Indexing also helps to go very granular in your analysis.
If you want to read about grids, you can check this out:
(iv) Data Preparation:
Cleaning: Using Periscope and Thoughtspot, you can clean your statistical data by taking care of the blanks, spaces, data formats, NAs etc whereas cleaning of GPS data involves snapping it back to the nearest road or correcting for spatial outliers. (You must have observed often while using Google Maps, GPS goes off very far randomly.) It is safe to say that GPS as a technology still has miles to go!
Merging: Platforms like Tableau Prep or Metabase allow you to merge two tables using an inner join, left join or a right join on the basis of a common identifier. It’s quite difficult to do spatial merges using these platform if, for instance, you have data across three dimensions: users, delivery partners, and stores (sometimes they come in different formats).
A spatial join involves inserting columns from one feature table of a layer to another in the spatial perspective. For example. merging a Kormangala area polygon with all the ride pick up points inside it.
Enriching: Enriching of any data implies adding new layers of information and merging it with third-party or external data sources. In the GIS world, we enrich spatial data for a better context of areas in which the points are present. This means adding the environmental layer of roads, traffic, weather, points of interest, demographics, buildings, etc.
The internal solutions not working out?
Some companies, of course, realize this and hack around open-source tools (like Kepler.gl or QGIS). But these open source tools come with their own list of constraints and limitations.
Read more about their limtations here:
However, the issue doesn’t get resolved here because geospatial data itself comes with a bucket full of challenges.
Performing geospatial queries on streaming data become very compute-intensive and legacy technologies (like ArcGIS) provide very little support. The complexity increases with visualizing large geospatial datasets with any sort of interactivity at scale.
Sometimes developers also build their own internal tools, but most of the times they are not well suited for all different audiences inside the company. Since the tools are not built in a scalable way, maintaining these suck up a lot of developer bandwidth often!
A lot of times there is even a repetition of effort and the wheel keeps getting re-invented over and over again. As Jeff Lawson from Twilio said — “It is easier than ever to build software but harder than ever to operate it”.
Why are we building Locale.ai?
It all started with a personal problem. As data scientists working with geospatial data, the existing analytics products were futile in our daily workflows. Hence, we had to build our own tools and libraries for our everyday workflows.
We then realized data scientists around the globe face similar problems when it comes to location data. As a result, businesses are struggling to attain operational efficiency!
At Locale, we plan on solving these problems once and for all. We want to make all of these processes less painful and building a world where it is very easy to get all your geospatial answers in minutes! That’s why we are going back to the drawing board and handcrafting this experience completely from scratch.
So, the next time you want to order medicines in case of an emergency, you won’t hopefully read on the screen, “Delivery guys are not available. Please check again later.” Next time the delivery guys won’t have to stand idle in the scorching heat, cold or rains waiting for the orders to come.
They can be incentivized to move to high demand areas and can earn more money. The push notifications that you get won’t be spam — they will be shot to you at the right place and right time.
If you resonate with this problem and want to contribute, we are hiring for different roles. In case you would like a demo, get in touch with me or LinkedIn or Twitter.