Geo-spatial autocorrelation refers to the degree to which one object is similar to other nearby objects. “Auto” means self and “correlation” means association. In layman terms, it measures how close objects are similar to other close objects.
The first rule of GIS: Everything is related to everything else. But near things are more related than distant things (Waldo R. Tobler , 1970)
To understand this law, let’s take an example.
Suppose you randomly picked up a house from a listing website like Housing and let's assume that the price of the house is $600K. Next, let's say the house just next to it is also listed for sale and you are to predict it’s the price. You are given two options $650K and $2.8M, which one would you pick?
If you picked $600K, then you already subconsciously know what spatial autocorrelation is. It is the correlation between two nearby objects in some common features (for example house prices).
A potential application of spatial auto-correlation is that it helps in analyzing clusters and dispersion of ecology and diseases. Questions such as “Is the disease an isolated case” or “Is the rainfall pattern clustered or same across places” can be well understood and answered with spatial autocorrelation analysis.
Technically, spatial auto-correlation refers to the measure of association between observation of variables that are close to each other at a spatial scale. The variable could be:
1. at any point on a continuous surface (such as land-use type or annual precipitation level in a region)
2. at a set of fixed sites in a particular region (such as a set of retail outlets)
3. across a set of areas that subdivide a region (such as count or proportion of the household which have two or more cars in a Census data that divide an urban region).
Autocorrelation violates the core principles of statistics, that is, observations are independent of each other. As per assumptions of independence in classical statistics , observations between groups and observation within groups should be independent . Hence , spatial autocorrelation clearly violates the above mentioned assumptions .
The concept of spatial correlation is a kind of extension of temporal correlation. The only difference being that temporal correlation measures the change of a variable over time while spatial correlation measures the change of two variables — observation (values like income, rainfall, etc) and location.
Types of Spatial Correlation
The most common forms of spatial correlation across geospatial space are patches and gradients.
Spatial correlation in a variable can be exogenous (caused by another spatially auto-correlated variable like rainfall) or endogenous (caused by some process at play like the spread of a disease).
Here’s a video which talks in detail about how spatial autocorrelation helps in the importance of spatial autocorrelation.
Spatial auto-correlation is measured by Moran’s I. Moran’s I is a correlation coefficient used to measure the overall spatial correlation in your data set. Moran I’s can be classified as positive, negative and no spatial autocorrelation:
1. Positive correlation:
Spatial correlation is positive when similar values cluster together on a map. Positive autocorrelation occurs when Moran I is close to +1. The image below shows the land cover in an area and it is an example of a positive correlation since similar clusters are nearby.
2. Negative correlation:
Spatial correlation is negative when dissimilar values cluster together on a map. A negative spatial autocorrelation occurs when Moran’s I value is -1. A checkerboard is a good example of negative auto-correlation because dissimilar values are next to each other.
2. Zero correlation:
A Moran’s I value of 0 denotes no spatial autocorrelation.
Applications of Spatial Correlation
The importance of spatial autocorrelation is that it helps to define how important spatial characteristic is in affecting a given object in space and if there is a clear relationship of objects with spatial properties. Here are some of the interesting industrial use cases of spatial autocorrelation:
- Measure of Inequality: Spatial autocorrelation helps find out the measure of inequality and diversity be it in terms of income, population or race. It analyzes whether the parameters like income, population are clustered or uniformly distributed in a certain region using Moran’s I coefficient. [Source]
- Environment: Spatial autocorrelation helps to spot contamination hotspots of rare earth elements in urban soils. [Source]
- Points of Interest: Autocorrelation is used to map different parameters as a function of distance for variables of interest. For example, how far away from the city center do the house prices actually start decreasing
- Ecology: Spatial autocorrelation is used widely in the ocean and coral reef ecosystem for important applications like site suitability analysis to pinpoint areas for mussel longline farm or marine aquaculture planning.
- Demographics: Spatial autocorrelation is used to map and analyze voter turnout during elections For example, spatial autocorrelation was used to map absenteeism during the French Presidential election and French Regional election. [Source]
Case Study: Migration Analysis of the Italian Population
Autocorrelation has a large influence is migration analysis. This study is a reproduction of the paper here. The case here takes into consideration the analysis of the migration of the foreign population in Italy.
Migration is a key factor in population dynamics evolutions at different scales, with implications in economy, culture, and environment. Using spatial autocorrelation, we identify spatial clusters representative of the concentration of migrants.
Technically, the Moran I coefficient’s here represents the difference between the weighted variance of the ratio of the foreign and local resident population and the generalized variance. In layman terms, it expresses the correlation between the ratio of foreign/population in a given place and the ratio of foreign/population in the neighboring spatial unit.
Using a correlation index LISA (Local Indicator of Spatial Association), we segregated the scenarios into five kinds :
1. Locations with high values of the phenomenon and high level of similarity with its surroundings (high-high) defined as hot-spots
2. Locations with low values of the phenomenon and low levels of similarity with its surroundings (low-low) defined as cold-spots.
3. Locations with high values of the phenomenon and low levels of similarity or vice-versa. These are known as spatial outliers.
The areas where the migrating population inhibited could be divided into three clusters as follows :
- The first cluster was geographically concentrated in the north-eastern areas represented the values of positive correlation (type: high-high). Such groups were characterized by increasing income opportunities /welfare and hence attracted foreigners looking for employment.
- The second cluster was found in the central region and represented values of positive correlation (type: high-high). The areas showed similar characteristics of greater welfare.
- The third cluster was found in towns of Southern Italy (type: low-low). The areas were typically characterized by low incomes and fewer employment opportunities.
Spatial autocorrelation not only clusters similar objects with other similar objects but also speaks about the degree of correlation or similarity. It is helpful in finding hidden patterns and relations. It finds a lot of applications in ecology and demographics.
At Locale, we are building an “operational” analytics platform using location data for supply and operations teams in on-demand companies. If you want to delve further, check our website out or get in touch with Aditi on LinkedIn or Twitter.