The application of GIS is limited only by the imagination of those who use it— Jack Dangermond

Geospatial data? What is that?

The term geospatial is relatively new and has been gaining popularity since the 1980s. Data which contains geographic content in it is classified as geospatial data. This includes coordinates, postal codes, city, or address. But only coordinates are not enough to understand the whole dataset. There must be some notation about that coordinates, for example, the name of the location which the coordinates represent. We call it attribute data.

Still not sure? Have a look at the data below.

VOLCANX020 : NAME        : LOCATION      : LATITUDE   :  LONGITUDE
   509     : Baker       : US-Washington : 48.7767982 : -121.8109970
   511     : GlacierPeak : US-Washington : 48.1118011 : -121.1110001
   513     : Rainier     : US-Washington : 46.8698006 : -121.7509995
   515     : St.Helens   : US-Washington : 46.1997986 : -122.1809998

Here, the number and name of the volcanoes are attribute data and the location, latitude, and longitude are geospatial data. Examples make things a thousand times easier, don’t they?

Types of data in GIS

As we already discussed, GIS data can be primarily separated into two categories — Spatially referenced and attribute data. The spatially referenced data can be further classified into two dissimilar types — Vector and Raster. Let’s dig what vector and raster data types are.

Vector Data

Vector data consists of individual points that are stored as pairs of (x,y) coordinates. These points further can be joined in a particular order to create lines or joined in closed rings to create polygons. Let’s understand all three different types — Point, Line (or arc), and Polygon.

Point data is commonly used to represent separate data points and nonadjacent (not next to each other) features. To represent point data we can use the radius and different colors to differentiate features from each other. Examples would be, pointing out volcanoes (banner image of this article), locating government offices, schools, and shopping malls in a particular city.

Line data is, of course, used to represent linear features. These features have both starting and ending points. We commonly use solid lines versus dotted lines or a combination of colors or even line thickness to distinguish features from one another. Primary examples would be roads, rivers or metro lines.

Source

Polygon data is used to represent boundaries of lakes, cities, or even forest. These features are two dimensional and can be used to measure the area of the desired geographic feature. This map below shows unemployment in the US which describes polygon data perfectly.

The unemployment rate in the US (city-wise)

Format of Vector data

Both — vector and raster — data have different file formats. It would be really difficult for any GIS analyst to process an unknown GIS file, therefore it’s good to know what type of file format you are going to work on. There are some common and uncommon formats, let’s understand what they are and how they work.

  1. Esri Shapefile
    File type — .shp, .dbf, .shx, .prj
    Description — By far the most common geospatial file is the shapefile. It is widely accepted by all commercial and opensource organizations. And not just accepted, it has become the industry standard. You will need all three files that are mandatory to make a shapefile. There are other file formats that you can include to make a shapefile but those are optional and not mandatory.
  2. Geographic JavaScript Object Notation
    File type — .geojson, .json
    Description — Commonly known as GeoJSON, it is the most used format for web-based mapping. It stores the coordinates as text in JSON form, which includes the vector points, lines, polygons as well as tabular information. GeoJSON stores information (read objects) within curly braces {} and it has less intricate compared to GML, which we will learn about in a few minutes.
  3. Geography Markup Language
    File type — .gml
    Description — GML is similar to GeoJSON. It stores information (or features) in the form of text and it can be easily updated in any text editor. Also, each feature has a list of properties — points, lines, curves, and polygons. However, as discussed, GML is intricate because it results in more data for the same amount of information.
  4. Google Keyhole Markup Language
    File type — .kml, .kmz
    Description — This format is XML-based and it is primarily used for the development of Google Earth. It was initially developed by Keyhole Inc and was later acquired by Google. KMZ (KML-Zipped) replaced KML as the default geospatial format for Google earth because it is a compressed version of the KML file.

Raster Data

The other type, raster data, is used to represent surfaces. It is cell-based data which consists of a matrix of cells (or pixels) organized into columns and rows (or grid) where each cell represent information. Simplifying it a bit, a digital photo is an example of raster data where each pixel value represent a particular color. Other examples of raster data would be aerial photographs, digital elevation model, or even scanned maps.

Image source

Format of Raster data

Raster data too have specific file types. As they are made up of grid, they are, in most cases, regularly spaces and squared but not always. Here are some of them.

  1. GeoTIFF
    File type — .tif, .tiff, .ovr
    Description — For GIS and satellite remote sensing applications GeoTIFF has become an industry standard. It makes up of a total of three files as mentioned above but may be accompanied by other files: .tfw, .xml, and .aux
  2. ERDAS Imagine
    File type — .img
    Description — It is a proprietary file format developed by Hexagon geospatial. These files are commonly used for raster data to store single and multiple bands of satellite data. Imagine files use a hierarchical format that is optional to store fundamental information about the file.
  3. IDRISI Raster
    File type — .rst, .rdc
    Description — IDRISI associate RST to all raster layers, which consist of numerical matrix cell values as real numbers, integers, and bytes. The RDC (raster documentation file) is a companion text file to RST files.

Data Sources

We now know what type of files are there in geospatial works. But where do we get them? We must get the data from reliable sources so that the data files we fetch have less uncommon formats.

  1. Esri Open Data
    Data type — Spreadsheets, KML, shapefile, GeoJSON and more.
    This is like a hidden treasure of free GIS data with over 67,300 open data set from more than 4,000 organizations. In some cases, you might have to put the effort of merging your downloads into one. But despite the efforts, this single source is your best chance to find precisely what you are looking for.
  2. Natural Earth Data
    Data type — Cultural, physical and raster (basemap) data.
    It does an excellent job to match the needs of cartographers. On a large scale, all key cultural and physical vector GIS data are available for you to use. And the best part? It is in the public domain. That means you have the right to modify data in any manner for your use.
  3. USGD Earth Explorer
    Data type — Remote sensing data
    For people observing the earth and seeking remote sensing data, this is your only destination. It has a user-friendly interface and gives you access to one of the largest databases of aerial and satellite imagery. In addition to that, it even has a bulk download application, just in case you need that.
  4. NASA’s Socioeconomic Data and Application Center
    Data type — Socioeconomic data
    SEDAC is all about human interaction with the environment. It has a wide range of data including (but not limited to) agriculture, hazards, health, population, sustainability, poverty, and water.
  5. UNEP Environmental Data Explorer
    Data type — Freshwater, population, forests, climate, emissions.
    It holds more than 500 variables but it is kind of difficult to explore GIS data because of the interface. If you go there, you can filter “Geospatial Data Sets” and download the data.
  6. DIVA-GIS
    Data type — Country, Global level data
    DIVA-GIS is a free computer program for mapping and geographic data analysis. But on the data page, you’ll find a good list of data sets ranging from global climate to species occurrence data.

Visualization tools

Packages

With the help of dedicated analysis packages, we can easily visualize small as well as large scale data. Even if you have zero experience dealing with packages, it comes with a minimal learning curve. So don’t worry just dive in.

  1. Geoplot
    Geoplot is a high-level python geospatial visualization library. It is an extension to cartopy and matplotlib (again another two great libraries for visualization) which makes mapping easy.
  2. Folium
    Folium was built on data wrangling strength of python and visualization/mapping strength of leaflet.js library. It has plenty of builtin tilesets from OpenStreetMap, Mapbox, and Stamen. Also, it supports both image and video, as well as GeoJSON and TopoJSON overlays.
  3. Geopandas
    Geopandas is fundamentally an opensource project to make working with geospatial data in python easier. However; it extends its support of mapping and it’s like walking in the park to create basic maps using Geopandas. Learn more here.
  4. PySAL
    PySAL is an open-source cross-platform library designed for geospatial data science with an emphasis on vector data. It supports the development of high-level applications such as the detection of spatial clusters and outliers, exploratory Spatio-temporal data analysis and many more.
  5. rworldmap and rworldxtra
    It is equally easy to plot geospatial data with R. rworldmap enables mapping of country-level and gridded user datasets, while rworldxtra provides high-resolution vector country boundaries derived from Natural Earth data.

Open Source Software

With these free GIS software, you have to power to get the job done as if you are you are working with proprietary GIS software.

  1. PostGIS
    PostGIS is an open source spatial database extender for the PostgreSQL object-relational database. It provides support for geographic objects allowing location queries to be run in SQL. Most software products use PostGIS as a database backend, including QGIS and GrassGIS listed below.
  2. QGIS
    QGIS is loaded with hidden gems at your fingertips. You can automate map production, process large-scale geospatial data, and generate cartographic images. What else do we need? The latest version (QGIS 3) came with a whole new set of cartography and 3D as well as analysis tools.
  3. GrassGIS
    Geographic Resources Analysis Support System is a widely used GIS software suite. It provides features like geospatial data management and analysis, image processing, map production, spatial modeling, and visualization.
  4. Kepler.gl
    Visualization team at Uber is creating industry-grade open source frameworks to supercharge big data. There are mainly four suits available — Deck, Luma, React map and React vis. Kepler was built with deck.gl and utilizes WebGL to render large data faster and efficiently.
Kepler.gl

At Locale, we are big fans of geospatial data and passionate about the community. You can check out similar tutorials here in this series here:

Geospatial Clustering: Types and Use Cases
Deep dive into all the different kinds of clustering with their use cases.
Visualizing Tesla Superchargers in France Geospatially
A complete guide on visualizing points using Python and Folium, from scratch.