Building JSON Documents from Relational Tables

March 15, 2016

Contributed by Jamey Hanson

Adoption of document databases is growing rapidly in response to the need for solutions that can handle large volumes of data. These solutions are adept at handling non-relational data models, but the distinction between database types (document, relational, key-value, etc.) is blurring. Organizations are making broad sweeps of data from multiple sources and storing it in single large documents for later analysis. 

PostgreSQL has kept up with evolving storage patterns in the data center with the addition of the JSON data type. PostgreSQL now supports many of the same workloads as MongoDB. Because of this capability, PostgreSQL users are seeking additional knowledge and expertise in working with JSON documents. 

 

Postgres enterprise integration across traditional DBMSs, Hadoop, and NoSQL. Download Now.

 

This post explores how to create nested JSON documents from relational tables with PostgreSQL 9.5 using a data set of 47,000 airports from http://ourairports.com. (A nod of thanks to the folks at OurAirports.com.)  This is the first of two PostgreSQL JSON posts using the airports data set.  The second post will explore how to index, query, update and present the JSON documents in PostgreSQL 9.5.

Data Set

OurAirport.com’s data set includes data from approximately 47,000 airports worldwide.  There are six, *.csv files corresponding to the tables in the ER diagram below.  The files contain UTF8, non-ASCII characters and embedded special characters including single/double quotes and curly-braces.

Airports are uniquely identified by the airport.ident field, which contains the three-letter airport identifier preceded by the country code, where USA is 'K'.  For example, Baltimore-Washington International Airport is 'KBWI'.  Airports have zero-to-many runways, navaids (navigational aids) and airport_frequencies.  I used custom enums for airport.type and various navaids fields because they are useful and simple in PostgreSQL.  airports.keywords is an array of values indicating the associated major city.  For example Baltimore-Washington International, Reagan National and Dulles all have the keyword 'WAS'.  countries and region.keywords are also arrays which are not reflected in the relational model.

I loaded the files using the file foreign data wrapper, file_fdw, because it allows PostGIS and other SQL functions to transform the data during the load.  I highly recommend this technique.  SQL for loading the data files and creating the custom data types is at the end of the post.  The database cluster is PostgreSQL 9.5 with the file_fdw and PostGIS 2.2 extensions running on a laptop CentOS 7.x VM.

Figure 1. Six tables used to create the airport JSON documents

Generate airport JSON documents

The (pretty complicated) SQL below builds airport JSON documents and loads them into a new table airports_json.  I created airports_json from a SELECT statement rather than export / import for compactness, but I recommend exporting the data to a file and then importing it into a work environment.

Create the JSON document model

The first step in creating the airports JSON document is to create a document model.  The three zero-to-many relationships of airports=<runways, airports=<navaids and airports=<airport_frequencies map naturally to arrays of JSON.  The location data for airports and runway leading / trailing edges should be modeled in GeoJSON, the standard for GIS data in JSON.  Some of the column names are ambiguous and need to be changed when converting to JSON tags.  For example, airport.home_link is changed to airport_home_link for clarity.  The three keyword fields could be modeled as strings, like they are in the source data, or arrays in JSON.  I chose to model them as arrays because it seems more natural. 

The next modeling decision is whether to repeat countries and regions data in each document.  The advantage of not repeating the values is better storage efficiency and simpler updates.  This is why they are separate tables in the relational model.  The advantages of including countries and regions data in each document are completeness and simplicity, which is why I chose to repeat them in each document in this example.

The final modeling decision is whether to remove NULLs (or nulls in JSON-speak) from the final document.  The JSON purist approach is always to remove nulls, but relational and hybrid models often keep (some of) them.  I removed all nulls in this example.

Make Sure all Needed Data is Available or Can be Generated

The next step in generating JSON is to make sure all the needed data needed for the JSON document is available or easily generated in the relational model.  In this example, all six tables plus joins are needed.  airports, regions and countries are all related with inner joins.  airports to runways, navaids and airport_frequencies are left outer joins.

The location data is provided as latitude_deg / longitude_deg, which is very difficult to turn into GeoJSON.  Luckily, PostGIS has a function to create GeoJSON, ST_AsGeoJSON(geometry).  This requires that lat/long be converted to geometry, which is easily done as part of the SELECT statement from the external to the internal table, ST_GeomFromText('POINT(' || longitude_deg || ' ' || latitude_deg || ')',4326).

Build the JSON Document from the Inside-out (using JSONB)

NOTE:  Use JSONB functions and casting when building JSON documents because PostgreSQL automatically handles control characters, single/double quotes and curly braces, in sub-documents and GeoJSON.  PostgreSQL treats JSON (not JSONB) sub-documents as text strings and escape all control characters.  This was not self-evident when I started this project.

The final step is to build the JSON document step-by-step from the inside-out starting with the runways, navaids and airport_frequencies embedded JSONB.

The steps for creating the three embedded documents are the same:

1.     Create the GeoJSON document and cast it to JSONB
ST_AsGeoJSON(location)::JSONB

2.     Create a JSONB document for each row in the respective table with TO_JSONB(anyelement)
NOTE: The syntax for this is awkward, but you can cut-and-paste for this template.

3.     Create a JSONB array of the JSONB rows using JSONB_AGG(expression)

4.     Add SQL to map the sub-documents to the parent airports.

After the sub-documents are created and related to the parent airports, the next step is to build the outer-level tags with JSONB_BUILD_OBJECT(VARIADIC "any").  This is the step when the three keyword columns are renamed to descriptive JSON tags.

The final step is to remove nulls with JSONB_STRIP_NULLS.

The SQL to generate the JSONB is below:

CREATE TABLE airports_json AS (

SELECT JSONB_STRIP_NULLS(

  JSONB_BUILD_OBJECT(

  'ident',          ident,

  'type',           type,

  'name',           air.name,

  'location',       ST_AsGeoJSON(location)::JSONB,

  'elevation_ft',   elevation_ft,

  'continent',      air.continent,

  'iso_country',    air.iso_country,

  'country',        cntry.name,

  'country_keywords', TO_JSONB(STRING_TO_ARRAY(cntry.keywords, ', ')),

  'iso_region',     air.iso_region,

  'region',         reg.name,

  'region_keywords', TO_JSONB(STRING_TO_ARRAY(reg.keywords, ', ')),

  'municipality',   municipality,

  'scheduled_service', scheduled_service,

  'gps_code',       gps_code,

  'airport_keywords', TO_JSONB(STRING_TO_ARRAY(air.keywords, ', ')),

  'airport_home_link', home_link,

  'airport_wikipedia', air.wikipedia_link,

  'country_wikipedia', cntry.wikipedia_link,

  'region_wikipedia',  reg.wikipedia_link,

  'runways', (

    SELECT JSONB_AGG(TO_JSONB(r))

    FROM (

      SELECT

        le_ident,

        he_ident,

        length_ft,

        width_ft,

        surface,

        lighted AS is_lighted,

        closed AS is_closed,

        ST_AsGeoJSON(le_location)::JSONB le_location,

        le_elevation_ft,

        le_heading_degT,

        le_displaced_threshold_ft,

        ST_AsGeoJSON(he_location)::JSONB he_location,

        he_elevation_ft,

        he_heading_degT,

        le_displaced_threshold_ft

      FROM runways

      WHERE airport_ident = air.ident

      ) r ),

  'navaids', (

    SELECT JSONB_AGG(TO_JSONB(n))

    FROM (

      SELECT

        name,

        filename,

        ident,

        type,

        frequency_khz,

        ST_AsGeoJSON(location)::JSONB AS location,

        elevation_ft,

        dme_frequency,

        dme_channel,

        ST_AsGeoJSON(dme_location)::JSONB AS dme_location,

        dme_elevation,

        slaved_variation_deg,

        magnetic_variation_deg,

        usagetype,

        power

      FROM navaids

      WHERE associated_airport = air.ident

      ) n ),

  'airport_frequencies', (

    SELECT JSONB_AGG(TO_JSONB(f))

    FROM (

      SELECT

        type,

        description,

        frequency_mhz

      FROM airport_frequencies

      WHERE airport_ident = air.ident

      ) f )

  )) AS airports

  FROM airports  air

    INNER JOIN regions reg ON air.iso_region = reg.code

    INNER JOIN countries cntry ON air.iso_country = cntry.code

);

Look at the Output

Take a look at an airports_json document to validate the model and make sure the fields are labeled and converted properly.  The SQL and JSON document below are for College Park Airport, the oldest continuously operating airport in the world.

SELECT College Park Airport

SELECT JSONB_PRETTY(airports)

FROM airports_json

WHERE airports ->> 'ident' = 'KCGS';

College Park Airport JSON document

{

    "name": "College Park Airport",

    "type": "small_airport",

    "ident": "KCGS",

    "region": "Maryland",

    "country": "United States",

    "runways": [

        {

            "surface": "ASP",

            "he_ident": "33",

            "le_ident": "15",

            "width_ft": 60,

            "is_closed": false,

            "length_ft": 2607,

            "is_lighted": true,

            "he_location": {

                "type": "Point",

                "coordinates": [

                    -76.9192,

                    38.9779

                ]

            },

            "le_location": {

                "type": "Point",

                "coordinates": [

                    -76.9253,

                    38.9833

                ]

            },

            "he_elevation_ft": 40,

            "he_heading_degt": 318.3,

            "le_elevation_ft": 47,

            "le_heading_degt": 138.3,

            "le_displaced_threshold_ft": 416

        }

    ],

    "gps_code": "KCGS",

    "location": {

        "type": "Point",

        "coordinates": [

            -76.9223022461,

            38.9805984497

        ]

    },

    "continent": "NA",

    "iso_region": "US-MD",

    "iso_country": "US",

    "elevation_ft": 48,

    "municipality": "College Park",

    "airport_keywords": [

        "DC3",

        "FRZ",

        "ADIZ"

    ],

    "country_keywords": [

        "America"

    ],

    "region_wikipedia": "http://en.wikipedia.org/wiki/Maryland",

    "airport_home_link": "http://www.collegeparkairport.aero/",

    "airport_wikipedia": "http://en.wikipedia.org/wiki/College_Park_Airport",

    "country_wikipedia": "http://en.wikipedia.org/wiki/United_States",

    "scheduled_service": false,

    "airport_frequencies": [

        {

            "type": "A/D",

            "description": "POTOMAC APP/DEP",

            "frequency_mhz": 119.85

        },

        {

            "type": "AWOS",

            "description": "AWOS 3",

            "frequency_mhz": 121.225

        },

        {

            "type": "UNIC",

            "description": "CTAF/UNICOM",

            "frequency_mhz": 122.975

        }

    ]

}

 Load RDBMS data

The SQL below was used to create and populate the relational tables.

CREATE SERVER ext_table FOREIGN DATA WRAPPER file_fdw;

 

-- Countries

CREATE TABLE countries (

  id            BIGINT PRIMARY KEY,

  code          CHAR(2) NOT NULL UNIQUE,

  name          TEXT,

  continent     CHAR(2),

  wikipedia_link TEXT,

  keywords      TEXT);

COMMENT ON TABLE countries IS 'Data from http://ourairports.com/data/';

 

CREATE FOREIGN TABLE fdw_countries (

  id            BIGINT,

  code          CHAR(2),

  name          TEXT,

  continent     CHAR(2),

  wikipedia_link TEXT,

  keywords      TEXT

) SERVER ext_table

OPTIONS (filename '/mnt/hgfs/VMShare/Data/Airports/countries.csv',

  format 'csv', header 'true');

 

INSERT INTO countries (

  SELECT * FROM fdw_countries);

 

-- Regions

CREATE TABLE regions (

  id            BIGINT PRIMARY KEY,

  code          VARCHAR(7) NOT NULL UNIQUE,

  local_code    VARCHAR(4),

  name          TEXT,

  continent     CHAR(2),

  iso_country   CHAR(2) REFERENCES countries(code),

  wikipedia_link TEXT,

  keywords      TEXT);

COMMENT ON TABLE regions IS 'Data from http://ourairports.com/data/';

 

CREATE FOREIGN TABLE fdw_regions (

  id            INT,

  code          VARCHAR(7),

  local_code    VARCHAR(4),

  name          TEXT,

  continent     CHAR(2),

  iso_country   CHAR(2),

  wikipedia_link TEXT,

  keywords      TEXT

) SERVER ext_table

OPTIONS (filename '/mnt/hgfs/VMShare/Data/Airports/regions.csv',

  format 'csv', header 'true');

 

INSERT INTO regions (

  SELECT * FROM fdw_regions);

 

-- Airports

CREATE TYPE airport_type AS ENUM (

  'balloonport',

  'closed',

  'heliport',

  'large_airport',

  'medium_airport',

  'seaplane_base',

  'small_airport');

 

CREATE TABLE airports (

  id            BIGINT PRIMARY KEY,

  ident         VARCHAR(7) NOT NULL UNIQUE,

  type          airport_type,

  name          TEXT,

  latitude_deg  NUMERIC NOT NULL,

  longitude_deg NUMERIC NOT NULL,

  location      GEOMETRY,

  elevation_ft  INT,

  continent     CHAR(2),

  iso_country   CHAR(2) REFERENCES countries(code),

  iso_region    VARCHAR(7) REFERENCES regions(code),

  municipality  TEXT,

  scheduled_service BOOLEAN,

  gps_code      VARCHAR(4),

  iata_code     VARCHAR(3),

  local_code    VARCHAR(7),

  home_link     TEXT,

  wikipedia_link TEXT,

  keywords      TEXT);

COMMENT ON TABLE airports IS 'Data from http://ourairports.com/data/';

 

CREATE FOREIGN TABLE fdw_airports (

  id            BIGINT,

  ident         VARCHAR(7),

  TYPE          airport_type,

  name          TEXT,

  latitude_deg  NUMERIC,

  longitude_deg NUMERIC,

  elevation_ft  INT,

  continent     CHAR(2),

  iso_country   CHAR(2),

  iso_region    VARCHAR(7),

  municipality  TEXT,

  scheduled_service BOOLEAN,

  gps_code      VARCHAR(4),

  iata_code     VARCHAR(3),

  local_code    VARCHAR(7),

  home_link     TEXT,

  wikipedia_link TEXT,

  keywords      TEXT

  ) SERVER ext_table

OPTIONS (filename '/mnt/hgfs/VMShare/Data/Airports/airports.csv',

  format 'csv', header 'true');

 

INSERT INTO airports (

SELECT

  id,

  ident,

  TYPE,

  name,

  latitude_deg,

  longitude_deg,

  ST_GeomFromText('POINT(' || longitude_deg || ' ' || latitude_deg || ')',4326),

  elevation_ft,

  continent,

  iso_country,

  iso_region,

  municipality,

  scheduled_service,

  gps_code,

  iata_code,

  local_code,

  home_link,

  wikipedia_link,

  keywords

FROM fdw_airports);

 

-- Runways

CREATE TABLE runways (

  id            BIGINT PRIMARY KEY,

  airport_ref   BIGINT,

  airport_ident VARCHAR(7) REFERENCES airports(ident),

  length_ft     INT,

  width_ft      INT,

  surface       TEXT,

  lighted       BOOLEAN,

  closed        BOOLEAN,

  le_ident      VARCHAR(6),

  le_latitude_deg NUMERIC,

  le_longitude_deg NUMERIC,

  le_location   GEOMETRY,

  le_elevation_ft INT,

  le_heading_degt NUMERIC,

  le_displaced_threshold_ft INT,

  he_ident      VARCHAR(6),

  he_latitude_deg NUMERIC,

  he_longitude_deg NUMERIC,

  he_location   GEOMETRY,

  he_elevation_ft INT,

  he_heading_degt NUMERIC,

  he_displaced_threshold_ft INT);

COMMENT ON TABLE runways IS 'Data from http://ourairports.com/data/';

 

CREATE FOREIGN TABLE fdw_runways (

  id            BIGINT,

  airport_ref   BIGINT,

  airport_ident VARCHAR(7),

  length_ft     INT,

  width_ft      INT,

  surface       TEXT,

  lighted       BOOLEAN,

  closed        BOOLEAN,

  le_ident      VARCHAR(6),

  le_latitude_deg NUMERIC,

  le_longitude_deg NUMERIC,

  le_elevation_ft INT,

  le_heading_degt NUMERIC,

  le_displaced_threshold_ft INT,

  he_ident      VARCHAR(6),

  he_latitude_deg NUMERIC,

  he_longitude_deg NUMERIC,

  he_elevation_ft INT,

  he_heading_degt NUMERIC,

  he_displaced_threshold_ft INT

    ) SERVER ext_table

OPTIONS (filename '/mnt/hgfs/VMShare/Data/Airports/runways.csv',

  format 'csv', header 'true');

 

INSERT INTO runways (

SELECT

  id,

  airport_ref,

  airport_ident,

  length_ft,

  width_ft,

  surface,

  lighted,

  closed,

  le_ident,

  le_latitude_deg,

  le_longitude_deg,

  ST_GeomFromText('POINT(' || le_longitude_deg || ' ' || le_latitude_deg || ')', 4326),

  le_elevation_ft,

  le_heading_degt,

  le_displaced_threshold_ft,

  he_ident,

  he_latitude_deg,

  he_longitude_deg,

  ST_GeomFromText('POINT(' || he_longitude_deg || ' ' || he_latitude_deg || ')',4326),

  he_elevation_ft,

  he_heading_degt,

  he_displaced_threshold_ft

FROM fdw_runways);

 

CREATE TYPE navaid_type AS ENUM (

  'DME', 'NDB', 'NDB-DME', 'TACAN', 'VOR', 'VOR-DME', 'VORTAC');

 

CREATE TYPE navaid_usagetype AS ENUM (

  'BOTH', 'HI', 'LO', 'RNAV', 'TERMINAL');

 

CREATE TYPE navaid_power AS ENUM (

  'HIGH', 'LOW', 'MEDIUM', 'UNKNOWN');

 

-- navaids

CREATE TABLE navaids (

  id            BIGINT,

  filename      TEXT,

  ident         TEXT,

  name          TEXT,

  TYPE          navaid_type,

  frequency_khz BIGINT,

  latitude_deg  NUMERIC,

  longitude_deg NUMERIC,

  location      GEOMETRY,

  elevation_ft  INT,

  iso_country   TEXT,

  dme_frequency NUMERIC,

  dme_channel   TEXT,

  dme_latitude_deg NUMERIC,

  dme_longitue_deg NUMERIC,

  dme_location  GEOMETRY,

  dme_elevation INT,

  slaved_variation_deg NUMERIC,

  magnetic_variation_deg NUMERIC,

  usagetype     navaid_usagetype,

  POWER         navaid_power,

  associated_airport VARCHAR(7) REFERENCES airports(ident)

);

 

CREATE FOREIGN TABLE fdw_navaids (

  id            INT,

  filename      TEXT,

  ident         TEXT,

  name          TEXT,

  TYPE          navaid_type,

  frequency_khz BIGINT,

  latitude_deg  NUMERIC,

  longitude_deg NUMERIC,

  elevation_ft  INT,

  iso_country   TEXT,

  dme_frequency NUMERIC,

  dme_channel   TEXT,

  dme_latitude_deg NUMERIC,

  dme_longitude_deg NUMERIC,

  dme_elevation INT,

  slaved_variation_deg NUMERIC,

  magnetic_variation_deg NUMERIC,

  usagetype     navaid_usagetype,

  POWER         navaid_power,

  associated_airport VARCHAR(7)

  ) SERVER ext_table

OPTIONS (filename '/mnt/hgfs/VMShare/Data/Airports/navaids.csv',

  format 'csv', header 'true');

 

INSERT INTO navaids (

SELECT

  id,

  filename,

  ident,

  name,

  TYPE,

  frequency_khz,

  latitude_deg,

  longitude_deg,

  ST_GeomFromText('POINT(' || longitude_deg || ' ' || latitude_deg || ')',4326),

  elevation_ft,

  iso_country,

  dme_frequency,

  dme_channel,

  dme_latitude_deg,

  dme_longitude_deg,

  ST_GeomFromText('POINT(' || dme_longitude_deg || ' ' || dme_latitude_deg || ')',4326),

  dme_elevation,

  slaved_variation_deg,

  magnetic_variation_deg,

  usagetype,

  POWER,

  associated_airport

FROM fdw_navaids);

 

-- airport_frequencies

CREATE TABLE airport_frequencies (

  id            BIGINT PRIMARY KEY,

  airport_ref   BIGINT,

  airport_ident VARCHAR(7) REFERENCES airports(ident),

  TYPE          TEXT,

  description   TEXT,

  frequency_mhz NUMERIC);

 

CREATE FOREIGN TABLE fdw_airport_frequencies (

  id            BIGINT,

  airport_ref   BIGINT,

  airport_ident VARCHAR(7),

  TYPE          TEXT,

  description   TEXT,

  frequency_mhz NUMERIC

) SERVER ext_table

OPTIONS (filename '/mnt/hgfs/VMShare/Data/Airports/airport-frequencies.csv',

  format 'csv', header 'true');

 

INSERT INTO airport_frequencies (

  SELECT * FROM fdw_airport_frequencies);

Congratulations! You are well on your way to mastering the use of JSON in PostgreSQL documents. Stay tuned on the EDB Postgres Blog for part II in this series to learn how to index, query, update and present the JSON documents in PostgreSQL 9.5.

Jamey Hanson is a Data Architect at Freedom Consulting Group, an EnterpriseDB Partner. 

Share this