The CRUD of JSON in PostgreSQL

April 07, 2016

Today’s connected enterprise requires a single database that can handle both structured and unstructured data efficiently and that adapts dynamically to swiftly changing and emerging data types. For many organizations, that database is Postgres. With JSON, Postgres can support document databases alongside relational tables and even combine structured and unstructured data. This provides organizations with the atomicity, consistency, isolation and durability (ACID) compliance and common business logic required to ensure data integrity.

Postgres database administrators are expanding their expertise working with the NoSQL capabilities in Postgres. Critical to this skill set is building document databases using JSON. This post is the second in a series dedicated to helping DBAs enrich their abilities for implementing the new advantages of Postgres and transform their data center into an innovation platform. The first post, Building JSON Documents from Relational Tables, published on March 3, 2016, explained how to create nested JSON documents from relational tables with PostgreSQL 9.5 using a data set of 47,000 airports.

Enterprise-ready Postgres tools for high availability, monitoring, and disaster recovery. Download Now.

 This post explores the Create, Read, Update, and Delete operations in PostgreSQL, known as CRUD. It is the second in a series of posts exploring PostgreSQL's unique JSON capabilities within the RDBMS framework. The (many) examples use the airports.json data set created in the previous post and can be downloaded here.  (* READ includes CREATE, UPDATE and DELETE. READ and indexing examples will be explored in a later post in this series.)

The airports.json data set contains a document for ~47K worldwide airports. It is based on the six csv files / RDBMS tables in the http://ourairports.com data set. See the previous post for more details.

PostgreSQL's JSON CRUD capabilities expand the range of JSON applications we can build with PostgreSQL because they free developers from the limited previous workflow need to:

  1. Create JSON in the application / get JSON from an external source
  2. Store JSON documents in the database
  3. Retrieve documents when needed
  4. Modify JSON by

a.  Retrieve the document to the application tier

b. Use application code to make changes

c. Put the document back in the database when changes are finished

With PostgreSQL's JSON CRUD capabilities, developers can now:

  1. Create JSON (and GeoJSON!) from PostgreSQL tables, application code or external sources
  2. Store JSON documents in the PostgreSQL
  3. Retrieve tags, values, arrays, subdocuments and documents using SQL
  4. Modify JSON inside PostgreSQL using

CREATE

CREATE documents by loading from a *.json file and create new elements within documents.

Load *.json file of documents into PostgreSQL

The airports.json file is available here for download or it can be built using the SQL in the blog post, Building JSON Documents from Relational Tables.

NOTE: Think about loading *.json files just like loading any other type of data.  I generally use Foreign Data Wrappers (FDWs) because they are simple, fast, and easy to troubleshoot.  The syntax using COPY is even simpler and is included for reference:

 

CREATE EXTENSION IF NOT EXISTS file_fdw;

CREATE SERVER ext_table FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE airports_fdw (

  airport  JSONB) SERVER ext_table

  OPTIONS (filename '/path/to/file/airports.json', format 'csv',
  header 'false', delimiter '|', quote '$');

-- delimiter and quote should not appear in the file

 

CREATE TABLE airports (

  id       BIGSERIAL PRIMARY KEY,

  airport JSONB);

 

INSERT INTO airports (airport)

  (SELECT airport FROM airports_fdw);

 

Load using COPY

A simpler, less flexible alternative is to use COPY, http://www.postgresql.org/docs/9.5/static/sql-copy.html.

 

COPY airports (airport)

FROM '/path/to/file/airports.json'

WITH delimiter '|' quote '$' CSV;

 

See what a record looks like

Look at Baltimore/Washington International Thurgood Marshall Airport (KBWI) as an example.

 

SELECT JSONB_PRETTY(airport)

FROM airports

WHERE airport ->> 'ident' = 'KBWI';
 

NOTE: ->> operator returns JSONB.  -> returns TEXT/ INT.  See http://www.postgresql.org/docs/9.5/static/functions-json.html

 

Add an outer-level tag and value

The airport JSON documents don’t have a tag to store the airlines that service them.  Add a tag,  carrier_hub, to Southwest Airlines hubs using the JSONB concatenation operator, ||, and jsonb_set.

 

UPDATE airports

SET airport = airport || '{"carrier_hub": ["Southwest"]}'
WHERE airport ->> 'ident' IN (
  'KBWI',   -- Baltimore-Washington International

  'KMDW',   -- Chicago Midway

  'KLAS',   -- Las Vegas

  'KDAL');  -- Love Field, Dallas;

 

Add an id tag to the airport JSON

jsonb set.http://www.postgresql.org/docs/9.5/static/functions-json.html, is the primary PostgreSQL JSON CRUD operator.  Note the optional create missing flag, which make jsonb_set behave like a INSERT when false, the default, or like an UPSERT when the flag is true.

Note that jsonb_set has access to other data within PostgreSQL, such as columns, variables, etc..

 

UPDATE airports_json

SET airport = jsonb_set(

  airport,

  '{id}',

  id::TEXT::JSONB,

  true); -- creates id column in JSON using airports_json.id

 

Verify the results:

 

SELECT

  airport ->> 'name',

  airport ->> 'ident',

  airport ->> 'carrier_hub'

FROM airports

WHERE airport ->> 'ident' IN ('KBWI', 'KMDW', 'KLAS', 'KDAL');

 

Add an element to an array

The tag airport_keyword is an array of cities that the airport is associated with.  For example, Baltimore/Washington International Thurgood Marshall Airport (KBWI), Ronald Reagan Washington National Airport (KDCA) and Washington Dulles International Airport (KIAD), all have the WAS element.

The SQL below adds 'BAL' to the airport_keyword array for BWI, Baltimore-Washington Airport using the concatenation operator, ||, and JSONB_SET with a subquery.

 

UPDATE airports

SET airport = JSONB_SET(

  airport,

  '{airport_keywords}',

  (SELECT (airport -> 'airport_keywords') || TO_JSONB('BWI'::TEXT)
  FROM airports WHERE airport ->> 'ident' = 'KBWI'),
  false)

WHERE airport ->> 'ident' = 'KBWI';

 

Verify the results

 

SELECT airport ->> 'airport_keywords'

FROM airports

WHERE airport ->> 'ident' = 'KBWI';

 

READ

READing documents, subdocuments, arrays, tags and values will be in the next blog post along with indexing.

 

UPDATE

Change the value of a first-level tag

Change Salisbury Ocean City Wicomico Regional Airport, KSBY, from a medium_airport to a large_airport.

 

Check type first

 

SELECT airport ->> 'type'

FROM airports

WHERE airport ->> 'ident' = 'KSBY';

 

Update type

 

UPDATE airports

SET airport = JSONB_SET(

  airport,

  '{type}',

  TO_JSONB('large airport'::TEXT),

  false)

WHERE airport ->> 'ident' = 'KSBY';

 

Recheck type

 

Change the value of a tag in a nested document

Change the Howard County General Hospital Heliport, MD25, runway surface from 'GRASS / SOD' to 'ASPH'.

Note that I used a nested JSONB_SET. The inner level is to change the value in the runways nested document and the outer level is to update the parent document.

Check the surface first:

 

SELECT JSONB_ARRAY_ELEMENTS(airport -> 'runways') ->> 'surface'

FROM airports

WHERE airport ->> 'ident' = 'MD25';

 

JSONB_ARRAY_ELEMENTS returns a row for each JSONB subdocument.

It is a helpful technique to think of the -> operator, which returns JSONB, in a series of pipes - just like in Linux.  In this example, the first (and only) element in the runways array of subdocuments is ‘piped’ to the ->> operator, which returns TEXT / INT/

Update the value.

 

UPDATE airports

SET airport = JSONB_SET(

  airport,

  '{runways}',

  (WITH heli_runs AS (

    SELECT airport -> 'runways' AS heli_run

    FROM airports

    WHERE airport ->> 'ident' = 'MD25')

  SELECT JSONB_SET(

    JSONB_ARRAY_ELEMENTS(heli_run),

    '{surface}',

    TO_JSONB('asph'::text),

    false)

  FROM heli_runs),

  false)

WHERE airport ->> 'ident' = 'MD25';

Recheck surface

 

DELETE

Remove an outer-level tag

Use the minus operator, '-', to remove the airport_wikipedia and region_wikipedia tags from each document.

 

UPDATE airports

SET airport = (airport - 'airport_wikipedia' - 'region_wikipedia');

Remove an element from an array

Remove the 'Idlewild' element from the airport_keywords array for John F. Kennedy International Airport, KJFK.

 

UPDATE airports
SET airport =

JSONB_SET(

  airport,

  '{airport_keywords}',
  (WITH kjfk_keys AS (

    SELECT JSONB_ARRAY_ELEMENTS(airport -> 'airport_keywords') keywords

    FROM airports

    WHERE airport ->> 'ident' = 'KJFK')
  SELECT JSONB_AGG(keywords)

  FROM kjfk_keys

  WHERE keywords::TEXT != '"Idlewild"'))
WHERE airport ->> 'ident' = 'KJFK';

 

Remove a tag from a sub-document

BWI airport has four runways:

●22 / 04                           6,000 ft

●28 / 10                         10,520 ft

●33R / 15L                       5,000 ft

●33L / 15R                       9,501 ft

 

NOTE:  Runways are named as 10 * the magnetic direction a plane is heading when on the centerline.  For example, runway 28 points almost due west: 280°.  The second name is the same runway in the other direction, 180° off, or subtracting 18.  Runway 22 - 18 = runway 04.  Parallel runways follow the same rules and add L (left) or R (right).

This SQL removes the le_displaced_threshold_ft from BWI runway '28'.

NOTE:  A runway’s displaced threshold is the distance past the beginning of runway pavement that a plan should land.

 

UPDATE airports

SET airport =

JSONB_SET(

  airport,

  '{runways}',

  (SELECT

    (WITH kbwi_runways AS (

      SELECT JSONB_ARRAY_ELEMENTS (airport -> 'runways') runway

      FROM airports

      WHERE airport ->> 'ident' = 'KBWI')

    SELECT runway - 'le_displaced_threshold_ft'

    FROM kbwi_runways

    WHERE runway ->> 'he_ident' = '28')

    ||

    (WITH kbwi_runways AS (

      SELECT JSONB_ARRAY_ELEMENTS(airport -> 'runways') AS runway

      FROM airports

      WHERE airport ->> 'ident' = 'KBWI')

    SELECT JSONB_AGG(runway)

    FROM kbwi_runways

    WHERE runway ->> 'he_ident' != '28'))

  )

WHERE airport ->> 'ident' = 'KBWI';

 

The logic of this SQL is:

  1. Create a row/JSONB_document for each runway with JSONB ARRAY ELEMENTS, which returns each element in a JSONB array as a result set row.
  2. Find the row for runway '28' and remove the le displaced threshold ft ' tag
  3. Concatenate the modified runway '28' subdocument with the other runway subdocuments
  4. Update the BWI airport JSONB document with the updated array or runways.

It would be simpler if there was only one runway or we knew the array position of runway '28'.

 

Summary

PostgreSQL 9.5 has powerful and unique JSON CRUD capabilities that enable new use cases, data models and (hopefully) new markets! 

Jamey Hanson is a Data Architect at Freedom Consulting Group, an EnterpriseDB Partner.  

  

Share this