Querying JSON fields in the LDP (old, unfinished)

Querying a JSON field that contains a simple value, either top-level or nested

If what you're looking for is a single, simple value, and the value is either top-level in the main JSON object or in a nested object (no arrays), you can use a single JSON function to extract the value. As documented in the LDP User Guide, "it is recommended to use the built in function json_extract_path_text() to retrieve data from a path of up to five nested JSON fields."

Example:

SELECT json_extract_path_text(data, 'status', 'name') AS status,
       count(*)
    FROM circulation_loans
    GROUP BY status;


Result:

 status | count
--------+--------
 Closed | 504407
 Open   |   1294

Querying a JSON field that contains an array of simple values

If the desired JSON element contains an array of simple values, there are a few ways you can approach the query.

Return the array as-is

You can use the procedure above to return the array as-is, if you have no need for extra processing.

Join the individual array elements to another table

In PostgreSQL:

WITH instance_formats_extract AS (
    SELECT
        inst.id AS instance_id,
        JSON_ARRAY_ELEMENTS_TEXT(JSON_EXTRACT_PATH(inst.data, 'instanceFormatIds')) AS instance_format_id
    FROM inventory_instances AS inst
),

In Amazon Redshift:

Any array processing is exponentially more difficult in Amazon Redshift, which does not support the more handy JSON functions used by the latest versions of PostgreSQL.

Clean up the way the array is displayed in the final report

Let's say you want to transform the array a bit, for example by adding a "|" to separate the elements of the array. You can take the array and aggregate the elements into a different formatted list by first separating the elements and then reaggregating them.

In PostgreSQL:


WITH inst_subjs AS (
    SELECT
        inst.id AS instance_id,
        JSON_ARRAY_ELEMENTS_TEXT(JSON_EXTRACT_PATH(inst.data, 'subjects')) AS subject
    FROM inventory_instances AS inst
)
SELECT
    instance_id,
    STRING_AGG(distinct subject, '|'::TEXT) AS subjects
FROM inst_subjs
GROUP BY instance_id








...

In Amazon Redshift:


Querying a JSON field that contains an array of objects 

An example of this kind of procedure is that you may want to identify the primary author of a work. You would first need to access the contributors array, and for each object in the array, you would need to test if the object is the "primary" contributor, then extract the information about that contributor. Because this requires operating on each object in the array separately and bringing the results back together, you can't just use the normal json_extract_path_text() function. (That is, what you pass to json_extract_path_text() must be a path through a series of objects to an element, with no arrays in the path except perhaps for the final element.)

In PostgreSQL:

WITH identifiers_extract AS (
    SELECT
        inst.id AS instance_id,
        JSON_EXTRACT_PATH_TEXT(JSON_ARRAY_ELEMENTS(JSON_EXTRACT_PATH(inst.data, 'identifiers')),'identifierTypeId') :: VARCHAR AS type_id,
        JSON_EXTRACT_PATH_TEXT(JSON_ARRAY_ELEMENTS(JSON_EXTRACT_PATH(inst.data, 'identifiers')),'value') :: VARCHAR AS value
    FROM inventory_instances as inst
)

This code identifies the array of objects through a call to JSON_EXTRACT_PATH(), which returns the array in JSON (rather than text). The JSON_ARRAY_ELEMENTS() "pivots" the array so that each object in the array appears in a separate row. This new table arrangement (a "set") is then passed to the JSON_EXTRACT_PATH_TEXT(), which can traverse the objects in each row and extract whatever element is needed. From here, the data can be sorted, filtered, joined, etc. like any other query result.

If you want values from each array object on separate rows, you're done. If you want to extract a single element from each object and join those elements into a single list for each of the original rows, you can use a similar process as described above for cleaning the array output:


WITH inst_pub_date_extract AS (
    SELECT
        inst.id AS instance_id,
        JSON_EXTRACT_PATH_TEXT(JSON_ARRAY_ELEMENTS(JSON_EXTRACT_PATH(inst.data, 'publication')),'dateOfPublication') :: VARCHAR AS pub_date
    FROM inventory_instances as inst
),
inst_pub_dates AS (
    SELECT
        instance_id,
        STRING_AGG(pub_date, '|'::TEXT) AS pub_dates
    FROM inst_pub_date_extract
    GROUP BY instance_id
)

In Amazon Redshift:

Just as we mentioned in the previous section, Amazon Redshift cannot natively process JSON arrays.