(contributed by Joanne Leary - 3-4-26)
Data Array Extraction Cheat Sheet - 3-5-26.xlsx
The following jsonb extract expressions will be used to get values from the json data array of a table.
jsonb_extract_path ---- this expression specifies the PATH necessary to get the last field entered in the path; the result is a json object and will display in double quotes
jsonb_extract_path_text ---- this expression does the same thing as jsonb_extract_path, but returns a TEXT object (displays without quotes)
jsonb_array_elements ---- this expression is used for extracting values within an array object
the operator #>>. The format for using this operator is:
table_name.jsonb #>> '{path_of_fields_in_the_hierarchy_separated_by_commas}'Example: audit_loan.jsonb #>> '{loan, metadata, updatedByUserId}'
-- this will get the value for "updatedByUserId" (the result is in Text format)
Extracting "First Level" Data Fields
Example: In the data array of the folio_users.user table (screenshot below), the field 'active' is a "first level" hierarchical field. Other first-level fields include: id, barcode, metadata, personal, proxyFor, username, createdDate, departments, patronGroup, updatedDate and externalSystemId.
You will see that some of the first-level fields have subordinate data in second-levels fields ("metadata" and "personal" have subordinate fields). The following query will get the data out of first-level fields, when those fields don't have subordinate fields:
SELECT
users.id AS user_id,
jsonb_extract_path_text (users.jsonb, 'active')::boolean AS patron_status – "active" is a boolean value (true or false)
FROM folio_users.users
The following expression will also get the value for "active":
SELECT
users.id AS user_id,
users.jsonb #>> '{active}" AS patron_status
FROM folio_users.users
Extracting "Second Level" Data Fields
In the example below, 'lastName' and 'firstName' are both "second-level" fields found under "personal."
SELECT
users.id AS user_id,
jsonb_extract_path_text (users.jsonb, 'personal', 'lastName') AS user_last_name,
jsonb_extract_path_text (users.jsonb, 'personal', 'firstName') AS user_first_name
FROM
folio_users.users
The following expression will also get "lastName" and "firstName":
SELECT
users.id AS user_id,
users.jsonb #>> '{personal, lastName}' AS user_last_name
users.jsonb #>> '{personal, firstName}' AS user_first_name
FROM
folio_users.users
Extracting "Third level" Data Fields
Some tables, such as the folio_circulation.audit_loan table, have third-level data fields that are comprised of text objects (they can also have array objects, but that is covered in the next section – note: array objects can appear at any level within the json data). The json statement for extracting third-level text fields is exactly the same as for first- and second-level data fields, except there are three fields listed in the parentheses, instead of just one or two.
In this example, "name" and "updatedByUserId" are third-level fields.
This is the code to extract them. Note that the fields listed within the parentheses must be in the same hierarchical order as they appear in the json data.
SELECT
audit_loan.id AS audit_loan_id,
jsonb_extract_path_text (audit_loan.jsonb, 'loan', 'status', 'name') AS loan_status_name,
jsonb_extract_path_text (audit_loan.jsonb, 'loan', 'metadata', 'updatedByUserId') AS loan_updated_by_user_id
FROM
folio_circulation.audit_loan
The following expression will also get the third-level values:
SELECT
audit_loan.id AS audit_loan_id,
audit_loan.jsonb #>> '{loan, status, name}' AS loan_status__name,
audit_loan.jsonb #>> '{loan, metadata, updatedByUserId}' AS loan_updated_by_user_id
FROM
folio_circulation.audit_loan
Extracting Arrays
1. Arrays embedded in text objects
In the previous examples, the elements being extracted were "text objects" within the table's JSON array, and were denoted by curly brackets { }. In this next example, we need to extract multiple values from an "array" object (denoted by square brackets [ ] ) that is embedded in a hierarchy of text elements. This calls for nested json extract statements, as well as a special function to extract the array.
When extracting arrays embedded in text objects, it's usually best to use a "cross join lateral" statement in the FROM stanza. This cross join creates a "json object" that can then be further extracted in the Select stanza. Warning: the records retrieved through a cross join will only be those that have the array elements you're looking for; records without those elements will drop out. See "Alternative to using cross joins" (that captures all the records) near the end of this page.
It's also possible to NOT use a cross join at all, but to nest the json extract statements in one long expression in the Select stanza. However, you lose the ability to include the ordinality of the values extracted. See Summary at the end of this page ("Three ways to extract arrays").
Nested extract statements are applied from the outside in. So the left-most json statement applies to the right-most element in the parentheses
The entire nested extract statement (created by the cross join) is a json object
That json object is then used in the select stanza to get the needed fields within it (by using another json extract statement on it)
Note that SQL performs the work in the FROM stanza before it performs the work in the SELECT stanza; that's how it knows about the json object you created through the cross join, and why you can use that object in the Select stanza
The example below uses the folio_users.custom_fields table, which has an embedded array called "values" containg three fields: "id","value" and "default". The "values" array is embedded within two preceding text objects:
"selectField" (text object) – topmost level
"options" (text object) – middle level
"values" (array object) – bottom level. This array object contains the fields we need: "id" and "value"
SELECT
cft.id AS custom_fields_id,
cft.ref_id,
cft.name AS custom_fields_name,
jsonb_extract_path_text (values.jsonb,'id') AS value_id,
----- this extracts the "id" element from the values object that we created in the cross join below
jsonb_extract_path_text (values.jsonb,'value') AS value_name
----- this extracts the "value" element from the values object that we created in the cross join below. NOTE: the two json extract statements above can be replaced by a shorthand formula, using the "#>>" operator:
values.jsonb #>> '{id}'
values.jsonb #>> '{value}'
FROM folio_users.custom_fields AS cf
CROSS JOIN LATERAL jsonb_array_elements (jsonb_extract_path (cf.jsonb, 'selectField', 'options', 'values')) AS values (jsonb)
----- "jsonb_array_elements" works on the 'values' array object at the end of the parentheses; we use this extract expression because "values" is an array
----- "jsonb_extract_path" looks for the path of elements to follow in order to get to the values array – the elements listed must be in the same hierarchical order as they appear in the table
----- the entire cross join statement produces a jsonb object which I named "values" (note: what you name this object is totally arbitrary - you could call it "stuff" if you felt like it – "stuff (jsonb)")
LEFT JOIN folio_users.custom_fields__t AS cft
ON jsonb_extract_path_text (cf.jsonb,'id')::UUID = cft.id
----- the custom_fields__t table has the custom fields id, the ref_id and the custom fields name (joining to this table is not required – I just wanted to include those fields in the results)
ORDER BY name, jsonb_extract_path_text (values.jsonb,'id'), jsonb_extract_path_text (values.jsonb,'value')
Note:
If you want to include "ordinality" as part of your results – this is the order in which the extracted values appear in the source record – you would insert "WITH ordinality" before "AS values (jsonb)":
CROSS JOIN LATERAL jsonb_array_elements (jsonb_extract_path (cf.jsonb, 'selectField', 'options', 'values')) WITH ordinality AS values (jsonb)Then you would add a line in the Select stanza to show the ordinality: values.ordinality AS custom_fields_ordinality
RESULT:
2. Array is at the top-most hierarchical level in a table
Sometimes you want to extract an array that is at the top-most hierarchical level of a table – that is, it is not embedded within any other object. An example of this is "departments" element in the folio_users.users table:
Because "departments" is an array, you still have to extract it with a cross join statement, as in the previous example. In the code below, I am using the shorthand method to extract the various elements:
SELECT
WHEN (users.jsonb #>> '{active}')::BOOLEAN = true
THEN 'Active'
ELSE 'Expired'
END AS patron_status_name,
users.jsonb #>>'{personal, lastName}' AS last_name, ------- this extrcts the last name from the "personal" object (second-level extraction)
users.jsonb #>>'{personal, firstName}' AS first_name, ------- this extracts the first name from the "personal' object (second-level extraction)
users.jsonb #>>'{username}' AS net_id, ------ this extracts "username" from the table (first-level extraction)
users.jsonb #>>'{barcode}' AS user_barcode, ------ this extracts "barcode" from the users table (first-level extraction)
depts.jsonb #>>'{}' AS dept_id, ----- this extracts the array values from the "depts" json object created through the cross join. Note that the curly brackets are empty; this is because there is no tag for the values, it's just the values themselves. If there were a tag for the values, the tag name would go in the curly brackets
ud.name AS department_name,
depts.ordinality AS dept_ordinality ------ this displays the ordinality of the departments (the sequence of occurrence in the record)
FROM folio_users.users
CROSS JOIN LATERAL jsonb_array_elements (jsonb_extract_path (users.jsonb,'departments'))
WITH ordinality AS depts (jsonb)
LEFT JOIN folio_users.departments__t AS ud ------- I am joining to the the departments__t table to get the department name
ON (depts.jsonb #>> '{}')::UUID = ud.id ------- note that I'm using the shortcut expression to join to the id in the departments__t table
RESULT:
3. Summary: Three ways to extract arrays
This illustrates three methods for extracting an embedded array from the folio_users.custom_fields table (example)
-- a. No cross join -- just use triple-nested json extract statements in the Select clause
SELECT
cft.id AS custom_fields_id,
cft.ref_id,
cft.name AS custom_fields_name,
jsonb_extract_path_text (jsonb_array_elements (jsonb_extract_path (cf.jsonb, 'selectField', 'options', 'values')),'id') AS value_id,
jsonb_extract_path_text (jsonb_array_elements (jsonb_extract_path (cf.jsonb, 'selectField', 'options', 'values')),'value') AS value_name
FROM folio_users.custom_fields AS cf
LEFT JOIN folio_users.custom_fields__t AS cft
ON jsonb_extract_path_text (cf.jsonb,'id')::UUID= cft.id
ORDER BY name, value_id, value_name
;
-- b. Use a cross join and put the result in a single-level json extract statement in the Select clause
SELECT
cft.id AS custom_fields_id,
cft.ref_id,
cft.name AS custom_fields_name,
jsonb_extract_path_text (values.jsonb,'id') AS value_id,
jsonb_extract_path_text (values.jsonb,'value') AS value_name
FROM folio_users.custom_fields AS cf
CROSS JOIN LATERAL
jsonb_array_elements (jsonb_extract_path (cf.jsonb, 'selectField', 'options', 'values'))
AS values (jsonb)
LEFT JOIN folio_users.custom_fields__t AS cft
ON jsonb_extract_path_text (cf.jsonb,'id') :: UUID= cft.id
ORDER BY name, jsonb_extract_path_text (values.jsonb,'id'), jsonb_extract_path_text (values.jsonb,'value')
;
-- c. Use a cross join but use the shortcut method in the Select clause to get the values in the array
SELECT
cft.id AS custom_fields_id,
cft.ref_id,
cft.name AS custom_fields_name,
values.jsonb #>> '{id}' AS value_id,
values.jsonb #>> '{value}' AS value_name
FROM folio_users.custom_fields AS cf
CROSS JOIN LATERAL
jsonb_array_elements (jsonb_extract_path (cf.jsonb, 'selectField', 'options', 'values'))
AS values (jsonb)
LEFT JOIN folio_users.custom_fields__t AS cft
ON jsonb_extract_path_text (cf.jsonb,'id') :: UUID= cft.id
ORDER BY name, jsonb_extract_path_text (values.jsonb,'id'), jsonb_extract_path_text (values.jsonb,'value')
---- can also use: ORDER BY name, values.jsonb #>> '{id}', values.jsonb #>> '{value}'
4. Alternative to using cross joins, and how to prevent records without the array elements from dropping out
Cross join queries will return only those records that have all the array elements you are trying to find. It's possible to use multiple cross join statements in the FROM clause to get multiple array values (for example, subjects, contributors and languages from the instance table) - but the results will show only those records that have ALL the array elements specified in the cross joins.
If you want to find records that have ANY of the particular array fields, AND show all the records that don't have any of those array fields, the better plan is to do a 2-part query.
First, create a query that has a nested jsonb extract statement (in the Select clause) for EACH of the arrays you're trying to extract (example below) – do not use cross joins
NB: if you really want to use cross joins, create one cross join query for each array field that you want to extract; don't combine all the extracts in one cross-join query
You will have to left-join each of the query results to the all-records query (created in the second step)
Second, create a query that finds all records from the source table (for example, get all instance ids from the folio_inventory.instance table), and left-join the results of the first query (or queries) to it
In this example, we're finding three array elements from the folio_inventory.instance table: contributors, languages, and subjects, then joining the result to a complete set of instance records.
WITH recs AS ----- Get the array elements for contributors, languages and subjects using nested jsonb extract statements in the Select clause
(SELECT
ii.id,
ii.jsonb #>> '{hrid}' AS instance_hrid, ---- first level text extract
ii.jsonb #>> '{title}' AS title, ---- first level text extract
jsonb_extract_path_text (jsonb_array_elements (jsonb_extract_path (ii.jsonb, 'contributors')),'name') AS contributors, -- contributors array extract (second-level extract)
(jsonb_array_elements (jsonb_extract_path (ii.jsonb,'languages'))) #>>'{}' AS languages, ----- languages array extract (first level array extract) – note the empty curly brackets
jsonb_extract_path_text (jsonb_array_elements (jsonb_extract_path (ii.jsonb, 'subjects')),'value') AS subjects ----- subject array extract (second-level extract)
FROM folio_inventory.instance AS ii
)
SELECT ------- Get all the records from the instance__t table; left join the results of the first query that got the array elements
instance__t.id,
instance__t.hrid,
instance__t.title,
string_agg (distinct recs.contributors,' | ') AS contributors_aggregated,
string_agg (distinct recs.languages,' | ') AS languages_aggregated,
string_agg (distinct recs.subjects,' | ') AS subjects_aggregated
FROM folio_inventory.instance__t
LEFT JOIN recs
ON instance__t.id = recs.id
GROUP BY
instance__t.id,
instance__t.hrid,
instance__t.title
;
RESULT:
5. Example - Revising your query to use Data Array Extractions instead of Derived Tables to get up-to-the minute data
Suppose your library has a heavily-used collection of materials (such as course reserve books or equipment) that are loaned for very short loan periods throughout the day. You want to find the items that are overdue, with item and borrower information, loan dates, due dates and amount of time overdue.
Using derived tables to write the query will not give you current data, since those tables are a snapshot of data from the previous day. You will have to write your query with primary source tables, because those are constantly being updated throughout the day.
Here is a comparison of methods for using derived tables for this query, and using primary source tables. Primary source tables will require extracting data elements from arrays, and the queries will usually require using more tables.
Find overdue hourly loans using derived tables (you will get yesterday’s data):
--Find overdue hourly loans using derived tables (you will get yesterday’s data) SELECT loans_items.loan_id, loans_items.hrid AS item_hrid, loans_items.barcode, loans_items.material_type_name, loans_items.loan_policy_name, users_groups.user_first_name, users_groups.user_last_name, items_holdings_instances.title, items_holdings_instances.call_number, items_holdings_instances.enumeration, items_holdings_instances.item_copy_number, loans_items.current_item_effective_location_name, TO_CHAR (now()::timestamp,'mm-dd-yyyy hh:mi am') AS current_date_and_time, TO_CHAR (loans_items.loan_date::timestamp,'mm-dd-yyyy hh:mi am') AS loan_date, TO_CHAR (loans_items.loan_due_date::timestamp,'mm-dd-yyyy hh:mi am') AS due_date, loans_items.loan_return_date::timestamp, CONCAT ( EXTRACT (day FROM (now()::timestamp - loans_items.loan_due_date::timestamp)),' days ', EXTRACT (hour FROM (now()::timestamp - loans_items.loan_due_date::timestamp)),' hours ', EXTRACT (minute FROM (now()::timestamp - loans_items.loan_due_date::timestamp)),' minutes' ) AS time_overdue, loans_items.loan_status FROM folio_derived.loans_items LEFT JOIN folio_derived.users_groups ON coalesce (loans_items.user_id::varchar ,'0') = users_groups.user_id::varchar LEFT JOIN folio_derived.items_holdings_instances ON loans_items.item_id = items_holdings_instances.item_id WHERE loans_items.loan_due_date::timestamp < now()::timestamp AND loans_items.loan_status = 'Open' AND (loans_items.material_type_name in ('Laptop','Equipment','Peripherals') OR loans_items.loan_policy_name ILIKE '%hour%' OR loans_items.current_item_effective_location_name ILIKE '%res%') ;
Find overdue hourly loans using primary tables (you will get current data)
-- Find overdue hourly loans using primary tables (you will get current data) SELECT loan__t.id AS loan_id, item__t.hrid AS item_hrid, item__t.barcode, material_type__t.name AS material_type_name, loan_policy__t.name AS loan_policy_name, users.jsonb#>>'{personal, firstName}' AS user_first_name, users.jsonb#>>'{personal, lastName}' AS user_last_name, instance__t.title, item.jsonb#>>'{effectiveCallNumberComponents,callNumber}' AS call_number, item__t.enumeration, item__t.copy_number AS item_copy_number, location__t.name AS current_item_effective_loation_name, TO_CHAR (now()::timestamp,'mm-dd-yyyy hh:mi am') AS current_date_and_time, TO_CHAR (loan__t.loan_date::timestamp,'mm-dd-yyyy hh:mi am') AS loan_date, TO_CHAR (loan__t.due_date::timestamp,'mm-dd-yyyy hh:mi am') AS due_date, loan__t.return_date::timestamp, CONCAT ( EXTRACT (day FROM (now()::timestamp - loan__t.due_date::timestamp)),' days ', EXTRACT (hour FROM (now()::timestamp - loan__t.due_date::timestamp)),' hours ', EXTRACT (minute FROM (now()::timestamp - loan__t.due_date::timestamp)),' minutes' ) AS time_overdue, loan.jsonb#>>'{status,name}' AS loan_status FROM folio_circulation.loan__t LEFT JOIN folio_circulation.loan ON loan__t.id = loan.id LEFT JOIN folio_users.users ON coalesce (loan__t.user_id::varchar,'1')::varchar = users.id::varchar LEFT JOIN folio_inventory.item__t ON loan__t.item_id = item__t.id LEFT JOIN folio_inventory.item ON item__t.id = item.id LEFT JOIN folio_inventory.material_type__t ON item__t.material_type_id = material_type__t.id LEFT JOIN folio_circulation.loan_policy__t ON loan__t.loan_policy_id = loan_policy__t.id LEFT JOIN folio_inventory.holdings_record__t ON item__t.holdings_record_id = holdings_record__t.id LEFT JOIN folio_inventory.instance__t ON holdings_record__t.instance_id = instance__t.id LEFT JOIN folio_inventory.location__t ON item__t.effective_location_id = location__t.id WHERE loan__t.due_date::timestamp < now()::timestamp AND loan.jsonb#>>'{status,name}' = 'Open' AND (material_type__t.name IN ('Laptop','Equipment','Peripherals') OR loan_policy__t.name ILIKE '%hour%' OR location__t.name ILIKE '%res%') ;
A Third Option:
If going through the trouble of extracting array elements is just too daunting, you can cheat: copy and paste the derivation code for the derived tables you need into your query, as subqueries at the beginning. Then, refer to the subqueries in the same was as you would the original derived tables. The results will be current data, because you are running the code on-the-fly.
Upon testing, all three options take about the same amount of time to run (at least in this example). Generally speaking, queries doing array extractions will take longer to run, but may be preferable if you want up-to-the-minute data or if you aren’t confident in derived table accuracy (because the source tables DO occasionally change while the derived tables don’t, or the derived tables don’t have all the fields you need, or the derived tables sometimes have coding errors). If you can do your own extractions, you will be invincible.
