Handling array structures in PostgreSQL JSON objects (FOLIO-921)

[FOLIO-922] Searching in repeated fields (arrays) in PostgreSQL JSON records. Created: 03/Nov/17  Updated: 03/Apr/19  Resolved: 03/Apr/19

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None
Parent: Handling array structures in PostgreSQL JSON objects

Type: Sub-task Priority: P2
Reporter: Niels Erik Nielsen Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: platform-backlog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Duplicate
duplicates RMB-380 implement new approach for searching ... Closed
Relates
relates to CQLPG-42 Array searches in the FT mode Closed
relates to MODINVSTOR-182 analyze and optimize identifiers arra... Closed
Sprint:
Development Team: Core: Platform

 Description   

The issue is how to perform searches for records that contain certain values in the elements of an array.

They can be search along the lines of object identities for sure, like asking for records that have an element that is an object with certain properties containing certain values:

For example find within array objects with a property "value": "xyz"

select * from diku_mod_inventory_storage.instance where jsonb->'identifiers' @> '[

{"value": "9780552142352"}

]';

Or with two properties - in this example one of the properties is a reference to the identifier type "ISBN":

select * from diku_mod_inventory_storage.instance where jsonb->'identifiers' @> '[

{"value": "9780552142352", "typeId": "8261054f-be78-422d-bd51-4ed9f33c3422"}

]';

But the question is, can we also somehow search with truncation? That would obviously be needed for searching for authors in the 'creators' array.

And how will it perform?



 Comments   
Comment by Julian Ladisch [ 08/Dec/17 ]

This shell script searches for all instances where ISBN equals the passed argument $1.

SERVER=http://localhost:9130
TOKEN=$( curl -D - -H "X-Okapi-Tenant: diku" -H "Content-type: application/json" \
  -d '{"tenant":"diku","username":"diku_admin","password":"admin"}' $SERVER/authn/login | grep -i "^x-okapi-token: " )
ISBNID=$( curl -H "$TOKEN" -H "X-Okapi-Tenant: diku" $SERVER/identifier-types?query=name==ISBN | sed -E -n 's/.*"id"\s*:\s*"([^"]+)".*/\1/p' )
ISBN=$1
CQL=$(echo 'identifiers=="*\"value\": \"ISBN\", \"identifierTypeId\": \"ISBNID\"*"' | sed "s/ISBN/$ISBN/; s/ISBNID/$ISBNID/" );
echo $CQL
CQL=$(echo "$CQL" | sed 's/ /%20/g; s/"/%22/g; s/\\/%5c/g')  # urlencode
curl -H "$TOKEN" -H "X-Okapi-Tenant: diku" $SERVER/instance-storage/instances?query=$CQL

This is an example CQL query:

identifiers=="*\"value\": \"9783782791120\", \"identifierTypeId\": \"8261054f-be78-422d-bd51-4ed9f33c3422\"*"

value is always before identifierTypeId because shorter keys are stored before longer keys: https://www.postgresql.org/docs/current/static/datatype-json.html

Comment by Julian Ladisch [ 18/Jan/19 ]

We may use a trigger that automatically extracts the values and writes them into read only fields:

UPDATE instance SET jsonb=jsonb_set(jsonb_set(
  jsonb,
  '{identifiervalues}',
  ( SELECT COALESCE(jsonb_agg(value), '[]')
    FROM jsonb_to_recordset(jsonb->'identifiers')
    AS x(key text, value text)
    WHERE value IS NOT NULL
  )),
  '{identifiervaluesisbn}',
  ( SELECT COALESCE(jsonb_agg(value), '[]')
    FROM jsonb_to_recordset(jsonb->'identifiers')
      AS y(key text, value text)
    WHERE key='26978ebf-eceb-4537-b5a7-9482c2cce893'
   )
);

A record like this

{
  "identifiers": 
  [{"key": "26978ebf-eceb-4537-b5a7-9482c2cce893", "value": "020163385X"},
   {"key": "26978ebf-eceb-4537-b5a7-9482c2cce893", "value": "9780201633856"},
   {"key": "1e276ff3-48d5-41c3-483a-58298be393a8", "value": "ocn294998755"}
  ]
}

becomes

{
  "identifiers": 
  [{"key": "26978ebf-eceb-4537-b5a7-9482c2cce893", "value": "020163385X"},
   {"key": "26978ebf-eceb-4537-b5a7-9482c2cce893", "value": "9780201633856"},
   {"key": "1e276ff3-48d5-41c3-483a-58298be393a8", "value": "ocn294998755"}
  ],
  "identifiervalues": ["020163385X", "9780201633856", "ocn294998755"],
  "identifiervaluesisbn": ["020163385X", "9780201633856"]
}

These new fields can be indexed and used with full text search including right truncation.

Comment by Jakub Skoczen [ 03/Apr/19 ]

Duplicate of RMB-380 Closed

Generated at Thu Feb 08 23:09:18 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.