[FOLIO-921] Handling array structures in PostgreSQL JSON objects Created: 03/Nov/17  Updated: 28/Jan/19  Resolved: 13/Dec/17

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P2
Reporter: Niels Erik Nielsen Assignee: shale99
Resolution: Done Votes: 0
Labels: core, sprint26, sprint28
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: 2 hours, 30 minutes Time Spent: 2 hours, 30 minutes
Σ Original Estimate: Not Specified Original estimate: Not Specified

Issue links:
Relates
relates to MODCXINV-7 Convert from Codex CQL to Inventory CQL Closed
relates to UIIN-3 Search Instances v1 Closed
Sub-tasks:
Key
Summary
Type
Status
Assignee
FOLIO-922 Searching in repeated fields (arrays)... Sub-task Closed  
FOLIO-923 Sorting on repeated fields in Postgre... Sub-task Open  
FOLIO-924 Data integrity for repeated fields wi... Sub-task Open  
Sprint:

 Description   

We may have some challenges with FOLIO records containing repeated fields, represented as array of either strings or complex objects, within the main JSON object.

The issues pertain to searching, sorting and data integrity and includes potential performance issues with either of these requirements.

The use of repeated fields is especially prevalent in the "Instance" record at this point, I believe. It has around 20 fields with three quarters of them being repeatable.

The spec for "Instance" (work in progress): https://github.com/folio-org/mod-inventory-storage/blob/MODINV24-from-v-5-1-1/ramls/instance.json

Example of an instance: https://github.com/folio-org/mod-inventory-storage/blob/MODINV24-from-v-5-1-1/sample-data/instances/transparent-water.json

Some of the repeated fields are arrays of simple strings so far (subjects, notes), some are arrays of simple strings that serves as foreign key identifiers (languages), but most are multi-property objects with their own foreign key identifiers, like 'creators', 'classifications', 'identifiers'.



 Comments   
Comment by Jakub Skoczen [ 05/Dec/17 ]

Array searching workaround: https://github.com/folio-org/cql2pgjson-java

language = \"en\"

For array of objects:

contributor = "\"value\" : \"xxx\""

Comment by Jakub Skoczen [ 05/Dec/17 ]

shale99 can you please check if the above will use an index, if not what can be done to support it.

Comment by Julian Ladisch [ 05/Dec/17 ]

For array of objects: (removed one space)

contributor = "\"value\": \"xxx\""

Comment by shale99 [ 06/Dec/17 ]

correct me if i am missing something, the above generates an AND query looking for "value": AND "xxx" in the contributor field - no ordering, meaning theoretically we can get back (although unlikely)
"contributor": ["anyfield": "asd asasd \"value\"" , "anyfield2": "\xxx\""]
or
"contributor": ["value": "asd asasd" , "xxx": "asd swwqq"]

Comment by shale99 [ 06/Dec/17 ]

from a trimming standpoint:
the value must have at least one space before it, key does not. for example

GOOD
query=creators=\"name\": \"ari*\"
query=creators=\"name\" : \"ari*\"
query=creators=\"name\" : \"ari*\"

BAD - query=creators=\"name\" :\"ari*\" - unless we are looking for a value that starts with ari , in which case this will be ok - this will not find an \"ari\" in the value - once again, this may be ok, but just need to be aware

Comment by shale99 [ 06/Dec/17 ]

trigram gin index is used

Comment by shale99 [ 06/Dec/17 ]

if we are ok with the above, seems reasonable at this stage, unless there are use cases which require more exact searching, i think we can close this?

Comment by Julian Ladisch [ 06/Dec/17 ]

contributor adj "\"value\": \"xxx\"" is the way to go.

contributor = "\"value\": \"xxx\"" contains two words that can match at any position so it matches
"contributor": ["anyfield": "xxx", "value": "asd asasd"]

adj enforces consecutive matching.

Comment by Jakub Skoczen [ 13/Dec/17 ]

Closing, let's make sure we have added the notes in the CQL2PG README shale99 Julian Ladisch

Generated at Thu Feb 08 23:09:18 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.