[FOLIO-921] Handling array structures in PostgreSQL JSON objects Created: 03/Nov/17 Updated: 28/Jan/19 Resolved: 13/Dec/17 |
|
| Status: | Closed |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Task | Priority: | P2 |
| Reporter: | Niels Erik Nielsen | Assignee: | shale99 |
| Resolution: | Done | Votes: | 0 |
| Labels: | core, sprint26, sprint28 | ||
| Σ Remaining Estimate: | Not Specified | Remaining Estimate: | Not Specified |
| Σ Time Spent: | 2 hours, 30 minutes | Time Spent: | 2 hours, 30 minutes |
| Σ Original Estimate: | Not Specified | Original estimate: | Not Specified |
| Issue links: |
|
||||||||||||||||||||
| Sub-tasks: |
|
||||||||||||||||||||
| Sprint: | |||||||||||||||||||||
| Description |
|
We may have some challenges with FOLIO records containing repeated fields, represented as array of either strings or complex objects, within the main JSON object. The issues pertain to searching, sorting and data integrity and includes potential performance issues with either of these requirements. The use of repeated fields is especially prevalent in the "Instance" record at this point, I believe. It has around 20 fields with three quarters of them being repeatable. The spec for "Instance" (work in progress): https://github.com/folio-org/mod-inventory-storage/blob/MODINV24-from-v-5-1-1/ramls/instance.json Example of an instance: https://github.com/folio-org/mod-inventory-storage/blob/MODINV24-from-v-5-1-1/sample-data/instances/transparent-water.json Some of the repeated fields are arrays of simple strings so far (subjects, notes), some are arrays of simple strings that serves as foreign key identifiers (languages), but most are multi-property objects with their own foreign key identifiers, like 'creators', 'classifications', 'identifiers'. |
| Comments |
| Comment by Jakub Skoczen [ 05/Dec/17 ] |
|
Array searching workaround: https://github.com/folio-org/cql2pgjson-java language = \"en\" For array of objects: contributor = "\"value\" : \"xxx\"" |
| Comment by Jakub Skoczen [ 05/Dec/17 ] |
|
shale99 can you please check if the above will use an index, if not what can be done to support it. |
| Comment by Julian Ladisch [ 05/Dec/17 ] |
|
For array of objects: (removed one space) contributor = "\"value\": \"xxx\"" |
| Comment by shale99 [ 06/Dec/17 ] |
|
correct me if i am missing something, the above generates an AND query looking for "value": AND "xxx" in the contributor field - no ordering, meaning theoretically we can get back (although unlikely) |
| Comment by shale99 [ 06/Dec/17 ] |
|
from a trimming standpoint: GOOD BAD - query=creators=\"name\" :\"ari*\" - unless we are looking for a value that starts with ari , in which case this will be ok - this will not find an \"ari\" in the value - once again, this may be ok, but just need to be aware |
| Comment by shale99 [ 06/Dec/17 ] |
|
trigram gin index is used |
| Comment by shale99 [ 06/Dec/17 ] |
|
if we are ok with the above, seems reasonable at this stage, unless there are use cases which require more exact searching, i think we can close this? |
| Comment by Julian Ladisch [ 06/Dec/17 ] |
|
contributor adj "\"value\": \"xxx\"" is the way to go. contributor = "\"value\": \"xxx\"" contains two words that can match at any position so it matches adj enforces consecutive matching. |
| Comment by Jakub Skoczen [ 13/Dec/17 ] |
|
Closing, let's make sure we have added the notes in the CQL2PG README shale99 Julian Ladisch |