[FOLIO-420] CQL2PgJSON: check index field name Created: 19/Dec/16  Updated: 12/Nov/18  Resolved: 10/Mar/17

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Bug Priority: P3
Reporter: Julian Ladisch Assignee: frances.webb@cornell.edu
Resolution: Done Votes: 0
Labels: sprint10, sprint5, sprint6, sprint7, sprint8
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Relates
relates to FOLIO-454 CQL2PgJSON: resolve ambiguous field n... Open
relates to FOLIO-405 implement CQL-to-Postgres/JSONB trans... Closed
Sprint:

 Description   

When a schema has been passed into CQL2PgJSON it should check that any index name in the CQL exists in the schema, and otherwise should throw some exception.



 Comments   
Comment by frances.webb@cornell.edu [ 20/Dec/16 ]

I'm working on this issue, but don't have the permissions to actually assign it to myself.

Comment by Jakub Skoczen [ 03/Jan/17 ]

frances.webb@cornell.edu has there been any progress on this item?

Just a note that the look up should be recursive: not only top level object keys should be searched but it should recurse deep into the object. This means that there can be multiple matches and all those matches should be reported to the client code. The client code can decide what to do, e.g use the first match or consider some external configuration to select one.

Additionally, the look up should check the type of the value under the key (string, number, etc) because that will be important for constructing the WHERE clause.

Comment by frances.webb@cornell.edu [ 17/Jan/17 ]

Julian and I met about this issue yesterday, and it looks like the code that I had been working on is compatible with the changes that have been made to the class. The class's API will stay essentially the same except that some new exceptions will be thrown for the added validation being run.

Comment by Jakub Skoczen [ 19/Jan/17 ]

OK. I would like us to touch base about a potential format for some configuration file to drive the schema-backed translation (e.g a way to express what indices are mapped into what fields).

Comment by frances.webb@cornell.edu [ 02/Feb/17 ]

As I am working on this issue, I am finding the need to make more minor changes to the class API. As expected, most of that relates to the possibility of new exceptions. One case where I'm uncertain whether to throw an exception is the case of a CQL query that cannot be mapped unambiguously to a single node in the object model json. One one hand, JSONB is accepting of ambiguous queries, and for smaller databases there's no problem. On the other hand, leaving the query in an ambiguous state means that Postgres will not be able to take full advantage of its indexes, which may be a significant problem once databases grow to production size. I know that a single query that fails to take advantage of indexing can cause a great deal of trouble, particularly if it's hard to tell that that's happening. (For our institution, mod-users will contain tens of thousands of records, and be tiny compared to many other modules' databases.)

My instinct is to do what SQL would traditionally do, and reject ambiguity in queries, but to possibly allow a option for looser interpretation. That option, if implemented, might be better determined in the object model json by the module owning the database than in the client code, as the database owner can have a better understanding of whether a particular database will be negatively impacted.

Regardless, this is something that can be switched if I implement one way but the group would prefer something different. I would like to know whether the group has strong feelings about it, though.

Comment by Julian Ladisch [ 02/Feb/17 ]

I suggest that you implement in the way that is most easy for you.

There are several possible ways to handle ambiguity so that should be done discussed and implemented in a separate issue: FOLIO-454 Open .

Comment by frances.webb@cornell.edu [ 07/Mar/17 ]

There's a candidate commit here, but I am not ready to make a pull request on this. I'd like to get Julian's feedback on my integration with his code, and see if everyone is comfortable with the new thrown exceptions.

There's an infrastructure here for possibly validating index field types, but I'm not sure how we want that to work so it may require reworking if it's more helpful to have the main CQL2PgJSON code able to identify the internal storage type of fields than it is to be able to reject CQL queries that provide string arguments for boolean fields.

My primary efficiency concern was the validation of CQL fields, and the secondary efficiency concern was the validation of the Schemas themselves. To that end I assumed that the data structures stored may grow to have large JSON schemas, but that name collision would remain moderate.

I'm sure that I haven't trapped for all possible errors in the JSON schema, but since we don't currently have a formal schema for the schemas, that can evolve if needed.

https://github.com/folio-org/cql2pgjson-java/commit/a49e312ace0cbc176b00453a5590978c3bedb0ae

Generated at Thu Feb 08 23:05:38 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.