[FOLIO-592] meta-data section in FOLIO records (createdDate, updatedDate, creator, etc) Created: 08/May/17  Updated: 18/Jan/19  Resolved: 03/Aug/17

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P1
Reporter: Jakub Skoczen Assignee: Marc Johnson
Resolution: Done Votes: 0
Labels: core, for-next-sprint, sprint14, sprint16, sprint17, sprint19, team2
Remaining Estimate: Not Specified
Time Spent: 3 hours, 15 minutes
Original estimate: Not Specified

Issue links:
Blocks
blocks CIRCSTORE-16 operator_id is missing in loan_histor... Closed
blocks RMB-48 implement handling of meta-data secti... Closed
is blocked by RMB-30 Trouble with a schema file referring ... Closed
Relates
relates to MODUSERS-23 set createdDate and updatedDate field... Closed
relates to CIRCSTORE-14 create an endpoint for requests objects Closed
relates to RMB-50 Clean the temp space after build to r... Closed
relates to UIREQ-1 Requests: View Request Details v1 Closed
relates to UX-39 UX: Record Created, Last Updated and ... Closed
relates to FOLIO-972 The regex pattern for UUID in schemas... Closed
relates to MODUSERS-16 extend schema to support fields requi... Closed
Sprint:

 Description   

Consider returning this information in form of HTTP headers and include for any entity stored in FOLIO.



 Comments   
Comment by shale99 [ 23/May/17 ]

looking at the normal workflow

1. get request (rmb gets the requests , parses and calls implementing function)
2. process (implementing module gets called and handles request)
3. push to db (implementing module uses a db client - potentially rmb's client to save to db)
3.a auto save timestamps
Option 1: implementing module handles this, for example, the config module creates a trigger to create these two timestamps when it creates the schema during tenant activation
Option 2: rmb handles this, rmb adds two columns to module's tables - prefer not to do this

3. get from db
3.b auto get timestamp fields (rmb) - the rmb's postgres client can pull these two fields into the result set so that each result would be: id | json | creation date | update date

At this point the RMB hands off the result set to the implementing module and the module returns the result set which conforms to the declared raml - but there may be additional processing by the module before it is returned.

we can add two headers to the response so that the module will need to populate this info in the response, but it will need to be done by the module, the rmb is not db result set aware, only http response conforming to raml aware. however, how will this work for responses that include more then one record? it would seem cumbersome to have an array which matches up to the result set, no?

Comment by Jakub Skoczen [ 23/May/17 ]

Decided to allow users of RMB to wire headers (to be defined as a RAML trait) with database fields.

Decision needs to be made about how to report timestamp field for lists of records (e.g user list)

Comment by Jakub Skoczen [ 24/May/17 ]

For the time being we have decided that the timestamp fields will be included directly in the schema, see MODUSERS-16 Closed

Comment by shale99 [ 25/May/17 ]

one more minor issue to decide on, currently for example, in mod-config - the dates are part of the schema. there is an option to create a general schema which contains updateDate, creationDate and can be referenced to by a storage / bl schema. this can be a type of meta-data schema which contains those two field for now. a future field may be updatedBy, etc...

Comment by shale99 [ 25/May/17 ]

i can add the meta-data schema if we decide to go that route (to the raml repo)

Comment by shale99 [ 28/May/17 ]

i have done some research on this from the server side angle. if we decide on a metadata schema it can be populated by triggers in the following manner


CREATE OR REPLACE FUNCTION update_user_trigger()
RETURNS TRIGGER AS $$
DECLARE
    inject jsonb;
BEGIN
    inject = json_build_object('createDate',current_timestamp,'createdBy', current_user);
    NEW.jsonb = jsonb_set(NEW.jsonb, '{metaData}' ,  inject , true);
    RETURN NEW;
END;
$$ language 'plpgsql';
CREATE TRIGGER update_trigger BEFORE INSERT ON myuniversity3_mod_users.users FOR EACH ROW EXECUTE PROCEDURE  update_user_trigger();

the above will create a trigger on insert and create the metaData section, populating it with an createDate and and createdBy

the same trigger (with a few minor tweaks) can also be added for on updates which will populate the metaData section with an updateDate and updatedBy fields.

the above trigger has been tested however, i have not run intense performance testing on it.

thoughts?

Comment by shale99 [ 29/May/17 ]

some performance numbers:

4 cores / 8 threads (logical processors) (i7-6700hQ)
16 gb ram
ssd disk (dont have details)

trigger performance is very linear (which is a good thing):

Trigger update_trigger: time=8.678ms calls=1,000
Trigger update_trigger: time=80.979ms calls=10,000
Trigger - update_trigger: time=826.325ms calls=100,000
Trigger update_trigger: time=8659.108ms calls=1,000,000

note this was tested using psql and seemed to be using just 1 cpu when running

query used:


EXPLAIN ANALYSE INSERT INTO myuniversity3_mod_users.users(jsonb) SELECT '{"username" : "shane14","id" : "b18e19f1-07b2-4367-9ee8-fa46a4881fd0", "active" : true}' FROM generate_series(1,100000);

note that the uuid was auto-generated by the db

Comment by shale99 [ 30/May/17 ]

so if there are no objections i would like to move forward with the metaData schema

the schema at this point would look like this (to be placed in raml repo):
(note that module schemas will need to reference '$ref' this schema)

{
  "title": "Metadata Schema",
  "type": "object",
  "properties": {
    "createDate": {
      "type": "string",
      "required": true
    },
    "updateDate": {
      "type": "string",
      "required": true
    },
    "updatedBy": {
      "type": "string"
    },
    "createdBy": {
      "type": "string"
    }
  }
}

two triggers should be added to a storage module to auto populate a returned json response with this info -

CREATE OR REPLACE FUNCTION create_date_<TABLE>_trigger()
RETURNS TRIGGER AS $$
DECLARE
    inject jsonb;
BEGIN
    inject = json_build_object('createDate',current_timestamp,'createdBy', current_user);
    NEW.jsonb = jsonb_set(NEW.jsonb, '{metaData}' ,  inject , true);
    RETURN NEW;
END;
$$ language 'plpgsql';
CREATE TRIGGER create_<TABLE>_trigger BEFORE INSERT ON myuniversity_mymodule.<TABLE> FOR EACH ROW EXECUTE PROCEDURE  create_date_<TABLE>_trigger();

CREATE OR REPLACE FUNCTION update_date_<TABLE>_trigger()
RETURNS TRIGGER AS $$
DECLARE
    inject jsonb;
BEGIN
    inject = json_build_object('updateDate',current_timestamp,'updateBy', current_user);
    NEW.jsonb = jsonb_set(NEW.jsonb, '{metaData}' ,  inject , true);
    RETURN NEW;
END;
$$ language 'plpgsql';
CREATE TRIGGER update_<TABLE>_trigger BEFORE UPDATE ON myuniversity_mymodule.<TABLE> FOR EACH ROW EXECUTE PROCEDURE  update_date_<TABLE>_trigger();

there is one drawback to this:
while the update trigger fires with every update (hence , there will always be an updateDate entry in the response), the createDate is only populated when a record is created. if an updated is requested, and the metaData section isnt part of the json passed in, we will lose this info forever. this can be avoided by having the createDate as a separate column - the RMB postgres client pulls columns and can populate the fields in the returned json so that we dont lose this info.

Comment by shale99 [ 30/May/17 ]

one last comment on this - if we agree on this approach

raml files should add a reference to the metaData schema

- ../raml-util/schemas/metadata.schema: !include ../raml-util/schemas/metadata.schema

schema (for example userdata.json) should add a reference to the schema

    "metaData": {
      "type": "object",
      "$ref": "../raml-util/schemas/metadata.schema"
    }
Comment by Mike Taylor [ 30/May/17 ]

Sorry, I am late to this and lack the necessary background and context to comment intelligently; but of course I won't let that stop me.

Is there a compelling reason to use string for the datestamps instead of a proper datestamp type?

Comment by shale99 [ 30/May/17 ]

since the server is generating this i was thinking about a "type" : "string" with a "pattern" field indicating the format. but date-time is legit

Comment by Marc Johnson [ 30/May/17 ]

Is it proposed that we include it within a domain representation, e.g:

{
  "username": "jhandey",
  "id": "7261ecaae3a74dc68b468e12a70b1aec",
  ...
  "personal": {
    "lastName": "Handey",
    "firstName": "Jack",
    ...
  },
  "metaData": {
    "createDate": "2016-11-05T07:23:32Z"
    ...
  },
}

Given that createDate is a required property, are we are expecting a client to include this information in any request to create or update a record, or is the metadata object itself going to be optional and it is expected that a client won't provide any of this information in requests? This might be related to the question about keeping this information in the stored JSON, or separately (which I think is a challenging tradeoff).

In the example schema above, the updateDate is required, does this mean that a newly created record, will have both a createDate and an updateDate?

Personally, I'd prefer that all of the properties were in the past tense, e.g. createdDate or updatedDate, but that might just be a stylistic viewpoint.

Comment by shale99 [ 30/May/17 ]

right, the metaData is optional, but this comes back to the issue of the client having to supply it when updating a record otherwise it is lost. i guess you could say that since the metaData is returned by the server as part of the object that it is part of the record and leaving it out means a type of update, but this is dangerous.

i have no preference about past tense

i would like to verify we are all on the same page from a metaData schema standpoint - the comments are good and the implementation can be tuned to accommodate.

Comment by Jakub Skoczen [ 30/May/17 ]

Marc Johnson shale99 I agree with Marc's stylistic viewpoint, let's go for createdDate and updatedDate.

What is the standard way for dealing with non-optional but server-provided fields, like the metadata section? Is there a recommendation in JSON schema or JSON-api?

Comment by shale99 [ 30/May/17 ]

i will check

Comment by shale99 [ 01/Jun/17 ]

for the schema type part

this can either be

type: date-time

or


type: string
pattern: "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}\\.[0-9]{6}\\+[0-9]{2}:[0-9]{2}$"

can not combine type: date-time with pattern - i think the pattern is a bit clearer for someone looking at the schema - we can add a description as well with more info.

on the other hand - we can add a description to the date-time type with the pattern.

validating tools will have an easier time with the type string and its pattern - so this would be my vote. if anyone feels otherwise please let me know

Comment by Marc Johnson [ 01/Jun/17 ]

shale99

If I'm understanding the pattern option, we would effectively be defining our own profile of ISO8601 that defines a date and time without timezone information, is that a sensible interpretation? Would this be interpreted by the RAML-Module-Builder as a date time in the POJOs or kept as a string?

My personal preference is to use the date-time type as it is defined as part of the JSON.Schema validation specification (I find https://tools.ietf.org/html/rfc3339#section-5.6 easier to read than a regular expression, but I might be alone in that regard). I was under the impression that the RAML-module-builder had recently been updated to support this?

Comment by shale99 [ 01/Jun/17 ]

not really

ISO 8601 Combined date and time in UTC looks like this: 2017-06-01T09:33:49+00:00
(this is represented by the regex above)
this is the format rmb saves / retrieves to the db when encountering date-time types in schema
this is not necessarily the format that will be used by non rmb modules when encountering a date-time type

Comment by shale99 [ 01/Jun/17 ]

my mistake - yes, this is iso with microsecond resolution.
i guess we can change the 6 digit partial second resolution to a two digit partial second resolution. right now rmb uses 6 digits

Comment by Marc Johnson [ 01/Jun/17 ]

shale99

I think I'm confused, so I'll share my reflection and hopefully you can help me with my understanding by correcting my understanding.

We are attempting to define a standard for date time representation in FOLIO APIs (maybe this conversation needs to move to [DMOD-256]).

Interface

There are two proposed options for API representations:

1. String type with pattern (see above), that is an ISO8601 profile (subset of available representations) and represents date and time (up to partial second), with no time zone information (assumed to be UTC).

2. date-time type (defined in the JSON.schema validation specification) and is also an ISO8601 profile (from RFC3339) which represents date and time (up to partial second) with explicit time zone information.

Standard Implementation

The RAML-module-builder currently converts a property of type date-time into a Java DateTime in the POJO and stores this information in the format described above (date and time, with optional partial seconds and no time zone information). I don't know what the RAML-module-builder would do in the pattern case, I assume it will keep the string as is.

Comment by shale99 [ 01/Jun/17 ]

will explain my thinking...

lets start with the meta data schema for starters
we are talking about a generic schema available in the raml repo to all developers. so we need to indicate the format of the update and created date fields.

option 1 indicate that these fields are of type date-type
this is a legit option, the rmb handles this as well and formats the content of the field into a date and time value format indicated previously. meaning all rmb storage modules will save the jsons in the database in a single format as indicated .

the one issue i have with this would be non-rmb modules / api consumers - there is no info in the schema indicating what they should expect for a date-time type. this is solvable by documentation and adding a description field to the date entries, however, an api consumer using a third party json validator will have a bit more work to do to indicate what an acceptable date-time value is acceptable.
for example, postgres jsonb columns support indicating validation at the db level for an inserted json - so that if the json does not validate , it wont be inserted - other 3rd party products may offer the same tooling - the question is, if we pass in a type: date-time - will all those validation tools validate the value in the same way. passing a string with a pattern will mean yes. this is the only thing that bothers me a bit with the type: date-time

Comment by Mike Taylor [ 01/Jun/17 ]

I can't see any significant reason not to use a real date-time type. It seems we already have all the support we need within our own projects; and support for it is only going to improve in other people's tools. Let's say what we mean.

Comment by shale99 [ 01/Jun/17 ]

that is kind of the issue i think - with date-time , we arent really saying what we mean, we are saying in general this is a date-time field, but we give no info to parsers, validators what that date-time field really means from a content standpoint.

I am ok going with the date-time field as well, my main concern was what i stated, it isnt a burning issue with me to go with the strings

Comment by Mike Taylor [ 01/Jun/17 ]

Hmm. Well, saying the field's type is dateTime is certainly more honest and explicit than saying string.

Your concern is that tooling support may not take advantage of what can be known about the dataTime field. But the key tool in our project is RMB, and that does support it. Which to my mind is good enough.

But I am well out of the mainstream of this issue, so if you guys who are better informed and closer to the metal have differing opinions I am more than happy to stand down.

Comment by Marc Johnson [ 01/Jun/17 ]

shale99

I agree that both module authors and clients need a clear understanding of expectations. From my perspective, the validation standard states what it is intended and is supported by fairly comprehensive documentation in RFC3339. Would referring directly to that allay some of your concerns around clarity?

As far as implementation goes, I don't know what the tool support for JSON schema validation or RFC3339 is like. I would expect that many people would just use ISO8601 validation, and so be slightly weaker than the specification, which could be a concern for compatibility (we could support that with a regular expression for internal use).

Comment by shale99 [ 05/Jun/17 ]

i would like to summarize the current status

1. a metadata schema will be created and placed in the raml repo
a. the metadata schema will contain two fields at this point
i. updatedDate , createdDate
b. storage / bl schemas should reference this schema
2. storage modules will need to create two triggers for each table (table representing a json)
a. trigger on insert - this will be slightly different then the trigger indicated previously - this trigger will populate a createdDate column in the table. The rmb will auto populate the metadata schema with this value when returning results from this table
b. trigger on update - can either work as the insert trigger, or as previously described (inject metadata into object in db) - i prefer as insert trigger
NOTE that rmb already does this type of inject already - this will hence not be a big change in the rmb.
3. date format - seems like the consensus is to go with date-time type. note that this will leave the actual format in the hands of the developer / consumer - there are 2 variants (with Z / without Z + indicating offset of UTC) - so there may be two representations of date
4. the metadata schema should be declared optional - the client does not need to pass it in. it will, however, always be returned by the server
a. the createdDate - required
b. the updatedDate - optional (relevant only if record was updated)

i would like to finalize this and push into a release - so please comment if you find something you do not necessarily agree with

Comment by Niels Erik Nielsen [ 05/Jun/17 ]

i would like to summarize the current status
1. a metadata schema will be created and placed in the raml repo

To make it reusable I assume, good

a. the metadata schema will contain two fields at this point
i. updatedDate , createdDate

Okay, good

b. storage / bl schemas should reference this schema

That's the reuse bit?

2. storage modules will need to create two triggers for each table (table representing a json)

Out of curiosity (as I'm only going through the RMB documentation now for the first time in quite a while): Is the module developer doing this by Java annotations somehow or by explicit DDL in postgres?

a. trigger on insert - this will be slightly different then the trigger indicated previously - this trigger will populate a createdDate column in the table. The rmb will auto populate the metadata schema with this value when returning results from this table
b. trigger on update - can either work as the insert trigger, or as previously described (inject metadata into object in db) - i prefer as insert trigger
NOTE that rmb already does this type of inject already - this will hence not be a big change in the rmb.

Guess I don't get why the trigger on update is not an update trigger but I'm sure there's a good explanation for that.

3. date format - seems like the consensus is to go with date-time type. note that this will leave the actual format in the hands of the developer / consumer - there are 2 variants (with Z / without Z + indicating offset of UTC) - so there may be two representations of date

Right, the JavaScript client code seems to seamlessly parse either flavor of ISO8601, but maybe the client would need to know what specific variation of ISO8601 that is behind "date-time" when pushing data to the module? I think Marc Johnson suggested that we documented the specific format somehow, I guess that would be good, especially if it is so that it breaks an interface if the internal, specific representation of date-time were to be changed in a module (between using Z and not for instance) ?

4. the metadata schema should be declared optional - the client does not need to pass it in. it will, however, always be returned by the server
a. the createdDate - required
b. the updatedDate - optional (relevant only if record was updated)

Sounds good.

i would like to finalize this and push into a release - so please comment if you find something you do not necessarily agree with

It looks fine to me (with my only question mark being whether variations in internal date-time representations can break an interface without it showing in the API spec)

Comment by Marc Johnson [ 05/Jun/17 ]

shale99 Niels Erik Nielsen

I'm mostly interested in the behaviour from the clients perspective (though the answers might affect the discussion about how to store the information), so I'll focus on points 3 and 4. If my interpretation is reasonable, I have a couple of questions.

3. date format - seems like the consensus is to go with date-time type. note that this will leave the actual format in the hands of the developer / consumer - there are 2 variants (with Z / without Z + indicating offset of UTC) - so there may be two representations of date

Do we want to be strict about only having audit metadata dates be represented explicitly in UTC?

If we do, then the date-time format isn't necessarily strict enough for our purposes and we might want to create a very specific profile of 8601 which only allows Z and not a specified offset.

4. the metadata schema should be declared optional - the client does not need to pass it in. it will, however, always be returned by the server

and Niels Erik Nielsen related comment:

Right, the JavaScript client code seems to seamlessly parse either flavor of ISO8601, but maybe the client would need to know what specific variation of ISO8601 that is behind "date-time" when pushing data to the module?

My interpretation of the conversation so far has been that the server owns this information, is this a sensible interpretation?

If so, what should the behaviour be for clients supplying audit metadata? For example, what should the server do if the client provides a different createdDate? Or if it repeats back the current audit metadata?

If the expectation is that the client should omit this information in subsequent requests, then it does not need to understand which format to create only how to read the acceptable formats (which it appears to be able to do already).

Comment by Mike Taylor [ 05/Jun/17 ]

Sorry to bring this up so late in the day, but ...

a. the metadata schema will contain two fields at this point
i. updatedDate , createdDate

What would it take to also include updatedBy and createdBy fields, that tell us who created and most recently updated the record? I think those would also be useful. Perhaps that belongs in a separate Jira, but I guess it's at least potentially relevant to this conversation.

Also: I agree with Marc in being perplexed about what it would be mean for a client to try to set these automatic fields. I would support that a request doing this be rejected with a 4xx error.

Comment by shale99 [ 05/Jun/17 ]

Marc Johnson -

Do we want to be strict about only having audit metadata dates be represented explicitly in UTC?

i thought you were saying the exact opposite, no? i have stated that previously, hence wanted this as a specific pattern + string since this removes any ambiguity, but you and mike have argued against that and that date-time as spec'ed by json schema is sufficient. or am i missing your point here?

since the server is always populating these fields, it does not matter what the client provides , as it will always be overwritten , hence ignored. i think that can be documented, i dont know if i would fail a legit request because of that

Mike Taylor - if you look at one of my previous comments about this issue - you will see that i indicated that a createdBy and an updatedBy can be included as well. since we dont necessarily need them now i am ok with adding them now or later.

original suggestion :

{
  "title": "Metadata Schema",
  "type": "object",
  "properties": {
    "createDate": {
      "type": "string",
      "required": true
    },
    "updateDate": {
      "type": "string",
      "required": true
    },
    "updatedBy": {
      "type": "string"
    },
    "createdBy": {
      "type": "string"
    }
  }
}
Comment by Mike Taylor [ 05/Jun/17 ]

Cool, thanks. Sorry that I missed that – this thread has got long

Comment by Marc Johnson [ 05/Jun/17 ]

shale99

I may have mixed up this context with the general context of [DMOD-256], which might explain my prior confusion (I had interpreted the suggestion as being used for the general case, which is far more important to me).

I agree that whatever policy we choose about client provided metadata then it should be clearly documented, be that either refusal or that it will be entirely ignored (in cases like this, the intended semantics of PUT get pretty complicated).

If we definitely want to restrict the audit metadata date time information to only be UTC, then we might be better in providing a specific pattern stating that explicitly.

If the policy will be that the client provided information will be ignored, then I think the practical difference between a custom pattern and the predefined pattern is fairly limited.

Comment by Niels Erik Nielsen [ 05/Jun/17 ]

Right, Marc Johnson, my bad, in the scenario discussed here it doesn't matter to the client what flavor of ISO8601 is used. My thoughts were straddling into the issue of the format of date-time dates that the client does control.

Well, DMOD-256 like you just said

Comment by Jakub Skoczen [ 20/Jun/17 ]

shale99 Kurt Nordstrom we didn't manage to discuss this on the call: what is the current status of this work?

I'd like to make sure we can wrap up UIU-31 Closed quickly, if the easier way forward is to set those fields in mod-user rather than in the DB through a trigger – let's go for that. shale99 you mentioned that you produced a meta-data schema that include the timestamp fields? How does it look like and how is it used?

Comment by Kurt Nordstrom [ 20/Jun/17 ]

Jakub Skoczen I can go ahead and add a routine to mod-users to update the fields whenever there's a write. That will get us to done for now, and perhaps we can settle on a system wide convention for the next sprint?

Comment by Jakub Skoczen [ 26/Jul/17 ]

We have decided that it's better to return the timestamps as part of the user. shale99 remaining question is if we can standardize this somehow across schemas, is there an equivalent concept to RAML 'traits' in JSON schemas?

Comment by shale99 [ 26/Jul/17 ]

create a metaData schema to hold info about the object - two fields may be update and creation date, additional info may be added in the future (updatedBy, etc...) - and then reference the metaData schema from all schemas.
note that the current creation date impl is problematic and should not be considered reliable - there is a mechanism in place to do this via a trigger in the db so that this value is not tampered with / lost - during updates

Comment by Marc Johnson [ 27/Jul/17 ]

shale99 Jakub Skoczen

Does this mean that this information would look something like (in the case of a user):

{
	"id": "9ed6bc8e-c833-4788-baa7-024611f931a3",
        ...
	"audit": {
		"createdUser": "9b2d46ee-707c-4a12-998f-d65ed9baa0e5",
		"createdDate": "2017-07-25T22:06:54Z",
		"updatedUser": "07f4e0fe-2f31-46f1-8018-16a14dd2bc3d",
		"updatedDate": "2017-07-27T11:04:22Z"
	}
}

I don't think I have seen an example of this yet, if there is one please refer me to that instead.

(This doesn't yet take into account the conversations about operator and name of the property we should use, in various issues like UIREQ-1 Closed and CIRCSTORE-16 Closed )

Comment by Jakub Skoczen [ 27/Jul/17 ]

Marc Johnson I think this makes sense, maybe we should use a more generic term than "audit"? Anyhow this looks good, what do you think shale99?

Comment by shale99 [ 27/Jul/17 ]

this doesnt exist - you can scroll up this convo (to 30/May/17 ) to see the metadata schema i suggested along with date formatting. i would prefer not to call this audit but rather metadata on the actual record

Comment by shale99 [ 27/Jul/17 ]

also, i would prefer to use a username and not uuid - in folio, a username is unique and can be used to lookup anything about a user (including the uuid if needed) - on the bonus side - it can be displayed in the ui without a need for additional http requests as displaying a uuid as an operator wont mean anything to the end user without an additional lookup

Comment by Marc Johnson [ 27/Jul/17 ]

shale99 Jakub Skoczen

Apologies, I'd forgotten we'd had many of these conversations already.

I'm not attached to the audit name, my primary interest in the property name is that it effectively becomes reserved in all/most of our schema, so I think we need to be really careful when choosing it (e.g. creator, which was suggested elsewhere, I think could easily clash with bibliographic metadata).

If I'm understanding this in the context of other conversations we've been having, the idea of operator (for loans / requests) is actually the updatedBy (I prefer updatedUser as it makes it explicit it is a reference to a user record) and we don't need to define that as well?

Comment by shale99 [ 27/Jul/17 ]

Marc Johnson - not sure i follow the need to define comment ? (maybe updatedByUser)? the updatedBy sounds very understandable
but i am the last person to comment on english

Comment by Marc Johnson [ 27/Jul/17 ]

shale99

I think I saw a similar question to this on CIRCSTORE-16 Closed .

I think we need to be explicit (I realise that my previous example was also ambiguous ) and consistent. My understanding of previous conversations, is that there is an intent to refer to related records by id (and there is ongoing efforts in this space to do this for users), so I think we need to be especially clear if we want to do something different with metadata (we might, as a consequence of ID is that we would likely never want to ever delete a user from the system, though we might have already decided on this constraint).

Maybe we need a convention when we reference another record, that makes it explicit what property the reference is for? For example createdUsername (I didn't want to do createdUserUsername) would refer to the the user who created the record by the username, whereas createdUserId would refer to the user who created the record by the ID.

Alternatively, if we go with createdUser, then I think we need to be explicit in our description of the property what this should contain.

Comment by Marc Johnson [ 27/Jul/17 ]

shale99

In reference to the define comment about operator. I'm wondering if the operatorId property (discussed for loans and requests) is redundant when we have the updatedByUser (or whatever we decide is appropriate) property in the metadata object, or whether we want to define both?

Comment by shale99 [ 27/Jul/17 ]

it is, this info can basically become part of all records we chose , so that we can have this type of functionality almost built into all record types we decide . the naming will also be consistent so that the ui doesnt have to do anything different

Comment by shale99 [ 27/Jul/17 ]

my thinking on this is that these two types (loans, requests) are 2 of many types that will need this info - i can imagine many types of records needing to track the dates and users making changes - so this is a generic baseline for this type of info that we can easily add to others in the future if needed in a standardized way

Comment by Marc Johnson [ 27/Jul/17 ]

shale99 Jakub Skoczen

Following the conversation in today's meeting, Jakub Skoczen asked me to put together a schema (to put in the shared schemas).

We discussed that the ID would be the primary way of referencing the user and having an optional username for both created and updated, that would be populated when available. I've also incorporated feedback on the requests issue to use a pattern for the IDs, to increase confidence that they are UUIDs.

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "title": "Metadata Schema",
  "type": "object",
  "properties": {
    "createdDate": {
      "description": "Date and time when the record was created",
      "type": "string",
      "format": "date-time"
    },
    "createdByUserId": {
      "description": "ID of the user who created the record",
      "type": "string",
      "pattern": "^[a-f0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}$"
    },
    "createdByUsername": {
      "description": "Username of the user who created the record (when available)",
      "type": "string"
    },
    "updatedDate": {
      "description": "Date and time when the record was last updated",
      "type": "string",
      "format": "date-time"
    },
    "updatedByUserId": {
      "description": "ID of the user who last updated the record",
      "type": "string",
      "pattern": "^[a-f0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}$"
    },
    "updatedByUsername": {
      "description": "Username of the user who last updated the record (when available)",
      "type": "string"
    }
  },
  "additionalProperties": false,
  "required": [
    "createdDate",
    "createdByUserId"
  ]
}

An example being:

{
  "createdDate": "2017-07-22T11:22:07Z",
  "createdByUserId": "dee12548-9cee-45fa-bbae-675c1cc0ce3b",
  "createdByUsername": "jaynesmith",
  "updatedDate": "2017-07-27T13:28:54Z",
  "updatedByUserId": "695540df-cf63-4c67-91c1-d8746920d1dd",
  "updatedByUsername": "robertjones"
}

An alternative could be to nest the updated and created aspects:

{
  "created": {
    "date": "2017-07-22T11:22:07Z",
    "byUserId": "dee12548-9cee-45fa-bbae-675c1cc0ce3b",
    "byUsername": "jaynesmith"
  },
  "updated": {
    "date": "2017-07-27T13:28:54Z",
    "byUserId": "695540df-cf63-4c67-91c1-d8746920d1dd",
    "byUsername": "robertjones"
  }
}

Is now a good time to consider records created / updated by systems / batch processes etc, or is reasonable to assume that they will be executed within the context of a user?

Thoughts and feedback welcome.

Comment by Mike Taylor [ 27/Jul/17 ]

If you're going to have a record with a nested structure, surely you'd want to go the whole hog?

{
  "created": {
    "date": "2017-07-22T11:22:07Z",
    "user": {
      "id": "dee12548-9cee-45fa-bbae-675c1cc0ce3b",
      "name": "jaynesmith"
    }
  },
  "updated": {
    "date": "2017-07-27T13:28:54Z",
    "user": {
      "id": "695540df-cf63-4c67-91c1-d8746920d1dd",
      "name": "robertjones"
    }
  }
}
Comment by shale99 [ 27/Jul/17 ]

Marc Johnson -
1. a question - all of the ids are across the board uuids but do not contain a pattern (in all of our schemas) - this is enforced today by declaring the id field as a uuid in the database and this is enforced there - are we making a change to all ids of all schemas? - what is the purpose of doing it here ? or is this a first step ? - the ids in the metadata schema will be enforced by the db in any case. i am asking because this is something that will be auto-populated along with the dates by the server / database - this will not be input passed in by the client (this is more of a read only schema) - i would actually prefer to indicate the date-time format so that clients are aware of the format they will be receiving the date in - as this is something they will need to parse / display - i assume the userid will be either used as a straight lookup to retrieve the username or display the username if one exists. I have no problem with the pattern - just wondering if this should be done across the board to be consistent - or not at all - in order to give id flexibility (if that is at all needed)
2. i have no preference about whether it is two sections or one
3. since the metadata is really server / db generated , if we indicate required fields in the metadata it must become optional and not required - probably just stating the obvious here. the purpose of the required fields is to indicate that the server will definitely be passing these?

Comment by Marc Johnson [ 28/Jul/17 ]

@shale Thanks for your thoughts, I'll try to address them.

UUID pattern

I am crossing the streams a little here and conflating multiple changes. The interest in validating that id properties are UUIDs came from a request from Jakub Skoczen : https://folio-org.atlassian.net/browse/UIREQ-1?focusedCommentId=58767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel

I believe the intention is to eventually use this (or similar pattern) for all id properties. It seemed to me like this and requests were a reasonable place to start, we can easily omit it here for the time being (given the way we share schema, controlled upgrades of common schema is challenging).

User ID and Username

My understanding from the conversation on yesterday's call was that we wanted references to a user to be by id. Yet it could be useful (where possible), to also include an (optional) property containing the username which could help with easier presentation and diagnostics.

Date Format

Personally I find the semantic expression of formats preferable over patterns. Being a profile of ISO8601 I believe clients can comfortably parse this (most seem to parse it as though it is the full spectrum of ISO8601). If we want to further constrain this to always UTC we could do so with a compatible pattern.

Required Properties

It is true that the metadata object will likely be server generated, which means that the property in the parent object will likely need to be optional, as we share schema between read and write requests/responses. This does mean that the required properties within it are therefore mostly for documentation purposes (unless a client chose to validate a response from a server, for example).

I left the updated properties optional as I wanted to leave it open to those properties only being provided after the first update (one advantage of the nesting is that it allows this to be more explicit in the schema).

Comment by Mike Taylor [ 28/Jul/17 ]

Yes, validating UUID by regexp is a nice incremental step.
Yes, including userName as well as userId will make this much more useful.
Yes, ISO 8601 is absolutely the way to go for dates.

Comment by Jakub Skoczen [ 28/Jul/17 ]

Marc Johnson I have to say I prefer your flat structure rather than nesting under "creator", etc. It's more context-free and we would be already nesting this per endpoint using "metadata" or similar.

For batch processes, assume that those are always going to be executed by a system user, so same fields apply.

Logging in username will be useful, although it has to be clear that this is "historical" information, (as the username may be changed) and the up-to-date user info needs to be read from the user record referred to by ID.

Agree on date-time and UUID regex.

Comment by Marc Johnson [ 28/Jul/17 ]

shale99 Jakub Skoczen Mike Taylor

Ok, so I'm going to go ahead and create the schema (keeping it flat) in the shared raml git repository. Let me know if there are any immediate concerns.

Comment by shale99 [ 28/Jul/17 ]

ok, so my understnading is that we are starting to indicate patterns for the id fields across the board - and this is a first step

right now, rmb supports iso8601 and not a profile of iso8601 - that is why i prefer the regex on the date rather than date-time - so that clients know exactly what they are getting back unless we are changing the rmb support for dates, which i rather not get into

the issue with username - which i originally thought was the way to go as (i thought mistakenly) - it was unique and never changing - is that - if a username is changed (which seems to be something we are supporting) - the ui can not rely on this field any more for display - and should always look up via the uuid - unless the username update will update the history table as well (this gets complicated as its cross module changes)

Comment by Marc Johnson [ 28/Jul/17 ]

One thing we still need to consider is what the server response to a request made with this metadata present should be:

  • Ignore the metadata property entirely, wholly replacing them with server derived state
  • Reject the request citing that clients should not provide the metadata property
Comment by Marc Johnson [ 28/Jul/17 ]

I've pushed this on a branch.

The schema is available to look at here: https://github.com/folio-org/raml/blob/FOLIO-592-metadata-schema/schemas/metadata.schema

I'd like David Crossley to take a look, with regards to RMB-30 Closed , about whether this location is ok from the perspective of including it in schema below it in the hierarchy.

Comment by Marc Johnson [ 28/Jul/17 ]

shale99

I regard an ISO8601 profile as a more specific subset of the standard (that supports a wide variety of representations). My understanding of proposing a pattern regular expression, is to define our own profile, is that a sensible expression of your intent?

Are you proposing that, based upon your new understanding of the potential for a username to change, that the createdByUsername and updatedByUsername properties are no longer useful?

Comment by shale99 [ 28/Jul/17 ]

I regard an ISO8601 profile as a more specific subset of the standard (that supports a wide variety of representations). My understanding of proposing a pattern regular expression, is to define our own profile, is that a sensible expression of your intent?

yes, i think we should define an iso8601 profile format so that there are no surprises - for example, lets take last weeks example - of setting ā€œ2017-04-20T07:21:45.000Zā€ and getting back ā€œ2017-04-20T07:21:45.000+0000" - if we are ok with this - that the date format is controlled by the server but it is iso8601 conformant and not necessarily what is passed in. then date-time works for me as well

Are you proposing that, based upon your new understanding of the potential for a username to change, that the createdByUsername and updatedByUsername properties are no longer useful?

kind of yes, once a username is removed across the system and replaced by the uuid - and the username becomes a updateble field - i think its not as valuable to keep it around in different places in the system (especially to start displaying it in different modules) - it will be a real pain to start cleaning this up when customers encounter non existent usernames in different places in the system

Comment by shale99 [ 30/Jul/17 ]

shale99 I moved your comment to RMB-48 Closed , let's keep this one about schema defintion

Comment by Jakub Skoczen [ 31/Jul/17 ]

shale99 Are you concerned that the format of the date-time would not be letter by letter what the client specified? E.g that the timezone info can be reported differently? I think this is totally OK, any client needs to be able to sensibly parse the RFC3339 date, which is already way more constrained compared to standard ISO8601. It's not rocket science. So I would stick with that.

I would still keep the readable username in the meta-data section, for convenience of annotating log-lines, etc.

Comment by shale99 [ 31/Jul/17 ]

1. yes, this was my first comment - it needs to be clear that whatever the client sends in - it needs to be understood that this is not necessarily what they will get back - (there is one system date format or at least i believe the server should work with / save / parse with one date format) - if we are ok with the clients getting the variations then this is ok by me.
--> take jackson for example - the rfc indicates that it may be possible to pass in a lower case 't' in the format - however, jackson will fail to parse in such a case - so we need to indicate this - also there are some variances (i see only one milli is needed by the rfc - but i will check - i am pretty sure jackson will fail to parse that as well) - and all of rmb json processing is jackson based - just something to be aware of if we dont indicate a single supported format

2. as for the user name - honestly,not crazy about this - once username becomes a mutable field - then a minute after it is logged it can be changed and become irrelevant - so the ui cannot use this anymore for display. this means a lookup based on the uuid is needed to get reliable data about the updater / creator so then the username loses its meaning - since you will always need a uuid lookup to get the username - but if we are sure about this i can add it as i left it out.

Comment by David Crossley [ 02/Aug/17 ]

Marc asked above (28/Jul/17) whether the placement of this metadata schema will suffer from the effects described in RMB-30 Closed .

I tested it today on mod-users and mod-configuration.

Unfortunately, yes it does.

As explained at RMB-30 Closed , the build is successful. However due to the "../" path in the $ref in the parent schema, copies of "metadata.schema" remain at "/tmp/metadata.schema" for the mod-users, and at "/tmp/raml-util/schemas/metadata.schema" for mod-configuration.

There is only one set of dot-dots in the path, so at least the build is successful.

Comment by shale99 [ 02/Aug/17 ]

David Crossley - the current issue is the leftovers that are not deleted, right?

Comment by shale99 [ 02/Aug/17 ]

In general, schemas such as the meta data schema will run into this situation since they are generic shared schemas placed in the raml-util/schemas dir, correct? we can open an issue on this and attempt a delete of such files from the /tmp dir by the rmb after the pojos are generated. David Crossley - is there something else that can be done here or can we merge this in?

Comment by shale99 [ 02/Aug/17 ]

David Crossley , Marc Johnson - i have merged this as it was a blocker for me to continue with the implementation. As discussed, David will be opening an issue on rmb , to delete /tmp schemas after the pojo generation step

Comment by David Crossley [ 02/Aug/17 ]

That cleanup idea is RMB-50 Closed .

In general, yes the shared schema area. In particular, when any schema has $ref with dot-dots, then there will be temp remnants. Any more than one set of dot-dots will be broken.

Comment by Marc Johnson [ 02/Aug/17 ]

shale99 Jakub Skoczen

What have we decided to do for state altering requests that have the metadata section present?

Either:

  • Ignore the metadata property entirely, wholly replacing it with server derived state
  • Reject the request citing that clients should not provide the metadata property
Comment by shale99 [ 02/Aug/17 ]

right now it is ignored and replaced by the server side generated info every time - meaning the client does not need to send it, but if the client does, the section will be ignored but the request will not fail. Not sure we closed this issue - ignoring gives a bit more flexibility as it does not require the client to parse out sections of the json before sending (for example an update) - and if the info is included, the request wont fail it will just be ignored. this appriach gives a bit more flexibility but may be a bit less clean.

Comment by shale99 [ 03/Aug/17 ]

pull request Folio 592
mod-circ-storage - temporarily depending on rmb 13.1.0-snapshot - will release rmb today and update pom

Generated at Thu Feb 08 23:06:56 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.