[FOLIO-1781] Change metadata when user performing action is unknown Created: 05/Feb/19  Updated: 07/Apr/21  Resolved: 16/Sep/19

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P2
Reporter: Marc Johnson Assignee: Unassigned
Resolution: Done Votes: 0
Labels: platform-backlog, potential-decision
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Relates
relates to RMB-373 Enable/disable logging date and user ... Open
relates to RMB-459 Populate metadata for reference data/... Closed
relates to RMB-353 Metadata without user id Closed
relates to FOLIO-1786 SPIKE: evaluate "init" tokens as a w... Draft
relates to MODSOURMAN-432 Create job execution requires user wi... Closed
relates to RMB-320 Updating records containing metadata ... Closed
Sprint:
Development Team: Core: Platform

 Description   

What should the created by / updated by change metadata be when the performing user is unknown (i.e. an anonymous request)?



 Comments   
Comment by Ann-Marie Breaux (Inactive) [ 05/Feb/19 ]

Comments from Marc Johnson in the # development Slack channel:

Example Scenarios

1. Denying login following a number of failed login attempts (mod-users-bl has module permissions to update users in storage)
2. Creation of reference or sample records during tenant activation

Options

1. Use a nil UUID by default
2. Provide a specific, well known UUID for an anonymous or system user (might be 1, 2 or more special known values)
3. Do not have a created by / updated by under these circumstances

Questions
1. How should clients interpret these options?
2. What impact do these options have upon the trust placed upon these properties?

Personal Thoughts

We are trying to decide what to do when we don’t know who or what is making a change, and we are allowing it because authorisation has been performed in another way.

I don’t understand the value of interpreting `we don't know` by inferring that it can be a well known user. What is the value in doing that? I don’t think we can trust that value, as anything can choose one of those specific, well known, values.

My preference is to explicitly record that `we don't know`. For that I would prefer the properties are omitted (rather than null) and not a special `nil` value, as I believe these are harder to interpret and require special logic to understand (or will just be a failed de-reference request to users)

This leaves us the option to expand to the use of specific values in the future if we want, and can trust their usage.

Comment by Ann-Marie Breaux (Inactive) [ 05/Feb/19 ]

Marc Johnson: Here's a few more use case examples from acquisitions and data import:

1. we have system-supplied default file extension settings (see Settings/Data Import/File Extension). In the detail record, we'll want to show the standard metadata component with the Created When, Created By, Last Updated When, Last Updated By data elements. Should the "by" data elements be handled with a "system" user ID?

2. User orders something on a vendor site. There is a FOLIO API that creates the FOLIO order automatically, using the data supplied from a vendor. We're just starting to figure out who we should assign as the as the Created By user. FOLIO? System? Anonymous? The vendor name?

3. With Data Import, we will have many cases where a file is being uploaded, and CRUDing various types of records in SRS, MARCcat, Inventory, Orders, and Invoices. Should those records have created by/updated by of System? or Data Import? Even though a person will be managing the import process, it doesn't really seem right to assign that person's user ID as the Create/Update ID, since they didn't manually make the changes.

4. In Settings/Inventory, there are many default/system-supplied "types" with a source of FOLIO. I'm not sure if that's the same as Created By, and based on a user ID or a different type of data element, but it seems worth reviewing.

And a general question - if we leave it blank/unassigned, won't that mess up many data schemas where created by/updated by is a required data element?

Comment by Jakub Skoczen [ 05/Feb/19 ]

Some comments based on the conversation we had about it:

  • since the platform allows for anonymous requests (anonymous tokens) it is honest to keep the change metadata fields nil to represent this fact. What needs to be done in this case is to make sure the code handling change metadata can handle nil fields without problem. The solution proposed via a PR in RMB-320 Closed (setting a dummy user during a record update) acts as a workaround for the storage layer modules that use RMB only.
  • on RMB-320 Closed and on the #development channel some comments indicated that the platform should not allow anonymous requests (audits, etc) and instead introduce special user accounts that are used when performing those requests. This would require changes to core FOLIO components (mod-authtoken, Okapi).
Comment by Craig McNally [ 05/Feb/19 ]

2. User orders something on a vendor site. There is a FOLIO API that creates the FOLIO order automatically, using the data supplied from a vendor. We're just starting to figure out who we should assign as the as the Created By user. FOLIO? System? Anonymous? The vendor name?

Ann-Marie Breaux The way we handle this for GOBI integration is as follows:

  • Someone places an order on the vendors site (e.g. GOBI).
  • The vendor site sends a request to the configured edge API, providing an API key (edge-orders in this case)
  • edge-orders parses the API key into it's parts: salt, username, tenantId
  • edge-orders looks up the credentials for that salt/user/tenant
  • edge-orders logs into FOLIO and gets back an OKAPI Token
  • edge-orders makes calls into FOLIO using this token (e.g. to mod-gobi)
  • mod-gobi maps the request into something understood by FOLIO using tenant specific or default mappings. Gathering data from other sources when needed, e.g.
    • look up the vendor UUID from mod-vendors
    • get the user's ID from the OKAPI Token
    • etc.
  • mod-gobi sends a request to mod-orders to place the order
  • mod-orders calls mod-orders-storage to persist the order
  • A response propagates back up the call stack to the vendor system.

So the createdBy field on the orders record(s) in this scenario would be set to the UUID of the institutional user provisioned for use by edge-orders.

Comment by Craig McNally [ 05/Feb/19 ]

1. we have system-supplied default file extension settings (see Settings/Data Import/File Extension). In the detail record, we'll want to show the standard metadata component with the Created When, Created By, Last Updated When, Last Updated By data elements. Should the "by" data elements be handled with a "system" user ID?

...elided...

4. In Settings/Inventory, there are many default/system-supplied "types" with a source of FOLIO. I'm not sure if that's the same as Created By, and based on a user ID or a different type of data element, but it seems worth reviewing.

Ann-Marie Breaux I think you're talking about reference or sample data in both of these cases, right? If so, and the modules implement the reference/sample data loading via the mechanism described in FOLIO-1519 Closed , the createdBy field would (or at least could/should?) be that of the user which made the call to the tenant API (or to OKAPI to install/upgrade/enable the module for a tenant)... That said, this approach to loading sample/reference data is fairly new and hasn't been widely adopted yet, so it's still a fair point to bring up here.

Comment by Craig McNally [ 05/Feb/19 ]

3. With Data Import, we will have many cases where a file is being uploaded, and CRUDing various types of records in SRS, MARCcat, Inventory, Orders, and Invoices. Should those records have created by/updated by of System? or Data Import? Even though a person will be managing the import process, it doesn't really seem right to assign that person's user ID as the Create/Update ID, since they didn't manually make the changes.

I personally don't have a problem with these records being "createdBy" or "updatedBy" the user managing the import process... They took some action which caused records to be created/updated, why does it matter if the creation/modification of records happened directly or indirectly? Just my two cents.

Comment by Ann-Marie Breaux (Inactive) [ 05/Feb/19 ]

Thanks for the clarifications, Craig McNally. I may not understand the definition of sample/reference data. These are definitely system-supplied settings that will be filled automatically. It's not fake data. We're not expecting the tenant to wipe it out and replace it. Tenants will have the option of manually adding other settings; for those, the user ID would be the individual user who created/updated it. Would the system-supplied settings data be reference/sample data in this case?

Comment by Craig McNally [ 05/Feb/19 ]

Denying login following a number of failed login attempts (mod-users-bl has module permissions to update users in storage)

I think this scenario makes the most compelling case for a system or anonymous user ID since there are changes being made which are the result of someone that isn't logged into the system.

Comment by Craig McNally [ 05/Feb/19 ]

I may not understand the definition of sample/reference data. These are definitely system-supplied settings that will be filled automatically. It's not fake data. We're not expecting the tenant to wipe it out and replace it. Tenants will have the option of manually adding other settings; for those, the user ID would be the individual user who created/updated it. Would the system-supplied settings data be reference/sample data in this case?

I would call that reference data. An example of sample data is what we have in the orders app... I grouped them together because the preferred mechanism for loading them is essentially the same. However, their intended use is not. The way I understand it is there's a fair chance that reference data would be loaded in a production system and possibly augmented to suit the tenant's needs. OTOH I'd say there's a very slim chance that sample data would be loaded in a production system - it's essentially for testing/demo/dev purposes.

Comment by Ann-Marie Breaux (Inactive) [ 05/Feb/19 ]

Craig McNally So yes, it sounds like it is reference data. You said that the createdBy field would (or at least could/should?) be that of the user which made the call to the tenant API (or to OKAPI to install/upgrade/enable the module for a tenant). What would that look like in the UI? Whatever we want? So we could just call it SYSTEM for the UI display?

Comment by Craig McNally [ 06/Feb/19 ]

Ann-Marie Breaux I had a realization while typing a response to your question... The user enabling modules for tenants might actually belong to the supertenant. This would likely be the case in a multi-tenant system. In this case, the UUID in the createdBy field of the reference data wouldn't be resolvable to any user record in the tenant's user table, and therefore you wouldn't be able to get the user information (username for instance) for display in the UI. This means the UI would have to make some uninformed decision about that UUID... If we can't resolve the UUID, do we assume it's valid and display some canned value, e.g. "System"? That's certainly not ideal, nor is displaying the UUID.

I spoke with VBar and he has some ideas for addressing this. I believe he's planning on responding here with details.

Comment by VBar [ 07/Feb/19 ]

Here are my thoughts on the broader topic of "anonymous user activity".

I don’t think there should ever be any empty or anonymous metadata changes "bylines". There should always be a context available to record - otherwise the lack of traceability and accountability will inevitably bite us sooner or later.

I do mean context rather than user since there are many situations where changes are being performed without the intervention of a specific user.

• If something is being done broadly in the context of a tenant, then I would recommend formalizing the notion of an "institutional user" that acts as a proxy for the tenant as a whole. We already use this approach for RTAC Edge API activity. That implementation is currently fragile however, as it relies on an ordinary user record which can easily be deleted through regular user management activities. Furthermore, tenant provisioning would ideally automatically add one of these for any tenant that is created.
• We should definitely stay away from using a NULL/NIL (or all-0) UUID to represent a system user, in any case. I think that it would be a good idea to systematically create an actual systemID UUID for each and every Folio deployment - that's the system. It could be used when recording system activity. We will be happy to have those in place as soon as we start tackling the problem of communications between folio installations.

Generated at Thu Feb 08 23:15:52 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.