Request anonymization
Problem statement
The institution requires a solution to anonymize closed requests in the request app (and corresponding data in the circulation log) to break the link to patron records while preserving anonymized information for reporting purposes. The scope includes creating an API endpoint to initiate anonymization for individual requests, removing user-sensitive data (e.g., requester and proxy information, delivery addresses) from request records, and updating circulation log entries accordingly, including adding a log entry for the anonymization action. Configurable settings will be implemented to enable automated anonymization with adjustable timing. Out of scope are UI-based tools for initiating anonymization, adding additional reporting-specific data to request objects, and settings tailored to specific closed request statuses.
Related information
https://folio-etesting-snapshot-consortium.ci.folio.org/settings/circulation/loan-anonymization Loan anonymization settings
Task list and mapping to Jira
# | Area | Jira |
---|---|---|
1 | Settings management | UICIRC-1202: Request anonymization - Settings PageIn Refinement - just UI form |
CIRC-2324: Request Anonymization - Settings Page BehaviorIn Refinement - behavior Note: settings should be stored in mod-circulation-storage (as opposed to the current mod-configurations approach, which has been declared obsolete by the Technical Council); use this API https://s3.amazonaws.com/foliodocs/api/mod-circulation/p/circulation-settings.html / https://s3.amazonaws.com/foliodocs/api/mod-circulation-storage/r/circulation-settings-storage.html | ||
CIRC-2320: Settings Capability Sets - Automatic Request Anonymization In Refinement - permissions | ||
2 | Run a task on schedule | CIRC-2289: Automated Request AnonymizationIn Refinement { The timer starts once an hour, checks what can be anonymized, and does the work - this is what needs to be expanded to add support of request anonymization |
3 | Anonymization process | CIRC-2364: Anonymize Single Request Post APIIn Refinement - API to start the process |
CIRC-2292: Request Anonymization - AnonymizationIn Refinement - anonymization itself It also has an event publisher so that the anonymization event can be sent to mod-audit via mod-pubsub | ||
CIRC-2384: Procedural Capability Set - Manual Request Anonymization - SingleOpen - permissions (Why “manual”? Amelia: When the second part of this feature is implemented, this capability set will also be used for manual anonymization through the request actions menu. Until then, this set will only allow access to the single anonymization API endpoint) | ||
4 | Circ Log | UICIRCLOG-178: Add Request "Anonymized" facet to Circ LogIn Refinement - UI facet for filtering (no backend work expected) |
5 | Requests UI | UIREQ-1313: Handle the users display in requests details where a user has been anonymizedIn Refinement - Proper display of anonymized requests |
UIREQ-1314: Handle the users display in requests search results where a user has been anonymizedIn Refinement- Proper display of anonymized requests |
Questions and answers
Below is a list of questions and necessary clarifications. (Open questions in Green, answers in Yellow)
Assumption: The scope of this feature is limited to the setting and the background anonymization process performed in the background. Are no new UI forms for manual start of anonymization, viewing progress, etc., included in the scope?
Amelia: Yes, the manual anonymization process is in the scope of the second part of the request anonymization implementation.
What does "remove fields" mean at the data level? E.g., replace with NULL, or with an empty string of zero length, or use a dash, or replace with fixed placeholders like Anonymized Patron, or just Anonymized?
Amelia: To follow the loans implementation of anonymization the removed fields should be replaced with NULL. (I believe this will require handling on the front-end to display that the request was anonymized when the removed fields are not present)
"Circulation log entry created for the anonymization action" - what should be in this entry?
Seems to be described in CIRC-2292: Request Anonymization - AnonymizationIn Refinement
Amelia: Yes, everything is described there. Is any other additional detail required
What should happen when you first run this new logic in production? There may be a huge number of old, previously created requests in the DB that have to be anonymized now, according to new settings. Process them one by one? Add one-time batch anonymization? Make it a data migration script when updating the module?
Raman: I found in the source code https://github.com/folio-org/mod-circulation/blob/master/src/main/java/org/folio/Environment.java#L20 that the current loan anonymization implementation has a protective mechanism - no more than 50,000 records are processed at a time. This means that if, after introducing the new anonymization logic, there are a large number of records in the database, the anonymization process will process them in batches of 50,000. All records will eventually be anonymized, but this may take some time.
Amelia: With batches that large I don’t think there will be any issues with having some delay, so we can follow the loans implementation on this as well.
Raman: I believe it makes sense to add a performance testing task to the feature to find out how quickly we can anonymize 50K requests and what impact this has on the performance of the modules mod-circulation/mod-pubsub/mod-audit, and the platform as a whole.
The server response also contains patronGroup and patronGroupId - should they also be deleted?
Amelia: No, those fields are required for reporting purposes and should remain on the record. They are not considered PII.
"Adding additional data to the request object for reporting purposes" - What is this about?
In this part of the Request Anon. feature, no additional data will be added to the request object. This was something mentioned on the original request anonymization ticket, but is not in-scope for any planned work at this time.
Given Settings > Circulation > Request anonymization is configured to {value} {time unit} after the request closes - What is the acceptable time lag between the anonymization time and the actual anonymization? In other words, how often should FOLIO check requests and perform automatic anonymization (say, once per hour, once per 12 hours, once per day, etc.)?
This should follow the timing used by the automated loan anonymization. I don’t know what that number is, but I believe it is either 1/hour or 1/half-hour.
Raman: Yes, еhe previously implemented process of anonymizing loans is launched once every 60 minutes.
Does the scope of the feature include anonymization of requests only in a single-tenant environment? What about ECS Circulation requests (for consortia) and mediated requests?
Yes, we only need to handle requests in a single-tenant for this initial implementation. Anonymization for cross-tenant requests and mediated requests will be developed in a later release.
Does the anonymization of loans/requests affect those records that were already in the circulation log?
Amelia: Yes, the user barcode is removed from all circulation log entries related to the anonymized loan. This should also be the case for Requests.
Raman: I tested this in the https://folio-snapshot.dev.folio.org/ environment - one can see the Anonymized circ action event on the screenshot, plus the user barcode has been removed for the corresponding loan (and it seems even for one request as well)