[FOLIO-1551] Missing or incomplete documentation of data attributes in many module APIs Created: 04/Oct/18  Updated: 26/Feb/19  Resolved: 07/Feb/19

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Umbrella Priority: P3
Reporter: Nassib Nassar Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: core
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Relates
relates to CIRCSTORE-144 Missing or incomplete documentation o... Open
relates to MODUSERS-119 Missing or incomplete documentation o... Closed
relates to FOLIO-1447 FOLIO policy for RAML and JSON descri... Blocked
relates to MODGQL-119 Investigate keeping JSON Schema link ... Closed
relates to MODINVSTOR-203 Use description fields in JSON schemas Closed
Sprint:
Development Team: Prokopovych

 Description   

Many module APIs have missing or incomplete documentation of data attributes. Documentation for each attribute should include:

  • A description of what the attribute means
  • The precise domain of the attribute (range of values, controlled vocabulary, etc.)
  • Whether the attribute is required or optional
  • Where to find referenced data, if the attribute is a foreign key
  • Any other constraints on the data, e.g. whether an attribute allows only unique values

This is needed to ensure data quality for reporting.

This issue is a blocker for: https://folio-org.atlassian.net/browse/UXPROD-1128

This issue relates to feature issue: https://folio-org.atlassian.net/browse/UXPROD-1414

Two examples of attribute documentation:

I.

Attribute name:     username
Description:        The user's login name.  This also serves as an unique,
                    human-readable identifier for the user.
Domain:             String of alphanumeric Unicode characters, beginning with
                    an alphabetic character.  Maximum of 16 characters.
Required:           Yes
References:         N/A
Other constraints:  Unique, but may be reused after the user is deleted.

II.

Attribute name:     patronGroup
Description:        The patron group that the user belongs to.
Domain:             UUID
Required:           Yes
References:         "User Groups" /groups/{groupId} (e.g. mod-users)
Other constraints:  None

These examples are intended not to prescribe a documentation format or style but to illustrate further the basic documentation content being requested.



 Comments   
Comment by Jakub Skoczen [ 09/Oct/18 ]

Nassib Nassar can you provide examples on how this documentation could be provided? For a selected data elements (e.g controlled vocabulary, UUID) etc. Should this be part of the JSON schema and RAML with a specific syntax used or are we talking about "human-readable" descriptions?

Comment by Nassib Nassar [ 09/Oct/18 ]

I have no preference about what form the documentation takes.

Comment by Sebastian Hammer [ 01/Nov/18 ]

From my perspective, it seems obvious that the documentation should accompany the interface definitions and get picked up by https://dev.folio.org/reference/api/ .... it's a compelling aspect of FOLIO that the interfaces are all there and available to developers, but we really undermine that aspect by leaving so many interfaces undocumented.. this will impact any client developer and anyone looking to build reporting using any kind of methodology.

It also seems like there is at least a possibility that documenting these interfaces will ultimately speed up development by reducing wrongful assumptions about how the interfaces are used, and by reducing the need for developers asking other developers about usage.

Comment by Nassib Nassar [ 01/Nov/18 ]

I didn't intend for my response above to be quite so terse. In my reply to the same question sent by Jakub on Slack, I had added that it's only the "content" of the documentation that is a problem for reporting, not particularly the "form" it is presented in. I don't want to impose more prescriptive detail here than necessary. For one thing, there may be related considerations in FOLIO that I am unaware of. The only thing essential for reporting is, I think, that we have the information listed in the Description field above; basically complete documentation of the allowed values for inputs/outputs in the API/interface. I could offer opinions about what form the information content should take, but I am not really in the best position to make recommendations about that for FOLIO at the moment. Having said that, I agree with the sentiment above that the documentation should be worked into the JSON schema definition, ideally making use of its self-documenting features if they will be clearly reflected in the autogenerated API documentation, or at least in prose within the description field, which I think is possible for the requested information.

Comment by Charlotte Whitt [ 01/Nov/18 ]

Here link to a guideline document which Ann-Marie Breaux, Tiziana Possemato and I started: https://docs.google.com/document/d/1T0cQ5SpbuwefPkdpkP9F-olxm6iYcNetg8aRZq7QQ6Y/edit

Comment by Marc Johnson [ 01/Nov/18 ]

I'm probably wading into a conversation that I'm missing some context on, so please let me know if my thoughts are appropriate or valuable, and I can step away if that would be more valuable.

My interpretation is that the primary purpose of this is to allow developers/users of an interface (or the records, in the case of reporting) to have more understanding of the expectations around what the range of acceptable records are.

Is that a reasonable summary of the goal?

In effect this will expand (or at least describe better) the minimal set of behavioural expectations for any implementation as well. How strongly are we expecting implementations to comply to these constraints, e.g. is it a bug if an implementation does not enforce that a value is unique or that a property value is in a specified set?

As we aren't able to express much of these constraints (e.g. unique values) in the schema itself, is it intended that much of this will only be in the human readable description property?

Is this only going to apply to the interfaces/implementations that the core team provides, or is this intended to apply to all FOLIO modules?

I think we need to be aware of the trade-off between expressing these expectations in the interface and limiting the variability allowed by implementations (e.g. an implementation may not be possible if it wanted to weaken a constraint). And conversely, also expect that implementations may impose further constraints (so not all records fulfilling these expectations may be valid).

References and controlled vocabularies

Where to find referenced data, if the attribute is a foreign key

Would that take the form of stating which interface would be used to find records referenced by this property?

I believe this is intended to guide a developer or user about where to find records referenced by this property.

Is it also intended to set an expectation that an implementation needs to verify that the referred to record exists (during some operations)? In the sense that, clients/users won't expect to handle that a reference might not be valid (that no record can be found using the reference).

Is that the same for references to controlled vocabularies (or are the expectations different)?

Conditional validation

Whether the attribute is required or optional

Does this include conditional situations, where a property is only required when a different property is a particular value? For example, a userId is only required for an open loan, and optional for closed loan, so is optional in the schema (some of our tooling does not allow us to use some of the more complex structures which could describe this explicitly).

Comment by David Crossley [ 02/Nov/18 ]

The document at dev.folio.org/guides/describe-schema/ has been improved, and also linked back to these issues for further guidance.

Comment by David Crossley [ 02/Nov/18 ]

Marc, your summary seems spot-on to me. You raise some important technical considerations too.

Yes this applies to all modules, with emphasis on core modules at this stage. See the related FOLIO-1447 Blocked which has links to all back-end RAML-related modules that have missing descriptions.

Comment by Marc Johnson [ 02/Nov/18 ]

David Crossley Thanks

Comment by Jakub Skoczen [ 06/Nov/18 ]

Marc Johnson what do you think about the following approach:

• for optional attributes: the schema should capture if an attr is optional or mandatory
• for allowed values, the schema should include the “type” and “format” fields to constraint the non-ref values
• for referential values — there’s no standard in JSON schema to capture this information, so I assume we need something in the description field. By its nature it is not going to be very precise, but still could be helpful. This is probably the hardest one to address because it essentially need some guidelines for people on on how to write “descriptions”. David can get something done here.

Comment by David Crossley [ 07/Nov/18 ]

Enhanced that document a bit more, following last yesterday's meeting and the comments here. Also linked to another more complete example.

(Cache can take 24 hours to propagate, or is available to you now.)

Comment by David Crossley [ 07/Nov/18 ]

In RAML 1.0 the "description" nodes can utilise Markdown.

Should we encourage that to enable links to specific further information?

Comment by Mike Taylor [ 09/Jan/19 ]

The machine-readable link descriptions mentioned in MODGQL-119 Closed may provide what we need.

Comment by Nassib Nassar [ 09/Jan/19 ]

Thanks. My understanding at this time is that MODGQL-119 Closed documents foreign key relationships but not the other constraints requested in the description of this issue.

Comment by Mike Taylor [ 10/Jan/19 ]

For referential values — there’s no standard in JSON schema to capture this information, so I assume we need something in the description field. By its nature it is not going to be very precise, but still could be helpful. This is probably the hardest one to address because it essentially need some guidelines for people on on how to write “descriptions”. David can get something done here.

To note again (possibly new information for the present audience): mod-graphql needs this information in a machine-readable form. I therefore defined some JSON-Schema extension fields, having established that there is no suitable pre-existing standard: see https://github.com/folio-org/mod-graphql/blob/master/src/autogen/README.md#option-1-json-schema-extensions
We have added some link-fields that are defined using these extensions: for example, holdingsRecords2 in the Instance schema is defined by a specification of how to go out and get holdings records from elsewhere and embed them in the instance: see https://github.com/folio-org/mod-inventory-storage/blob/1c2571d695bdc500e71a3dfb556df5a7e4564f5d/ramls/instance.json#L303-L316

Note 1. In general it does not suffice to say "We'll add a description to the link field" because sometimes there is no link field. An instance records doesn't have a holdingsId field that links to its holdings records; instead, each holdings record has an instanceId that links to its instance record.)

Note 2. Does this information properly belong in the JSON Schema? Opinions differ (I say yes), but see MODGQL-119 Closed for a way to keep our options open on this.

Comment by Cate Boerema (Inactive) [ 07/Feb/19 ]

Closing as duplicate of UXPROD-1414 Closed . I've moved the requirements from this umbrella into the individual stories as acceptance criteria. The links have also been migrated.

Generated at Thu Feb 08 23:14:10 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.