[FOLIO-2910] folio reference environment builds broken, incompatible version "notes" Created: 11/Dec/20  Updated: 13/Jan/21  Resolved: 15/Dec/20

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Bug Priority: P3
Reporter: David Crossley Assignee: Sobha Duvvuri
Resolution: Done Votes: 0
Labels: back-end, epam-spitfire
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Blocks
is blocked by CIRC-1055 Support notes 2.0 interface Closed
Relates
relates to FOLIO-2957 report reference env build errors to ... Open
relates to MODNOTES-163 Assign/Unassign Notes Modal: Support ... Closed
relates to STSMACOM-466 Assign/Unassign Notes Modal: Support ... Closed
Sprint:
Development Team: Spitfire

 Description   

The hourly jobs of build-platform-complete-snapshot have been broken for many runs:

Incompatible version for module folio_notes-4.0.100069
interface notes. Need 1.0. Have 2.0/mod-notes-2.10.3-SNAPSHOT.163.

Incompatible version for module mod-circulation-19.3.0-SNAPSHOT.774
interface notes. Need 1.0. Have 2.0/mod-notes-2.10.3-SNAPSHOT.163.

Incompatible version for module folio_stripes-smart-components-6.0.1000883
interface notes. Need 1.0. Have 2.0/mod-notes-2.10.3-SNAPSHOT.163

The daily build failed 2020-12-11 for folio-testing 742 is showing similar.

Note that the other reference environment daily builds (folio-snapshot related ones) are indicating "successful" runs. However they are out-of-date because they are using the old artifacts from the most recent successful run of platform-complete-snapshot.

As to the cause, see Zak's notes in comments below here (and STSMACOM-466 Closed and other related).



 Comments   
Comment by Bohdan Suprun (Inactive) [ 11/Dec/20 ]

The notes major version has been changed to 2.0 recently https://github.com/folio-org/mod-notes/commit/6fce921a96f80f4be61250954a004bad026bd2f4. Mod-circulation needs to be updated to require this version of the API.

CC: Pavlo Smahin, Dima Tkachenko.

Comment by Marc Johnson [ 11/Dec/20 ]

Bohdan Suprun

Thanks for investigating

Comment by Zak Burke [ 11/Dec/20 ]

There were three UI PRs that merged yesterday related to this via STSMACOM-466 Closed (stripes-smart-components, ui-users, ui-notes) that updated the notes interface version to 2.0. I understand the incompatibility here is with mod-circulation wanting notes v1.0 and others wanting v2.0. What I don't understand is how folio-snapshot built successfully with notes v1.0 given the MDs generated by the UI should all be asking for v2.0. Can somebody help me to understand that piece?

Lastly, is there a way to mimic the build process to uncover these kinds of incompatibilities, e.g. generate all the MDs and warn about conflicts? I asked about server-side interface incompatibilities in the comments on the SSC PR and heard we were clear. I'm not trying to point fingers (I break the build all the time!); I'm trying to understand how, given that we specifically looked for exactly this problem, we still missed it.

Comment by Marc Johnson [ 11/Dec/20 ]

Zak Burke

I understand the incompatibility here is with mod-circulation wanting notes v1.0 and others wanting v2.0. What I don't understand is how folio-snapshot built successfully with notes v1.0 given the MDs generated by the UI should all be asking for v2.0. Can somebody help me to understand that piece?

As I understand it, the snapshot builds use the install endpoint, which if it does not use specific versions of dependencies, it will use the latest dependencies that are compatible.

I think that usually manifests itself quietly as a successfully built yet out of date environment. Sometimes this without notice for some time.

The current build actually still uses a version of mod-notes that provides the 1.x interface. I imagine when folks come to test the new changes they will notice this.

The overnight folio snapshot build wouldn't have picked up most of these changes at all, as the implicit precursor build that it relies on for configuration, platform-complete-snapshot had been failing due to these discrepancies and so did not update the config.

This was one of the reasons why we kept both the testing and snapshot environments, because the testing builds tend to fail far more noisily when incompatibilities are encountered and that might lead to folks resolving them.

John Malconian Ian Hardy Adam Dickmeiss Please feel free to correct or expand upon my clumsy understanding of this.

Comment by John Malconian [ 11/Dec/20 ]

This issue also prevents folio-snapshot from being updated. folio-snapshot uses a yarn.lock and install.json files that has passed interface dependency checks. This check is performed in the Automation/build-platform-complete-snapshot Jenkins job. You can see that it is currently failing.

https://jenkins-aws.indexdata.com/job/Automation/job/build-platform-complete-snapshot/

Comment by David Crossley [ 13/Dec/20 ]

I modified this ticket's Title and Description to better explain.

Comment by Zak Burke [ 14/Dec/20 ]

Huh; looks like the ui-notes and stripes-smart-components PRs that updated the interface version to 2.0 failed in a very quiet way during the "publish module descriptor" step: ui-notes #70, stripes-smart-components #884. I totally missed both of these as the PRs themselves appeared successful.

Building those master branches just now was successful; I'll start a new folio-snapshot build.

I'm a little embarrassed to report that I even have monitoring set up for the snapshot branch of platform-complete and the master builds of SSC (among others...) and see those have been failing since Thursday. IOW, having been bit by this kind of quiet failure in the past, I set up monitors to make those failures noisier ... and still missed them. Sheesh. Maybe/additionally I can also publish those branch-failure warnings on Slack in #devops or #hosted-reference-environments? I'm using a free uptimerobot.com account with a Slack webhook; happy to abandon that in favor of something more official, or hand over the reins, etc.

Comment by Zak Burke [ 14/Dec/20 ]

p.s. So the proximal reason for the platform-complete#snapshot build failing is that the most-recent SSC and ui-notes builds had failed. Do we have any insight into why the "publish module descriptor" step of both repositories failed?

Comment by Zak Burke [ 14/Dec/20 ]

p.p.s. It looks like I have screwed up the dependency hierarchy below @folio/stripes, so even if the notes interface gets straightened out here, it won't surprise me if the UI is out of whack because I see two versions of @folio/stripes-components and @folio/stripes-core in the build, and that doesn't bode well.

... and I see the build I started failed because mod-circulation still wants notes 1.0, as in the ticket description:

Incompatible version for module mod-circulation-19.3.0-SNAPSHOT.774 interface notes. 
Need 1.0. Have 2.0/mod-notes-2.10.3-SNAPSHOT.164

Phew; I get some breathing room to get my stripes hierarchy problem sorted.

Comment by Marc Johnson [ 14/Dec/20 ]

Khalilah Gambrell Sobha Duvvuri Dima Tkachenko Oleksii Petrenko Pavlo Smahin

The original work related to this appears to have been done by the Spitfire team. I think that means that the remedial work for this should be picked up by the Spitfire team.

Until CIRC-1055 Closed is done, the folio-snapshot environment will not be updated which will prevent testing of new features.

cc: Cate Boerema Holly Mistlebauer

Comment by Zak Burke [ 14/Dec/20 ]

Thanks, Marc Johnson. As noted on Slack, the implication of this bug is that no UI work merged from Thursday 2020-12-10 onward is available in any reference environment.

There is a sprint demo tomorrow. While teams can/should be using the rancher environments to review their work internally, there is a longstanding, if unwritten, policy that local environments should not be used for demos during sprint review. What can be done to raise the profile of this issue?

Comment by Marc Johnson [ 14/Dec/20 ]

Zak Burke

the implication of this bug is that no UI work merged from Thursday 2020-12-10 onward is available in any reference environment.

And any back end work done to modules that rely on the notes interface after it was bumped.

While teams can/should be using the rancher environments to review their work internally, there is a longstanding, if unwritten, policy that local environments should not be used for demos during sprint review. What can be done to raise the profile of this issue?

I don't know. I've pinged most of the folks that I think could / should be interested. Folks could contact them directly.

Comment by Oleksii Petrenko [ 14/Dec/20 ]

Marc Johnson Could you please clarify why this work should be done by Spitfire team?
I completely agree with you that interface version has been changed and at the same time I could not find any reference or procedure where noted that team which changes interface version should proceed with upgrades of dependent modules.
As I know modules owners are responsible for supporting of new dependent interface versions.
Please correct me if I am wrong

Holly Mistlebauer

Comment by Marc Johnson [ 14/Dec/20 ]

Oleksii Petrenko

Firstly, I apologise in advance if my response comes across as frustrated or terse, I did not anticipate this reaction.

Could you please clarify why this work should be done by Spitfire team?

As I understand it, the team making the breaking compatibility change is responsible for updating the dependents, either by doing the work or coordinating with other teams to synchronise the merging of the work.

If a team does not do this it:

  • means that their own work cannot be reviewed by a PO and closed because it won't be deployed to the folio-snapshot environment
  • almost guarantees environmental outages that affect all teams.

To me, this is only a small step from the implicit goal of all development work, to not break the build.

Thus, I think CIRC-1055 Closed should have been part of the original scope of the feature and be done (or at least have been coordinated by Spitfire)

I could not find any reference or procedure where noted that team which changes interface version should proceed with upgrades of dependent modules.

There might not be any written down policy. We can raise it with the Technical Leads / Technical Council to do so.

I believe it was discussed as part of the check list work that Craig McNally and I did, which did not get rolled out very far unfortunately. An example of which explicitly states the need for coordinated work and described on the dev website.

As I know modules owners are responsible for supporting of new dependent interface versions.

How would that work in practice?

How would they know about the compatibility breaking change? How would they know to plan that work?

If we expect other teams to react without prior knowledge or coordination, we would effectively be prolonging the outages (like this one) I referred to above.

Craig McNally Zak Burke Cate Boerema Am I misrepresenting FOLIO's policies (both for compatibility breaking changes and for changes needing to be reviewed on the folio-snapshot environment) in this regard?

If Spitfire can't or won't do this work, then please assign this to Core Functional :-/

cc: Zak Burke Holly Mistlebauer

Comment by Craig McNally [ 14/Dec/20 ]

My understanding of the processes aligns with what Marc said. I've seen this done, and have done this myself in the past...

In a previous release (fameflower?) I made breaking compatibility changes in mod-login which required coordination across several modules (FE and BE). I first tried to identify all of the consumers of the interface which was changing, and reached out to the maintainers of those modules. In some cases there was a fair amount of back and forth to try and ensure we didn't break the build. IIRC I also helped create some of the JIRAs needed to cover the work and necessary releases. It was not a trivial amount of work, which is why we generally try to stay away from these breaking changes.

As I hinted above, I don't think we have a good way of tracking these interface dependencies. At one point we were using spreadsheets for each platform release, but then we moved to tracking this in JIRA which IMO proved to be confusing and a lot more work. The information in JIRA was out of date and misleading. This topic was raised during at least one of the last few release retrospectives, though I don't think the problem has been solved yet.

We can raise it with the Technical Leads / Technical Council

I'm on board with this. I'm not sure what's on the agenda for this week but I'll add it the topics list for TL.

Comment by Zak Burke [ 15/Dec/20 ]

In UI-land, it is relatively easy to parse the package.json files for the modules in a platform to get a list of okapi interfaces and their expected/accepted versions. Indeed, this what Denys Bohdan did in the SSC PR to find the other modules that needed to be updated.

Is there a way to do something similar with backend modules?

Comment by David Crossley [ 15/Dec/20 ]

Not sure if this is what is needed, but when investigating something similar, i enhanced this doc:
https://dev.folio.org/faqs/how-to-which-module-which-interface-endpoint/

For example:

curl -s -S -w'\n' \
  'http://folio-registry.aws.indexdata.com/_/proxy/modules?latest=1&require=notes%3D1.0'
Comment by Marc Johnson [ 15/Dec/20 ]

Sobha Duvvuri Thank you for submitting a pull request for CIRC-1055 Closed

Comment by Marc Johnson [ 15/Dec/20 ]

Oleksii Petrenko Khalilah Gambrell Sobha Duvvuri Dima Tkachenko Cate Boerema Holly Mistlebauer Charlotte Whitt Zak Burke Craig McNally David Crossley John Malconian Bohdan Suprun

Folks will be pleased to know that Sobha Duvvuri pull request resolved the discrepancy and the folio-snapshot environment has been successfully rebuilt with updates versions.

I have updated the proposed pull request guidelines to include a section about interface changes, hopefully once these are approved, this will reduce the chances of confusion in the future. We may wan to consider also updating any acceptance testing policy documentation we may have.

Comment by David Crossley [ 15/Dec/20 ]

All reference environments are now re-built successfully.

Comment by Oleksii Petrenko [ 15/Dec/20 ]

Marc Johnson Thank you for sharing community practices regarding this topic.
Looks like team was not faced before with major interface version change that affect other teams.
Actually we proceed with impact analysis of this change only for our modules.
As release coordinator I shall prepare tool for getting data regarding affected dependencies.
Spitfire team will add note to definition of done to make sure that affected teams be notified.

Comment by Marc Johnson [ 15/Dec/20 ]

Oleksii Petrenko Thank you

As release coordinator I shall prepare tool for getting data regarding affected dependencies.

That would be cool and appreciated (I haven't contemplated the options folks have shared above yet).

Spitfire team will add note to definition of done to make sure that affected teams be notified.

Thank you for reflecting upon this and changing the process.

Personally, I think the responsibility goes further than notifying other teams. Otherwise, there is still a pretty good chance of disruption to the hosted environments.

I think the team making the breaking change is responsible for coordinating the changes to (with best efforts) to minimise the disruption to our hosted environments (and hence review processes)

The team making the change does not have to make those changes (although it can), I think it does need to work with the other teams to ensure they are all completed and merged in a coordinated fashion.

Maybe this is what you meant, and I misunderstood what notified meant.

Generated at Thu Feb 08 23:24:10 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.