[FOLIO-2910] folio reference environment builds broken, incompatible version "notes" Created: 11/Dec/20 Updated: 13/Jan/21 Resolved: 15/Dec/20 |
|
| Status: | Closed |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Bug | Priority: | P3 |
| Reporter: | David Crossley | Assignee: | Sobha Duvvuri |
| Resolution: | Done | Votes: | 0 |
| Labels: | back-end, epam-spitfire | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Issue links: |
|
||||||||||||||||||||||||
| Sprint: | |||||||||||||||||||||||||
| Development Team: | Spitfire | ||||||||||||||||||||||||
| Description |
|
The hourly jobs of build-platform-complete-snapshot have been broken for many runs: Incompatible version for module folio_notes-4.0.100069 interface notes. Need 1.0. Have 2.0/mod-notes-2.10.3-SNAPSHOT.163. Incompatible version for module mod-circulation-19.3.0-SNAPSHOT.774 interface notes. Need 1.0. Have 2.0/mod-notes-2.10.3-SNAPSHOT.163. Incompatible version for module folio_stripes-smart-components-6.0.1000883 interface notes. Need 1.0. Have 2.0/mod-notes-2.10.3-SNAPSHOT.163 The daily build failed 2020-12-11 for folio-testing 742 is showing similar. Note that the other reference environment daily builds (folio-snapshot related ones) are indicating "successful" runs. However they are out-of-date because they are using the old artifacts from the most recent successful run of platform-complete-snapshot. As to the cause, see Zak's notes in comments below here (and
|
| Comments |
| Comment by Bohdan Suprun (Inactive) [ 11/Dec/20 ] |
|
The notes major version has been changed to 2.0 recently https://github.com/folio-org/mod-notes/commit/6fce921a96f80f4be61250954a004bad026bd2f4. Mod-circulation needs to be updated to require this version of the API. CC: Pavlo Smahin, Dima Tkachenko. |
| Comment by Marc Johnson [ 11/Dec/20 ] |
|
Thanks for investigating |
| Comment by Zak Burke [ 11/Dec/20 ] |
|
There were three UI PRs that merged yesterday related to this via
Lastly, is there a way to mimic the build process to uncover these kinds of incompatibilities, e.g. generate all the MDs and warn about conflicts? I asked about server-side interface incompatibilities in the comments on the SSC PR and heard we were clear. I'm not trying to point fingers (I break the build all the time!); I'm trying to understand how, given that we specifically looked for exactly this problem, we still missed it. |
| Comment by Marc Johnson [ 11/Dec/20 ] |
As I understand it, the snapshot builds use the install endpoint, which if it does not use specific versions of dependencies, it will use the latest dependencies that are compatible. I think that usually manifests itself quietly as a successfully built yet out of date environment. Sometimes this without notice for some time. The current build actually still uses a version of mod-notes that provides the 1.x interface. I imagine when folks come to test the new changes they will notice this. The overnight folio snapshot build wouldn't have picked up most of these changes at all, as the implicit precursor build that it relies on for configuration, platform-complete-snapshot had been failing due to these discrepancies and so did not update the config. This was one of the reasons why we kept both the testing and snapshot environments, because the testing builds tend to fail far more noisily when incompatibilities are encountered and that might lead to folks resolving them. John Malconian Ian Hardy Adam Dickmeiss Please feel free to correct or expand upon my clumsy understanding of this. |
| Comment by John Malconian [ 11/Dec/20 ] |
|
This issue also prevents folio-snapshot from being updated. folio-snapshot uses a yarn.lock and install.json files that has passed interface dependency checks. This check is performed in the Automation/build-platform-complete-snapshot Jenkins job. You can see that it is currently failing. https://jenkins-aws.indexdata.com/job/Automation/job/build-platform-complete-snapshot/ |
| Comment by David Crossley [ 13/Dec/20 ] |
|
I modified this ticket's Title and Description to better explain. |
| Comment by Zak Burke [ 14/Dec/20 ] |
|
Huh; looks like the ui-notes and stripes-smart-components PRs that updated the interface version to 2.0 failed in a very quiet way during the "publish module descriptor" step: ui-notes #70, stripes-smart-components #884. I totally missed both of these as the PRs themselves appeared successful. Building those master branches just now was successful; I'll start a new folio-snapshot build. I'm a little embarrassed to report that I even have monitoring set up for the snapshot branch of platform-complete and the master builds of SSC (among others...) and see those have been failing since Thursday. IOW, having been bit by this kind of quiet failure in the past, I set up monitors to make those failures noisier ... and still missed them. Sheesh. Maybe/additionally I can also publish those branch-failure warnings on Slack in #devops or #hosted-reference-environments? I'm using a free uptimerobot.com account with a Slack webhook; happy to abandon that in favor of something more official, or hand over the reins, etc. |
| Comment by Zak Burke [ 14/Dec/20 ] |
|
p.s. So the proximal reason for the platform-complete#snapshot build failing is that the most-recent SSC and ui-notes builds had failed. Do we have any insight into why the "publish module descriptor" step of both repositories failed? |
| Comment by Zak Burke [ 14/Dec/20 ] |
|
p.p.s. It looks like I have screwed up the dependency hierarchy below @folio/stripes, so even if the notes interface gets straightened out here, it won't surprise me if the UI is out of whack because I see two versions of @folio/stripes-components and @folio/stripes-core in the build, and that doesn't bode well. ... and I see the build I started failed because mod-circulation still wants notes 1.0, as in the ticket description: Incompatible version for module mod-circulation-19.3.0-SNAPSHOT.774 interface notes. Need 1.0. Have 2.0/mod-notes-2.10.3-SNAPSHOT.164 Phew; I get some breathing room to get my stripes hierarchy problem sorted. |
| Comment by Marc Johnson [ 14/Dec/20 ] |
|
Khalilah Gambrell Sobha Duvvuri Dima Tkachenko Oleksii Petrenko Pavlo Smahin The original work related to this appears to have been done by the Spitfire team. I think that means that the remedial work for this should be picked up by the Spitfire team. Until
|
| Comment by Zak Burke [ 14/Dec/20 ] |
|
Thanks, Marc Johnson. As noted on Slack, the implication of this bug is that no UI work merged from Thursday 2020-12-10 onward is available in any reference environment. There is a sprint demo tomorrow. While teams can/should be using the rancher environments to review their work internally, there is a longstanding, if unwritten, policy that local environments should not be used for demos during sprint review. What can be done to raise the profile of this issue? |
| Comment by Marc Johnson [ 14/Dec/20 ] |
And any back end work done to modules that rely on the notes interface after it was bumped.
I don't know. I've pinged most of the folks that I think could / should be interested. Folks could contact them directly. |
| Comment by Oleksii Petrenko [ 14/Dec/20 ] |
|
Marc Johnson Could you please clarify why this work should be done by Spitfire team? |
| Comment by Marc Johnson [ 14/Dec/20 ] |
|
Firstly, I apologise in advance if my response comes across as frustrated or terse, I did not anticipate this reaction.
As I understand it, the team making the breaking compatibility change is responsible for updating the dependents, either by doing the work or coordinating with other teams to synchronise the merging of the work. If a team does not do this it:
To me, this is only a small step from the implicit goal of all development work, to not break the build. Thus, I think
There might not be any written down policy. We can raise it with the Technical Leads / Technical Council to do so. I believe it was discussed as part of the check list work that Craig McNally and I did, which did not get rolled out very far unfortunately. An example of which explicitly states the need for coordinated work and described on the dev website.
How would that work in practice? How would they know about the compatibility breaking change? How would they know to plan that work? If we expect other teams to react without prior knowledge or coordination, we would effectively be prolonging the outages (like this one) I referred to above. Craig McNally Zak Burke Cate Boerema Am I misrepresenting FOLIO's policies (both for compatibility breaking changes and for changes needing to be reviewed on the folio-snapshot environment) in this regard? If Spitfire can't or won't do this work, then please assign this to Core Functional :-/ |
| Comment by Craig McNally [ 14/Dec/20 ] |
|
My understanding of the processes aligns with what Marc said. I've seen this done, and have done this myself in the past... In a previous release (fameflower?) I made breaking compatibility changes in mod-login which required coordination across several modules (FE and BE). I first tried to identify all of the consumers of the interface which was changing, and reached out to the maintainers of those modules. In some cases there was a fair amount of back and forth to try and ensure we didn't break the build. IIRC I also helped create some of the JIRAs needed to cover the work and necessary releases. It was not a trivial amount of work, which is why we generally try to stay away from these breaking changes. As I hinted above, I don't think we have a good way of tracking these interface dependencies. At one point we were using spreadsheets for each platform release, but then we moved to tracking this in JIRA which IMO proved to be confusing and a lot more work. The information in JIRA was out of date and misleading. This topic was raised during at least one of the last few release retrospectives, though I don't think the problem has been solved yet.
I'm on board with this. I'm not sure what's on the agenda for this week but I'll add it the topics list for TL. |
| Comment by Zak Burke [ 15/Dec/20 ] |
|
In UI-land, it is relatively easy to parse the package.json files for the modules in a platform to get a list of okapi interfaces and their expected/accepted versions. Indeed, this what Denys Bohdan did in the SSC PR to find the other modules that needed to be updated. Is there a way to do something similar with backend modules? |
| Comment by David Crossley [ 15/Dec/20 ] |
|
Not sure if this is what is needed, but when investigating something similar, i enhanced this doc: For example: curl -s -S -w'\n' \ 'http://folio-registry.aws.indexdata.com/_/proxy/modules?latest=1&require=notes%3D1.0' |
| Comment by Marc Johnson [ 15/Dec/20 ] |
|
Sobha Duvvuri Thank you for submitting a pull request for
|
| Comment by Marc Johnson [ 15/Dec/20 ] |
|
Oleksii Petrenko Khalilah Gambrell Sobha Duvvuri Dima Tkachenko Cate Boerema Holly Mistlebauer Charlotte Whitt Zak Burke Craig McNally David Crossley John Malconian Bohdan Suprun Folks will be pleased to know that Sobha Duvvuri pull request resolved the discrepancy and the folio-snapshot environment has been successfully rebuilt with updates versions. I have updated the proposed pull request guidelines to include a section about interface changes, hopefully once these are approved, this will reduce the chances of confusion in the future. We may wan to consider also updating any acceptance testing policy documentation we may have. |
| Comment by David Crossley [ 15/Dec/20 ] |
|
All reference environments are now re-built successfully. |
| Comment by Oleksii Petrenko [ 15/Dec/20 ] |
|
Marc Johnson Thank you for sharing community practices regarding this topic. |
| Comment by Marc Johnson [ 15/Dec/20 ] |
|
Oleksii Petrenko Thank you
That would be cool and appreciated (I haven't contemplated the options folks have shared above yet).
Thank you for reflecting upon this and changing the process. Personally, I think the responsibility goes further than notifying other teams. Otherwise, there is still a pretty good chance of disruption to the hosted environments. I think the team making the breaking change is responsible for coordinating the changes to (with best efforts) to minimise the disruption to our hosted environments (and hence review processes) The team making the change does not have to make those changes (although it can), I think it does need to work with the other teams to ensure they are all completed and merged in a coordinated fashion. Maybe this is what you meant, and I misunderstood what notified meant. |