/
Sunflower (R1 2025): Bug Fest Support Protocol
  • Ready for review
  • Sunflower (R1 2025): Bug Fest Support Protocol

    FSE team is responsible for Sunflower Bug Fest environment creation, maintenance, and ongoing updates.

    Bug Fest Update

    It is expected that Bug Fest environment is updated daily at agreed-upon time. Updates of the Bug Fest typically take 2 hours. During the weekend, there are no planned Bug Fest updates. Information on the updates of the environment can be found in the #eureka-bugfest-notify Slack channel.

    Activities calendar

    Daily:

    • Triage and resolve newly reported issues according to priorities

    • Update Sunflower Bug Fest environment according to schedule

    o   Morning update: 7.00-9.30 UTC+3 | 0:00 - 3:00 am ET

    Please note, only modules released by 17.30 UTC+3 | 10.30 am ET of the day before the update will be included in the Morning update)

    • Communicate in Slack channel on other planned/unplanned maintenance activities required to complete tasks or resolve blocking issue with the environment (if any)

    Weekly:

    • Review the ‘In Progress’ and ‘Closed’ tickets and make sure all labels are added (see guidance below)

    • Based on the completed tickets, update wiki page with helpful notes for hosting providers

    • Notify development teams in respective Jira issues to update release notes or module’s configuration details (on need basis)

    Monthly | Optional

    • Run survey (in Slack) to collect ideas and feedback from Bug Fest participants

    Bug Fest on duty person responsibilities

    To streamline update process of Bug Fest , an on duty engineer from FSE team is assigned for daily updates (planned/unplanned).

    On duty engineer should:

    1. Monitor and respond to #folio-bug-fest Slack channel messages related to Sunflower Bug Fest environment

    2. Monitor #folio-releases Slack channel with respect to Sunflower modules' releases as well as Eureka management modules updates

    3. Monitor/Trigger planned updates according to schedule (in case new releases were posted since previous Bug Fest environment update)

    4. Share information in #eureka-bugfest-notify channel in case update failed/stuck and provide updates on the issue resolution through reasonable intervals

    5. Change jira statuses after environment update from ‘Awaiting Deployment’ to ‘In bugfix Review’ (please see job Move-Jira-Tickets-depend-on-deployed-version-of-module at https://jenkins-aws.indexdata.com ). Remaining jira tickets should be also reviewed and status should be changed manually.

    Dashboard for the Sunflower BF environmental issues monitoring: Sunflower R1 2025 BF environmental issues

    Slack Communication

    #bug-fest Slack channel is a primary communication means to inform interested parties about any maintenance activities except standard daily updates of the Sunflower Bug Fest environment.

    For the planned maintenance, notification should be posted up to 2 hours in advance specifying the expected duration of the work. For unplanned maintenance notification should be posted with available details on the type of the issue and expected resolution time (if known).

    Messages related to the standard daily updates of environment are posted in ‘eureka-bugfest-notify’ channel and have specific structure, and inform on the time of upcoming update start, recently released modules (if any), new versions on applications (if any), and expected time of update completion. Also, there is an automated notification in case of deployment issue.

    After maintenance completion, the update in the channel in the thread of initial message should be posted.

    Bug Fest Issues Reporting

    When a QA engineer from the development team finds a bug on Eureka Bugfest, they first should report the issue to their development team for initial triaging. The bug is created in the Jira project for the relevant module or functional area with their development team assigned.

    During the development team’s triage process, if the issue is determined to belong to the functionality of that development team, the bug remains in the initial Jira project or to be moved to the appropriate project owned by the development team.

    If the issue is considered outside the scope of the development team's modules/functional area, the QA engineer moves the bug to the BF Jira project (for issues likely related to infrastructure) or the Eureka Jira project (for all other cases).

    Following details should be added to the tickets reported for Sunflower Bug Fest issues:

    • URL of the tenant/environment where the issue is observed

    • URL of other environments where the issue is not reproduced (if any)

    • priority of the issue (P1, P2, P3, P4) | optional

    • bugfest_R1.2025' label | required

    • 'SunflowerBF' label | required

    • expected results

    • actual results

    • details on how to reproduce the issue (video record format is preferable), in any other cases – screenshot(s)

    • screenshot of the network tab when some of the functionality is not working (failed call, status, headers, response)

    Bug Fest issues triaging process

    During Bug fix period FSE team runs triaging sessions daily (12:00 AM UTC +3 | 5.00 AM ET)

    Process:

    1. During the Triaging session, tickets from the Sunflower R1 2025 BF environmental issues dashboard are reviewed (starting with P1s)

    2. ‘fse-reviewed’ label is added for the reviewed tickets

    3. owner from the FSE team is assigned

    4. in case of questions on requirements, comment is added with @mention of the person who created the ticket

    5. in case it is determined that the issue is not caused by any of the following root causes:

      • deployment (incorrect version of application is installed or application is not installed at all, module is not present in the application, …)

      • infrastructure malfunctioning (Kafka is down or overloaded, or messages are stuck, OpenSearch is <>, DB is <>, task is not stable or restarts, …)

      • misconfiguration (CPU, memory, env configuration, settings are not equal to documented ones on the page Modules Configuration details or in README file)

      • improper installation (was performed not according to Release notes, feature flag is not set or set incorrectly (if any), etc.)

      • timeouts configuration

      • application/service visibility

      • permission misconfiguration

    then the ticket should be moved to a project of the team owning the module where the issue was observed with changing the ‘development team’ field to corresponding one (please see matrix of responsibility)

    Please note:

    • It is expected that permissions/capabilities-related issues should be addressed by QA engineers/development teams themselves (if it’s not related to Admin role)

    • Data-related Jira tasks should have “data-related” label and appropriate SQL scripts should be added to the description field

    • Any configuration changes throughout the Bug Fest/Bug Fix period should be documented in the Release notes by development teams for their respective module(s)/eureka applications with @mention of FSE team of such a change in #bug-fest Slack channel

    Lifecycle of the tickets in Bug Fest jira project

    1. When the work on the ticket is started, its status should be changed from ‘Open’ to ‘In Progress’.

    2. If another existing ticket is blocking progress, the active one should be marked as blocked with adding blocking ticket as linked with the relation type ‘Blocks’. If a NEW blocking reason is identified, a corresponding ticket should be created in the corresponding Jira project and linked with ‘Blocks’ relation to the initial ticket.

    3. When the work is completed and testing is required by other parties, the status of the ticket should be changed to ‘In Review’ and the respective jira user should be assigned to perform testing (typically the one who reported the issue) and @mention the jira user in the comment to add more visibility to the request for validation of the work done.

    4. If the testing results confirm that the issue has been fully resolved, the status of the ticket should be changed to ‘Closed’ | ‘Done’. Also required labels should be added (see guidance below). When closing issue that related to configuration/environment root cause (and not functional), the 'bugfest_R1.2025' label should be removed from Jira ticket.

    5. If it appears in the result of triaging/troubleshooting that the work is not needed, then the ticket should be closed with adding resolution ‘Won’t do’, ‘Cannot Reproduce’, ‘Declined’, or ‘Duplicate’ for the issues that already exist or have been already addressed.

    Basic rules:

    • tickets with higher priority have precedence over the rest ones

    • for the tickets with P1 priority, it is recommended to add updates/comments daily to keep stakeholders informed on the status/progress/challenges(if any)

    • for tickets occasionally put into BF project and which require work from another team, the project should be changed to corresponding project (connect with SM/TL of the team if clarification needed)

    Troubleshooting guidelines

    1. check cluster state

    2. review recent changes

    3. review release notes

    4. search for similar cases in jira

    5. talk to development team responsible for appropriate functionality

    6. get in touch with QA and/or development team for complex workflows and/or unclear requirements

    7. get in touch with Kitfox

    8. start war-room for critical (blocking) issues to collaboratively brainstorm/resolve issue

    Labels for jira tickets in Bug Fest project

    To support further analysis and preparation for future bug fests, following labels should be added to environmental issues:

    • data-related - fix/changes related to data tuning/fixing

    • configuration-related – fix/changes related to configuration

    • capabilities-related - fix/changes related to capabilities

    Related content