POC: mod-search queries performance optimization

Description

Intro:
New features for mod-search was introduced in Ramsons. And a lot of functionality of document merging were moved to DB side. New tables also added to mod-search schema (contributor, contributor_instance, instance, holding, item, etc. )

Issue description:
As mod-search now got multiple processes running on DB side it may affect severely another DB related processes (for example data import).
It was observed by PTF that with running mod-search, DI 25 K MARC BIB created affected. (17 minutes with running mod-search vs 11 min without it).
It’s clearly visible that queries like this start to run as soon as DI started, occupying almost 4 VCPU’s of DB.

SELECT c.id, c.name, c.name_type_id, c.authority_id, json_agg( json_build_object( 'count', sub.instance_count, 'typeId', sub.type_ids, 'shared', sub.shared, 'tenantId', sub.tenant_id ) ) AS instances FROM ( SELECT ins.contributor_id, ins.tenant_id, ins.shared, array_agg(DISTINCT ins.type_id) FILTER (WHERE ins.type_id <> '') as type_ids, COUNT(DISTINCT ins.instance_id) AS instance_count FROM cs00000int_mod_search.instance_contributor ins WHERE ins.contributor_id IN (?, ?) GROUP BY ins.contributor_id, ins.tenant_id, ins.shared ) sub JOIN cs00000int_mod_search.contributor c ON c.id = sub.contributor_id WHERE c.id IN (?, ?) GROUP BY c.id;

Moreover this query continue running even after DI already completed (as mod-search working asynchronously).

Please optimise this typical queries to not load DB as it is now.

Environment

Ramsons

Potential Workaround

None

CSP Request Details

None

CSP Rejection Details

None

CSP Approved

None

Attachments

5
  • 06 Jan 2025, 05:04 PM
  • 06 Jan 2025, 05:04 PM
  • 27 Dec 2024, 01:16 PM
  • 26 Dec 2024, 12:50 PM
  • 26 Dec 2024, 12:50 PM

Confluence content

mentioned on

Checklist

hide

Activity

Show:

Roman_Fedynyshyn January 7, 2025 at 11:05 AM

Verified by PTF. No slow queries found

Valery_Pilko January 6, 2025 at 5:04 PM

Verified on Ramsons Bugfest environments.
All testable linked tickets to MSEARCH-923 are verified and closed (except https://folio-org.atlassian.net/browse/MSEARCH-905 ).
On Ramsons Bugfest environments, search works as expected (new/updated instance record could be found via search/browse) when big Data import job (create of MARC bib records) is in progress and right after it is finished:

  • On okapi tested with import of 25k MARC bibliographic records:

image-20250106-164919.png
  • On Eureka tested with import of 100k MARC bibliographic records:

    image-20250106-165020.png


NOTE: a slight delay (2-3 seconds) sometimes is observed between moments of Subject/Contributor/Classification creation and when created record could be found via search (I don't think that it is crucial).

, , - do you want to review, or we can close this ticket?

JenkinsNotifications January 5, 2025 at 6:19 AM

Deployed to the Ramsons bf env. Moved status to In bugfix review from status Awaiting deployment. Please proceed with the verification.

Valery_Pilko December 27, 2024 at 1:16 PM

Today Re-tested upload of 25k of MARC bibliographic records Sprint testing environment https://folio-testing-sprint-fs09000000.ci.folio.org/
Import was successful - no queue in logs. Manually created Instances with Contributors, Subjects, Classifications could be found via Search/Browse.
Waiting for PTF testing results.

image-20241227-131442.png

Valery_Pilko December 26, 2024 at 12:50 PM

Today we’ve tried to upload 25k of MARC bibliographic records on Sprint testing environment https://folio-testing-sprint-fs09000000.ci.folio.org/
Only about 18k of records were created

image-20241226-124943.png
image-20241226-124956.png

investigating it

Done

Details

Assignee

Reporter

Priority

Story Points

Sprint

Development Team

Spitfire

Fix versions

Release

Ramsons (R2 2024) Bug Fix

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs
Created December 16, 2024 at 2:48 PM
Updated March 13, 2025 at 7:05 PM
Resolved January 3, 2025 at 4:04 PM
TestRail: Cases
TestRail: Runs

Flag notifications