POC: mod-search queries performance optimization
Description
Environment
Ramsons
Potential Workaround
CSP Request Details
CSP Rejection Details
CSP Approved
Attachments
- 06 Jan 2025, 05:04 PM
- 06 Jan 2025, 05:04 PM
- 27 Dec 2024, 01:16 PM
- 26 Dec 2024, 12:50 PM
- 26 Dec 2024, 12:50 PM
blocks
clones
continues
has to be done before
requires
Checklist
hideActivity
Roman_Fedynyshyn January 7, 2025 at 11:05 AM
Verified by PTF. No slow queries found
Valery_Pilko January 6, 2025 at 5:04 PM
Verified on Ramsons Bugfest environments.
All testable linked tickets to MSEARCH-923 are verified and closed (except https://folio-org.atlassian.net/browse/MSEARCH-905 ).
On Ramsons Bugfest environments, search works as expected (new/updated instance record could be found via search/browse) when big Data import job (create of MARC bib records) is in progress and right after it is finished:
On okapi tested with import of 25k MARC bibliographic records:
On Eureka tested with import of 100k MARC bibliographic records:
NOTE: a slight delay (2-3 seconds) sometimes is observed between moments of Subject/Contributor/Classification creation and when created record could be found via search (I don't think that it is crucial).
@Kalibek Turgumbayev , @Christine Schultz-Richert , @Khalilah Gambrell - do you want to review, or we can close this ticket?
JenkinsNotifications January 5, 2025 at 6:19 AM
Deployed to the Ramsons bf env. Moved status to In bugfix review from status Awaiting deployment. Please proceed with the verification.
Valery_Pilko December 27, 2024 at 1:16 PM
Today Re-tested upload of 25k of MARC bibliographic records Sprint testing environment https://folio-testing-sprint-fs09000000.ci.folio.org/
Import was successful - no queue in logs. Manually created Instances with Contributors, Subjects, Classifications could be found via Search/Browse.
Waiting for PTF testing results.
Valery_Pilko December 26, 2024 at 12:50 PM
Today we’ve tried to upload 25k of MARC bibliographic records on Sprint testing environment https://folio-testing-sprint-fs09000000.ci.folio.org/
Only about 18k of records were created
@Kalibek Turgumbayev investigating it
Intro:
New features for mod-search was introduced in Ramsons. And a lot of functionality of document merging were moved to DB side. New tables also added to mod-search schema (contributor, contributor_instance, instance, holding, item, etc. )
Issue description:
As mod-search now got multiple processes running on DB side it may affect severely another DB related processes (for example data import).
It was observed by PTF that with running mod-search, DI 25 K MARC BIB created affected. (17 minutes with running mod-search vs 11 min without it).
It’s clearly visible that queries like this start to run as soon as DI started, occupying almost 4 VCPU’s of DB.
SELECT c.id, c.name, c.name_type_id, c.authority_id, json_agg( json_build_object( 'count', sub.instance_count, 'typeId', sub.type_ids, 'shared', sub.shared, 'tenantId', sub.tenant_id ) ) AS instances FROM ( SELECT ins.contributor_id, ins.tenant_id, ins.shared, array_agg(DISTINCT ins.type_id) FILTER (WHERE ins.type_id <> '') as type_ids, COUNT(DISTINCT ins.instance_id) AS instance_count FROM cs00000int_mod_search.instance_contributor ins WHERE ins.contributor_id IN (?, ?) GROUP BY ins.contributor_id, ins.tenant_id, ins.shared ) sub JOIN cs00000int_mod_search.contributor c ON c.id = sub.contributor_id WHERE c.id IN (?, ?) GROUP BY c.id;
Moreover this query continue running even after DI already completed (as mod-search working asynchronously).
Please optimise this typical queries to not load DB as it is now.