Purpose
For today the Search API in SRS works only with newly imported Marc records.
This page is created with attendance to describe the approach to enable SRS Search for the existing Marc records.
Approach
The approach is to provide the filler script to the customer. The DB administrator/staff are able to run it once SRS is deployed and ready for use.
The script copies records from marc_records_lb table (only records that are missing in marc_indexers) and inserts records into marc_indexers table.
The content of script:
- Define search_path to make the script run in a context of tenant/institution
SET search_path TO <tenant_mod_source_record_storage>;
Note: The DB administrator should replace the <tenant_mod_source_record_storage> on a proper value. For example, on the bugfest environment <tenant_mod_source_record_storage> = fs09000000_mod_source_record_storage, on the reference environment <tenant_mod_source_record_storage> = diku_mod_source_record_storage.
- Remove existing indexes to speed up migration that happens after
RAISE INFO 'Step1: Removing indexes %', NOW(); drop index if exists idx_marc_indexers_leader_marc_id; for index in 0 .. 999 loop suffix = lpad(index::text, 3, '0'); execute 'drop index if exists idx_marc_indexers_marc_id_' || suffix || ';'; end loop;
- Migrate records from marc_records_lb table to marc_indexers & marc_indexers_leader tables
The sql copies Marc fields from marc_records_lb table to marc_indexers table, and Marc leaders from marc_records_lb table to marc_indexers_leader. Both marc_indexers & marc_indexers_leader are involved into search mechanism, so here a Marc records become searchable.
Note: If some Marc field of the Marc record, that is involved into migration, is symbolic (cat, own, lkr and so on), this field is skipped from the migration being stored only in marc_records_lb
RAISE INFO 'Step2: Migration: %', NOW(); for rec in (select id, content from marc_records_lb x where id not in (select marc_id from marc_indexers)) loop IF (recordIndex % logFrequency = 0) THEN RAISE INFO 'RecordIndex: %', recordIndex; END IF; recordIndex := recordIndex + 1; insert into marc_indexers (field_no, ind1, ind2, subfield_no, value, marc_id) (with vals as (select value from jsonb_array_elements((select value from jsonb_each(rec.content) x where key = 'fields')) y), fields as (select x.key as field_no, x.value as field_value from vals, jsonb_each(vals.value) x), fields_subfields as (select field_no, trim(field_value ->> 'ind1'::text) ind1, trim(field_value ->> 'ind2') ind2, field_value -> 'subfields' subfields, field_value from fields where field_no between '000' and '999'), marc_raw as (select fs.field_no, fs.ind1, fs.ind2, fs.field_value, null::text subfield_no, null::text subfield_value from fields_subfields fs where subfields is null union all select fs.field_no, fs.ind1, fs.ind2, fs.field_value, subfs.key::text subfield_no, subfs.value::text subfield_value from fields_subfields fs, jsonb_array_elements(fs.subfields) sx, jsonb_each(sx.value) subfs where subfields is not null), marc as (select m.field_no, CASE WHEN ind1 IS NULL or ind1 = '' THEN '#' ELSE ind1 END as ind1, CASE WHEN ind2 IS NULL or ind2 = '' THEN '#' ELSE ind2 END as ind2, CASE WHEN subfield_no IS NULL or trim(subfield_no) = '' THEN '0' ELSE subfield_no END as subfield_no, trim(both '"' from coalesce(subfield_value, field_value::text)) as value from marc_raw m) select distinct lower(field_no) field_no, ind1, ind2, subfield_no, value, rec.id marc_id from marc); insert into marc_indexers_leader(p_00_04, p_05, p_06, p_07, p_08, p_09, p_10, p_11, p_12_16, p_17, p_18, p_19, p_20, p_21, p_22, marc_id) (select substring(value from 1 for 5) p_00_04, substring(value from 6 for 1) p_05, substring(value from 7 for 1) p_06, substring(value from 8 for 1) p_07, substring(value from 9 for 1) p_08, substring(value from 10 for 1) p_09, substring(value from 11 for 1) p_10, substring(value from 12 for 1) p_11, substring(value from 13 for 5) p_12_16, substring(value from 18 for 1) p_17, substring(value from 19 for 1) p_18, substring(value from 20 for 1) p_19, substring(value from 21 for 1) p_20, substring(value from 22 for 1) p_21, substring(value from 23 for 1) p_22, marc_id from (select replace(lower(trim(both '"' from value::text)), ' ', '#') as value, rec.id marc_id from jsonb_each(rec.content) x where key = 'leader') y); end loop;
- Restore/Create indexes
RAISE INFO 'Step3: Creating indexes %', NOW(); create index idx_marc_indexers_leader_marc_id on marc_indexers_leader (marc_id); for index in 0 .. 999 loop suffix = lpad(index::text, 3, '0'); execute 'create index idx_marc_indexers_marc_id_' || suffix || ' on marc_indexers_' || suffix || '(marc_id);'; end loop;
- Running
Source on gist: https://gist.github.com/Igor-Gorchakov/4d71e19620f47af9da45aa5cf22f578b
Source on github repo on SRS: <TODO put a link later>
Running from psql:
psql -h hostname -U folio -f fill-marc-indexes.sql
Execution process:
Starting:
Ending:
Time needed for complete execution: ±5 hours
Tested number of records: ±8 million records in marc_records_lb:
As a result of execution the records from marc_records_lb table are populated into marc_indexers & marc_indexers_leaders:
Restrictions & limitations & edge cases
- If there is a need to import (using data-import) new Marc records that contain symbolic fields (cat, own and so on) on Iris SRS, and make such records enabled for Search API, then there should be created hotfix. Need to add partitions for symbolic fields to search tables to make such fields searchable.
2. Now we do not know how exact adding/posting(using REST API) new Marc records with symbolic fields will affect the Search API and data-import.
For the Search API this records will not be available, because the database trigger for indexing the record for search will fail.
For the data-import this can probably fail the import job, because DB trigger will try to put a Marc field into none existing partition throwing runtime exception, this case needs to be double checked.