SPIKE- investigate options for improved OAI-PMH harvesting
- Oleksandr Bozhko
One of the possible improvements for OAI-PMH harvesting is to get rid of downloading instance ids following by retrieving instances by them. More promising approach is to use a direct view to retrieve instances from Inventory/SRS. In this case, views should return records ordered by instance id to allow multiple requests in a sequence for full harvest starting from the id following by the last returned one in every next request.
The next two parameters can be added to the function to track last instance id and number of records to be returned:
CREATE OR REPLACE FUNCTION get_updated_instance_ids_view( startdate timestamp with time zone, ... lastinstanceid uuid DEFAULT NULL::uuid, numrecords INTEGER DEFAULT 50)
Also a returned table may contain a jsonb of instance in the first column instead of id:
RETURNS TABLE( instance jsonb, ...
In the last two SELECT queries separated by UNION ALL it is possible to introduce an additional condition to check whether lastinstanceid ($7) parameter is not null and if so, select only specific number of instances ($8) with id after $7. In this case, all instances should be ordered by id:
SELECT jsonb AS instance, ... FROM instanceIdsInRange, instance WHERE instanceIdsInRange.maxDate BETWEEN dateOrMin($1) AND dateOrMax($2) ... AND ($7 IS NULL OR instance.id > $7) GROUP BY 1, 2, 4, instance.id UNION ALL SELECT (jsonb #>> '{record}')::jsonb AS instance, ... FROM audit_instance WHERE $3 ... AND ($7 IS NULL OR (jsonb #>> '{record,id}')::uuid > $7) ORDER BY instance.id, (jsonb #>> '{record,id}')::uuid LIMIT $8
Since the instances are retrieved from two different tables (second table contains deleted ones), LIMIT value is used only once after two selections.
The next function can be helpful for full harvest since it does not require dates:
CREATE OR REPLACE FUNCTION get_instance_view( deletedrecordssupport boolean DEFAULT TRUE, skipsuppressedfromdiscoveryrecords boolean DEFAULT TRUE, source varchar DEFAULT NULL, lastinstanceid UUID DEFAULT NULL::UUID, numrecords INTEGER DEFAULT 50) RETURNS TABLE( instance jsonb, source CHARACTER varying, "updatedDate" TIMESTAMP WITH TIME ZONE, "suppressFromDiscovery" boolean, deleted boolean) LANGUAGE 'sql' COST 100 VOLATILE PARALLEL UNSAFE ROWS 1000 AS $BODY$ SELECT instance.jsonb AS instance, instance.jsonb ->> 'source' AS source, strToTimestamp(instance.jsonb -> 'metadata' ->> 'createdDate') AS maxDate, (instance.jsonb ->> 'discoverySuppress')::bool AS suppressFromDiscovery, false AS deleted FROM instance WHERE NOT ($2 AND COALESCE((instance.jsonb ->> 'discoverySuppress')::bool, false)) AND ($4 IS NULL OR instance.id > $4) AND ($3 IS NULL OR (instance.jsonb ->> 'source') = $3) GROUP BY instance.id UNION ALL SELECT (jsonb #>> '{record}')::jsonb AS instance, jsonb #>> '{record,source}' AS source, strToTimestamp(jsonb ->> 'createdDate') AS maxDate, false AS suppressFromDiscovery, true AS deleted FROM audit_instance WHERE $1 AND ($4 IS NULL OR (jsonb #>> '{record,id}')::uuid > $4) AND ($3 IS NULL OR (jsonb #>> '{record,source}') = $3) ORDER BY instance.id, (jsonb #>> '{record,id}')::uuid LIMIT $8 $BODY$;
The function above returns only instances in json format even if the source is MARC. In other words, there is no MARC in the response. Probably, more appropriate solution here is to remove source as parameter and return only records with source FOLIO:
CREATE OR REPLACE FUNCTION get_instances_with_folio_view( deletedrecordssupport boolean DEFAULT TRUE, skipsuppressedfromdiscoveryrecords boolean DEFAULT TRUE, lastinstanceid uuid DEFAULT NULL::UUID, numrecords integer DEFAULT 50) RETURNS TABLE( instance jsonb, source CHARACTER varying, "updatedDate" TIMESTAMP WITH TIME ZONE, "suppressFromDiscovery" boolean, deleted boolean) LANGUAGE 'sql' COST 100 VOLATILE PARALLEL UNSAFE ROWS 1000 AS $BODY$ SELECT instance.jsonb AS instance, 'FOLIO' AS source, strToTimestamp(instance.jsonb -> 'metadata' ->> 'createdDate') AS maxDate, (instance.jsonb ->> 'discoverySuppress')::bool AS suppressFromDiscovery, false AS deleted FROM instance WHERE NOT ($2 AND COALESCE((instance.jsonb ->> 'discoverySuppress')::bool, false)) AND ($3 IS NULL OR instance.id > $3) AND ((instance.jsonb ->> 'source') = 'FOLIO') UNION ALL SELECT (jsonb #>> '{record}')::jsonb AS instance, 'FOLIO' AS source, strToTimestamp(jsonb ->> 'createdDate') AS maxDate, false AS suppressFromDiscovery, true AS deleted FROM audit_instance WHERE $1 AND ($3 IS NULL OR (jsonb #>> '{record,id}')::uuid > $3) AND ((jsonb #>> '{record,source}') = 'FOLIO') ORDER BY instance.id, (jsonb #>> '{record,id}')::uuid LIMIT $4 $BODY$;
To harvest only MARC records, the following function can be applied:
CREATE OR REPLACE FUNCTION get_instances_with_marc_view( deletedrecordssupport boolean DEFAULT TRUE, skipsuppressedfromdiscoveryrecords boolean DEFAULT TRUE, lastinstanceid uuid DEFAULT NULL::UUID, numrecords INTEGER DEFAULT 50) RETURNS TABLE( instanceId uuid, marc jsonb, source CHARACTER varying, "updatedDate" TIMESTAMP WITH TIME ZONE, "suppressFromDiscovery" boolean, deleted boolean) LANGUAGE 'sql' COST 100 VOLATILE PARALLEL UNSAFE ROWS 1000 AS $BODY$ SELECT lb.external_id AS instanceId, marc.content AS marc, 'MARC' AS source, lb.created_date AS maxDate, lb.suppress_discovery::bool AS suppressFromDiscovery, false AS deleted FROM marc_records_lb marc JOIN records_lb lb ON lb.id = marc.id WHERE NOT ($2 AND COALESCE(lb.suppress_discovery::bool, false)) AND ($3 IS NULL OR lb.external_id > $3) UNION ALL SELECT (jsonb #>> '{record,id}')::uuid AS instanceId, (jsonb #>> '{record}')::jsonb AS instance, 'MARC' AS source, strToTimestamp(jsonb ->> 'createdDate') AS maxDate, false AS suppressFromDiscovery, true AS deleted FROM audit_instance WHERE $1 AND ($3 IS NULL OR (jsonb #>> '{record,id}')::uuid > $3) AND (jsonb #>> '{record,source}') = 'MARC' ORDER BY instance.id, (jsonb #>> '{record,id}')::uuid LIMIT $4 $BODY$;
However, in case if there are deleted records from audit_instance, marc will contain only json for instance, not marc json record and it should be generated on the fly.
The following function combines MARC and FOLIO sources and returns all records:
CREATE OR REPLACE FUNCTION get_instances_with_folio_and_marc_view( deletedrecordssupport boolean DEFAULT TRUE, skipsuppressedfromdiscoveryrecords boolean DEFAULT TRUE, lastinstanceid uuid DEFAULT NULL::uuid, numrecords integer DEFAULT 50) RETURNS TABLE( instance jsonb, source CHARACTER varying, "updatedDate" TIMESTAMP WITH TIME ZONE, "suppressFromDiscovery" boolean, deleted boolean) LANGUAGE 'sql' COST 100 VOLATILE PARALLEL UNSAFE ROWS 1000 AS $BODY$ SELECT instance.jsonb AS instance, 'FOLIO' AS source, strToTimestamp(instance.jsonb -> 'metadata' ->> 'createdDate') AS maxDate, (instance.jsonb ->> 'discoverySuppress')::bool AS suppressFromDiscovery, false AS deleted FROM instance WHERE NOT ($2 AND COALESCE((instance.jsonb ->> 'discoverySuppress')::bool, false)) AND ($3 IS NULL OR instance.id > $3) AND ((instance.jsonb ->> 'source') = 'FOLIO') UNION ALL SELECT marc.content AS marc, 'MARC' AS source, lb.created_date AS maxDate, lb.suppress_discovery::bool AS suppressFromDiscovery, false AS deleted FROM marc_records_lb marc JOIN records_lb lb ON lb.id = marc.id WHERE NOT ($2 AND COALESCE(lb.suppress_discovery::bool, false)) AND ($3 IS NULL OR lb.external_id > $3) ORDER BY lb.external_id UNION ALL SELECT (jsonb #>> '{record}')::jsonb AS instance, jsonb #>> '{record,source}' AS source, strToTimestamp(jsonb ->> 'createdDate') AS maxDate, false AS suppressFromDiscovery, true AS deleted FROM audit_instance WHERE $1 AND ($3 IS NULL OR (jsonb #>> '{record,id}')::uuid > $3) AND ((jsonb #>> '{record,source}') = 'FOLIO') ORDER BY instance.id, lb.external_id, (jsonb #>> '{record,id}')::uuid LIMIT $4 $BODY$;
All the approaches above require ordering all returned records by id for multiple requests in a row in the full harvest, but combining the tables by UNION ALL prevents us from utilizing this ordering effectively.
More efficient solution is to unify a view to return MARC data (in case of SRS or SRS + Inventory records source) instead of jsonb of instance. This approach allows us not to request SRS for MARC records.
To achieve that, it needs to join also SRS tables with MARC content:
Based on the ERD above, we can omit UNION ALL and use only LEFT JOIN to select all instances from INSTANCE table and join deleted instances from audit table using FULL JOIN ON false:
CREATE OR REPLACE FUNCTION get_instances_view( startdate timestamp with time zone, enddate timestamp with time zone, deletedrecordsupport boolean DEFAULT true, skipsuppressedfromdiscoveryrecords boolean DEFAULT true, onlyinstanceupdatedate boolean DEFAULT true, "source" character varying DEFAULT NULL::character varying, nextinstanceid uuid DEFAULT NULL, -- specifies id of instance following by the result -- is returned (last instance id from the previous -- result, or null if the first request) "limit" integer DEFAULT 100) RETURNS TABLE("instanceId" uuid, "instanceRecord" jsonb, -- can be either jsonb of instance, or marc content depending on the source "source" character varying, "updatedDate" timestamp with time zone, "suppressFromDiscovery" boolean, "deleted" boolean) LANGUAGE 'sql' VOLATILE PARALLEL UNSAFE ROWS 1000 AS $$ SELECT DISTINCT CASE WHEN instance_record.id IS NOT NULL -- if instance deleted, then from audit, otherwise from instances THEN instance_record.id ELSE (audit_instance_record.jsonb #>> '{record,id}')::uuid END, CASE WHEN marc_record.content IS NOT NULL -- return jsonb of marc content if present THEN marc_record.content ELSE ( CASE WHEN instance_record.jsonb IS NOT NULL -- return jsonb of instance if present, otherwise from audit THEN instance_record.jsonb ELSE (audit_instance_record.jsonb -> 'record')::jsonb END) END, CASE WHEN instance_record.jsonb IS NOT NULL -- return source from instances if not deleted THEN instance_record.jsonb ->> 'source' ELSE audit_instance_record.jsonb #>> '{record,source}' END, -- from audit if deleted CASE WHEN instance_record.jsonb IS NOT NULL -- if instance deleted, use audit table to find created date -- (created date in audit table means last updated date for instance -- that was deleted) THEN strToTimestamp(instance_record.jsonb -> 'metadata' ->> 'updatedDate') ELSE strToTimestamp(audit_instance_record.jsonb ->> 'createdDate') END, CASE WHEN marc_record.content IS NOT NULL -- first, check SRS for discovery suppress THEN record_lb.suppress_discovery -- if not in SRS, try to find in inventory and if not either, that means -- instance is not suppressed from discovery and false ELSE COALESCE((instance_record.jsonb ->> 'discoverySuppress')::bool, false) END, audit_instance_record.id IS NOT NULL -- true if current id is present in the audit table (means it was deleted) FROM instance instance_record LEFT JOIN holdings_record holdings_record ON instance_record.id = holdings_record.instanceid LEFT JOIN item item_record ON holdings_record.id = item_record.holdingsrecordid -- add marc records LEFT JOIN records_lb record_lb ON record_lb.external_id = instance_record.id LEFT JOIN marc_records_lb marc_record ON marc_record.id = record_lb.id -- add audit tables FULL JOIN audit_instance audit_instance_record ON false -- just add all records from audit_instance table cause instance can be either in -- audit table, or instance table, but not in both of them LEFT JOIN audit_holdings_record audit_holdings_record -- case when instance not deleted, but holding deleted ON (audit_holdings_record.jsonb #>> '{record,instanceId}')::uuid = instance_record.id LEFT JOIN audit_holdings_record audit_holdings_record_deleted -- case when instance deleted and holding deleted ON (audit_holdings_record_deleted.jsonb #>> '{record,instanceId}')::uuid = audit_instance_record.id LEFT JOIN audit_item audit_item_record -- case when holding deleted and item deleted ON (audit_item_record.jsonb #>> '{record,holdingsRecordId}')::uuid = audit_holdings_record.id LEFT JOIN audit_item audit_item_record_deleted -- case when only item deleted (holding is still in inventory) ON (audit_item_record_deleted.jsonb #>> '{record,holdingsRecordId}')::uuid = holdings_record.id WHERE (strToTimestamp(instance_record.jsonb -> 'metadata' ->> 'updatedDate') -- apply date range to instances in inventory BETWEEN dateOrMin($1) AND dateOrMax($2) OR ( -- case when there are deleted instances and needs to check deletedrecordsupport and date range $3 AND strToTimestamp(audit_instance_record.jsonb ->> 'createdDate') BETWEEN dateOrMin($1) AND dateOrMax($2) -- add related items and holdings OR NOT $5 -- check only instance updated date and if false, proceed with items and holdings AND (strToTimestamp(holdings_record.jsonb -> 'metadata' ->> 'updatedDate') BETWEEN dateOrMin($1) AND dateOrMax($2) OR strToTimestamp(item_record.jsonb -> 'metadata' ->> 'updatedDate') BETWEEN dateOrMin($1) AND dateOrMax($2) -- add related deleted records OR strToTimestamp(audit_holdings_record.jsonb ->> 'createdDate') BETWEEN dateOrMin($1) AND dateOrMax($2) OR strToTimestamp(audit_holdings_record_deleted.jsonb ->> 'createdDate') BETWEEN dateOrMin($1) AND dateOrMax($2) OR strToTimestamp(audit_item_record.jsonb ->> 'createdDate') BETWEEN dateOrMin($1) AND dateOrMax($2) OR strToTimestamp(audit_item_record_deleted.jsonb ->> 'createdDate') BETWEEN dateOrMin($1) AND dateOrMax($2)) )) AND ($7 IS NULL OR (CASE WHEN instance_record.id IS NOT NULL -- mandatory condition to check next instance id (can be null) THEN instance_record.id ELSE (audit_instance_record.jsonb #>> '{record,id}')::uuid END) > $7) AND ($6 IS NULL OR (CASE WHEN instance_record.jsonb IS NOT NULL -- mandatory condition to check source (can be null) THEN instance_record.jsonb ->> 'source' ELSE audit_instance_record.jsonb #>> '{record,source}' END) = $6) AND NOT ($4 AND CASE WHEN marc_record.content IS NOT NULL -- mandatory condition to check discovery suppress THEN record_lb.suppress_discovery ELSE COALESCE((instance_record.jsonb ->> 'discoverySuppress')::bool, false) END) -- can be null in inventory ORDER BY (CASE WHEN instance_record.id IS NOT NULL THEN instance_record.id ELSE (audit_instance_record.jsonb #>> '{record,id}')::uuid END) LIMIT $8 $$
As per the view above, if instance with source MARC is deleted via API from inventory, MARC content in the marc_records_lb table is still present, so it is possible to retrieve it using record id from audit_instance, but this case is not supported yet and can be considered as extension of current view.
Query plan for full harvest without date range and limit 500 with 500k instances:
"QUERY PLAN" "Limit (cost=24187557.85..24187566.60 rows=500 width=90) (actual time=77055.632..77057.034 rows=500 loops=1)" " -> Unique (cost=24187557.85..24270317.46 rows=4729121 width=90) (actual time=77055.630..77056.939 rows=500 loops=1)" " -> Sort (cost=24187557.85..24199380.65 rows=4729121 width=90) (actual time=77055.628..77056.324 rows=500 loops=1)" " Sort Key: (CASE WHEN (instance_record.id IS NOT NULL) THEN instance_record.id ELSE ((audit_instance_record.jsonb #>> '{record,id}'::text[]))::uuid END), (CASE WHEN (marc_record.content IS NOT NULL) THEN marc_record.content ELSE CASE WHEN (instance_record.jsonb IS NOT NULL) THEN instance_record.jsonb ELSE (audit_instance_record.jsonb -> 'record'::text) END END), (CASE WHEN (instance_record.jsonb IS NOT NULL) THEN (instance_record.jsonb ->> 'source'::text) ELSE (audit_instance_record.jsonb #>> '{record,source}'::text[]) END), (CASE WHEN (instance_record.jsonb IS NOT NULL) THEN diku_mod_inventory_storage.strtotimestamp(((instance_record.jsonb -> 'metadata'::text) ->> 'updatedDate'::text)) ELSE diku_mod_inventory_storage.strtotimestamp((audit_instance_record.jsonb ->> 'createdDate'::text)) END), (CASE WHEN (marc_record.content IS NOT NULL) THEN record_lb.suppress_discovery ELSE COALESCE(((instance_record.jsonb ->> 'discoverySuppress'::text))::boolean, false) END), ((audit_instance_record.id IS NOT NULL))" " Sort Method: external merge Disk: 367488kB" " -> Hash Left Join (cost=118536.32..23095205.52 rows=4729121 width=90) (actual time=2584.668..73885.868 rows=282512 loops=1)" " Hash Cond: (audit_instance_record.id = ((audit_holdings_record_deleted.jsonb #>> '{record,instanceId}'::text[]))::uuid)" " Filter: (((diku_mod_inventory_storage.strtotimestamp(((instance_record.jsonb -> 'metadata'::text) ->> 'updatedDate'::text)) >= '1970-01-01 00:00:00+00'::timestamp with time zone) AND (diku_mod_inventory_storage.strtotimestamp(((instance_record.jsonb -> 'metadata'::text) ->> 'updatedDate'::text)) <= '2050-01-01 00:00:00+00'::timestamp with time zone)) OR ((diku_mod_inventory_storage.strtotimestamp((audit_instance_record.jsonb ->> 'createdDate'::text)) >= '1970-01-01 00:00:00+00'::timestamp with time zone) AND (diku_mod_inventory_storage.strtotimestamp((audit_instance_record.jsonb ->> 'createdDate'::text)) <= '2050-01-01 00:00:00+00'::timestamp with time zone)) OR ((diku_mod_inventory_storage.strtotimestamp(((holdings_record.jsonb -> 'metadata'::text) ->> 'updatedDate'::text)) >= '1970-01-01 00:00:00+00'::timestamp with time zone) AND (diku_mod_inventory_storage.strtotimestamp(((holdings_record.jsonb -> 'metadata'::text) ->> 'updatedDate'::text)) <= '2050-01-01 00:00:00+00'::timestamp with time zone)) OR ((diku_mod_inventory_storage.strtotimestamp(((item_record.jsonb -> 'metadata'::text) ->> 'updatedDate'::text)) >= '1970-01-01 00:00:00+00'::timestamp with time zone) AND (diku_mod_inventory_storage.strtotimestamp(((item_record.jsonb -> 'metadata'::text) ->> 'updatedDate'::text)) <= '2050-01-01 00:00:00+00'::timestamp with time zone)) OR ((diku_mod_inventory_storage.strtotimestamp((audit_holdings_record.jsonb ->> 'createdDate'::text)) >= '1970-01-01 00:00:00+00'::timestamp with time zone) AND (diku_mod_inventory_storage.strtotimestamp((audit_holdings_record.jsonb ->> 'createdDate'::text)) <= '2050-01-01 00:00:00+00'::timestamp with time zone)) OR ((diku_mod_inventory_storage.strtotimestamp((audit_holdings_record_deleted.jsonb ->> 'createdDate'::text)) >= '1970-01-01 00:00:00+00'::timestamp with time zone) AND (diku_mod_inventory_storage.strtotimestamp((audit_holdings_record_deleted.jsonb ->> 'createdDate'::text)) <= '2050-01-01 00:00:00+00'::timestamp with time zone)) OR ((diku_mod_inventory_storage.strtotimestamp((audit_item_record.jsonb ->> 'createdDate'::text)) >= '1970-01-01 00:00:00+00'::timestamp with time zone) AND (diku_mod_inventory_storage.strtotimestamp((audit_item_record.jsonb ->> 'createdDate'::text)) <= '2050-01-01 00:00:00+00'::timestamp with time zone)) OR ((diku_mod_inventory_storage.strtotimestamp((audit_item_record_deleted.jsonb ->> 'createdDate'::text)) >= '1970-01-01 00:00:00+00'::timestamp with time zone) AND (diku_mod_inventory_storage.strtotimestamp((audit_item_record_deleted.jsonb ->> 'createdDate'::text)) <= '2050-01-01 00:00:00+00'::timestamp with time zone)))" " -> Hash Left Join (cost=118502.25..700569.46 rows=883948 width=3783) (actual time=2583.902..6956.745 rows=282512 loops=1)" " Hash Cond: (holdings_record.id = ((audit_item_record_deleted.jsonb #>> '{record,holdingsRecordId}'::text[]))::uuid)" " -> Hash Left Join (cost=118468.17..528289.45 rows=304012 width=3767) (actual time=2583.873..6758.927 rows=282512 loops=1)" " Hash Cond: (audit_holdings_record.id = ((audit_item_record.jsonb #>> '{record,holdingsRecordId}'::text[]))::uuid)" " -> Hash Left Join (cost=118434.10..361808.81 rows=304012 width=3751) (actual time=2583.835..6552.516 rows=282512 loops=1)" " Hash Cond: (instance_record.id = ((audit_holdings_record.jsonb #>> '{record,instanceId}'::text[]))::uuid)" " -> Merge Full Join (cost=118400.02..198357.43 rows=304012 width=3703) (actual time=2583.817..6297.398 rows=282512 loops=1)" " Join Filter: false" " -> Hash Left Join (cost=118400.02..194556.27 rows=304012 width=3655) (actual time=2583.797..6086.891 rows=282511 loops=1)" " Hash Cond: (instance_record.id = holdings_record.instanceid)" " -> Merge Right Join (cost=118261.36..192539.27 rows=299937 width=1511) (actual time=2581.813..5684.808 rows=282471 loops=1)" " Merge Cond: (marc_record.id = record_lb.id)" " -> Index Scan using marc_records_lb_pkey on marc_records_lb marc_record (cost=0.42..68665.84 rows=284128 width=1341) (actual time=0.020..1309.478 rows=283700 loops=1)" " -> Materialize (cost=117914.21..119413.90 rows=299937 width=202) (actual time=2581.783..3406.530 rows=282471 loops=1)" " -> Sort (cost=117914.21..118664.06 rows=299937 width=202) (actual time=2581.779..3111.661 rows=282471 loops=1)" " Sort Key: record_lb.id" " Sort Method: external merge Disk: 59912kB" " -> Merge Left Join (cost=0.84..38384.25 rows=299937 width=202) (actual time=0.030..1729.037 rows=282471 loops=1)" " Merge Cond: (instance_record.id = record_lb.external_id)" " -> Index Scan using instance_pkey on instance instance_record (cost=0.42..20872.97 rows=299937 width=185) (actual time=0.013..624.993 rows=282445 loops=1)" " -> Index Scan using idx_records_external_id on records_lb record_lb (cost=0.42..14269.29 rows=283705 width=33) (actual time=0.013..594.909 rows=274418 loops=1)" " -> Hash (cost=134.00..134.00 rows=373 width=2160) (actual time=1.965..1.968 rows=378 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 805kB" " -> Hash Right Join (cost=68.28..134.00 rows=373 width=2160) (actual time=0.732..1.460 rows=378 loops=1)" " Hash Cond: (item_record.holdingsrecordid = holdings_record.id)" " -> Seq Scan on item item_record (cost=0.00..64.73 rows=373 width=1115) (actual time=0.006..0.276 rows=364 loops=1)" " -> Hash (cost=63.68..63.68 rows=368 width=1061) (actual time=0.714..0.715 rows=369 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 402kB" " -> Seq Scan on holdings_record (cost=0.00..63.68 rows=368 width=1061) (actual time=0.014..0.343 rows=369 loops=1)" " -> Materialize (cost=0.00..1.01 rows=1 width=48) (actual time=0.015..0.018 rows=1 loops=1)" " -> Seq Scan on audit_instance audit_instance_record (cost=0.00..1.01 rows=1 width=48) (actual time=0.010..0.012 rows=1 loops=1)" " -> Hash (cost=20.70..20.70 rows=1070 width=48) (actual time=0.003..0.004 rows=0 loops=1)" " Buckets: 2048 Batches: 1 Memory Usage: 16kB" " -> Seq Scan on audit_holdings_record (cost=0.00..20.70 rows=1070 width=48) (actual time=0.003..0.003 rows=0 loops=1)" " -> Hash (cost=20.70..20.70 rows=1070 width=32) (actual time=0.030..0.031 rows=3 loops=1)" " Buckets: 2048 Batches: 1 Memory Usage: 20kB" " -> Seq Scan on audit_item audit_item_record (cost=0.00..20.70 rows=1070 width=32) (actual time=0.007..0.009 rows=3 loops=1)" " -> Hash (cost=20.70..20.70 rows=1070 width=32) (actual time=0.020..0.021 rows=3 loops=1)" " Buckets: 2048 Batches: 1 Memory Usage: 20kB" " -> Seq Scan on audit_item audit_item_record_deleted (cost=0.00..20.70 rows=1070 width=32) (actual time=0.005..0.007 rows=3 loops=1)" " -> Hash (cost=20.70..20.70 rows=1070 width=32) (actual time=0.003..0.003 rows=0 loops=1)" " Buckets: 2048 Batches: 1 Memory Usage: 16kB" " -> Seq Scan on audit_holdings_record audit_holdings_record_deleted (cost=0.00..20.70 rows=1070 width=32) (actual time=0.002..0.002 rows=0 loops=1)" "Planning Time: 6.777 ms" "Execution Time: 77154.775 ms"
However, such approach takes too much time for 10 millions records and needs to be revised.
The investigation will be continued in the follow up spike.