Performance of getting source records and counting the total number of records
During working on the bug MODSOURCE-85, the following measurements of queries executed in the database with 7.6 million records were made:
Query to source_records_view with a limit 10 and nested query for calculation total records amount.
EXPLAIN ANALYZE SELECT *, (SELECT COUNT(_id) FROM diku_mod_source_record_storage.records WHERE (lower(f_unaccent(diku_mod_source_record_storage.records.jsonb->>'deleted')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))')))) AS totalRows FROM diku_mod_source_record_storage.source_records_view WHERE lower(f_unaccent(diku_mod_source_record_storage.source_records_view.jsonb->>'deleted')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))')) LIMIT 10
Query plan"Limit (cost=2590083.17..2617567.28 rows=10 width=56) (actual time=224095.763..224106.902 rows=10 loops=1)" " InitPlan 1 (returns $0)" " -> Aggregate (cost=2590071.75..2590071.75 rows=1 width=8) (actual time=224093.579..224093.579 rows=1 loops=1)" " -> Seq Scan on records records_1 (cost=0.00..2590067.74 rows=1602 width=16) (actual time=0.315..220471.600 rows=7641917 loops=1)" " Filter: (lower(f_unaccent((jsonb ->> 'deleted'::text))) ~ '(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'::text)" " -> Merge Left Join (cost=11.41..105016785.14 rows=38210 width=56) (actual time=224095.762..224106.892 rows=10 loops=1)" " Merge Cond: ((records.jsonb ->> 'parsedRecordId'::text) = (marc_records.jsonb ->> 'id'::text))" " Filter: (lower(f_unaccent((json_build_object('recordId', (records.jsonb ->> 'id'::text), 'snapshotId', (records.jsonb ->> 'snapshotId'::text), 'recordType', (records.jsonb ->> 'recordType'::text), 'deleted', (records.jsonb ->> 'deleted'::text), 'order', (records.jsonb ->> 'order'::text), 'additionalInfo', (records.jsonb -> 'additionalInfo'::text), 'metadata', (records.jsonb -> 'metadata'::text), 'rawRecord', raw_records.jsonb, 'parsedRecord', COALESCE(marc_records.jsonb)) ->> 'deleted'::text))) ~ '(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'::text)" " -> Nested Loop (cost=1.12..77965435.14 rows=7641928 width=1808) (actual time=0.049..0.321 rows=10 loops=1)" " -> Index Scan using records_parsed_record_id_idx_btree on records (cost=0.56..24763758.44 rows=7641928 width=550) (actual time=0.016..0.038 rows=10 loops=1)" " Index Cond: ((jsonb ->> 'parsedRecordId'::text) IS NOT NULL)" " -> Index Scan using raw_records_id_idx_btree on raw_records (cost=0.56..6.95 rows=1 width=1258) (actual time=0.023..0.024 rows=1 loops=10)" " Index Cond: ((jsonb ->> 'id'::text) = (records.jsonb ->> 'rawRecordId'::text))" " -> Index Scan using marc_records_id_idx_btree on marc_records (cost=0.56..24740116.58 rows=7638396 width=587) (actual time=0.009..0.046 rows=10 loops=1)" "Planning time: 2.454 ms" "Execution time: 224107.039 ms"
The query for calculation of the total number of records with filter conditions.
EXPLAIN ANALYZE SELECT COUNT(_id) FROM diku_mod_source_record_storage.records WHERE (lower(f_unaccent(diku_mod_source_record_storage.records.jsonb->>'deleted')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))')))
Query plan"Aggregate (cost=2590071.75..2590071.75 rows=1 width=8) (actual time=210258.396..210258.396 rows=1 loops=1)" " -> Seq Scan on records (cost=0.00..2590067.74 rows=1602 width=16) (actual time=0.097..206949.371 rows=7641917 loops=1)" " Filter: (lower(f_unaccent((jsonb ->> 'deleted'::text))) ~ '(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'::text)" "Planning time: 0.224 ms" "Execution time: 210258.430 ms"
The query for calculation of the total number of records without filter conditions.
EXPLAIN ANALYZE SELECT COUNT(_id) FROM diku_mod_source_record_storage.records
Query plan"Finalize Aggregate (cost=586653.92..586653.93 rows=1 width=8) (actual time=17830.882..17830.883 rows=1 loops=1)" " -> Gather (cost=586653.71..586653.92 rows=2 width=8) (actual time=17830.601..17839.107 rows=3 loops=1)" " Workers Planned: 2" " Workers Launched: 2" " -> Partial Aggregate (cost=585653.71..585653.72 rows=1 width=8) (actual time=17817.788..17817.788 rows=1 loops=3)" " -> Parallel Seq Scan on records (cost=0.00..577693.37 rows=3184137 width=16) (actual time=0.464..17129.066 rows=2547306 loops=3)" "Planning time: 0.118 ms" "Execution time: 17839.161 ms"
Query to source_records_view with a limit 10.
EXPLAIN ANALYZE SELECT * FROM diku_mod_source_record_storage.source_records_view WHERE lower(f_unaccent(diku_mod_source_record_storage.source_records_view.jsonb->>'deleted')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))')) LIMIT 10
Query plan"Limit (cost=11.41..27495.52 rows=10 width=48) (actual time=2.389..15.713 rows=10 loops=1)" " -> Merge Left Join (cost=11.41..105016785.14 rows=38210 width=48) (actual time=2.388..15.703 rows=10 loops=1)" " Merge Cond: ((records.jsonb ->> 'parsedRecordId'::text) = (marc_records.jsonb ->> 'id'::text))" " Filter: (lower(f_unaccent((json_build_object('recordId', (records.jsonb ->> 'id'::text), 'snapshotId', (records.jsonb ->> 'snapshotId'::text), 'recordType', (records.jsonb ->> 'recordType'::text), 'deleted', (records.jsonb ->> 'deleted'::text), 'order', (records.jsonb ->> 'order'::text), 'additionalInfo', (records.jsonb -> 'additionalInfo'::text), 'metadata', (records.jsonb -> 'metadata'::text), 'rawRecord', raw_records.jsonb, 'parsedRecord', COALESCE(marc_records.jsonb)) ->> 'deleted'::text))) ~ '(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'::text)" " -> Nested Loop (cost=1.12..77965435.14 rows=7641928 width=1808) (actual time=0.099..0.932 rows=10 loops=1)" " -> Index Scan using records_parsed_record_id_idx_btree on records (cost=0.56..24763758.44 rows=7641928 width=550) (actual time=0.012..0.107 rows=10 loops=1)" " Index Cond: ((jsonb ->> 'parsedRecordId'::text) IS NOT NULL)" " -> Index Scan using raw_records_id_idx_btree on raw_records (cost=0.56..6.95 rows=1 width=1258) (actual time=0.076..0.077 rows=1 loops=10)" " Index Cond: ((jsonb ->> 'id'::text) = (records.jsonb ->> 'rawRecordId'::text))" " -> Index Scan using marc_records_id_idx_btree on marc_records (cost=0.56..24740116.58 rows=7638396 width=587) (actual time=0.009..0.155 rows=10 loops=1)" "Planning time: 3.728 ms" "Execution time: 15.810 ms"
Summary results of queries execution:
Query number | Execution time (ms) |
---|---|
1 | 224107 |
2 | 210258 |
3 | 17839 |
4 | 15 |