[FOLIO-1920] SPIKE: optimize query instance by location Created: 28/Mar/19  Updated: 03/Jun/20  Resolved: 04/Apr/19

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P3
Reporter: Hongwei Ji Assignee: Nassib Nassar
Resolution: Done Votes: 0
Labels: platform-backlog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: File postgresql.conf    
Issue links:
Blocks
blocks MODINVSTOR-256 Improve performance of /instance-stor... Closed
Relates
relates to CQLPG-95 SPIKE: design foreign key search support Closed
relates to MODINVSTOR-185 Analyze filtering by location Closed
relates to MODINVSTOR-273 SPIKE: analyze instance storage API l... Closed
Sprint: Core: Platform - Sprint 60
Story Points: 5
Development Team: Core: Platform

 Description   

When query instance by location, following query is generated but the performance is very poor. Query explain analyze shows about 50 seconds. Find a way to improve existing query, or come up a different and better query, or propose other solution like re-organize how instance/holdings data are stored. BTW, the query is generated when users click the Inventory app and select a location filter. Note, I adjusted the ~ to LIKE for the location condition, otherwise, it takes about 8 minutes.

Note: No implementation in the scope of this ticket.

WITH headrecords AS 
(
  SELECT
    jsonb,
    lower(f_unaccent(jsonb ->> 'title')) AS title 
  FROM
    supertenant_mod_inventory_storage.instance_holding_view 
  WHERE
    (
(((lower(f_unaccent(instance_holding_view.jsonb ->> 'title')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]])).*($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'))) 
      OR 
      (
        lower(f_unaccent(instance_holding_view.jsonb ->> 'contributors')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))"name":([[:punct:]]|[[:space:]])+".*"($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))')) 
      )
) 
      OR 
      (
        lower(f_unaccent(instance_holding_view.jsonb ->> 'identifiers')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))"value":([[:punct:]]|[[:space:]])+".*"($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))')) 
      )
) 
      AND 
      (
        lower(f_unaccent(instance_holding_view.ho_jsonb ->> 'permanentLocationId')) LIKE lower(f_unaccent('53cf956f-c1df-410b-8bea-27f712cca7c0')) 
      )
    )
    AND lower(f_unaccent(jsonb ->> 'title')) < ( 
    SELECT
      lower(f_unaccent(jsonb ->> 'title')) 
    FROM
      supertenant_mod_inventory_storage.instance_holding_view 
    ORDER BY
      lower(f_unaccent(jsonb ->> 'title')) OFFSET 10000 LIMIT 1 ) 
    ORDER BY
      lower(f_unaccent(jsonb ->> 'title')) LIMIT 30 OFFSET 0 
)
,
allrecords AS 
(
  SELECT
    jsonb,
    lower(f_unaccent(jsonb ->> 'title')) AS title 
  FROM
    supertenant_mod_inventory_storage.instance_holding_view 
  WHERE
    (
(((lower(f_unaccent(instance_holding_view.jsonb ->> 'title')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]])).*($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'))) 
      OR 
      (
        lower(f_unaccent(instance_holding_view.jsonb ->> 'contributors')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))"name":([[:punct:]]|[[:space:]])+".*"($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))')) 
      )
) 
      OR 
      (
        lower(f_unaccent(instance_holding_view.jsonb ->> 'identifiers')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))"value":([[:punct:]]|[[:space:]])+".*"($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))')) 
      )
) 
      AND 
      (
        lower(f_unaccent(instance_holding_view.ho_jsonb ->> 'permanentLocationId')) LIKE lower(f_unaccent('53cf956f-c1df-410b-8bea-27f712cca7c0')) 
      )
    )
    AND 
    (
      SELECT
        COUNT(*) 
      FROM
        headrecords 
    )
    < 30 
)
SELECT
  jsonb,
  title,
  0 AS count 
FROM
  headrecords 
WHERE
  (
    SELECT
      COUNT(*) 
    FROM
      headrecords 
  )
  >= 30 
UNION
( 
SELECT
  jsonb, title, 
  (
    SELECT
      COUNT(*) 
    FROM
      allrecords 
  )
  AS count 
FROM
  allrecords 
ORDER BY
  title LIMIT 30 OFFSET 0 ) 
ORDER BY
  title;


 Comments   
Comment by Nassib Nassar [ 29/Mar/19 ]

Running locally, the query was fast. HJ suggested changing the "." term to ".star" to help reproduce the problem, and this results in the query taking 2m23s running locally.

The time appears to be spent in CTE "headrecords" (1m11s) which performs a bitmap heap scan on a large number of records. Disabling the bitmap scan results in the query optimizer using an index scan on holdings_record_permanentlocationid_idx (holdings_record). This setting is used for the tests below.

There are 2763587 rows with jsonb->>'permanentLocationId' = '53cf956f-c1df-410b-8bea-27f712cca7c0' (95% of the rows in holdings_record). If the query is modified to use jsonb->>'permanentLocationId' = 'fcd64ce1-6995-48f0-840e-89ffa2288371' (of which there are only 36451 rows), running time drops to below 1 second. This suggests query optimization should focus on the case where the query value for permanentLocationId matches a large number of rows in holdings_record.

Comment by Nassib Nassar [ 01/Apr/19 ]

After further debugging, it seems that regular expression queries in the CTEs are running as sequential scans which take most of the time:

... ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]])).star*($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'))

This form of query is possibly being generated to detect word boundaries in order to match only whole words. It is unclear why the GIN indexes are not used by the query optimizer in this case.

Using tsvector/tsquery may be an alternative, if it meets the use case requirements:

to_tsvector(...) @@ 'star'::tsquery
Comment by Nassib Nassar [ 02/Apr/19 ]

First tests with tsvector/tsquery indicate that it is very fast, but more context about the use case is needed.

Comment by Hongwei Ji [ 02/Apr/19 ]

Nassib Nassar, can you paste/attach the full query that performs better? I thought the FK was the bottleneck.

Comment by Nassib Nassar [ 02/Apr/19 ]

This is a test using tsquery on "star":

/*
CREATE INDEX ON supertenant_mod_inventory_storage.instance USING GIN
    ( to_tsvector('simple'::regconfig, jsonb ->> 'contributors'::text) );
CREATE INDEX ON supertenant_mod_inventory_storage.instance USING GIN
    ( to_tsvector('simple'::regconfig, jsonb ->> 'identifiers'::text) );
ANALYZE supertenant_mod_inventory_storage.instance;
*/
WITH headrecords AS 
(
  SELECT
    jsonb,
    lower(f_unaccent(jsonb ->> 'title')) AS title 
  FROM
    supertenant_mod_inventory_storage.instance_holding_view 
  WHERE
    (
(((to_tsvector('simple'::regconfig, jsonb ->> 'title'::text) @@ 'star'::tsquery)
      OR 
      (
	to_tsvector('simple'::regconfig, jsonb ->> 'contributors'::text) @@ 'star'::tsquery
      )
) 
      OR 
      (
	to_tsvector('simple'::regconfig, jsonb ->> 'identifiers'::text) @@ 'star'::tsquery
      )
) 
      AND 
      (
        lower(f_unaccent(instance_holding_view.ho_jsonb ->> 'permanentLocationId')) LIKE lower(f_unaccent('53cf956f-c1df-410b-8bea-27f712cca7c0')) 
      )
    )
    AND lower(f_unaccent(jsonb ->> 'title')) < ( 
    SELECT
      lower(f_unaccent(jsonb ->> 'title')) 
    FROM
      supertenant_mod_inventory_storage.instance_holding_view 
    ORDER BY
      lower(f_unaccent(jsonb ->> 'title')) OFFSET 10000 LIMIT 1 ) 
    ORDER BY
      lower(f_unaccent(jsonb ->> 'title')) LIMIT 30 OFFSET 0 
)
,
allrecords AS 
(
  SELECT
    jsonb,
    lower(f_unaccent(jsonb ->> 'title')) AS title 
  FROM
    supertenant_mod_inventory_storage.instance_holding_view 
  WHERE
    (
(((to_tsvector('simple'::regconfig, jsonb ->> 'title'::text) @@ 'star'::tsquery)
      OR 
      (
	to_tsvector('simple'::regconfig, jsonb ->> 'contributors'::text) @@ 'star'::tsquery
      )
) 
      OR 
      (
	to_tsvector('simple'::regconfig, jsonb ->> 'identifiers'::text) @@ 'star'::tsquery
      )
) 
      AND 
      (
        lower(f_unaccent(instance_holding_view.ho_jsonb ->> 'permanentLocationId')) LIKE lower(f_unaccent('53cf956f-c1df-410b-8bea-27f712cca7c0')) 
      )
    )
    AND 
    (
      SELECT
        COUNT(*) 
      FROM
        headrecords 
    )
    < 30 
)
SELECT
  jsonb,
  title,
  0 AS count 
FROM
  headrecords 
WHERE
  (
    SELECT
      COUNT(*) 
    FROM
      headrecords 
  )
  >= 30 
UNION
( 
SELECT
  jsonb, title, 
  (
    SELECT
      COUNT(*) 
    FROM
      allrecords 
  )
  AS count 
FROM
  allrecords 
ORDER BY
  title LIMIT 30 OFFSET 0 ) 
ORDER BY
  title;
Comment by Jakub Skoczen [ 02/Apr/19 ]

Hongwei Ji Nassib Nassar What is the culprit here? From what I can see the clause that is responsible for filtering by location does not use any regexes ( LIKE lower(f_unaccent('53cf956f-c1df-410b-8bea-27f712cca7c0')) and should be fast. The problem seems to be with the title search that uses the regex match when it SHOULD use the fulltext matching (afaik the fulltext index is enabled for "title" in the mod-inventory-storage schema.json)

Nassib Nassar can you please verify when we are not seeing fulltext queries (tsvesctor/tsquery) generated for the title search when location filtering is enabled. It will required looking at the code in RMB and CQL2PG/ Is this something you can do?

Comment by Nassib Nassar [ 02/Apr/19 ]

1) Yes, "title", "contributors", and "identifiers" are all using regular expressions which are not using the indexes. Those are the regular expressions I was referring to. In the revised query using tsquery which I posted above, all three regular expressions have been replaced by tsquery/tsvector queries, bringing the running time from 2m23s to less than 1s.

2) Yes, but it would save a lot of time if someone that knows the RMB, CQL2PG and relevant module source codes can walk me through them and offer sufficient context, since I have not spent any time ramping up on them. Otherwise the database layer is much easier for me to contribute in.

Comment by Hongwei Ji [ 02/Apr/19 ]

Nassib Nassar, the same query takes much longer for me in perf env even after adding the new full text indexes.

folio=# \d supertenant_mod_inventory_storage.instance;
...
    "instance_to_tsvector_idx" gin (to_tsvector('simple'::regconfig, jsonb ->> 'contributors'::text))
    "instance_to_tsvector_idx1" gin (to_tsvector('simple'::regconfig, jsonb ->> 'identifiers'::text))
...
 Sort  (cost=3949680.02..3949680.08 rows=23 width=72) (actual time=28979.723..28979.724 rows=30 loops=1)
   Sort Key: "*SELECT* 1".title
   Sort Method: quicksort  Memory: 67kB
   CTE headrecords
     ->  Limit  (cost=3883614.87..3883614.89 rows=6 width=882) (actual time=14484.151..14484.152 rows=2 loops=1)
           InitPlan 1 (returns $1)
             ->  Limit  (cost=3816349.30..3816730.94 rows=1 width=32) (actual time=223.278..223.279 rows=1 loops=1)
                   ->  Nested Loop  (cost=1.37..1110040518.33 rows=2908646 width=32) (actual time=0.113..222.755 rows=10001 loops=1)
                         ->  Index Scan using instance_title_idx on instance  (cost=0.56..1429603.63 rows=2762804 width=850) (actual time=0.030..42.604 rows=
9693 loops=1)
                         ->  Index Scan using holdings_record_instanceid_idx on holdings_record  (cost=0.81..255.56 rows=14543 width=350) (actual time=0.012.
.0.012 rows=1 loops=9693)
                               Index Cond: (lower(f_unaccent((jsonb ->> 'instanceId'::text))) = lower(f_unaccent((instance.jsonb ->> 'id'::text))))
           ->  Sort  (cost=66883.93..66883.95 rows=6 width=882) (actual time=14484.150..14484.150 rows=2 loops=1)
                 Sort Key: (lower(f_unaccent((instance_1.jsonb ->> 'title'::text))))
                 Sort Method: quicksort  Memory: 28kB
                 ->  Hash Join  (cost=16236.66..66883.86 rows=6 width=882) (actual time=5169.755..14484.141 rows=2 loops=1)
                       Hash Cond: (lower(f_unaccent((holdings_record_1.jsonb ->> 'instanceId'::text))) = lower(f_unaccent((instance_1.jsonb ->> 'id'::text)))
)
                       ->  Bitmap Heap Scan on holdings_record holdings_record_1  (cost=533.26..47415.86 rows=14543 width=350) (actual time=217.233..7782.060
 rows=2763587 loops=1)
                             Filter: (lower(f_unaccent((jsonb ->> 'permanentLocationId'::text))) ~~ '53cf956f-c1df-410b-8bea-27f712cca7c0'::text)
                             Rows Removed by Filter: 93697
                             Heap Blocks: exact=54267 lossy=98820
                             ->  Bitmap Index Scan on holdings_record_permanentlocationid_idx  (cost=0.00..529.63 rows=14543 width=0) (actual time=206.863..2
06.863 rows=2763587 loops=1)
                                   Index Cond: (lower(f_unaccent((jsonb ->> 'permanentLocationId'::text))) = '53cf956f-c1df-410b-8bea-27f712cca7c0'::text)
                       ->  Hash  (cost=15689.59..15689.59 rows=1105 width=850) (actual time=231.100..231.100 rows=2 loops=1)
                             Buckets: 2048  Batches: 1  Memory Usage: 18kB
                             ->  Bitmap Heap Scan on instance instance_1  (cost=85.69..15689.59 rows=1105 width=850) (actual time=225.802..231.024 rows=2 loo
ps=1)
                                   Recheck Cond: ((to_tsvector('simple'::regconfig, (jsonb ->> 'title'::text)) @@ '''star'''::tsquery) OR (to_tsvector('simpl
e'::regconfig, (jsonb ->> 'contributors'::text)) @@ '''star'''::tsquery) OR (to_tsvector('simple'::regconfig, (jsonb ->> 'identifiers'::text)) @@ '''star''':
:tsquery))
                                   Filter: (lower(f_unaccent((jsonb ->> 'title'::text))) < $1)
                                   Rows Removed by Filter: 900
                                   Heap Blocks: exact=884
                                   ->  BitmapOr  (cost=85.69..85.69 rows=3315 width=0) (actual time=0.194..0.194 rows=0 loops=1)
                                         ->  Bitmap Index Scan on instance_title_idx_ft  (cost=0.00..29.32 rows=1243 width=0) (actual time=0.142..0.143 rows=
846 loops=1)
                                               Index Cond: (to_tsvector('simple'::regconfig, (jsonb ->> 'title'::text)) @@ '''star'''::tsquery)
                                         ->  Bitmap Index Scan on instance_to_tsvector_idx  (cost=0.00..29.32 rows=1243 width=0) (actual time=0.031..0.031 ro
ws=86 loops=1)
                                               Index Cond: (to_tsvector('simple'::regconfig, (jsonb ->> 'contributors'::text)) @@ '''star'''::tsquery)
                                         ->  Bitmap Index Scan on instance_to_tsvector_idx1  (cost=0.00..26.22 rows=829 width=0) (actual time=0.019..0.019 ro
ws=0 loops=1)
                                               Index Cond: (to_tsvector('simple'::regconfig, (jsonb ->> 'identifiers'::text)) @@ '''star'''::tsquery)
   CTE allrecords
     ->  Result  (cost=15412.47..66062.57 rows=17 width=882) (actual time=232.841..14492.653 rows=915 loops=1)
           One-Time Filter: ($3 < 30)
           InitPlan 3 (returns $3)
             ->  Aggregate  (cost=0.14..0.15 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
                   ->  CTE Scan on headrecords headrecords_1  (cost=0.00..0.12 rows=6 width=0) (actual time=0.000..0.001 rows=2 loops=1)
           ->  Hash Join  (cost=15412.47..66062.57 rows=17 width=882) (actual time=232.836..14492.194 rows=915 loops=1)
                 Hash Cond: (lower(f_unaccent((holdings_record_2.jsonb ->> 'instanceId'::text))) = lower(f_unaccent((instance_2.jsonb ->> 'id'::text))))
                 ->  Bitmap Heap Scan on holdings_record holdings_record_2  (cost=533.26..47415.86 rows=14543 width=350) (actual time=213.747..7894.179 rows=
2763587 loops=1)
                       Filter: (lower(f_unaccent((jsonb ->> 'permanentLocationId'::text))) ~~ '53cf956f-c1df-410b-8bea-27f712cca7c0'::text)
                       Rows Removed by Filter: 93697
                       Heap Blocks: exact=54267 lossy=98820
                       ->  Bitmap Index Scan on holdings_record_permanentlocationid_idx  (cost=0.00..529.63 rows=14543 width=0) (actual time=203.450..203.450
 rows=2763587 loops=1)
                             Index Cond: (lower(f_unaccent((jsonb ->> 'permanentLocationId'::text))) = '53cf956f-c1df-410b-8bea-27f712cca7c0'::text)
                 ->  Hash  (cost=14837.63..14837.63 rows=3314 width=850) (actual time=7.514..7.514 rows=902 loops=1)
                       Buckets: 4096  Batches: 1  Memory Usage: 824kB
                       ->  Bitmap Heap Scan on instance instance_2  (cost=87.35..14837.63 rows=3314 width=850) (actual time=0.290..4.810 rows=902 loops=1)
                             Recheck Cond: ((to_tsvector('simple'::regconfig, (jsonb ->> 'title'::text)) @@ '''star'''::tsquery) OR (to_tsvector('simple'::re
gconfig, (jsonb ->> 'contributors'::text)) @@ '''star'''::tsquery) OR (to_tsvector('simple'::regconfig, (jsonb ->> 'identifiers'::text)) @@ '''star'''::tsque
ry))
                             Heap Blocks: exact=884
                             ->  BitmapOr  (cost=87.35..87.35 rows=3315 width=0) (actual time=0.203..0.203 rows=0 loops=1)
                                   ->  Bitmap Index Scan on instance_title_idx_ft  (cost=0.00..29.32 rows=1243 width=0) (actual time=0.147..0.147 rows=846 lo
ops=1)
                                         Index Cond: (to_tsvector('simple'::regconfig, (jsonb ->> 'title'::text)) @@ '''star'''::tsquery)
                                   ->  Bitmap Index Scan on instance_to_tsvector_idx  (cost=0.00..29.32 rows=1243 width=0) (actual time=0.033..0.033 rows=86
loops=1)
                                         Index Cond: (to_tsvector('simple'::regconfig, (jsonb ->> 'contributors'::text)) @@ '''star'''::tsquery)
                                   ->  Bitmap Index Scan on instance_to_tsvector_idx1  (cost=0.00..26.22 rows=829 width=0) (actual time=0.022..0.022 rows=0 l
oops=1)
                                         Index Cond: (to_tsvector('simple'::regconfig, (jsonb ->> 'identifiers'::text)) @@ '''star'''::tsquery)
   ->  HashAggregate  (cost=1.81..2.04 rows=23 width=72) (actual time=28979.683..28979.688 rows=30 loops=1)
         Group Key: "*SELECT* 1".jsonb, "*SELECT* 1".title, ((0)::bigint)
         ->  Append  (cost=0.15..1.63 rows=23 width=72) (actual time=28979.604..28979.612 rows=30 loops=1)
               ->  Subquery Scan on "*SELECT* 1"  (cost=0.15..0.34 rows=6 width=72) (actual time=14484.161..14484.161 rows=0 loops=1)
                     ->  Result  (cost=0.15..0.27 rows=6 width=68) (actual time=14484.160..14484.160 rows=0 loops=1)
                           One-Time Filter: ($6 >= 30)
                           InitPlan 6 (returns $6)
                             ->  Aggregate  (cost=0.14..0.15 rows=1 width=8) (actual time=14484.158..14484.158 rows=1 loops=1)
                                   ->  CTE Scan on headrecords headrecords_2  (cost=0.00..0.12 rows=6 width=0) (actual time=14484.153..14484.155 rows=2 loops
=1)
                           ->  CTE Scan on headrecords  (cost=0.15..0.27 rows=6 width=68) (never executed)
               ->  Limit  (cost=1.08..1.12 rows=17 width=72) (actual time=14495.442..14495.447 rows=30 loops=1)
                     InitPlan 5 (returns $5)
                       ->  Aggregate  (cost=0.38..0.39 rows=1 width=8) (actual time=14261.885..14261.885 rows=1 loops=1)
                             ->  CTE Scan on allrecords allrecords_1  (cost=0.00..0.34 rows=17 width=0) (actual time=0.000..14261.371 rows=915 loops=1)
                     ->  Sort  (cost=0.69..0.73 rows=17 width=72) (actual time=14495.441..14495.442 rows=30 loops=1)
                           Sort Key: allrecords.title
                           Sort Method: top-N heapsort  Memory: 92kB
                           ->  CTE Scan on allrecords  (cost=0.00..0.34 rows=17 width=72) (actual time=14494.732..14494.981 rows=915 loops=1)
 Planning time: 1.312 ms
 Execution time: 28979.925 ms
(82 rows)
Comment by Nassib Nassar [ 02/Apr/19 ]

Hongwei Ji I don't have access to the perf env system, but would you send the full list of indexes on supertenant_mod_inventory_storage.instance.

Comment by Hongwei Ji [ 02/Apr/19 ]

Nassib Nassar, here you go:

folio=# \d supertenant_mod_inventory_storage.instance;
              Table "supertenant_mod_inventory_storage.instance"
      Column      |           Type           | Collation | Nullable | Default
------------------+--------------------------+-----------+----------+---------
 _id              | uuid                     |           | not null |
 jsonb            | jsonb                    |           | not null |
 creation_date    | timestamp with time zone |           |          |
 created_by       | text                     |           |          |
 instancestatusid | uuid                     |           |          |
 modeofissuanceid | uuid                     |           |          |
Indexes:
    "instance_pkey" PRIMARY KEY, btree (_id)
    "instance_hrid_idx_unique" UNIQUE, btree (lower(f_unaccent(jsonb ->> 'hrid'::text)))
    "instance_classifications_idx_gin" gin (lower(f_unaccent(jsonb ->> 'classifications'::text)) gin_trgm_ops)
    "instance_contributors_idx" btree (lower(f_unaccent(jsonb ->> 'contributors'::text)))
    "instance_contributors_idx_gin" gin (lower(f_unaccent(jsonb ->> 'contributors'::text)) gin_trgm_ops)
    "instance_id_idx" btree (lower(f_unaccent(jsonb ->> 'id'::text)))
    "instance_identifiers_idx_gin" gin (lower(f_unaccent(jsonb ->> 'identifiers'::text)) gin_trgm_ops)
    "instance_indextitle_idx" btree (lower(f_unaccent(jsonb ->> 'indexTitle'::text)))
    "instance_indextitle_idx_gin" gin (lower(f_unaccent(jsonb ->> 'indexTitle'::text)) gin_trgm_ops)
    "instance_instancestatusid_idx_gin" gin (lower(jsonb ->> 'instanceStatusId'::text) gin_trgm_ops)
    "instance_instancetypeid_idx_ft" gin (to_tsvector('simple'::regconfig, jsonb ->> 'instanceTypeId'::text))
    "instance_instancetypeid_idx_gin" gin (lower(f_unaccent(jsonb ->> 'instanceTypeId'::text)) gin_trgm_ops)
    "instance_languages_idx_ft" gin (to_tsvector('simple'::regconfig, jsonb ->> 'languages'::text))
    "instance_modeofissuanceid_idx_gin" gin (lower(jsonb ->> 'modeOfIssuanceId'::text) gin_trgm_ops)
    "instance_publication_idx" btree (lower(f_unaccent(jsonb ->> 'publication'::text)))
    "instance_statisticalcodeids_idx" btree (lower(jsonb ->> 'statisticalCodeIds'::text))
    "instance_subjects_idx_gin" gin (lower(f_unaccent(jsonb ->> 'subjects'::text)) gin_trgm_ops)
    "instance_title_idx" btree (lower(f_unaccent(jsonb ->> 'title'::text)))
    "instance_title_idx_ft" gin (to_tsvector('simple'::regconfig, jsonb ->> 'title'::text))
    "instance_title_idx_gin" gin (lower(f_unaccent(jsonb ->> 'title'::text)) gin_trgm_ops)
    "instance_to_tsvector_idx" gin (to_tsvector('simple'::regconfig, jsonb ->> 'contributors'::text))
    "instance_to_tsvector_idx1" gin (to_tsvector('simple'::regconfig, jsonb ->> 'identifiers'::text))
Foreign-key constraints:
    "instance_instancestatusid_fkey" FOREIGN KEY (instancestatusid) REFERENCES supertenant_mod_inventory_storage.instance_status(_id)
    "instance_modeofissuanceid_fkey" FOREIGN KEY (modeofissuanceid) REFERENCES supertenant_mod_inventory_storage.mode_of_issuance(_id)
Referenced by:
    TABLE "supertenant_mod_inventory_storage.holdings_record" CONSTRAINT "holdings_record_instanceid_fkey" FOREIGN KEY (instanceid) REFERENCES supertenant_mod_inventory_storage.instance(_id)
    TABLE "supertenant_mod_inventory_storage.instance_relationship" CONSTRAINT "instance_relationship_subinstanceid_fkey" FOREIGN KEY (subinstanceid) REFERENCES supertenant_mod_inventory_storage.instance(_id)
    TABLE "supertenant_mod_inventory_storage.instance_relationship" CONSTRAINT "instance_relationship_superinstanceid_fkey" FOREIGN KEY (superinstanceid) REFERENCES supertenant_mod_inventory_storage.instance(_id)
    TABLE "supertenant_mod_inventory_storage.instance_source_marc" CONSTRAINT "instance_source_marc__id_fkey" FOREIGN KEY (_id) REFERENCES supertenant_mod_inventory_storage.instance(_id)
Triggers:
    set_instance_hrid BEFORE INSERT ON supertenant_mod_inventory_storage.instance FOR EACH ROW EXECUTE PROCEDURE supertenant_mod_inventory_storage.set_instance_hrid()
    set_instance_md_json_trigger BEFORE UPDATE ON supertenant_mod_inventory_storage.instance FOR EACH ROW EXECUTE PROCEDURE supertenant_mod_inventory_storage.set_instance_md_json()
    set_instance_md_trigger BEFORE INSERT ON supertenant_mod_inventory_storage.instance FOR EACH ROW EXECUTE PROCEDURE supertenant_mod_inventory_storage.instance_set_md()
    set_instance_sourcerecordformat BEFORE INSERT OR UPDATE ON supertenant_mod_inventory_storage.instance FOR EACH ROW EXECUTE PROCEDURE supertenant_mod_inventory_storage.set_instance_sourcerecordformat()
    update_instance_references BEFORE INSERT OR UPDATE ON supertenant_mod_inventory_storage.instance FOR EACH ROW EXECUTE PROCEDURE supertenant_mod_inventory_storage.update_instance_references()
Comment by Nassib Nassar [ 02/Apr/19 ]

Hongwei Ji Also the postgres configuration file "postgresql.conf" if possible.

Comment by Nassib Nassar [ 02/Apr/19 ]

Hongwei Ji Also, if not too much trouble, the output of:

\d diku_mod_inventory_storage.holdings_record
Comment by Hongwei Ji [ 02/Apr/19 ]

Nassib Nassar, here you go postgresql.conf and

folio=# \d supertenant_mod_inventory_storage.holdings_record;
            Table "supertenant_mod_inventory_storage.holdings_record"
       Column        |           Type           | Collation | Nullable | Default
---------------------+--------------------------+-----------+----------+---------
 _id                 | uuid                     |           | not null |
 jsonb               | jsonb                    |           | not null |
 creation_date       | timestamp with time zone |           |          |
 created_by          | text                     |           |          |
 instanceid          | uuid                     |           |          |
 permanentlocationid | uuid                     |           |          |
 temporarylocationid | uuid                     |           |          |
 holdingstypeid      | uuid                     |           |          |
 callnumbertypeid    | uuid                     |           |          |
 illpolicyid         | uuid                     |           |          |
Indexes:
    "holdings_record_pkey" PRIMARY KEY, btree (_id)
    "holdings_record_hrid_idx_unique" UNIQUE, btree (lower(f_unaccent(jsonb ->> 'hrid'::text)))
    "holdings_record_id_idx" btree (lower(f_unaccent(jsonb ->> 'id'::text)))
    "holdings_record_instanceid_idx" btree (lower(f_unaccent(jsonb ->> 'instanceId'::text)))
    "holdings_record_instanceid_idx_gin" gin (lower(f_unaccent(jsonb ->> 'instanceId'::text)) gin_trgm_ops)
    "holdings_record_permanentlocationid_idx" btree (lower(f_unaccent(jsonb ->> 'permanentLocationId'::text)))
    "holdings_record_permanentlocationid_idx_gin" gin (lower(f_unaccent(jsonb ->> 'permanentLocationId'::text)) gin_trgm_ops)
    "holdings_record_temporarylocationid_idx" btree (lower(f_unaccent(jsonb ->> 'temporaryLocationId'::text)))
    "holdings_record_temporarylocationid_idx_gin" gin (lower(f_unaccent(jsonb ->> 'temporaryLocationId'::text)) gin_trgm_ops)
Foreign-key constraints:
    "holdings_record_callnumbertypeid_fkey" FOREIGN KEY (callnumbertypeid) REFERENCES supertenant_mod_inventory_storage.call_number_type(_id)
    "holdings_record_holdingstypeid_fkey" FOREIGN KEY (holdingstypeid) REFERENCES supertenant_mod_inventory_storage.holdings_type(_id)
    "holdings_record_illpolicyid_fkey" FOREIGN KEY (illpolicyid) REFERENCES supertenant_mod_inventory_storage.ill_policy(_id)
    "holdings_record_instanceid_fkey" FOREIGN KEY (instanceid) REFERENCES supertenant_mod_inventory_storage.instance(_id)
    "holdings_record_permanentlocationid_fkey" FOREIGN KEY (permanentlocationid) REFERENCES supertenant_mod_inventory_storage.location(_id)
    "holdings_record_temporarylocationid_fkey" FOREIGN KEY (temporarylocationid) REFERENCES supertenant_mod_inventory_storage.location(_id)
Referenced by:
    TABLE "supertenant_mod_inventory_storage.item" CONSTRAINT "item_holdingsrecordid_fkey" FOREIGN KEY (holdingsrecordid) REFERENCES supertenant_mod_inventory_storage.holdings_record(_id)
Triggers:
    set_holdings_record_hrid BEFORE INSERT ON supertenant_mod_inventory_storage.holdings_record FOR EACH ROW EXECUTE PROCEDURE supertenant_mod_inventory_storage.set_holdings_record_hrid()
    set_holdings_record_md_json_trigger BEFORE UPDATE ON supertenant_mod_inventory_storage.holdings_record FOR EACH ROW EXECUTE PROCEDURE supertenant_mod_inventory_storage.set_holdings_record_md_json()
    set_holdings_record_md_trigger BEFORE INSERT ON supertenant_mod_inventory_storage.holdings_record FOR EACH ROW EXECUTE PROCEDURE supertenant_mod_inventory_storage.holdings_record_set_md()
    update_holdings_record_references BEFORE INSERT OR UPDATE ON supertenant_mod_inventory_storage.holdings_record FOR EACH ROW EXECUTE PROCEDURE supertenant_mod_inventory_storage.update_holdings_record_references()
Comment by Nassib Nassar [ 03/Apr/19 ]

Hongwei Ji Thanks. Also the outputs of:

SELECT version();

and

SHOW ALL;

Then "EXPLAIN ANALYZE" the query again, but just before that set:

SET enable_hashjoin TO off;

(in the same session as the query)

Comment by Jakub Skoczen [ 03/Apr/19 ]

Julian Ladisch any idea why the "title" search uses regexes and not the fulltext query (tsvector/tsquery) when we apply the location filter? Is there a bug in CQLPG or is this because the view is somehow hardcoded in the mod-inventory-storage?

Comment by Nassib Nassar [ 04/Apr/19 ]

Hongwei Ji Jakub Skoczen I am not sure why the query is running slowly on the perf env. From the query plan it looks like one problem is incorrect statistics, though presumably autovacuum is running. But just to be sure could you do:

VACUUM ANALYZE supertenant_mod_inventory_storage.instance;
VACUUM ANALYZE supertenant_mod_inventory_storage.holdings_record;

Then re-explain-analyze the query:

BEGIN;
EXPLAIN ANALYZE WITH headrecords AS [the rest of the query . . .]
ROLLBACK;
Comment by Nassib Nassar [ 04/Apr/19 ]

Hongwei Ji Jakub Skoczen Also again it might be helpful to see the cost parameters and other settings from

SHOW ALL;

on the perf env.

Alternatively, is it possible for me to clone the perf env locally?

Comment by Hongwei Ji [ 04/Apr/19 ]

Nassib Nassar, please see slack message how to access the perf db directly. Let me know if you have connection issues.

Comment by Nassib Nassar [ 04/Apr/19 ]

On the perf db, after running the two VACUUM ANALYZE commands above to recompile statistics, the query runs in less than one second (~300 ms).

Comment by Nassib Nassar [ 04/Apr/19 ]

Hongwei Ji Could you run the sequence in https://folio-org.atlassian.net/browse/FOLIO-1920?focusedCommentId=193197&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel and see if that works for you?

Comment by Nassib Nassar [ 04/Apr/19 ]

Hongwei confirms the query is now very fast. I have closed this issue.

Comment by Hongwei Ji [ 04/Apr/19 ]

Just want to add that the original query (regex version) is also fast now. Is it a pure stats issue?

Comment by Hongwei Ji [ 04/Apr/19 ]

BTW, see this in the perf db starting log:

2019-04-04 12:06:26.547 UTC [208] ERROR:  canceling autovacuum task
2019-04-04 12:06:26.547 UTC [208] CONTEXT:  automatic analyze of table "folio.supertenant_mod_inventory_storage.instance"
2019-04-04 12:06:30.115 UTC [208] ERROR:  canceling autovacuum task
2019-04-04 12:06:30.115 UTC [208] CONTEXT:  automatic analyze of table "folio.supertenant_mod_inventory_storage.holdings_record"
Comment by Nassib Nassar [ 04/Apr/19 ]

Hongwei asks about a related query:

SELECT COUNT(*) FROM supertenant_mod_inventory_storage.instance_holding_view  WHERE lower(f_unaccent(instance_holding_view.ho_jsonb->>'permanentLocationId')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]])) 53cf956f-c1df-410b-8bea-27f712cca7c0($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'));

I will take a quick look at this.

Comment by Nassib Nassar [ 04/Apr/19 ]

Most of the time in this related query is spent in performing the join. I think the question should be asked whether the join is necessary or if the problem can be reformulated. Other than that, some performance optimization may be possible.

A secondary issue is the unnecessary use of a regular expression. The WHERE clause should use (ho_jsonb->>'permanentLocationId' = '53cf956f-c1df-410b-8bea-27f712cca7c0') if that satisfies the use case.

Comment by Nassib Nassar [ 05/Apr/19 ]

Hongwei Ji Running locally (VM), the original regex "star" query takes 140.0 s compared with the new tsvector query which takes 0.3 s. I didn't investigate the regex query on the perf env, as tsvector would seem a more correct form for the use case.

Generated at Thu Feb 08 23:16:52 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.