[RMB-441] Consider to use = over LIKE when search term has no wildcard Created: 31/Jul/19 Updated: 31/May/21 Resolved: 31/May/21 |
|
| Status: | Closed |
| Project: | RAML Module Builder |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Task | Priority: | P3 |
| Reporter: | Hongwei Ji | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | platform-backlog | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Issue links: |
|
||||||||||||||||||||||||
| Sprint: | CP: Roadmap backlog | ||||||||||||||||||||||||
| Development Team: | Core: Platform | ||||||||||||||||||||||||
| Description |
|
While working on
1) if there is only BTREE index: All searches use index except LIKE search for "Available".
1) with just b-tree index: "item_status_name_idx" btree (lower(f_unaccent((jsonb -> 'status'::text) ->> 'name'::text))) folio=# vacuum analyze supertenant_mod_inventory_storage.item; VACUUM folio=# explain analyze SELECT COUNT(1) FROM supertenant_mod_inventory_storage.item WHERE lower(f_unaccent(item.jsonb->'status'->>'name')) LIKE lower(f_unaccent('Available')); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1056834.36..1056834.38 rows=1 width=8) (actual time=7326.753..7326.753 rows=1 loops=1) -> Seq Scan on item (cost=0.00..1049759.31 rows=2830022 width=0) (actual time=0.054..7060.456 rows=2837576 loops=1) Filter: (lower(f_unaccent(((jsonb -> 'status'::text) ->> 'name'::text))) ~~ 'available'::text) Rows Removed by Filter: 71070 Planning time: 0.215 ms Execution time: 7326.774 ms (6 rows) folio=# explain analyze SELECT COUNT(1) FROM supertenant_mod_inventory_storage.item WHERE lower(f_unaccent(item.jsonb->'status'->>'name')) LIKE lower(f_unaccent('Checked out')); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=58888.39..58888.40 rows=1 width=8) (actual time=469.995..469.996 rows=1 loops=1) -> Index Scan using item_status_name_idx on item (cost=0.43..58691.81 rows=78631 width=0) (actual time=0.445..461.876 rows=71070 loops=1) Index Cond: (lower(f_unaccent(((jsonb -> 'status'::text) ->> 'name'::text))) = 'checked out'::text) Filter: (lower(f_unaccent(((jsonb -> 'status'::text) ->> 'name'::text))) ~~ 'checked out'::text) Planning time: 0.158 ms Execution time: 470.019 ms (6 rows) folio=# explain analyze SELECT COUNT(1) FROM supertenant_mod_inventory_storage.item WHERE lower(f_unaccent(item.jsonb->'status'->>'name')) = lower(f_unaccent('Available')); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=443045.82..443045.83 rows=1 width=8) (actual time=1465.292..1465.292 rows=1 loops=1) -> Index Scan using item_status_name_idx on item (cost=0.43..435970.76 rows=2830022 width=0) (actual time=0.063..1263.410 rows=2837576 loops=1) Index Cond: (lower(f_unaccent(((jsonb -> 'status'::text) ->> 'name'::text))) = 'available'::text) Planning time: 0.154 ms Execution time: 1465.317 ms (5 rows) folio=# explain analyze SELECT COUNT(1) FROM supertenant_mod_inventory_storage.item WHERE lower(f_unaccent(item.jsonb->'status'->>'name')) = lower(f_unaccent('Checked out')); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=38444.33..38444.34 rows=1 width=8) (actual time=291.678..291.679 rows=1 loops=1) -> Index Scan using item_status_name_idx on item (cost=0.43..38247.75 rows=78631 width=0) (actual time=0.094..285.701 rows=71070 loops=1) Index Cond: (lower(f_unaccent(((jsonb -> 'status'::text) ->> 'name'::text))) = 'checked out'::text) Planning time: 0.154 ms Execution time: 291.704 ms (5 rows) 2) With just gin index: "item_status_name_idx_gin" gin (lower(f_unaccent((jsonb -> 'status'::text) ->> 'name'::text)) gin_trgm_ops) folio=# vacuum analyze supertenant_mod_inventory_storage.item; VACUUM folio=# explain analyze SELECT COUNT(1) FROM supertenant_mod_inventory_storage.item WHERE lower(f_unaccent(item.jsonb->'status'->>'name')) LIKE lower(f_unaccent('Available')); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1056851.82..1056851.83 rows=1 width=8) (actual time=7149.606..7149.606 rows=1 loops=1) -> Seq Scan on item (cost=0.00..1049759.31 rows=2837003 width=0) (actual time=0.053..6895.869 rows=2837576 loops=1) Filter: (lower(f_unaccent(((jsonb -> 'status'::text) ->> 'name'::text))) ~~ 'available'::text) Rows Removed by Filter: 71070 Planning time: 0.202 ms Execution time: 7149.625 ms (6 rows) folio=# explain analyze SELECT COUNT(1) FROM supertenant_mod_inventory_storage.item WHERE lower(f_unaccent(item.jsonb->'status'->>'name')) LIKE lower(f_unaccent('Checked out')); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=189376.56..189376.57 rows=1 width=8) (actual time=1305.208..1305.208 rows=1 loops=1) -> Bitmap Heap Scan on item (cost=9923.29..189197.43 rows=71650 width=0) (actual time=43.358..1296.635 rows=71070 loops=1) Recheck Cond: (lower(f_unaccent(((jsonb -> 'status'::text) ->> 'name'::text))) ~~ 'checked out'::text) Rows Removed by Index Recheck: 337220 Heap Blocks: exact=37348 lossy=33722 -> Bitmap Index Scan on item_status_name_idx_gin (cost=0.00..9905.37 rows=71650 width=0) (actual time=37.155..37.155 rows=71070 loops=1) Index Cond: (lower(f_unaccent(((jsonb -> 'status'::text) ->> 'name'::text))) ~~ 'checked out'::text) Planning time: 0.161 ms Execution time: 1305.239 ms (9 rows) folio=# explain analyze SELECT COUNT(1) FROM supertenant_mod_inventory_storage.item WHERE lower(f_unaccent(item.jsonb->'status'->>'name')) = lower(f_unaccent('Available')); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1056851.82..1056851.83 rows=1 width=8) (actual time=7151.198..7151.198 rows=1 loops=1) -> Seq Scan on item (cost=0.00..1049759.31 rows=2837003 width=0) (actual time=0.063..6896.387 rows=2837576 loops=1) Filter: (lower(f_unaccent(((jsonb -> 'status'::text) ->> 'name'::text))) = 'available'::text) Rows Removed by Filter: 71070 Planning time: 0.146 ms Execution time: 7151.217 ms (6 rows) folio=# explain analyze SELECT COUNT(1) FROM supertenant_mod_inventory_storage.item WHERE lower(f_unaccent(item.jsonb->'status'->>'name')) = lower(f_unaccent('Checked out')); QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1049938.44..1049938.45 rows=1 width=8) (actual time=6624.144..6624.144 rows=1 loops=1) -> Seq Scan on item (cost=0.00..1049759.31 rows=71650 width=0) (actual time=0.431..6615.394 rows=71070 loops=1) Filter: (lower(f_unaccent(((jsonb -> 'status'::text) ->> 'name'::text))) = 'checked out'::text) Rows Removed by Filter: 2837576 Planning time: 0.142 ms Execution time: 6624.164 ms (6 rows) |
| Comments |
| Comment by Julian Ladisch [ 03/Aug/19 ] |
|
https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-LIKE says: Reproducable example: create table t(jsonb jsonb); create unique index on t ((jsonb->>'barcode')); explain analyse select * from t where jsonb->>'barcode' LIKE 'abc'; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Index Scan using t_expr_idx on t (cost=0.15..8.18 rows=1 width=32) (actual time=0.028..0.028 rows=0 loops=1) Index Cond: ((jsonb ->> 'barcode'::text) = 'abc'::text) Filter: ((jsonb ->> 'barcode'::text) ~~ 'abc'::text) Planning Time: 0.571 ms Execution Time: 0.075 ms Note that the query uses the index and uses = to lookup 'abc' in the index. PostgreSQL has already implemented the optimization to use = when the LIKE term has no wildcard. We should not duplicate such code. |
| Comment by Julian Ladisch [ 03/Aug/19 ] |
|
Changing from LIKE to = doesn't make the query fast enough, 1465 ms is still to slow for other reasons. These other reasons are discussed in
|
| Comment by Hongwei Ji [ 29/Aug/19 ] |
|
Could it be that in simple cases, the optimizer can do it right but not in more complex cases like what we observed above? I am not sure it is a wasted effort to try to use the most suitable syntax where you can. I guess I do not have as much faith on the optimizer as Julian does. |
| Comment by Julian Ladisch [ 29/Aug/19 ] |
|
This should be closed as "Won't do". |
| Comment by Hongwei Ji [ 29/Aug/19 ] |
|
The numbers were from platform perf test env. The db was populated from https://s3.amazonaws.com/folio-public-sample-data/perf.tar.gz. It contains a bash file to copy the data to db. |
| Comment by Adam Dickmeiss [ 03/Sep/19 ] |
|
If anything this shows that in many cases it does not make sense to provide make a separate gin-index .. (It modules can simply choose not to define it). Fewer indexes – faster updates/insertion. By the way: if a module does not define a gin-index will our CQL2PgJSON fail a search using == ? Or just be terribly slow? |
| Comment by Jakub Skoczen [ 03/Sep/19 ] |
|
Guys, I think it might make sense to try removing the GIN index from the status field in mod-inventory-storage and verifying the performance. But I don't think we will see general gains. it might make more sense to look at limiting the size of the results in the subquery (see
|
| Comment by Hanna Hulevich [ 31/May/21 ] |
|
Hongwei Ji Jakub Skoczen we are closing this as we think this is not needed. Please reopen if you think in different way. Thanks! |