Investigation: mod-orders/mod-orders-storage performance
1. Performance testing methods (general information)
Performance testing methods for mod-orders-storage should cover common operations: search (by one/several* fields), filtering (by one/several* fields), filtering & sorting (by one/several* fields). Corresponding endpoints for GET requests for covering common operations are presented in the Table 1.1. JMeter script for measurement performance markers can be founded here mod-orders-storage-perf-scenario.zip. Default script configuration designed to perform 1K requests in one thread for each endpoint from Table 1.1. This configuration can be simple improved/changed in future to obtain more precise performance markers.
Table 1.1 - Endpoints for reference GET requests
Search | |
/orders-storage/purchase-orders?query=poNumber==${poNumber} | Searching purchase order/po-line records by unique value for both direct endpoints and po+po-line view. One need manually set poNumber/poLineNumber value whenever possible in the middle of the range of corresponding values. |
/orders-storage/orders?query=poNumber==${poNumber} | |
/orders-storage/po-lines?query=poLineNumber==${poLineNumber} | |
/orders-storage/order-lines?query=poLineNumber==${poLineNumber} | |
Filtering | |
/orders-storage/purchase-orders?query=workflowStatus==Open | Filtering purchase order/po-line records by multiple same value for both direct endpoints and po+po-line view. |
/orders-storage/orders?query=workflowStatus==Open | |
/orders-storage/po-lines?query=orderFormat==Physical Resource | |
/orders-storage/order-lines?query=orderFormat==Physical Resource | |
Sorting | |
/orders-storage/orders?query=workflowStatus==Open AND orderFormat=Physical Resource sortBy metadata.createdDate/sort.ascending | Filtering and sorting purchase order/po-line records by multiple same value for po+po-line view. |
/orders-storage/order-lines?query=workflowStatus==Open AND orderFormat=Physical Resource sortBy metadata.createdDate/sort.ascending |
* At the present moment searching and filtering performance testing implemented only for one field. It is interesting to add scenarios for several fields searching/filtering.
2. Indexes
2.1 General information
Current RMB implementation contains some PostgreSQL indexes. Correspondence between RMB indexes and standard PostgreSQL indexes is presented in the Table 2.1.
Table 2.1 - Correspondence between RMB indexes and standard PostgreSQL indexes
PostgreSQL index | Description | RMB index type | RMB-generated snippet | Folio schemas application area |
---|---|---|---|---|
B-tree | B-trees fit the most common situations. B-trees can handle equality and range queries on data that can be sorted into some ordering. | index | CREATE INDEX IF NOT EXISTS ${table.tableName}_${indexes.fieldName}_idx ON ${myuniversity}_${mymodule}.${table.tableName} | B-trees can handle equality and range queries on data that can be sorted into some ordering. This is most common case. |
uniqueIndex | CREATE UNIQUE INDEX IF NOT EXISTS ${table.tableName}_${indexes.fieldName}_idx_unique ON ${myuniversity}_${mymodule}.${table.tableName} ( ${indexes.fieldPath} ) | Unique indexes. This should be used for unique fields, for example, poNumber. *Currently, only B-tree indexes can be declared unique https://www.postgresql.org/docs/11/indexes-unique.html. | ||
likeIndex | CREATE INDEX IF NOT EXISTS ${table.tableName}_${indexes.fieldName}_idx_like ON ${myuniversity}_${mymodule}.${table.tableName} ((${indexes.fieldPath}) text_pattern_ops) | Due to text_pattern_ops the values are compared strictly character by character rather than according to the locale-specific collation rules. This makes these operator classes suitable for use by queries involving pattern matching expressions. | ||
Hash | Hash indexes can only handle simple equality comparisons. The query planner will consider using a hash index whenever an indexed column is involved in a comparison using the = operator. | Not supported | ||
GiST | GiST indexes are not a single kind of index, but rather an infrastructure within which many different indexing strategies can be implemented. The standard distribution of PostgreSQL includes GiST operator classes for several two-dimensional geometric data types, which support indexed queries using these operators: <<, &<, &>, >>, <<|, &<|, |&>, |>>, @>, <@, ~=, &&. GiST indexes are also capable of optimizing “nearest-neighbor” searches. | Not supported | ||
SP-GiST | SP-GiST indexes, like GiST indexes, offer an infrastructure that supports various kinds of searches. SP-GiST permits implementation of a wide range of different non-balanced disk-based data structures, such as quadtrees, k-d trees, and radix trees (tries). As an example, the standard distribution of PostgreSQL includes SP-GiST operator classes for two-dimensional points, which support indexed queries using these operators: <<, >>, ~=, <@, <^, >^. | Not supported | ||
GIN | GIN indexes are “inverted indexes” which are appropriate for data values that contain multiple component values, such as arrays. An inverted index contains a separate entry for each component value, and can efficiently handle queries that test for the presence of specific component values. | ginIndex | CREATE INDEX IF NOT EXISTS ${table}_${field}_idx_gin ON ${tenant}_${module}.${table} USING GIN ((${indexes.fieldPath}) gin_trgm_ops) | The gin_trgm_ops module provides GIN index operator classes that allows to create an index over a text column for the purpose of very fast similarity searches. These indexes do not support equality nor simple comparison operators. |
fullTextIndex | CREATE INDEX IF NOT EXISTS ${table}_${indexes.fieldName}_idx_ft ON ${myuniversity}_${mymodule}.${table.tableName} USING GIN ( to_tsvector('${ft_defaultDictionary}', ${indexes.fieldPath}) ); | This index can be used for used for full text searching, for example, comments, notes, etc. by using function to_tsvector that converts a document to the tsvector data type. | ||
BRIN | BRIN indexes (a shorthand for Block Range INdexes) store summaries about the values stored in consecutive physical block ranges of a table. | Not supported |
2.2 Purchase order, purchase order line, piece indexes
In scope of - MODORDSTOR-85Getting issue details... STATUS the following indexes were introduced for PO, PO Line, Piece (Table 2.2)
Table 2.2 - Correspondence between RMB indexes and PO, PO Line, Piece fields
RMB index | Field | |
---|---|---|
index | PO | assignedTo, metadata.createdDate, metadata.createdByUserId, workflowStatus, approved, dateOrdered, orderType, reEncumber, renewal.manualRenewal, renewal.renewalDate, renewal.reviewPeriod |
PO Line | eresource.accessProvider, eresource.activated, eresource.expectedActivation, eresource.trial, eresource.materialType, details.subscriptionFrom, details.subscriptionTo, location.locationId, physical.receiptDue, physical.expectedReceiptDate, physical.volumes, physical.materialType, claim.grace, claim.sent, collection, receiptStatus, paymentStatus, orderFormat, receiptDate, metadata.createdDate, rush, source | |
Piece | format, itemId, locationId, poLineId, receivingStatus, receivedDate, supplement | |
uniqueIndex | PO | poNumber |
PO Line | - | |
Piece | - | |
ginIndex | PO | - |
PO Line | fundDistribution.code, details.productIds, title, vendor.vendorAccount, vendor.refNumber, tags, contributors, poLineNumber, requester | |
Piece | caption | |
fullTextIndex | PO | closeReason.reason, closeReason.note |
PO Line | publisher, donor, selector | |
Piece | comment, caption |
Indexes can be verified by using query:
SELECT tablename, indexname, indexdef FROM pg_indexes WHERE schemaname = 'diku_mod_orders_storage' ORDER BY tablename, indexname;
2.3 Performance testing results
Since there is currently no reference environment for measuring performance markers, it is difficult to obtain a time trend for their values. Therefore, it makes sense to compare values before/after critical changes, however, measured on the same environment. Performance testing results (Table 2.3) was measured and averaged over 1000 iterations for requests from Table 1.1 based on sample data generated with using data loader executed with configuration:
number of records (po + po_line) : 5000
orders distribution: open (50%), pending (20%), closed (30%);
order lines distribution: physical (50%), electronic (20%), p/e_mix (30%).
Table 2.3 - Performance testing results (from JMeter)
Operation | Average, ms | Median, ms | 90% Line, ms | 95% Line, ms | 99% Line, ms | Min, ms | Max, ms | Throughput, op/sec |
Non-indexed version | ||||||||
Search : Orders (purchase-orders) | 25 | 23 | 33 | 37 | 58 | 18 | 82 | 0.37987 |
Search : Orders (orders) | 518 | 511 | 541 | 557 | 638 | 484 | 1230 | 0.37980 |
Search : Order-Lines (po-lines) | 91 | 89 | 100 | 106 | 128 | 82 | 186 | 0.37988 |
Search : Order-Lines (order-lines) | 303 | 299 | 321 | 336 | 388 | 280 | 499 | 0.37985 |
Filtering : Orders (purchase-orders) | 22 | 21 | 27 | 31 | 38 | 18 | 76 | 0.37990 |
Filtering : Orders (orders) | 387 | 383 | 408 | 416 | 477 | 360 | 683 | 0.37985 |
Filtering : Order-Lines (po-lines) | 74 | 73 | 83 | 87 | 102 | 67 | 129 | 0.37990 |
Filtering : Order-Lines (order-lines) | 204 | 201 | 220 | 230 | 255 | 187 | 328 | 0.37989 |
Sorting : Orders | 511 | 505 | 536 | 549 | 629 | 467 | 932 | 0.37985 |
Sorting : Order-Lines | 493 | 463 | 490 | 508 | 596 | 433 | 13357 | 0.37985 |
Indexed version | ||||||||
Search : Orders (purchase-orders) | 26 | 25 | 35 | 39 | 62 | 19 | 112 | 0.44100 |
Search : Orders (orders) | 315 | 308 | 330 | 367 | 437 | 291 | 667 | 0.44095 |
Search : Order-Lines (po-lines) | 76 | 73 | 85 | 93 | 114 | 69 | 237 | 0.44101 |
Search : Order-Lines (order-lines) | 289 | 283 | 305 | 334 | 401 | 268 | 545 | 0.44097 |
Filtering : Orders (purchase-orders) | 17 | 16 | 23 | 25 | 30 | 14 | 69 | 0.44104 |
Filtering : Orders (orders) | 393 | 385 | 412 | 460 | 534 | 367 | 649 | 0.44097 |
Filtering : Order-Lines (po-lines) | 47 | 45 | 55 | 59 | 70 | 42 | 106 | 0.44103 |
Filtering : Order-Lines (order-lines) | 177 | 173 | 192 | 203 | 244 | 163 | 339 | 0.44101 |
Sorting : Orders | 480 | 471 | 502 | 547 | 660 | 450 | 818 | 0.44096 |
Sorting : Order-Lines | 434 | 424 | 460 | 505 | 611 | 402 | 733 | 0.44097 |
Comparison of the non-indexed and indexed version is presented in Table 2.4.
Table 2.4 - Average time decreasing (positive values are expected - time should be decreasing), throughput increasing (positive values are expected - throughput should be increasing)
Operation | Average time decreasing, % | Throughput increasing, % |
---|---|---|
Search : Orders (purchase-orders) | -4 | 16,09 |
Search : Orders (orders) | 39 | 16,10 |
Search : Order-Lines (po-lines) | 16 | 16,09 |
Search : Order-Lines (order-lines) | 5 | 16,09 |
Filtering : Orders (purchase-orders) | 23 | 16,09 |
Filtering : Orders (orders) | -2 | 16,09 |
Filtering : Order-Lines (po-lines) | 36 | 16,09 |
Filtering : Order-Lines (order-lines) | 13 | 16,09 |
Sorting : Orders | 6 | 16,09 |
Sorting : Order-Lines | 12 | 16,09 |
2.4 Indexes: summary
The Table 2.4 demonstrates that there is a general trend of decreasing the average request time for most endpoints from Table 1.1. For some types of requests (standard order searching and orders filtering by view) the changes are slightly negative, indicating a degradation of performance after the introduction of indexes. However, the magnitude of the decrease is small enough to talk about performance degradation. In general, one can conclude that the introduction of indexes for the fields has a positive effect on the performance of the mod-orders-storage. The throughput for all request types increased by about 16% which also indicates an improvement in performance due to the introduction of indexes.
3. Performance analysis
3.1 Reference testing methodology
3.1.1 Methodology
Open system load.
Virtual users arrive in a certain amount at equal time intervals and cyclically perform the sequence of operations C-U-R-D. This load mode corresponds to an open system. Open system have no control over the number concurrent users: users keep on arriving even though applications has some trouble serving them. This load mode allows to study the dependence of the response time of the system depending on the number of clients/number of active requests in the conditions of a real service workload and peak service throughtput for which service fails.
Constant load for a fixed time / number of iterations.
Such a load scenario implies the cyclic execution of sequential operations C-U-R-D for virtual users, the number of which is constant during the test. This load scenario is used for the main purpose - measuring the throughput of services in the case of one (or several) request flows. Additionally, it can be used to check the stability of the system over time when the service is used continuously for a long time.
3.1.2 JMeter script
In order to analyze performance changes one need to have a reference script that allows to cover scenarios that are close to the actual module operating. Among these scenarios the most universal are CRUD operations and Searching/Filtering/Sorting. At the same time it is important to analyze the performance of the system for parallel requests from different clients. JMeter script has been developed for this purposes: https://github.com/folio-org/folio-perf-test/tree/mod-orders-performance-test/Folio-Test-Plans/mod-orders/performance-test.
At the moment it contains the following scenarious:
- CRUD operations (cyclic creating, updating, reading and deleting) for order and order-line;
- Searching/Filtering/Sorting for orders/order-lines by query.
In accordance with JMeter ramp-up period feature this script allows to execute scenarious cyclically with increasing the number of clients up to num_of_threads every ramp_up_period (in seconds) during test_duration (in seconds). This means for typical configuration num_of_threads=20, ramp_up_period=1200 and during test_duration=1200 that each new thread will be started in 1200/20 = 60 s and all test execution will be finished after 1200 s. It is important that test_duration >= ramp_up_period so that all threads are created during the script execution.
3.1.3 Gatling tool
3.2 Results analysis
TBD
4. Performance-related stories list
- MODORDSTOR-85Getting issue details... STATUS : PO, PO Line, Piece fields indexes introduced