#1 Browsing by Contributors using dedicated Elasticsearch index
This approach is the same as browsing by subjects.
Pros | Cons |
---|---|
Easy to implement by reusing existing code base | Requires additional space to store the dedicated index |
Requires additional code to manage update and delete events (each batch with that event refreshes Elasticsearch index) |
#2 Browsing by Contributors using PostgreSQL table
This options provides ability to browse by using PostgreSQL table with index on the contributors field.
Code Block | ||||
---|---|---|---|---|
| ||||
create table instance_subjects ( subject text not null, instance_id text not null, constraint instance_subject_pk primary key (subject, instance_id) ); create index instance_subjects_subject on diku_mod_search.instance_subjects (lower(subject)); |
...
Code Block | ||||
---|---|---|---|---|
| ||||
select subject, count(*) from instance_subjects where subject in ( select distinct on (lower(subject)) subject from instance_subjects where lower(subject) >= :anchor order by lower(subject) limit :limit ) group by subject order by lower(subject); |
Pros | Cons |
---|---|
Fast to query (faster than other options) | Requires additional space to store the dedicated index (~1Gb per million resources) |
Easy to manage update and delete events |
#3 Browsing by Contributors using PostgreSQL index and Elasticsearch terms aggregation
This approach can be implemented in two steps:
- Create new index for lowercase contributor values from instance jsons and browse by it
- Retrieve counts using terms aggregation per contributor entity
Pros | Cons |
---|---|
It can be slightly better than |
option #1, because there is no need to store and manage dedicated index or table | Additional load to the existing data storage and mod-inventory-storage |
No need to manage update and delete events | Additional index can slow down document indexing for mod-inventory-storage |
Slower than |
option #2 |
#4 Browse by range query and numeric representation of incoming value
This approach can be based on the Call-Number browsing. The main idea is to create a long value for the string and use it for the range query to limit the number of documents to aggregate.
Items to check/investigate:
- Retrieve number of instances per each contributor
- Manage how to deal with redundant contributors in the result of terms aggregation (script query?)
Pros | Cons |
---|---|
Approximately, the same performance as Call-Number Browsing. | Additional value must be stored within each document - numeric value for each contributor |
No need to store dedicated Elasticsearch index, PostgreSQL table or index |