Page Comparison

#1 Browsing by Contributors using dedicated Elasticsearch index

This approach is the same as browsing by subjects.

Pros	Cons
Easy to implement by reusing existing code base	Requires additional space to store the dedicated index
	Requires additional code to manage update and delete events (each batch with that event refreshes Elasticsearch index)

#2 Browsing by Contributors using PostgreSQL table

This options provides ability to browse by using PostgreSQL table with index on the contributors field.

Code Block

language	sql
title	Database schema

create table instance_subjects
(
    subject     text not null,
    instance_id text not null,
    constraint instance_subject_pk
        primary key (subject, instance_id)
);

create index instance_subjects_subject
    on diku_mod_search.instance_subjects (lower(subject));

...

Code Block

language	sql
title	Succeeding Query

select subject, count(*)
from instance_subjects
where subject in (
  select distinct on (lower(subject)) subject
  from instance_subjects
  where lower(subject) >= :anchor
  order by lower(subject)
  limit :limit
)
group by subject
order by lower(subject);

Pros	Cons
Fast to query (faster than other options)	Requires additional space to store the dedicated index (~1Gb per million resources)
Easy to manage update and delete events

#3 Browsing by Contributors using PostgreSQL index and Elasticsearch terms aggregation

This approach can be implemented in two steps:

Create new index for lowercase contributor values from instance jsons and browse by it
Retrieve counts using terms aggregation per contributor entity

Pros	Cons
It can be slightly better than

1st

option #1, because there is no need to store and manage dedicated index or table	Additional load to the existing data storage and mod-inventory-storage
No need to manage update and delete events	Additional index can slow down document indexing for mod-inventory-storage
	Slower than

2nd option

option #2

#4 Browse by range query and numeric representation of incoming value

This approach can be based on the Call-Number browsing. The main idea is to create a long value for the string and use it for the range query to limit the number of documents to aggregate.

Items to check/investigate:

Retrieve number of instances per each contributor
Manage how to deal with redundant contributors in the result of terms aggregation (script query?)

Pros	Cons
Approximately, the same performance as Call-Number Browsing.	Additional value must be stored within each document - numeric value for each contributor
No need to store dedicated Elasticsearch index, PostgreSQL table or index

Versions Compared

Old Version 1

New Version 2

Key

#1 Browsing by Contributors using dedicated Elasticsearch index

#2 Browsing by Contributors using PostgreSQL table

#3 Browsing by Contributors using PostgreSQL index and Elasticsearch terms aggregation

#4 Browse by range query and numeric representation of incoming value