Browsing by Contributors using dedicated Elasticsearch index
This approach is the same as browsing by subjects.
Pros | Cons |
---|---|
Easy to implement by reusing existing code base | Requires additional space to store the dedicated index |
Browsing by Contributors using PostgreSQL table
Database schema
create table instance_subjects ( subject text not null, instance_id text not null, constraint instance_subject_pk primary key (subject, instance_id) ); create index instance_subjects_subject on diku_mod_search.instance_subjects (lower(subject));
Insertions can be done in batch, which can be done configuring Spring Data Jpa:
Insert script
insert into instance_subjects(instance_id, subject) values (?,?) on conflict do nothing;
Required configuration
spring: jpa: properties: hibernate: order_inserts: true order_updates: true jdbc.batch_size: 500
Java Entities
@Data @Entity @NoArgsConstructor @Table(name = "instance_subjects") @AllArgsConstructor(staticName = "of") @SQLInsert(sql = "insert into instance_subjects(instance_id, subject) values (?, ?) on conflict do nothing") public class InstanceSubjectEntity implements Persistable<InstanceSubjectEntityId> { @EmbeddedId private InstanceSubjectEntityId id; @Override public boolean isNew() { return true; } } @Data @Embeddable @NoArgsConstructor @AllArgsConstructor(staticName = "of") public class InstanceSubjectEntityId implements Serializable { private String subject; private String instanceId; }
Preceding Query
select subject, count(*) from instance_subjects where subject in ( select distinct on (lower(subject)) subject from instance_subjects where lower(subject) < :anchor order by lower(subject) desc limit :limit ) group by subject order by lower(subject);
Succeeding Query
select subject, count(*) from instance_subjects where subject in ( select distinct on (lower(subject)) subject from instance_subjects where lower(subject) >= :anchor order by lower(subject) limit :limit ) group by subject order by lower(subject);
Pros | Cons |
---|---|
Fast | Requires additional space to store the dedicated index (~1Gb per million resources) |
Browsing by Contributors using PostgreSQL index and Elasticsearch terms aggregation
This approach can be implemented in two steps:
- Create new index for lowercase contributor values from instance jsons and browse by it
- Retrieve counts using terms aggregation per contributor entity
Pros | Cons |
---|---|
It can be slightly better than 1st option, because there is no need to store and manage dedicated index or table | Additional load to the existing data storage and mod-inventory-storage |
Additional index can slow down document indexing for mod-inventory-storage | |
Slower than 2nd option |