Skip to end of banner
Go to start of banner

MSEARCH-320 Technical approach to browse Contributors

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Browsing by Contributors using dedicated Elasticsearch index

This approach is the same as browsing by subjects. 

ProsCons

Easy to implement by reusing existing code base

Requires additional space to store the dedicated index

Browsing by Contributors using PostgreSQL table


Database schema
create table instance_subjects
(
    subject     text not null,
    instance_id text not null,
    constraint instance_subject_pk
        primary key (subject, instance_id)
);

create index instance_subjects_subject
    on diku_mod_search.instance_subjects (lower(subject));

Insertions can be done in batch, which can be done configuring Spring Data Jpa:

Insert script
insert into instance_subjects(instance_id, subject) values (?,?) on conflict do nothing;
Required configuration
spring:
  jpa:
    properties:
      hibernate:
        order_inserts: true
        order_updates: true
        jdbc.batch_size: 500
Java Entities
@Data
@Entity
@NoArgsConstructor
@Table(name = "instance_subjects")
@AllArgsConstructor(staticName = "of")
@SQLInsert(sql = "insert into instance_subjects(instance_id, subject) values (?, ?) on conflict do nothing")
public class InstanceSubjectEntity implements Persistable<InstanceSubjectEntityId> {

  @EmbeddedId
  private InstanceSubjectEntityId id;

  @Override
  public boolean isNew() {
    return true;
  }
}

@Data
@Embeddable
@NoArgsConstructor
@AllArgsConstructor(staticName = "of")
public class InstanceSubjectEntityId implements Serializable {

  private String subject;
  private String instanceId;
}
Preceding Query
select subject, count(*)
from instance_subjects
where subject in (
  select distinct on (lower(subject)) subject
  from instance_subjects
  where lower(subject) < :anchor
  order by lower(subject) desc
  limit :limit
)
group by subject
order by lower(subject);


Succeeding Query
select subject, count(*)
from instance_subjects
where subject in (
  select distinct on (lower(subject)) subject
  from instance_subjects
  where lower(subject) >= :anchor
  order by lower(subject)
  limit :limit
)
group by subject
order by lower(subject);
ProsCons

Fast

Requires additional space to store the dedicated index (~1Gb per million resources)

Browsing by Contributors using PostgreSQL index and Elasticsearch terms aggregation

This approach can be implemented in two steps:

  1. Create new index for lowercase contributor values from instance jsons and browse by it
  2. Retrieve counts using terms aggregation per contributor entity
ProsCons

It can be slightly better than 1st option, because there is no need to store and manage dedicated index or table

Additional load to the existing data storage and mod-inventory-storage

Additional index can slow down document indexing for mod-inventory-storage

Slower than 2nd option
  • No labels