...
This approach is the same as browsing by subjectscontributors, but with a few extensions.
Pros | Cons |
---|
Easy to implement by reusing existing code baseRequires only Elasticsearch and Kafka | Requires additional space to store the dedicated index with all linked instance ids |
| Requires integration with kafka to prevent race conditions and get rid of optimistic locks on Elasticsearch documents |
| Requires additional code to manage update and delete events (each batch with that event refreshes Elasticsearch index)events |
Approach is contains a several steps to implement:
- Send contributor events from instance to the search.instance-contributor topic (message key = sha1 hash of concatenatination - tenantId + contributorTypeNameId + name)
- Implement a new Kafka Listener to listen contributor only events (Kafka will arrange contributors with the same key in the same partition, allowing event processing without optimistic locking)
- Implement a new InstanceContributor repository with following upsert request for events groups
- Implement a query which allows to find contributor exact by name and type (it can be CQL with double equal sign, but it looks like that John and John. are treated as the same value (ES standard tokenizer))
Upsert query example:
No Format |
---|
{
"script": {
"source": "def set=new LinkedHashSet(ctx._source.instances);set.addAll(params.ins);params.del.forEach(set::remove);ctx._source.instances=set",
"lang": "painless",
"params": {
"ins": [ "instanceId#1|contributorTypeId#1", "instanceId#2|contributorTypeId#1", "instanceId#3|tcontributorTypeId#1" ],
"del": [ "instanceId#4|contributorTypeId#1"]
}
},
"upsert": {
"id": "abshc",
"name": "Antoniou, Grigoris",
"contributorTypeNameId": "contriboturTypeNameId",
"instances": [ "instanceId#1|contributorTypeId#1", "instanceId#2|contributorTypeId#1", "instanceId#3|contributorTypeId#1" ]
}
} |
#2 Browsing by Contributors using PostgreSQL table
...
Code Block |
---|
language | sql |
---|
title | Database schema |
---|
|
create table instance_subjectscontributor
(
subjectcontributor text not null,
instance_id text not null,
constraint instance_subjectcontributors_pk
primary key (subjectcontributor, instance_id)
);
create index instance_subjectscontributors_subjectcontributor
on diku_mod_search.instance_subjectscontributors (lower(subjectcontributor)); |
Insertions can be done in batch, which can be done configuring Spring Data Jpa:
Code Block |
---|
language | sql |
---|
title | Insert script |
---|
|
insert into instance_subjectscontributor(instance_id, subjectcontributor) values (?,?) on conflict do nothing; |
...
Code Block |
---|
language | java |
---|
title | Java Entities |
---|
|
@Data
@Entity
@NoArgsConstructor
@Table(name = "instance_subjectscontributors")
@AllArgsConstructor(staticName = "of")
@SQLInsert(sql = "insert into instance_subjectscontributors(instance_id, subjectcontributor) values (?, ?) on conflict do nothing")
public class InstanceSubjectEntityInstanceContributorEntity implements Persistable<InstanceSubjectEntityId>Persistable<InstanceContributorEntityId> {
@EmbeddedId
private InstanceSubjectEntityIdInstanceContributorEntityId id;
@Override
public boolean isNew() {
return true;
}
}
@Data
@Embeddable
@NoArgsConstructor
@AllArgsConstructor(staticName = "of")
public class InstanceSubjectEntityIdInstancecontributorEntityId implements Serializable {
private String subjectcontributor;
private String instanceId;
} |
...
Code Block |
---|
language | sql |
---|
title | Preceding Query |
---|
|
select subjectcontributor, count(*)
from instance_subjectscontributors
where subjectcontributor in (
select distinct on (lower(subjectcontributor)) subjectcontributor
from instance_subjectscontributors
where lower(subjectcontributor) < :anchor
order by lower(subjectcontributor) desc
limit :limit
)
group by subjectcontributor
order by lower(subjectcontributor); |
Code Block |
---|
language | sql |
---|
title | Succeeding Query |
---|
|
select subjectcontributor, count(*)
from instance_subjectscontributors
where subjectcontributor in (
select distinct on (lower(subjectcontributor)) subjectcontributor
from instance_subjectscontributors
where lower(subjectcontributor) >= :anchor
order by lower(subjectcontributor)
limit :limit
)
group by subjectcontributor
order by lower(subjectcontributor); |
Pros | Cons |
---|
Fast to query (faster than other options) | Requires additional space to store the dedicated index (~1Gb per million resources) |
Easy to manage update and delete events |
|
...
Pros | Cons |
---|
Approximately, the same performance as Call-Number Browsing. | Additional value must be stored within each document - numeric value for each contributor |
No need to store dedicated Elasticsearch index, PostgreSQL table or index | There is no way to collect facets for contributor type/name |
| Filter contributors by type will be hard too |
| In case of large collisions (2000-3000 resources per contributr) - response will be slow |