Reindex of contributors fails with maximum 2712 'index row size exceeded' error

Description

I recently reindexed our full inventory data set at Chicago, and found that the reindex failed on the instance_contributors_idx with this error: "index row size 3256 exceeds maximum 2712 for index". Turns out that we have around 6000 instances with contributors whose aggregate length exceeds 2712 characters. As a work around, I removed the index and completed the reindex with no other problems. And the application, including searches, seemed to work just fine without it, presumably because each of these jsonb collections also have gin and full text indexes defined for them as well. It seems that trying to create a btree index on contributors, converting the whole array of contributors to text and using the text as the index key, is too brittle to accommodate the wide range of contributors that research libraries typically have. The same may be true for btree indexes on other types of jsonb collection, as well. Also, I don't actually see any need for this type of index.

CSP Request Details

None

CSP Rejection Details

None

Potential Workaround

None

Attachments

6

Checklist

hide

TestRail: Results

Activity

Show:

Cate Boerema January 6, 2020 at 4:32 PM

Charlotte Whitt January 6, 2020 at 11:44 AM
Edited

Manual test in FOLIO for load of large data set (https://folio-snapshot-load.aws.indexdata.com/), using Chrome

I loaded the attached file: problem_bibs.mrc file in to the FOLIO environment (https://folio-snapshot-load.aws.indexdata.com/) using Data Import - all looks good, and I could load all 20 problematic records, and got no error messages

Here test on search on the first of the listed problematic titles: Rekishi kyōiku shakaika kyōiku nenpō

Search on contributor:

Search on title:

I'll close the ticket.

Julian Ladisch November 12, 2019 at 10:14 PM

I also support dropping the current b-tree index for contributors. It is broken and rarely needed.
There is a single use case for it: Sorting by contributors when the search is very unspecific. Inventory allows to search the result set by contributors. The contributors b-tree index is used if all records shown on the front-end are within the first 10000 of all records (matching and non-matching records) the library has sorted by contributors.
The contributors b-tree index sorts the JSON and this means that it first sorts by contributorNameTypeId and then by name. This is broken because you cannot see from the result set view why the result set has that contributor column with A-Z several times (for each contributor name type A-Z).

Ann-Marie Breaux November 8, 2019 at 8:03 PM

Hi Absolutely - that will be fantastic!!

Dale Arntson November 8, 2019 at 1:23 AM

It is hard to identify particular records from a reindex failure, but I am pretty sure there are some examples i the file that I have just attached.

Done

Details

Assignee

Reporter

Priority

Story Points

Sprint

Development Team

Core: Platform

Fix versions

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs

Created October 22, 2019 at 8:55 PM
Updated March 9, 2020 at 11:26 AM
Resolved January 6, 2020 at 11:48 AM
TestRail: Cases
TestRail: Runs