...
Time | Item | Who | Notes |
---|---|---|---|
5 | Welcome | Ingolf | Ingolf will be the note taker today. Introductions: new member Craig Boman. |
30 | List of Integrations | Which issues need development and decision by the PC ? Discuss List of Integrations . These kind of issues are not yet represented in the Backlog. The Backlog so far focuses on UI application features. The List will be presented to the PC (Ingolf, Chris; today). Texas has mostly custom-developed integrations. | |
15 | Early results on load timings / bulk user load | Tod | Early result on load timings for the user module and their implications for other form of bulk loading.
Tod reports that UChicago (John) has done some testing in a cloud environment. Bulk user load took 2h 28m for 90,000 users. For a one-time load this is so-so, but extrapolated to bib loading (millions of records) this will be too slow. The bulk import calls the user model one at a time for each user. This could easily be optimized. The bulk user loader has not been brought back to the User Management SIG for user acceptance (Chris M. to follow up with Katalin). Deleting users was not discussed in the scope of the bulk loader, but it is important for testing. Spint Review reports bulk load of 2.6 mio bib records in 1h 15m for a raw load into a blank database. But what about merges and overloads ? There should be a review at Culto, IndexData and the other developer groups to optimize database commits. Bottlenecks and performance issues must be eliminated. A best design practice for similarity of endpoints' look and feel is desired. Our guess is that that is being developed quite independently by different teams. We need an emphasize on this being consistent. This SIG sould keep an eye on that. The methods and the way of calling them should be similar. One might call it API consistency. (acceptance criteria) What is acceptable for patron loading ? 10 minutes - yes. 2 hours - maybe. 3 days - no. It is hard to put on hard limits here - it depends on a lot of variables, the external conditions and the institutional environment. Each institution has its own acceptance criteria. Objection on use of UUIDs as primary identifiers: Identifiers should be the same in the legacy system and in the new system. Creating UUIDs as new identifiers will create complexity in the migration. One probably will have to take care about migration time then. One has to take special care about data integrity. Apart from that, large loads will be a little slower as if one uses integer counters. UUIDs are very bad as primary keys from the performance standpoint. Also you will probably have to insert the row first and then get it back in order to get the UUID. You probably can't build the UUID first. This is really cumbersome for migration; we want to do the translation of IDs before we do the bulk load. |
10 | Conceptual Architectural Diagram | Wayne | |
20 | Data migration | Ingolf / group | Data migration is a big load of work to the SIG.
It seems to me (Ingolf) that just one hour per week is not enough to discuss this (data migration) and work on this further. Can we choose one of the following solutions:
|
5 | Next Meeting | Ingolf | topics for next meeting |
Action items
- List of Integrations. Chris still has a list from the OLE migration. It would be useful to give some examples. Texas (Steve) will contribute and adapt as is appropriate. Tod and Chris M. schedule a time to bash through the
- Chris M. will reach out to Katalin and Cate (the PO for User Management) and will talk about our issues on bulk user loading, which are important for migration and testing.
- Holly or Sharon (B. ? W-Y?) could present the API consistency issue (best practices for endpoints' look and feel) to the developers at the Madrid conference (Jan 22 - 24). Tod is going to write notes on that for reference. Tod will also chat with Wayne. Who will pass the notes to whom ?