POC/Data Lake Project | Sharon, Tod, Anne, Mark, Vince, Joanne, Doreen, Scott, Karen | - structure of POC/Data Lake Project
- current status of Tod's data loader Python script
Tod Olson has a script for creating loans on github https://github.com/todolson/folio-loan-tool.git It's not complete, but it does this: 1. authenticates - 2. gets a list of users
- 3. gets a list of items
- 4. creates a list of user/ item pairs to loan
- 5. POSTS loan request - not yet working It's on GitHub, so people can see how it works.
- Each step is simplistic. There are a number of things marked TODO, so some suggestions for further work if we can get some help fleshing it out.
- schedule another small group meeting?
- important considerations for a data lake environment
questions?
Notes: - Sharon gave a short overview of the POC/Data Lake Project.
- A small work group met to discuss the Proof of Concept (POC) for a Data Lake on 2/13/2018.
- Minutes notes and recording of the meeting are available.
- The POC/Data Lake is a 3-week project; we are in the 2nd week with a deadline for completion – March 2nd.
- The goal is to design a Data Lake environment. The group clarified what kind of data will go into the Data Lake and what type of report should be built. The report should include information from all three areas of FOLIO (patron, circulation and inventory).
- Tod Olsen will write a Python script to load the data.
- The working group decided to use an open source tool, BIRT, as the reporting tool for this project. Chris Creswell will write the BIRT report.
Reporting will be done in 2 steps: - EBSCO will give Chris a data extract from the Data Lake
- Chris will try to connect BIRT to the Data Lake
Tod Olson shared his notes about the Python script. The scope of the script is to create loans automatically. The idea is to: - pull users from user storage
- pull items from item storage
- make random loans
Tod will connect with Matt Reno to work on this. Someone from Texas A&M has some Python skills and could take a look at the script. - Sharon will help setting up some meetings to help Tod with the Python script:
- One meeting between Tod and the person from Texas A&M
- One meeting between Tod and Matt Reno
- One meeting between Matt Reno & Chris Creswell
Update on this project at the next meeting.
|
Current Reporting Tools | All | Review of Current Reporting Tools used by Reporting SIG participants Most of the Reporting SIG participants are using SQL in relational database system to generate reports - Duke University uses a combination of SQL + PERL + IBM Cognos + some ExLibris canned reports. Data source: ALEPH
- University of Alabama: same
- University of Chicago: Access + Excel. The assessment librarian uses Tableau
Sharon: We don’t have experience with these open source tools. Do we need to do an analysis of the current reporting tools? We should create a short list of tools that will work well in this environment. - Is there someone from FOLIO/EBSCO/Index Data that can help us come up with the short list of tools?
- We will probably be tasked to provide training on these tools. This will be part of our roles.
Mark: This request should go to the Product Council – they might be able to assign resources to help with the tool analysis request. - Our goal is to recommend either commercial reporting tools or open source reporting tools. This would likely be a decision made by each institution participating in FOLIO.
- We need to determine what skills are needed to use the open source reporting tools.
- We need to have a list of tools that we are recommending.
Questions to consider: - How does BIRT work?
- Should we test each tool with the current set-up?
- Do we need a list of current issues (top 3 or top 5 issues) known for each reporting tool? I.e.: One known problem with Cognos is that data goes through 2 transformations; for invoicing packages it’s difficult to get item data.
- Is it worth it for us spending time with coming up with a list of issues since we are going to a new environment?
Some issues are documented on the Master Spreadsheet. For example, on the Metadata management tab it is discussed the data integrity and consistency checking (a category of reports) – we need to be able to do these kind of reports in FOLIO. We’ll use the Master Spreadsheet as source for analysis.
|
Resource/Format Working Group | Sharon | update from 2/13/18 meeting - This group is working on an inventory set-up (how the data will be structured and how the data will look)
- Sharon is attending as a place holder, but we need someone from Reporting SIG to attend their meetings
- We’ll try to build a report using the structure that they are using
- We need to look through those data elements
More info at a next meeting. |