<!-- 
RSS generated by JIRA (1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d) at Fri Feb 09 00:39:07 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary add field=key&field=summary to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>FOLIO Jira</title>
    <link>https://folio-org.atlassian.net</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>1001.0.0-SNAPSHOT</version>
        <build-number>100246</build-number>
        <build-date>07-02-2024</build-date>
    </build-info>

<item>
            <title>[UXPROD-4337] Reliably process large files in DI by automatically splitting and processing source files</title>
                <link>https://folio-org.atlassian.net/browse/UXPROD-4337</link>
                <project id="10000" key="UXPROD">UX Product</project>
                    <description>&lt;p&gt;&lt;b&gt;Problem:&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Loading large files* into the system isn&#8217;t reliable. The current process is inconvenient: it&#8217;s labor-intensive, time-consuming, and needs to be done off-hours to avoid negatively impacting the system. Data import needs to reliably support loading and successful processing of large files,&#160;&lt;/p&gt;

&lt;p&gt;*Large file = a file of any reasonable size, typically with 100,000+ records.&lt;/p&gt;

&lt;p&gt;Background:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Record loads of a small amount (less than 1,000) may time out or take a long time to complete&lt;/li&gt;
	&lt;li&gt;FOLIO users must plan to work off hours to complete import jobs, otherwise it can bring other systems down&lt;/li&gt;
	&lt;li&gt;Libraries are unable to complete large cataloging projects&lt;/li&gt;
	&lt;li&gt;Libraries are scared to even try the larger loads because they will have to deal with the &quot;mess&quot;&lt;/li&gt;
	&lt;li&gt;Smaller jobs (jobs with a handful of records) get delayed by larger jobs, often by hours.&#160;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;In the current system, to successfully load a large file, I must:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;manually break up the original file into several smaller files (~1,000-5,000 records per file)&lt;/li&gt;
	&lt;li&gt;wait until the end of the day to kick off imports for each of the smaller files (otherwise these imports can negatively impact the system)&lt;/li&gt;
	&lt;li&gt;The total elapsed time can be days to weeks to load all records in a large file.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Out of scope&lt;/b&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Existing data import flaws will not be addressed&lt;/li&gt;
	&lt;li&gt;Major changes to the existing Data Import workflow&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Use case(s)&lt;/b&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Reliably complete large file imports with the following actions:
	&lt;ul&gt;
		&lt;li&gt;create&lt;/li&gt;
		&lt;li&gt;modify&lt;/li&gt;
		&lt;li&gt;update&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;Reliably complete large file imports with the following record types:
	&lt;ul&gt;
		&lt;li&gt;MARC Bibliographic records&lt;/li&gt;
		&lt;li&gt;MARC Authority records&lt;/li&gt;
		&lt;li&gt;MARC Holdings records&lt;/li&gt;
		&lt;li&gt;&lt;del&gt;EDIFACT invoice records&lt;/del&gt;&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;When importing a single large file via Data Import::
	&lt;ul&gt;
		&lt;li&gt;I can view the status of the file being uploaded/split&lt;/li&gt;
		&lt;li&gt;I can view the status of a large job that is running&lt;/li&gt;
		&lt;li&gt;I can view the status of large job that has completed in logs&lt;/li&gt;
		&lt;li&gt;In the event errors occur while processing my large file, I can see information about the error(s)&lt;/li&gt;
		&lt;li&gt;While the large file is being processed, I can cancel the import (cancelling doesn&#8217;t undo whatever was done prior to the point of being stopped)&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;The following can be configured:
	&lt;ul&gt;
		&lt;li&gt;Feature flag on/off (at the cluster level)&lt;/li&gt;
		&lt;li&gt;Max file size&lt;/li&gt;
		&lt;li&gt;Number of active data import jobs&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;TODO: Identify critical profiles / files for us to test&lt;/li&gt;
	&lt;li&gt;TODO: UI mockups&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Proposed solutions/stories:&lt;/b&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Direct uploading of large files: AWS S3 to upload source files with MARC records&lt;/li&gt;
	&lt;li&gt;File slicing logic: splitting/chunking of large files automatically&lt;/li&gt;
	&lt;li&gt;Process each chunk file independently&lt;/li&gt;
	&lt;li&gt;Implementation of Data Import queue management to allow processing small jobs in between larger chunks&#160;&lt;/li&gt;
	&lt;li&gt;Result aggregation: aggregate rules of Data Import Jobs that process chunk files&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;NFRs:&lt;/b&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Establish performance baselines: the response times or throughput of all tested endpoint methods, pages or operations do not degrade in performance by comparing to an agreed upon metric or baseline&lt;/li&gt;
	&lt;li&gt;Processing of large jobs is performant during the day&lt;/li&gt;
	&lt;li&gt;Other systems downstream are not negatively affected by large data import jobs&lt;/li&gt;
	&lt;li&gt;Additional requirements (&lt;a href=&quot;https://folio-org.atlassian.net/wiki/display/FOLIJET/Requirements+questions+and+assumptions&quot; class=&quot;external-link&quot; rel=&quot;nofollow noreferrer&quot;&gt;questions and assumptions&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Links to additional info&lt;/b&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;&lt;a href=&quot;https://folio-org.atlassian.net/wiki/x/lWEV&quot; class=&quot;external-link&quot; rel=&quot;nofollow noreferrer&quot;&gt;Solution to slide large data import files into chunks&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Questions&lt;/b&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;What is the maximum load?&lt;/li&gt;
	&lt;li&gt;What need to be logged?&lt;/li&gt;
	&lt;li&gt;Specific requirements about errors?&lt;/li&gt;
&lt;/ul&gt;
</description>
                <environment></environment>
        <key id="12720">UXPROD-4337</key>
            <summary>Reliably process large files in DI by automatically splitting and processing source files</summary>
                <type id="10002" iconUrl="https://folio-org.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10322?size=medium">New Feature</type>
                            <parent id="13571">UXPROD-47</parent>
                                    <priority id="10000" iconUrl="https://dev.folio.org/assets/jira-priority/jira-p1.svg">P1</priority>
                        <status id="6" iconUrl="https://folio-org.atlassian.net/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10003">Done</resolution>
                                                        <assignee accountid="5fd3d8ac692b7901101561d2">Kathleen Moore</assignee>
                                                                <reporter accountid="5fd3d8ac692b7901101561d2">Kathleen Moore</reporter>
                                    <labels>
                    </labels>
                <created>Tue, 6 Jun 2023 01:52:40 +0000</created>
                <updated>Thu, 8 Feb 2024 15:35:57 +0000</updated>
                            <resolved>Mon, 4 Dec 2023 18:25:58 +0000</resolved>
                                                                    <component>Batch Importer</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                    <issuelinks>
                            <issuelinktype id="10008">
                    <name>Defines</name>
                                            <outwardlinks description="defines">
                                        <issuelink>
            <issuekey id="50894">UIDATIMP-1563</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is defined by ">
                                        <issuelink>
            <issuekey id="27033">PERF-565</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10003">
                    <name>Relates</name>
                                            <outwardlinks description="relates to">
                                        <issuelink>
            <issuekey id="62339">MODDATAIMP-843</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="49688">UIDATIMP-1489</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62756">MODDATAIMP-842</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62760">MODDATAIMP-852</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50841">UIDATIMP-1464</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50843">UIDATIMP-1466</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50846">UIDATIMP-1469</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50836">UIDATIMP-1510</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="16683">FAT-7307</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="39903">FOLS3CL-11</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62757">MODDATAIMP-846</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62758">MODDATAIMP-849</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62759">MODDATAIMP-850</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62322">MODDATAIMP-853</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62762">MODDATAIMP-857</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62764">MODDATAIMP-860</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62923">MODDATAIMP-863</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62934">MODDATAIMP-893</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="58386">MODSOURMAN-1062</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50840">UIDATIMP-1463</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50844">UIDATIMP-1467</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50845">UIDATIMP-1468</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50848">UIDATIMP-1472</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50763">UIDATIMP-1487</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50765">UIDATIMP-1488</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="relates to">
                                        <issuelink>
            <issuekey id="62324">MODDATAIMP-832</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62753">MODDATAIMP-820</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62833">MODDATAIMP-829</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62752">MODDATAIMP-830</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62323">MODDATAIMP-831</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62835">MODDATAIMP-833</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62751">MODDATAIMP-834</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62839">MODDATAIMP-835</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62750">MODDATAIMP-836</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62838">MODDATAIMP-837</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62837">MODDATAIMP-838</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62836">MODDATAIMP-839</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62922">MODDATAIMP-861</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62924">MODDATAIMP-864</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50837">UIDATIMP-1460</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50839">UIDATIMP-1462</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50803">UIDATIMP-1475</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50805">UIDATIMP-1476</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50807">UIDATIMP-1477</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50809">UIDATIMP-1478</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50813">UIDATIMP-1479</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50811">UIDATIMP-1480</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50814">UIDATIMP-1481</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50816">UIDATIMP-1482</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10007">
                    <name>Requires</name>
                                                                <inwardlinks description="is required by">
                                        <issuelink>
            <issuekey id="63006">MODSOURCE-627</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="11129" name="DI - Large jobs - Current large job DI user flow in FOLIO.jpg" size="243693" author="5fd3d8ac692b7901101561d2" created="Tue, 6 Jun 2023 02:04:06 +0000"/>
                            <attachment id="11130" name="DI - Large jobs - Detail_ Existing flow for initiating an import from DI app.jpg" size="322545" author="5fd3d8ac692b7901101561d2" created="Tue, 6 Jun 2023 02:04:06 +0000"/>
                            <attachment id="11131" name="DI - Large jobs - Detail_ Existing flow showing progress of import from DI app.jpg" size="361753" author="5fd3d8ac692b7901101561d2" created="Tue, 6 Jun 2023 02:04:06 +0000"/>
                            <attachment id="11132" name="DI - Large jobs - Large job DI user flow in other ILS.jpg" size="204276" author="5fd3d8ac692b7901101561d2" created="Tue, 6 Jun 2023 02:04:06 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10000" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummarycf">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10057" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Development Team</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10146"><![CDATA[Data Import Task Force]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10014" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue key="$xmlutils.escape($text)">Batch Importer (Bib/Acq)</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10063" key="com.atlassian.jira.plugin.system.customfieldtypes:float">
                        <customfieldname>PO Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10019" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|hzx1ev:0430i0000o</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10046" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Release</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10076"><![CDATA[Poppy (R2 2023)]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10061" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Report Functional Area(s)</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10192"><![CDATA[Import and Export]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10020" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10025" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>[CHART] Time in Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>