SPIKE - improve parsing of .csv files with identifiers


Requirements are described in the MODBULKOPS-219.

Identify a more robust way to parse the file with different separators

While bulk editing the csv file is processed by mod-data-export-worker with spring batch. Spring batch provides exception IncorrectTokenCountException if line of the file has comma as separator. The reason is usage DelimitedLineTokenizer , it tries to convert line to object using comma as object fields separator. Ways to resolve the issues with parsing csv files:

  1. Update documentation with certain rules for the file with identifier to process it correctly. So the users will be notify about expectations for the initial file with identifiers.

  2. Process initial csv file with identifiers before retrieving records by identifiers. Identify the separator for the identifiers - is it comma or new line. Create a new file with with new line as separator for identifiers if previously identifiers were spitted by comma, process it with mod-data-export-worker to get records.

Disregard white spaces and empty lines

Csv file with identifiers is processed by mod-data-exporter-worker with spring batch. Empty lines in the initial file prevents getting identifiers.

Spring batch options for reading initial file is configured in IdentifiersConfig class.

LineMapper is used to map lines read from a file to domain objects. Mod-data-export-worker for bulk editing used DefaultLineMapper implementation of LineMapper.

Custom implementation of DefaultLineMapper provide opportunity to process line while reading or skip empty lines:

class SkipEmptyLine<T> extends DefaultLineMapper<T> { @Override public T mapLine(String line, int lineNumber) throws Exception { if (StringUtils.isBlank(line)) { return null; } return super.mapLine(line, lineNumber); } }