Code Monkey home page Code Monkey logo

fixed-length-file-handler's Introduction

Fixed Length File Handler

Build GitHub Maven Central

Introduction

When processing data from some systems (mainly legacy ones), it's usual to have Fixed Length Files, which are files that contain lines which content is split using a specific length for each field of a record.

This kind of files are sometimes tricky to handle as many times there is a spaghetti of string manipulations and padding, and character counting and... Well, many things to take care of.

This library comes to the rescue of programmers dealing with fixed length files. It enables you to simply define how your records are structured and it will handle these records for you in a nice Kotlin DSL for further processing.

Using with Gradle

Import it into your dependencies:

dependencies {
    implementation("br.com.guiabolso:FixedLengthFileHandler:{version}")
}

Basic Usage

The basic usage assumes that you're reading a file with a single type of record.

Given a Fixed-Length File:

Definition

Field Type Initial Position Final Position Exclusive
UserName String 0 30
User Document Int 30 39
User Registry Date LocalDate 39 49

File

FirstUsername                 1234567892019-02-09
SecondAndLongerUsername       9876543212018-03-10
ThirdUsernameWithShorterDoc   0000001232017-04-11

We can parse it with the fixedLengthFileParser DSL:

data class MyUserRecord(val username: String, val userDoc: Int, val registryDate: LocalDate)

val fileInputStream: InputStream = getFileInputStream()

fixedLengthFileParser<MyUserRecord>(fileInputStream) {
    MyUserRecord(
        field(0, 30, Padding.PaddingRight(' ')),
        field(30, 39, Padding.PaddingLeft('0')),
        field(39, 49)
    )    
}

The library is prepared to handle Left Padding and Right Padding. It's also prepared to handle many of Kotlin/Java types.

Closing the file stream

Attention - You're responsible for closing the stream after processing the sequence, so be sure to close it!

Default parsing

This library is prepared to handle some of the most usual Kotlin/Java types. More types may be added if they're required. The default types are:

  • String
  • Int
  • UInt
  • Double
  • Long
  • ULong
  • Char
  • Boolean (Case insensitive)
  • LocalDate (Using default DateTimeFormatter)
  • LocalTime (Using default DateTimeFormatter)
  • LocalDateTime (Using default DateTimeFormatter)
  • BigDecimal
  • Enum types

Decimal Parsing

There might be scenarios where the default parsing of a decimal (Double / BigDecimal) isn't enough, and you need to declare a special scale, for example 4299 might represent 42.99 (scale of 2, initially undeclared in the String).

For this particular case, you can use decimalField instead of field:

decimalField(from = 0, toExclusive = 0, scale = 3, padding = NoPadding)

Custom parsing

There might be times where the default types are not enough, and you need a custom parser for a given record.

For example: You know that a specific number contains a currency, and the last two digits are used for the cents.

This library is prepared to handle cases where you need custom parsing for a String, by modifying the field invocation:

// Parsing the field 0000520099 to 5200.99 

field(15, 25, Padding.PaddingLeft('0')) { str: String -> StringBuilder(str).insert(str.length - 2, ".").toString().toBigDecimal() }

Advanced Usage

For an unknown reason, many Fixed-Length file providers use the same file for more than one record, denoting a specific bit for record identification, so there's a possibility that this happens:

1 FirstUserName       123.12
1 SecondUserName      002.50
2 123456789     2019-02-09UserDocs
2 000812347     2018-03-08AnotherUserDocs

In this cases, the software must look at the first char to determine the record type. This situation is usually what leads to a spaghetti string manipulation. We can solve it by using this library's "advanced" options:

data class FirstRecordType(username: String, userMoney: BigDecimal)
data class SecondRecordType(userCode: Int, registerDate: LocalDate, docs: String)

fixedLengthFileParser<Any>(fileInputStream) {
    withRecord({ line -> line[0] == '1' }) {
        FirstRecordType(
            field(2, 22, Padding.PaddingRight(' ')),
            field(22, 28, Padding.PaddingLeft('0'))
        )
    }
    
    withRecord( { line -> line[0] == '2' }) {
        SecondRecordType(
            field(2, 15, Padding.PaddingRight(' ')),
            field(15, 25),
            field(25, 40, Padding.PaddingRight(' '))
        )
    }
}

Features

  • The file is streamed into a sequence of values, and is never loaded in its entirety to the memory. You should expect this to have a good performance over a very big file.
  • The Kotlin DSL makes it easier to define the file parsing in a single point, and the sequence processing can be done anywhere

Changelog

Check the complete changelog here

fixed-length-file-handler's People

Contributors

arthurgomes avatar dependabot[bot] avatar leocolman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fixed-length-file-handler's Issues

Impossible to parse nullable records

The field method is constrained to Any, and not Any?. For this reason, it's impossible to have nullable fields satisfied, even when using a custom parser.

Add special parsing for decimals with specified decimal places

Some fixed length files might contain decimal values written as an Int/Long and specify how many places should be used for decimals, for example:

399 might represent 3.99 if 2 decimal places are specified.

For this, we can create overloads on the field method specifically for Double/BigDecimal.


The current workaround for this is creating a custom parser for these cases, which is already supported.

Add overload to use Sequence<String> instead of InputStream

The default usage of the API is

val fileInputStream: InputStream = getFileInputStream()

fixedLengthFileParser<MyUserRecord>(fileInputStream) {
    MyUserRecord(
        field(0, 30, Padding.PaddingRight(' ')),
        field(30, 39, Padding.PaddingLeft('0')),
        field(39, 49)
    )    
}

In this case, we'll use the fileInputStream, but won't close it. When using a method such as useLines from STDLib, one can use a sequence of lines from a file. In that case, we should allow parsing/transforming that sequence instead of the stream as well.

Add possibility to treat lines with exception

Sometimes a line parse might result in an exception, mainly if the fixed-length file is not that fixed and the documentation is not correct.

For these cases, it should be possible to process these lines for logging or retrying purposes.

Create Fixed Length File Writer

The DSL currently only supports parsing a file stream to record objects defined by the user.

This was our need when we created the library.

However, it would be nice to also support the writing of these files using the record objects as input and an output stream to be written to.

Add a way to parse from a point to the end of the line

Sometimes, for an unknown reason, FixedLengthFiles might not pad the end of a line with whitespaces.

That leads to a failure to parse a line, as the index will be out of bounds.

There should be a way to get a field that has a starting point, but goes until the end of the string. This kind of overload is already available in the substring method.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.