Jackson dataformat: Spreadsheet

Let’s get rid of the Spread Sh!t.

This Jackson extension library is a component that memory-efficiently reads and writes the SpreadsheetML format^[1] using Apache POI and StAX via Jackson abstraction. — Implemented as full streaming implementation, which allows full access (streaming, data-binding, and tree-model).

💬	The same abstraction is provided for the Horrible Microsoft Excel format^[2]. However, not based on Event API, so may not be suitable for large .xls files.

Table of Contents

Installation
- Maven
- Gradle
Busy Developers' Guide to Features
And more.
- Supported Types
- References
License

Installation

📙	This project is currently under development. If you are interested, you can always use the latest snapshot artifacts from OSS Sonatype.

Maven

<repository>
    <id>sonatype-snapshots</id>
    <url>https://s01.oss.sonatype.org/content/repositories/snapshots/</url>
</repository>

<dependency>
    <groupId>io.github.scndry</groupId>
    <artifactId>jackson-dataformat-spreadsheet</artifactId>
    <version>0.0.1-SNAPSHOT</version>
</dependency>

Gradle

repositories {
    maven { url "https://s01.oss.sonatype.org/content/repositories/snapshots/" }
}

dependencies {
    implementation "io.github.scndry:jackson-dataformat-spreadsheet:0.0.1-SNAPSHOT"
}

Busy Developers' Guide to Features

Want to use HSSF and XSSF read and write spreadsheets in a hurry? This guide is for you. — Apache POI’s Quick Guide.

Reading Spreadsheet

First, let’s look at input-entries.xlsx file to read.

Table 1. input-entries.xlsx

Column A	Column B
1	2
3	4

Then let’s define the Entry class annotated @DataGrid.

@DataGrid
public class Entry {
    private int a;
    private int b;
    // Constructors, Getters, Setters and toString
}

We use SpreadsheetMapper to read the file into an Entry object, so let’s set it up now.

SpreadsheetMapper mapper = new SpreadsheetMapper();
File file = new File("input-entries.xlsx");

Once we have SpreadsheetMapper configured, simply use readValue.

Entry value = mapper.readValue(file, Entry.class);
// Output: Entry(a=1, b=2)

You can use readValues to read the entire list.

List<Entry> values = mapper.readValues(file, Entry.class);
// Output: [Entry(a=1, b=2), Entry(a=3, b=4)]

By default, it reads the first sheet. If you need to find and read a specific sheet by name or index, you can use SheetInput.

String sheetName = "Entries";
SheetInput<File> input = SheetInput.source(file, sheetName);
List<Entry> values = mapper.readValues(input, Entry.class);
// Output: [Entry(a=1, b=2), Entry(a=3, b=4)]

Writing Spreadsheet

We also use SpreadsheetMapper to write an Entry object to the file. Just call writeValue.

SpreadsheetMapper mapper = new SpreadsheetMapper();
File file = new File("output-entries.xlsx");
Entry value = new Entry(1, 2);
mapper.writeValue(file, value);

The output of the above in the file will be:

Table 2. output-entries.xlsx

a	b
1	2

This time, let’s write a list.

List<Entry> values = Arrays.asList(new Entry(1, 2), new Entry(3, 4));
mapper.writeValue(file, values);

java.lang.IllegalArgumentException: `valueType` MUST be specified to write a value of a Collection or array type

Oh, Sheet!

Okay. Let’s add a valueType to the 3rd parameter.

Class<Entry> valueType = Entry.class;
mapper.writeValue(file, values, valueType);

Table 3. output-entries.xlsx

a	b
1	2
3	4

Now let’s annotate on properties with @DataColumn to write each column’s name.

@DataGrid
public class Entry {
    @DataColumn("Column A")
    private int a;
    @DataColumn("Column B")
    private int b;
    // Constructors, Getters, Setters and toString
}

Table 4. output-entries.xlsx

Column A	Column B
1	2
3	4

And also you can use SheetOutput to write the sheet name. By default, the created sheet is named "Sheet" + indexOfSheet.

String sheetName = "Entries";
SheetOutput<File> output = SheetOutput.target(file, sheetName);
mapper.writeValue(output, values, Entry.class);

Nested Object (de)Structuring

Spreadsheets are a flat data representation of a 2-dimensional structure, but most of the POJOs we have take complex nested structures. Of course, it provides (de)structuring these complex models, as you can see below:

ID	NAME	ZIPCODE	ADDRESS LINE 1	ADDRESS LINE 2	DESIGNATION	SALARY
1	John Doe	12345	123 Main St.	Anytown, USA	CEO	$300,000.00

As we are Object-Oriented Developers, you probably don’t want to manage a lot of column data in a flat model. Let’s first define the well-structured Employee class.

@DataGrid
class Employee {
    int id;
    String name;
    Address address;
    Employment employment;
    // ...
}

class Address {
    String zipcode;
    AddressLines addressLines;
    // ...
}

class AddressLines {
    String line1;
    String line2;
    // ...
}

class Employment {
    Designation designation;
    long salary;
    // ...
}

enum Designation {
    CEO, CTO, SM, ARCH, INT
}

No additional code is required when deserializing data into nested structures, you can deserialize the same way.

Employee value = mapper.readValue(input, Employee.class);

Here’s the Employee object that output will be:

Employee(
    id=1
    name=John doe
    address=Address(
        zipcode=12345
        addressLines=AddressLines(
            line1=123 Main St.
            line2=Anytown, USA
        )
    )
    employment=Emeployment(
        designation=CEO
        salary=300000
    )
)

Just as can be deserialized without any additional code, you can serialize these nested objects.

mapper.writeValue(output, values, Employee.class);

Writing the Nested List

Although limited, it supports serializing nested list structures. However, deserializing is not supported due to implementation complexity.

@DataGrid
class Outer {
    @DataColumn("A") int a;
    List<Inner> inners;
    @DataColumn("E") int e;
    // ...
}

class Inner {
    @DataColumn("B") int b;
    @DataColumn("C") int[] c;
    @DataColumn("D") int d;
    // ...
}

Outer value = new Outer(
        1, Arrays.asList(
            new Inner(2, new int[]{3, 4}, 5),
            new Inner(6, new int[]{7, 8}, 9)),
        10);
mapper.writeValue(output, value);

This will lead to the desired result like this:

A	B	C	D	E
1	2	3	5	10
`BLANK`	`BLANK`	4	`BLANK`	`BLANK`
`BLANK`	6	7	9	`BLANK`
`BLANK`	`BLANK`	8	`BLANK`	`BLANK`

Annotations and Schema

For most column schema generation requirements, @DataGrid and @DataColumn annotations will provide all you need. The two have almost similar attributes, so only listing attributes of @DataColumn.

Attributes in @DataColumn

String value(): The column’s name to write in header
String style(): The name of the cellStyle to apply to the column, it should be registered with StylesBuilder.
String headerStyle(): Same as style(), but only applied to the column’s header.
int width(): Fixed width for the column.
OptBoolean autoSize(): Adjusts the column width to fit the contents. Not accurate for fullwidth forms like the CJK characters.
int minWidth(): Minimum width of the auto-sized column.
int maxWidth(): Maximum width of the auto-sized column.

Attribute Resolution

Column schema uses particular order designed to allow for value overrides. Attributes are considered in the following order:

@DataColumn's attribute on property.
@DataGrid's attribute on the class in which the property is declared.
@DataGrid's attribute of the class of which the class in which the property is declared is a member.

For a better understanding, let’s look at the code below:

@DataGrid(autoSizeColumn = OptBoolean.FALSE)
class Foo {
    @DataColumn(style = "Foo_a_style")
    int a;
    @DataColumn(headerStyle = "Foo_b_headerStyle")
    int b;
    // ...
}

@DataGrid(columnHeaderStyle = "Bar_columnHeaderStyle")
class Bar {
    Foo foo;
    // ...
}

@DataGrid(columnStyle = "Baz_columnStyle", autoSizeColumn = OptBoolean.TRUE)
class Baz {
    Foo foo;
    // ...
}

Schema is generated and used when (de)serializing within SpreadsheetMapper, but you can generate using sheetSchemaFor.

SpreadsheetSchema foo = mapper.sheetSchemaFor(Foo.class);
SpreadsheetSchema bar = mapper.sheetSchemaFor(Bar.class);
SpreadsheetSchema baz = mapper.sheetSchemaFor(Baz.class);

Column attributes of schemas generated for each of the above types are as follows:

Column	Style	Header Style	Auto Size
foo.a	`"Foo_a_style"`	`""`	`FALSE`
foo.b	`""`	`"Foo_b_headerStyle"`	`FALSE`
bar.foo.a	`"Foo_a_style"`	`"Bar_columnHeaderStyle"`	`FALSE`
bar.foo.b	`""`	`"Foo_b_headerStyle"`	`FALSE`
baz.foo.a	`"Foo_a_style"`	`""`	`FALSE`
baz.foo.b	`"Baz_columnStyle"`	`"Foo_b_headerStyle"`	`FALSE`

Build Your Own Styles

@DataGrid(columnHeaderStyle = "baseHeader")
class Employee {
    // ...
    Address address;
    Employment employment;
    // ...
}

class Address {
    @DataColumn(value = "ZIPCODE", style = "zipcode")
    String zipcode;
    // ...
}

@DataGrid(columnHeaderStyle = "employmentHeader")
class Employment {
    // ...
    @DataColumn(value = "SALARY", style = "salary", headerStyle = "salaryHeader")
    long salary;
    // ...
}

StylesBuilder builder = new StylesBuilder()
        .cellStyle(/*name*/ "baseHeader")
            .border().thin()
            .fillForegroundColor(IndexedColors.GREY_25_PERCENT)
            .fillPattern().solidForeground()
            .font().bold().end()
            .end()
        .cellStyle(/*name*/ "employmentHeader", /*cloneStyleFrom*/ "baseHeader")
            .fillForegroundColor(198, 239, 206)
            .end()
        .cellStyle("salaryHeader", "baseHeader")
            .fillForegroundColor(0xFFC7CE)
            .font().bold().color(0x9C0006).end()
            .end()
        .cellStyle("salary")
            .dataFormat("[Red][>=100000]$#,##0.00;$#,##0.00")
            .end()
        .cellStyle("zipcode")
            .dataFormat().text()
            .end();
SpreadsheetMapper mapper = SpreadsheetMapper.builder()
        .stylesBuilder(builder)
        .build();
mapper.writeValue(output, values, type);

Built-in data formats for convenience

general(): "General"
text(): "@"
numberInt(): "0"
numberFloat(): "0.00"
numberIntWithComma(): "#,##0"
numberFloatWithComma(): "#,##0.00"
date(): "yyyy-mm-dd"
dateTime(): "yyyy-mm-dd hh:mm:ss"

SpreadsheetMapper mapper = SpreadsheetMapper.builder()
        .addModule(new ExcelDateModule())
        .build();

Change Origin Address

CellAddress address = ...;
SpreadsheetMapper mapper = SpreadsheetMapper.builder()
        .origin(address)
        .origin("B2")
        .origin(1, 1)
        .build();
mapper.writeValue(output, values, type);

`BLANK`	`BLANK`	`BLANK`
`BLANK`	Column A	Column B
`BLANK`	1	2
`BLANK`	3	4

Overwrite Column Names

SpreadsheetMapper mapper = SpreadsheetMapper.builder()
        .columnNameResolver(prop -> "Overwrite " + prop.getName().toUpperCase())
        .build();
mapper.writeValue(output, values, type);

Overwrite A	Overwrite B
1	2
3	4

enum ColumnCode {
    A("Code A"), B("Code B");
    String text;
    // ...
}

@Retention(RetentionPolicy.RUNTIME)
@interface NameOf {
    ColumnCode value();
}

@DataGrid
class Entity {
    @NameOf(ColumnCode.A)
    @DataColumn("It will be overwritten")
    int a;
    @NameOf(ColumnCode.B)
    int b;
    // ...
}

ColumnNameResolver byText = AnnotatedNameResolver.forValue(NameOf.class, ColumnCode::getText);
SpreadsheetMapper mapper = SpreadsheetMapper.builder()
        .columnNameResolver(byText)
        .build();
mapper.writeValue(output, values, type);

Code A	Code B
1	2
3	4

Reading Under Control

SpreadsheetReader reader = mapper.sheetReaderFor(Entry.class);
try (SheetMappingIterator<Entry> iterator = reader.readValues(input)) {
    while (iterator.hasNext()) {
        Entry value = iterator.next();
        SheetLocation location = iterator.getCurrentLocation();
        boolean done = ...;
        if (done) {
            break;
        }
    }
}

SpreadsheetMapper mapper = SpreadsheetMapper.builder();
        .enable(SheetParser.Feature.BREAK_ON_BLANK_ROW)
        .build();

On/Off Features in SheetParser.Feature

BLANK_ROW_AS_NULL (default true)
BREAK_ON_BLANK_ROW (default false)

And more.

Supported Types

Deserialization
- java.io.File
- java.io.InputStream
- io.github.scndry.jackson.dataformat.spreadsheet.deser.SheetInput<T>
- org.apache.poi.ss.usermodel.Sheet
Serialization
- java.io.File
- java.io.OutputStream
- io.github.scndry.jackson.dataformat.spreadsheet.ser.SheetOutput<T>
- org.apache.poi.ss.usermodel.Sheet

References

License

Project is licensed under Apache License 2.0.

1. XLSX — XSSF (XML SpreadSheet Format)

2. XLS — HSSF (Horrible SpreadSheet Format)

scndry / jackson-dataformat-spreadsheet Goto Github PK