Code Monkey home page Code Monkey logo

fastods's Introduction

FastODS Code Coverage

FastODS

(C) J. Férard 2016-2023

(C) M. Schulz 2008-2013 for SimpleODS

A very fast and lightweight (no dependency) library for creating ODS (Open Document Spreadsheet, mainly for Calc) files in Java. It's a Martin Schulz's SimpleODS fork.

TLDR;

  • FastODS is compatible with Java 8 or any later version and OpenDocument v1.2 (actually, documents are 1.3 valid but tagged "1.2").
  • FastODS cannot read ODS documents;
  • FastODS can produce complex and large ODS documents very fast;
  • FastODS is almost ready for production use. The version 1.0 is coming;
  • There is a little tutorial that covers most of the features of FastODS;
  • All documents produced in the tutorial are validated against OpenDocument RELAX NG schemas;
  • Important: feel free to ask a question or make a suggestion;

Features

  • Cell styles, content formatting (dates, numbers, ...), formulas;
  • Page formatting (header, footer);
  • Document embedding, images;
  • Filters & Autofilters;
  • Pivot Tables (Data Pilot);
  • Easy export of SQL ResultSets;
  • Support for Macros, events;
  • Document encryption (AES).

Examples

Here's what some of the produced documents in the tutorial look like:

A regular table (export of a SQL ResultSet):
Periodic Table List of elements

A multiplication table:
Multiplication Table

The periodic table of Mr. Dmitri Mendeleev:
Periodic Table

Table of contents

Why FastODS?

Because I need to write big and simple ODS files very fast in Java.

There are some very good libraries for OASIS Open Document Format, like Simple ODF or JOpenDocument, but they are a little bit slow and cumberstone for only writing very simple spreadsheets. There is a simple and fast library by Martin Schulz, Simple ODS, but it is now discontinued, outdated (Java 1.3), has a few limitations (incorrect handling of UTF-8 encoding, missing XML escaping for attributes), etc.

FastODS is a fork of SimpleODS that aims to be a very fast ODS writing library in Java. A lot of features have been added.

(Thanks to M. Schultz for his work.)

Limitations

FastODS won't deal with odt, odg, odf, or other od_ files. It won't even read ods files. Because it doesn't use XML internally, but only for writing files. That's why it is fast and lightweight.

It's an OpenDocument producer (Open Document Format for Office Applications (OpenDocument) Version 1.2, 2.3.1) and only an OpenDocument producer:

An OpenDocument producer is a program that creates at least one conforming OpenDocument document

FastODS documents and LibreOffice/OpenOffice/Excel/...

While an OpenDocument producer like FastODS can be reasonably simple because it just has to focus on the creation of OpenDocument files, OpenDocument consumers like LibreOffice, OpenOffice, Excel, ... are expected to handle numerous files created by various producers and are therefore more complex. Although the spec states that a producer has to:

parse and interpret OpenDocument documents according to the semantics defined by this specification [...] but it need not interpret the semantics of all elements, attributes and attribute values. (emphasis mine)

we expect those applications to open almost every proper OpenDocument file. But that's not so easy, because LibreOffice, OpenOffice, Excel, ... do not understand some tags, attributes or attribute values.

To be pragmatic, I chose to consider LibreOffice as the reference implementation. Hence the documents created by FastODS should be fully understood by LibreOffice (whereas there is a specific option to disable this enforced compatibility and to produce files that are slightly more concise). There is no plan to adapt the documents created by FastODS to OpenOffice, Excel or another reader, although it will be done if possible (your help is welcome).

Platforms and dependencies

FastODS has no runtime dependency beyond the standard Java Library version 6 or higher. Even the XML code is manually produced.

However, specific modules like fastods-crypto and fastods-extra have dependencies, but those are usually not needed.

FastODS works on Android (tested on Android 9 Pie, see this issue).

Installation

Standard

Add the following dependency to your POM:

<dependency>
		<groupId>com.github.jferard</groupId>
		<artifactId>fastods</artifactId>
		<version>0.8.1</version>
</dependency>

From sources

For the latest version, type the following command:

git clone https://github.com/jferard/fastods.git

Then:

mvn clean install

And add the following dependency to your POM:

<dependency>
		<groupId>com.github.jferard</groupId>
		<artifactId>fastods</artifactId>
		<version>0.8.2-SNAPSHOT</version>
</dependency>

From jar

First download the jar file from the latest release.

Then run the following command to install the jar in your local repo:

mvn org.apache.maven.plugins:maven-install-plugin:2.5.2:install-file -Dfile=fastods-<version>.jar

Examples

Basic example

Taken from the tutorial:

final OdsFactory odsFactory = OdsFactory.create(Logger.getLogger("hello-world"), Locale.US);
final AnonymousOdsFileWriter writer = odsFactory.createWriter();
final OdsDocument document = writer.document();
final Table table = document.addTable("hello-world");
final TableRowImpl row = table.getRow(0);
final TableCell cell = row.getOrCreateCell(0);
cell.setStringValue("Hello, world!");
writer.saveAs(new File("generated_files", "readme_example.ods"));

Documentation

Writing a full documentation would be a considerable work, because every time you change the library, you have to rewrite the doc.

My idea is to provide a set of examples of the features of FastODS. This is ensures that the doc is up to date.

Those examples are located in the examples module and are fully commented. A tutorial was extracted from this examples.

To run those examples, one has to run:

mvn verify

The resulting ods files are written in generated_files directory, and can be opened with LibreOffice or OpenOffice.

(Note: All documents produced in the tutorial are validated against OpenDocument RELAX NG schemas. This is done by the odfvalidator from OdfToolkit.)

Other examples

Other examples are implemented as integration tests: OdsFileCreationIT.java, OdsFileWithHeaderAndFooterCreationIT.java, etc.

To run those examples, one has to run:

mvn verify

The resulting ods files are written in generated_files directory, and can be opened with LibreOffice or OpenOffice. See the integration tests directory

Speed

Let's be concrete : FastODS is approximately twice as fast as SimpleODS and ten times faster than JOpenDocument for writing large ODS files. (SimpleODF is clearly not the right tool to write large ODS files.)

For more details, see https://github.com/jferard/fastods/wiki/Benchmarking-and-profiling.

History

See https://github.com/jferard/fastods/releases

fastods's People

Contributors

dependabot[bot] avatar jferard avatar juergen-albert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastods's Issues

The example in the README is outdated

The API was modified a couple of times since the example was written. To avoid a desynchronization between README and API, the simple example should have his own test class in misc dir.

Add a logger for exceptions

This logger will trace exceptions instead of ignore or e.printStackTrace(). It could also provide valuable debug information.

Gather non unit tests in a misc folder

There are four fake tests:

  • src/test/.../OdsFileWithHeaderAndFooterCreation.java (kind of example)
  • src/test/.../OdsFileCreation.java (kind of example)
  • src/test/.../ProfileFastODS.java (for profiling)
  • src/bench/.../Benchmark.java (for benchmarks)
    All those files could be in a new src/misc/java folder (a misc would replace the bench one). The first two could have a ...Test.java ending to help running it from maven.

Maybe create a length class

This should be carefully tested, since it could slow down the generation of files if used intensively, but would be useful.

A sign that such a class is missing is the javadoc: the comments are polluted with explanations on what a length is (see 18.3.18 length, 18.3.23 percent).

Java1.6 compliance

The code is Java 1.6 compliant. It would be useful to set source & target to 1.6 in the pom.xml, in order to have a jre1.6 compatible jar.

Check DomTester usage

Every time XML code is tested, DomTester should be used to avoid false positives:
<a b="1" c="2" />
Is equivalent to:
<a b="2" c="1" />

Create a TextProperties class

The TextStyle.appendAnonymousXMLToContentEntry output a <style:text-properties> tag. A TextProperties object in the TextStyle class, with delegation, would clarifiy the API.

TableCellStyle and DataStyle

Presently, HeavyTableRow uses a fake TableCellStyle to set the cell DataStyle. This exposes a <style:table-cell> definition per data style type in styles.xml common styles. These fake styles are visible for the LO user (type F11).

We need another way of setting a data style for a cell.

Check printable header and footer

Some features have not been tested, like header and footer for printer, because they were not used. They should be tested and fixed if necessary now.

Improve Javadoc

The quality of the Javadoc is dropping. With the new JDK8 doclint, it does not even passes a basic mvn site.

An effort on the doc is necessary.

A new API for FooterHeader building

Let's make it like that:
FooterHeader.builder("fh").par().styledText("A", ts1).styledText("B", ts2).styledPar(ps1).text("C").text("D").text("E).
The output would be:

AB

CDE

Escape tooltip text

The setTooltip(c, text) does not escape the text. It's a bug and will make LO crash if the text contains < or > characters.

A simple workaround is to escape the text before passing it to setTooltip with an XMLUtil instance.

EDIT: I've seen that the newlines aaren't correctly rendered under LO. Try to fix this too.

Allow ZipOutputStream configuration

Currently, the ZipOutputStream has BEST_SPEED level. It seems good for most use cases, but some times one could have a good reason too choose BEST_COMPRESSION or NO_COMPRESSION.

Create an in memory ZipUTF8Writer

To help test some features, it would be nice to have a precise view of what zip entries (content.xml, ...) contains. To do so, we need a ZipUTF8WriterTester with the same API as ZipUTF8Writer and the following methods:

String ZipUTF8WriterTester.getEntryAsString(String entryName)
org.w3c.dom.Document ZipUTF8WriterTester.getEntryAsDocument(String entryName)

Clarify styles destinations

There are four possible destinations for style definitions:

  • content.xml/automatic-styles
  • styles.xml/styles
  • styles.xml/automatic-styles
  • styles.xml/master-styles

LO seems to follow some kind of "group rule". E.g. A page layout is is styles.xml/automatic-styles, therefore all styles referenced in that page layout have to be in styles.xml/automatic-styles. If FastODS does not follow that rule, LO won't be able to read FastODS generated files.

See #24
See #19

Use a specific FullList for raw types

A basic implementation of a list of bytes or ints with 0 as default value would probably be faster than an ArrayList<Integer> with auto boxing/unboxing, and would avoid the the problem of null values.
Need some profiling to determine if it worths it.

Spans

When merging a cell with its neighbours, the generated XML sould look like:

<table:table-row>
	cells
	<table:table-cell table:number-columns-spanned="A" table:number-rows-spanned="B">
		value of the cell
	</table:table-cell>
	<table:covered-table-cell table:number-columns-repeated="A-1"/>
	cells
</table:table-row>

Followed by B times:

<table:table-row>
	cells
	<table:covered-table-cell table:number-columns-repeated="A"/>
	cells
</table:table-row>

Currently, all covered cells are ignored, and the XML looks like:

<table:table-row>
	cells
	<table:table-cell table:number-columns-spanned="A" table:number-rows-spanned="B">
		value of the cell
	</table:table-cell>
	cells
</table:table-row>

And nothing on the following rows.

Adding the covered cells on the row should be easy, since the XML of the row is created cell by cell. But it's more difficult to "cover the cells" of the next rows, because each HeavyTableRow is independant.

One solution would be:

  • create, for each Table, two methods setCovered(row, col) and isCovered(row, col).
  • call table.setCovered(row, col) when one cell covers neighbor cells.
  • call table.isCovered(row, col) when outputting the XML.

Add a smaller file generation for profiling

When profiling all methods (including java. ...), the profiling is very slow and blocks on a weak laptop. Adding smaller and less greedy test would allow to do the profiling in that case.

Add formula

It is sometimes useful to create formulas programmatically. It should not be too difficult to implement in HeavyTableColdRow.

Move StyleTag to StylesEntry

The StyleTag XML is written in content.xml > automatic-styles. That's the wrong destination, since FastODS does not allow anonymous styles. The output has to be moved to styles.xml > styles.

Threading

Reminder:

  • one should write ods files sequentially;
  • LO needs the styles in content.xml to be defined before they are used (that means that a style-name attribute mut refer to a style previously defined);
  • basic use case of FastODS implies threading: one thread gathers data (e.g. performs a query on a database) and put it on a bus; one thread reads data from bus and writes it to file (this is the FastODS part).

Now, under some assumptions, it is possible to split FastODS work into two threads:

  • one thread builds HeavyTableRows and put it on a bus;
  • one thread reads the rows from bus and writes it to file.

These assumptions are:

  • all styles are defined when the first table row is written;
  • a row is never modified after it was put on the bus (there is a kind of flush)

I think there is only one way to discover if it will really speed up the file creation: try!

What happens when a file cannot be saved?

Currently, there is no way to distinguish the following cases:

  • the file cannot be saved because the filename is a directory name
  • the file can be saved but the current file will be overwritten
  • the file can be saved without problem
    (I do not consider usual I/O problems).
    The library may be used in an application which has to handle the cases above (e.g. in the GUI), so its important to distinguish (exception or boolean return) thoses cases.

A separate DomTester module

The DomTester class provides different tests for XML code equivalence. It is implemented in test dir and it is not so simple. Perhaps a separate module would be better ?

The name of the module would be : fastods-testlib (like in Guava), in a mutlimodule structure:

fastods/fastods/... <- current module without DomTester
fastods/fastods-testlib/... <- DomTester

Fix copyright date

We're in 2017, the copyright date should be 2016-2017:

  • in the README.md
  • in the header of files

Footer and header builder are incomplete

There are two builders : RegionFooterHeaderBuilder that builds a RegionFooterHeader and SimpleFooterHeaderBuilder that builds a SimpleFooterHeader. None of them allows the user to set margins or min height.

Improve benchmarking results presentation

The benchmarks should present results like that:

FastODS 10 tables of 5000 rows, 20 columns without warmup:
avg time: ... ms
best time: ... ms
worst time: ... ms

FastODS 10 tables of 5000 rows, 20 columns with warmup:
avg time: ... ms
best time: ... ms
worst time: ... ms

Idem for SimpleODS and JOpendocument libraries.

SimpleODS dependency

The POM contains a SimpleODS dependency, used only in Benchmark.java, which is not part of unit tests, but has to be run manually. Since SimpleODS is not available in Maven central repo, this leads to a compilation failure.

Three options:

  • install SimpleODS manually;
  • remove dependency;
  • manage to keep dependency, but avoid failure (how ?)

Separate Footer and Header

The FooterHeader class stores both header and footer, but we need to check the type (h or f) at some places. This check could be avoided with two classes.
Perhaps a design like this: keep the FooterHeader class but let it store an instance of FooterOrHeader interface that gathers the specific code of each "type".

Create a OdsDocument.getOrAddTable()

The common idiom to access to a table when one does not know if it exists is:

Table t;
try {
    t = document.getTable("t");
} catch (FastOdsException e) {
    t = document.addTable("t");
}

A document.getOrAddTable()method would be welcome.

New cell pseudo-type text

Some cells may contain complex text, similar to the footer/header text. With new cell pseudo-type, it would be easy to do that.
String version:
TableCell.setStringValue("string")
Outputs a very short XML:

<table:table-cell office:value-type="string" office:string-value="string"/>

Text version (see syntax for builder here #14):
TableCell.setTextValue(TextValue.builder().<some methods to build the text value>.build())
Outputs a longer XML:

<table:table-cell office:value-type="string" calcext:value-type="string">
    <text:p>
        <text:span text:style-name="...">te</text:span><text:span text:style-name="...">xt</text:span>
    </text:p>
</table:table-cell>

Favor 'object composition' over 'class inheritance' in style builders

There are still a lot of class inheritance. A refactoring should limit the use of inheritance, since it's error prone.

EDIT: It is acceptable for styles, because it's easy to manage, but in the case of builders, there is a dirty F-bound trick to keep the fluent style alive.

Handle duplicate styles at builder level

Most of the styles are created with a <Style>.builder(<name>). <some setters...> .build().

It should be possible to avoid name conflicts at that level, with a kind of styles dictionary (name -> style). Currently, name conflicts are not handled by FastODS: it is possible to have multiple styles with the same name in the document.

It could need some digging in the OASIS Standard to see what happens then, but it's better to avoid it.

Make ConfigItem accessible

The Table objects inits a bunch of ConfigItems, but there is currently no way to change their values.

A ConfigItems class, where each ConfigItem would be accessible by name, seems a good way to expose setters`:

ConfigItems.set("zoomValue", "60");

Merge Util and XMLUtil

Both classes have the same purpose: give helper methods for the library. The merge could be a facade pattern, with the two classes remaining the same, or a full merge.

Before merging, the argument order must be the same in every method or constructor: XMLUtil then Util.

Autofilter

This should be relatively easy to setup:

table:calculation-settings
table:named-expressions
<table:database-ranges>
	<table:database-range table:name="__Anonymous_Sheet_DB__0" table:target-range-address="Sheet1.A1:Sheet1.A3" table:display-filter-buttons="true"/>
</table:database-ranges>

Improve unit tests coverage

According to cobertura, the coverage of unit tests is 64 %, which is little. Coverage should be increased.

Allow page layout share between master pages

Currently, the PageStyle handle two tags:

  • style:master-page in styles.xml > master-styles
  • style:page-layout in styles.xml >automatic-styles
    If we split PageStyle into MasterPageStyle and PageLyout, a page layout could be shared between two master pages.

Move object type inference in a specific class

The use of TableCellWalker.setObjectValue should be avoided. FastODS has to infer the "ods type" of the object, what is not always obvious.

Instead of TableCellWalker.setObjectValue, we should use a new class CellValue, and a newTableCellWalker.setCellValue, which is unambiguous.

More details on CellValue:

  • CellValue.fromObject: where the inference is done
  • CellValue.fromDate, CellValue.fromBoolean, etc.
  • CellValue.fromTypeAndObject, unambiguous but slow.

This class will be a little cumberstone, but useful for helpers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.