Code Monkey home page Code Monkey logo

exificient-core's Introduction

EXIficient

EXIficient - open source implementation of the W3C Efficient XML Interchange (EXI) format specification.

The EXI format is a very compact representation for the Extensible Markup Language (XML) Information Set that is intended to simultaneously optimize performance and the utilization of computational resources.

Java CI

API support

  • SAX 1
  • SAX 2
  • DOM
  • StAX
  • XmlPull

Apache Maven Dependency

<dependency>
   <groupId>com.siemens.ct.exi</groupId>
   <artifactId>exificient</artifactId>
   <version>1.0.7</version>
</dependency>

Requirements

  • Java 1.5 or higher
  • Xerces2 Java Parser

Library - Code Sample

/*
 *  Setup EXIFactory as required
 */
EXIFactory exiFactory = DefaultEXIFactory.newInstance();
// e.g., add additional settings beyond the default values
// exiFactory.setGrammars(GrammarFactory.newInstance().createGrammars("foo.xsd")); // use XML schema
// exiFactory.setCodingMode(CodingMode.COMPRESSION); // use deflate compression for larger XML files

/*
 *  encode XML to EXI
 */
String fileEXI = "foo.xml.exi"; // EXI output
OutputStream osEXI = new FileOutputStream(fileEXI);
EXIResult exiResult = new EXIResult(exiFactory);
exiResult.setOutputStream(osEXI);
XMLReader xmlReader = XMLReaderFactory.createXMLReader();
xmlReader.setContentHandler( exiResult.getHandler() );
xmlReader.parse("foo.xml"); // parse XML input
osEXI.close();

/*
 *  decode EXI to XML
 */
String fileXML = "foo.xml.exi.xml"; // XML output again
Result result = new StreamResult(fileXML);
InputSource is = new InputSource(fileEXI);
SAXSource exiSource = new EXISource(exiFactory);
exiSource.setInputSource(is);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(exiSource, result);

Command-line Interface

EXIficient also provides a command-line interface.

/* Class: com.siemens.ct.exi.cmd.EXIficientCMD */

      

######################################################################### 
###   EXIficient                                                      ### 
###   Command-Shell Options                                           ### 
######################################################################### 

 -h                               /* shows help */ 

 -encode 
 -decode 

 -i  
 -o  

 -schema  
 -xsdSchema                       /* XML schema datatypes only */ 
 -noSchema                        /* default */ 

 -strict 
 -preservePrefixes 
 -preserveComments 
 -preserveLexicalValues 
 -preservePIs                     /* processing instructions */ 
 -preserveDTDs                    /* DTDs & entity references */ 

 -bytePacked 
 -preCompression 
 -compression 

 -blockSize  
 -valueMaxLength  
 -valuePartitionCapacity  

 -noLocalValuePartitions          /* EXI Profile parameters */ 
 -maximumNumberOfBuiltInProductions  
 -maximumNumberOfBuiltInElementGrammars  
 
 -includeOptions 
 -includeCookie 
 -includeSchemaId 
 -includeSchemaLocation 
 -includeInsignificantXsiNil 
 -includeProfileValues 
 -retainEntityReference
 -fragment 
 -selfContained <{urn:foo}elWithNS,elDefNS>
 -datatypeRepresentationMap  

# Examples
 -encode -schema notebook.xsd -i notebook.xml
 -decode -schema notebook.xsd -i notebook.xml.exi -o notebookDec.xml

EXIFactory Settings

Note: in general all options are set in a way that a small EXI stream is produced. However for larger XML files (e.g., COMPRESSION) or desired fidelity options (e.g., preserver comments) different settings might be chosen.

General Information Default Hint
blockSize Specifies the block size used for EXI compression 1,000,000 The default blockSize is intentionally large but can be reduced
valueMaxLength Specifies the total capacity of value partitions in a string table unbounded Reducing the number reduces the possible required memory usage
valuePartitionCapacity Specifies the total capacity of value partitions in a string table unbounded Often larger strings (> 16 characters) are unlikely to be a string table hit. Hence setting a lower value may reduce memory usage and speed up processing for no string table hits
com.siemens.ct.exi.core.FidelityOptions
Option Information Default
FEATURE_COMMENT Comments are preserved false
FEATURE_PI Processing Instructions are preserved false
FEATURE_DTD DTDs and Entity References are preserved false
FEATURE_PREFIX Namespace Prefixes are preserved false
FEATURE_LEXICAL_VALUE Lexical form of values is be preserved (e.g., float 1.00 vs 1) false
com.siemens.ct.exi.core.CodingMode Information (default is BIT_PACKED) Hint
BIT_PACKED Alignment option value bit-packed indicates that the the event codes and associated content are packed in bits without any paddings in-between Small files
BYTE_PACKED Alignment option value byte-alignment indicates that the event codes and associated content are aligned on byte boundaries Small files
PRE_COMPRESSION Alignment option value pre-compression alignment indicates that all steps involved in compression are to be done with the exception of the final step of applying the DEFLATE algorithm Large Files (e.g., compression built-in in transport)
COMPRESSION The compression option is used to increase compactness using additional computational resources (DEFLATE algorithm) Large Files
com.siemens.ct.exi.core.EncodingOptions Information Default Hint
INCLUDE_COOKIE EXI Cookie, which is a four byte field that serves to indicate an EXI stream false Useful if stream can be of any other type than EXI
INCLUDE_OPTIONS EXI Options, which provides a way to specify the options used to encode the body of the EXI stream. false Useful if options may vary or are unknown to recipient
INCLUDE_SCHEMA_ID Identify the schema information, if any, used to encode the body false Useful if schema information are unknwon to recipient

exificient-core's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

exificient-core's Issues

Feature request: Ability to save/restore Grammar

I have large XSD (hundreds of quite large XSD files, 10+ Megabytes of XSD).

I need to use schema-aware EXI, both with and without the compression option.

The time taken to create the Grammar from these XSD is significant.

The Grammar objects are currently not serializable.

Could they be so that one could compile the XSD to a grammar once, save to a file, then reload from file to reuse?

NullPointerException when decoding an EXI stream with FidelityOptions.FEATURE_PREFIX

For OpenDaylight's NETCONF implementation we have tried to switch from OpenEXI to Exificient (https://git.opendaylight.org/gerrit/35131), but our unit tests (which use FidelityOptions.FEATURE_PREFIX) are triggering:

Caused by: java.lang.NullPointerException
at com.siemens.ct.exi.util.xml.QNameUtilities.getQualifiedName(QNameUtilities.java:86)
at com.siemens.ct.exi.core.AbstractEXIBodyDecoder.getAttributeQNameAsString(AbstractEXIBodyDecoder.java:297)
at com.siemens.ct.exi.api.sax.SAXDecoder.handleAttribute(SAXDecoder.java:588)
at com.siemens.ct.exi.api.sax.SAXDecoder.parseEXIEvents(SAXDecoder.java:288)

Which seems to be indicating that the prefix passed to getQualifiedName() is null and trips on pfx.length() check. I am not sure if this is an codec problem (and pfx should never be null), or if the null should be treated as an empty string.

BigDecimal parsing in DecimalValue class

Currently providing BigDecimal to DecimalValue is not possible and users need to create string first.

I assume there are better and faster ways of extracting the integer and fractional part from a BigDecimal.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.