Code Monkey home page Code Monkey logo

saxy's Introduction

Hex pm

Saxy

Saxy (Sá xị) is an XML SAX parser and encoder in Elixir that focuses on speed, usability and standard compliance.

Comply with Extensible Markup Language (XML) 1.0 (Fifth Edition).

Features highlight

  • An incredibly fast XML 1.0 SAX parser.
  • An extremely fast XML encoder.
  • Native support for streaming parsing large XML files.
  • Parse XML documents into simple DOM format.
  • Support quick returning in event handlers.

Installation

Add :saxy to your mix.exs.

def deps do
  [{:saxy, "~> 0.9.1"}]
end

Overview

Full documentation is available on HexDocs.

SAX parser

A SAX event handler implementation is required before starting parsing.

defmodule MyEventHandler do
  @behaviour Saxy.Handler

  def handle_event(:start_document, prolog, state) do
    IO.inspect("Start parsing document")
    {:ok, [{:start_document, prolog} | state]}
  end

  def handle_event(:end_document, _data, state) do
    IO.inspect("Finish parsing document")
    {:ok, [{:end_document} | state]}
  end

  def handle_event(:start_element, {name, attributes}, state) do
    IO.inspect("Start parsing element #{name} with attributes #{inspect(attributes)}")
    {:ok, [{:start_element, name, attributes} | state]}
  end

  def handle_event(:end_element, name, state) do
    IO.inspect("Finish parsing element #{name}")
    {:ok, [{:end_element, name} | state]}
  end

  def handle_event(:characters, chars, state) do
    IO.inspect("Receive characters #{chars}")
    {:ok, [{:chacters, chars} | state]}
  end
end

Then start parsing XML documents with:

iex> xml = "<?xml version='1.0' ?><foo bar='value'></foo>"
iex> Saxy.parse_string(xml, MyEventHandler, [])
{:ok,
 [{:end_document},
  {:end_element, "foo"},
  {:start_element, "foo", [{"bar", "value"}]},
  {:start_document, [version: "1.0"]}]}

Streaming parsing

Saxy also accepts file stream as the input:

stream = File.stream!("/path/to/file")

Saxy.parse_stream(stream, MyEventHandler, initial_state)

It even supports parsing a normal stream.

stream = File.stream!("/path/to/file") |> Stream.filter(&(&1 != "\n"))

Saxy.parse_stream(stream, MyEventHandler, initial_state)

Partial parsing

Saxy can parse part of an XML document, and parse more of it later.

alias Saxy.Parser.Partial

xml = """
<?xml version=1.0' ?>
<foo bar=value'>
</foo>
"""
split_xml = String.split(xml, "\n")

{:ok, context} = Partial.init(MyEventHandler, initial_state)
{:ok, context} = Partial.parse(Enum.at(split_xml, 0), context)
{:ok, context} = Partial.parse(Enum.at(split_xml, 1), context)
{:ok, context} = Partial.parse(Enum.at(split_xml, 2), context)
{:ok, state} = Partial.finish(context)

Simple DOM format exporting

Sometimes it will be convenient to just export the XML document into simple DOM format, which is a 3-element tuple including the tag name, attributes, and a list of its children.

Saxy.SimpleForm module has this nicely supported:

Saxy.SimpleForm.parse_string(data)

{"menu", [],
 [
   {"movie",
    [{"id", "tt0120338"}, {"url", "https://www.imdb.com/title/tt0120338/"}],
    [{"name", [], ["Titanic"]}, {"characters", [], ["Jack &amp; Rose"]}]},
   {"movie",
    [{"id", "tt0109830"}, {"url", "https://www.imdb.com/title/tt0109830/"}],
    [
      {"name", [], ["Forest Gump"]},
      {"characters", [], ["Forest &amp; Jenny"]}
    ]}
 ]}

XML builder

Saxy offers two APIs to build simple form and encode XML document.

Use Saxy.XML to build and compose XML simple form, then Saxy.encode!/2 to encode the built element into XML binary.

iex> import Saxy.XML
iex> element = element("person", [gender: "female"], "Alice")
{"person", [{"gender", "female"}], [{:characters, "Alice"}]}
iex> Saxy.encode!(element, [])
"<?xml version=\"1.0\"?><person gender=\"female\">Alice</person>"

See Saxy.XML for more XML building APIs.

Saxy also provides Saxy.Builder protocol to help composing structs into simple form.

defmodule Person do
  @derive {Saxy.Builder, name: "person", attributes: [:gender], children: [:name]}

  defstruct [:gender, :name]
end

iex> jack = %Person{gender: :male, name: "Jack"}
iex> john = %Person{gender: :male, name: "John"}
iex> import Saxy.XML
iex> root = element("people", [], [jack, john])
iex> Saxy.encode!(root, [])
"<?xml version=\"1.0\"?><people><person gender=\"male\">Jack</person><person gender=\"male\">John</person></people>"

Benchmarking

Benchmarking in XML is hard and highly depends on the complexity of the document. Saxy usually yields 1.4 times better than Erlsom in benchmark results. With deeply nested documents, it is particularly noticeably faster with 4.35 times faster.

As for XML builder, Saxy is usually 4 times faster than xml_builder on simple element encoding, and 17 times faster in deeply nested elements encoding.

The benchmark suite can be found in this repository.

Limitations

  • No XSD supported.
  • No DTD supported, when the parser encounters a <!DOCTYPE, it simply stops parsing.

Where did the name come from?

Sa xi Chuong Duong

☝️ Sa Xi, pronounced like sa-see, is an awesome soft drink made by Chuong Duong.

Contributing

If you have any issues or ideas, feel free to write to https://github.com/qcam/saxy/issues.

To start developing:

  1. Fork the repository.
  2. Write your code and related tests.
  3. Create a pull request at https://github.com/qcam/saxy/pulls.

saxy's People

Contributors

qcam avatar stevedomin avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.