Code Monkey home page Code Monkey logo

easysax's Introduction

EASYSAX - pure javascript sax-style parser for xml

Простой и быстрый SAX парсер XML. Парсер не потоковый, и не расчитан на гиганские файлы. Весь XML должен быть в памяти. Встроен механизм работы с пространсвами имен.

Парсер был написан для RSS ридера http://zzreader.com На конец 2017 года остается самым быстрым SAX парсером XML под NODE.JS

Install

$ npm install easysax

Benchmark

https://github.com/vflash/sax-benchmark

sh: node bench-01.js

count - 100000
size - 25

saxjs : 346.182ms
libxml: 852.098ms
expat : 705.867ms
expat buffer: 712.212ms
ltx: 137.998ms
easysax ns=on  entityDecode=on  getAttr=on : 100.050ms
easysax ns=off entityDecode=on  getAttr=on : 82.520ms
easysax ns=off entityDecode=off getAttr=on : 69.133ms
easysax ns=off entityDecode=off getAttr=off: 29.226ms

sh: node bench-02.js

count - 1000
size - 22750

saxjs : 1484.910ms
libxml: 1058.808ms
expat : 1028.151ms
expat buffer: 853.925ms
ltx: 359.173ms
easysax ns=on  entityDecode=on  getAttr=on : 151.511ms
easysax ns=off entityDecode=on  getAttr=on : 114.646ms
easysax ns=off entityDecode=off getAttr=on : 88.604ms
easysax ns=off entityDecode=off getAttr=off: 80.773ms

sh: node bench-03.js

count - 1000
size - 121786

saxjs : 10765.309ms
libxml: 5387.832ms
expat : 6734.018ms
expat buffer: 5865.209ms
ltx: 2953.910ms
easysax ns=on  entityDecode=on  getAttr=on : 1769.676ms
easysax ns=off entityDecode=on  getAttr=on : 1475.585ms
easysax ns=off entityDecode=off getAttr=on : 1214.665ms
easysax ns=off entityDecode=off getAttr=off: 405.799ms

Пример использования

var parser = new EasySax();

// если требуется пространство имен
parser.ns('rss', {
	'http://www.w3.org/2005/Atom': 'atom',
	'http://www.w3.org/1999/xhtml': 'xhtml',

	'http://search.yahoo.com/mrss/': 'media',
	'http://purl.org/rss/1.0/': 'rss',
	'http://purl.org/dc/elements/1.1/': 'dc',
	'http://www.w3.org/1999/02/22-rdf-syntax-ns#' : 'rdf',
	'http://purl.org/rss/1.0/modules/content/': 'content',
	'http://www.yandex.ru': 'yandex',
	'http://news.yandex.ru': 'yandex',
	'http://backend.userland.com/rss2': 'rss'

});

parser.on('error', function(msg) {
	// console.log('error - ' + msg);
});

parser.on('startNode', function(elementName, getAttr, isTagEnd, getStringNode) {
	// elementName -- (string) название элемента. при указании пространства имен, то автоматически подставляется префикс
	// getAttr() -- (function) парсит атрибуты и возврашает обьект.
	// isTagEnd -- (boolean) флаг что элемент пустой "<elem/>"
	// getStringNode() -- (function) возвращает нераспарсенная строка элемента. Пример: <item title="text" id="x345">
});

parser.on('endNode', function(elementName, isTagStart, getStringNode) {
	// isTagStart -- (boolean) флаг что элемент пустой "<elem/>"
});

parser.on('textNode', function(text) {
	// text -- (String) строка текста
});

parser.on('cdata', function(text) {
    // text -- (String) строка текста элемента CDATA
});

parser.on('comment', function(text) {
	// text - (String) текст комментария
});

//parser.on('question', function() {}); // <? ... ?>
//parser.on('attention', function() {}); // <!XXXXX zzzz="eeee">


parser.parse(xml); // xml -- (String) строка xml

easysax's People

Contributors

vflash avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

easysax's Issues

Работа с потоками

У меня xml по 30гб в оперативку не влезут, приходится искать те пакеты что работают с потоками)
Не планируется добавление такого функционала?

Various improvements

First of all dobroi djen and thanks for this awesome, blazing fast XML parser @vflash.

I've forked your library as nikku/easysax and plan to use it in production. In the course of trying it out, I've carried out various changes. A few of these are improvements you could consider to merge:

  • CHORE: rewrite test suite to use deep equals (nikku@6b5d829), this provides proper diffs between expected and actual data
  • FIX: handle nested namespaces correctly (nikku@60daae3)
  • FEAT: provide parse context with { line, column, data } to use upon user request (nikku@1733b57), this is helpful for library users to retrieve the context on parser or XML processing errors

Keep up the good work and have a nice day!

Nico

":" and "_" are valid first characters of a nodeName

The XML specification defines:

NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

But easysax only accepts [A-Za-z]: see here, else it throws "first char nodeName".

Personally, i don't care to much about those higher unicode codepoints from the spec (and that would touch some other areas inside easysax i think), but i have some data on my hands where some tags start with underscore.

Could at least ":" and "_" be added to the list of valid first chars for nodeName?

English :)

Hi,

Thank you for this excellent library. I plan to learn Russian sometime soon but still can't read / write any Russian :) I know many people would love to use this library. Is there any chance I can help you to change the README to English language ?

Cheers

License of easysax

Hi,
Thank you for releasing a great software!

I would like you to put the LICENSE file.
I hope you choose MIT or BSD style license.

Thanks in advance!

Hiroaki

Easysax improvements

I use easysax in a trading application (you can see the first dependency on npmjs site). There are issues that make using easysax not an easy deceision.

  1. Easysax has no unit tests, sax has exellent unit tests, can you please add serious regression unit testing?
  2. Can you please translate Russian in code and documentation? People do not like languages they do not understand
  3. Code cleanup would be great to have.

Is the charCodeAt so much better charAt?

w = x.charCodeAt(q);
if ( w>96 && w < 123 || w > 47 && w < 59 || w>64 && w< 91 || w ===45) {

I cannot read your code without ASCII table and event then it's so uneasy

The only reason I use easysax is that it's excellent speed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.