Code Monkey home page Code Monkey logo

efrt's Introduction

compression of key-value data
npm install efrt

if your data looks like this:

var data = {
  bedfordshire: 'England',
  aberdeenshire: 'Scotland',
  buckinghamshire: 'England',
  argyllshire: 'Scotland',
  bambridgeshire: 'England',
  cheshire: 'England',
  ayrshire: 'Scotland',
  banffshire: 'Scotland'
}

you can compress it like this:

import { pack } from 'efrt'
var str = pack(data)
//'England:b0che1;ambridge0edford0uckingham0;shire|Scotland:a0banff1;berdeen0rgyll0yr0;shire'

then _very!_ quickly flip it back into:

import { unpack } from 'efrt'
var obj = unpack(str)
obj['bedfordshire'] //'England'

Yep,

efrt packs category-type data into a very compressed prefix trie format, so that redundancies in the data are shared, and nothing is repeated.

By doing this clever-stuff ahead-of-time, efrt lets you ship much more data to the client-side, without hassle or overhead.

The whole library is 8kb, the unpack half is barely 2kb.

it is based on:

Benchmarks!

Basically,
  • get a js object into very compact form
  • reduce filesize/bandwidth a bunch
  • ensure the unpacking time is negligible
  • keep word-lookups on critical-path
import { pack, unpack } from 'efrt' // const {pack, unpack} = require('efrt')

var foods = {
  strawberry: 'fruit',
  blueberry: 'fruit',
  blackberry: 'fruit',
  tomato: ['fruit', 'vegetable'],
  cucumber: 'vegetable',
  pepper: 'vegetable'
}
var str = pack(foods)
//'{"fruit":"bl0straw1tomato;ack0ue0;berry","vegetable":"cucumb0pepp0tomato;er"}'

var obj = unpack(str)
console.log(obj.tomato)
//['fruit', 'vegetable']

or, an Array:

if you pass it an array of strings, it just creates an object with true values:

const data = [
  'january',
  'february',
  'april',
  'june',
  'july',
  'august',
  'september',
  'october',
  'november',
  'december'
]
const packd = pack(data)
// true¦a6dec4febr3j1ma0nov4octo5sept4;rch,y;an1u0;ly,ne;uary;em0;ber;pril,ugust
const sameArray = Object.keys(unpack(packd))
// same thing !

Reserved characters

the keys of the object are normalized. Spaces/unicode are good, but numbers, case-sensitivity, and some punctuation (semicolon, comma, exclamation-mark) are not (yet) supported.

specialChars = new RegExp('[0-9A-Z,;!:|¦]')

efrt is built-for, and used heavily in compromise, to expand the amount of data it can ship onto the client-side. If you find another use for efrt, please drop us a line🎈

Performance

efrt is tuned to be very quick to unzip. It is O(1) to lookup. Packing-up the data is the slowest part, which is usually fine:

var compressed = pack(skateboarders) //1k words (on a macbook)
var trie = unpack(compressed)
// unpacking-step: 5.1ms

trie.hasOwnProperty('tony hawk')
// cached-lookup: 0.02ms

Size

efrt will pack filesize down as much as possible, depending upon the redundancy of the prefixes/suffixes in the words, and the size of the list.

  • list of countries - 1.5k -> 0.8k (46% compressed)
  • all adverbs in wordnet - 58k -> 24k (58% compressed)
  • all adjectives in wordnet - 265k -> 99k (62% compressed)
  • all nouns in wordnet - 1,775k -> 692k (61% compressed)

but there are some things to consider:

  • bigger files compress further (see 🎈 birthday problem)
  • using efrt will reduce gains from gzip compression, which most webservers quietly use
  • english is more suffix-redundant than prefix-redundant, so non-english words may benefit from other styles

Assuming your data has a low category-to-data ratio, you will hit-breakeven with at about 250 keys. If your data is in the thousands, you can very be confident about saving your users some considerable bandwidth.

Use

IE9+

<script src="https://unpkg.com/efrt@latest/builds/efrt.min.cjs"></script>
<script>
  var smaller = efrt.pack(['larry', 'curly', 'moe'])
  var trie = efrt.unpack(smaller)
  console.log(trie['moe'])
</script>

if you're doing the second step in the client, you can load just the CJS unpack-half of the library(~3k):

const unpack = require('efrt/unpack') // node/cjs
<script src="https://unpkg.com/efrt@latest/builds/efrt-unpack.min.cjs"></script>
<script>
  var trie = unpack(compressedStuff)
  trie.hasOwnProperty('miles davis')
</script>

Thanks to John Resig for his fun trie-compression post on his blog, and Wiktor Jakubczyc for his performance analysis work

MIT

efrt's People

Contributors

spencermountain avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

paulyc vitaly-z

efrt's Issues

error when word = "constructor"

please make a note in the readme that if there's the word "constructor" (a javascript reserved word) then you get strange errors (TypeError: h[val].push is not a function). either that or have the word filtered out automatically.
thanks for the great library, it's a massive improvement over Steve Hanov's work!

Missing License

The project indicates that it is licensed under MIT, but includes no license file or copyright notice. As the MIT license requires a copyright notice (2nd paragraph - The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.) It is not possible to comply with this license. Please include in the repository a copy of the MIT license with an appropriate copyright.

Could we get support for strings with numbers

We are looking to use this library for storing a large collection of domains, some of which have numbers (e.g. 101domain.com). Would it be possible to add support for numbers in this library?

We found a workaround by encoding the numbers with special characters but it would be nicer to not have to do that. Furthermore, if the values are just an array, the unpack function should probably just have the option of returning an Array or Set, instead of an object.

import { unpack } from 'efrt';

const chars =
{
	')': '0',
	'~': '1',
	'@': '2',
	'#': '3',
	'$': '4',
	'%': '5',
	'^': '6',
	'&': '7',
	'*': '8',
	'(': '9',
};

let obj = unpack (packed);
obj = Object.keys (obj).join ('\n').replace (/[\)\~\@\#\$\%\^\&\*\(]/g, (m) => chars[m]);
const set = new Set ([ ...obj.split ('\n') ]);

console.log (set.has ('101domain.com'));

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.