Code Monkey home page Code Monkey logo

hyphenopoly's Introduction

Hyphenopoly.js

CircleCI Coverage Status Codacy Badge

Hyphenopoly.js is a JavaScript-polyfill for hyphenation in HTML: it hyphenates text if the user agent does not support CSS-hyphenation at all or not for the required languages and it is a Node.js-module.

The package consists of the following parts:

  • Hyphenopoly_Loader.js (~11KB unpacked, ~2KB minified and compressed): feature-checks the client and loads other resources if necessary.
  • Hyphenopoly.js (~36KB unpacked, ~5KB minified and compressed): does the whole DOM-foo and wraps wasm.
  • wasm-Modules (sizes differ! e.g. en-us.wasm: ~21KB uncompressed, ~15KB compressed): core hyphenation functions and hyphenation patterns in a space saving binary format (including pattern license).
  • hyphenopoly.module.js: the node module to hyphenate plain text strings.

Usage (Browser)

Place all the code for Hyphenopoly at the top of the header (immediately after the <title> tag) to ensure resources are loaded as early as possible.

You'll have to insert two script blocks. In the first block, load Hyphenopoly_Loader.js as an external script. In the second block, provide the initial configurations for Hyphenopoly_Loader as an inline script. This also triggers all further steps.

Also, don't forget to enable CSS hyphenation.

Example:

<!DOCTYPE html>
<html>
    <head>
        <meta http-equiv="content-type" content="text/html; charset=UTF-8">
        <title>Example 1</title>
        <script src="./Hyphenopoly_Loader.js"></script>
        <script>
        Hyphenopoly.config({
            require: {
                "la": "honorificabilitudinitas",
                "de": "Silbentrennungsalgorithmus",
                "en-us": "Supercalifragilisticexpialidocious"
            },
            setup: {
                selectors: {
                    ".container": {}
                }
            }
        });
        </script>
        <style type="text/css">
            body {
                width:60%;
                margin-left:20%;
            }
            p {
                text-align: justify;
                margin: 0 2em 0 0;
            }
            .container {
                display: flex;
                hyphens: auto;
                -ms-hyphens: auto;
                -moz-hyphens: auto;
                -webkit-hyphens: auto;
            }
        </style>
    </head>
    <body>
        <h1>Example 1</h1>
        <div class="container">
            <p lang="la">Qua de causa Helvetii quoque reliquos Gallos virtute praecedunt, quod fere cotidianis proeliis cum Germanis contendunt, cum aut suis finibus eos prohibent aut ipsi in eorum finibus bellum gerunt.</p>
            <p lang="en-us">For which reason the Helvetii also surpass the rest of the Gauls in valor, as they contend with the Germans in almost daily battles, when they either repel them from their own territories, or themselves wage war on their frontiers.</p>
            <p lang="de">Aus diesem Grund übertreffen auch die Helvetier die übrigen Gallier an Tapferkeit, weil sie sich in fast täglichen Gefechten mit den Germanen messen, wobei sie diese entweder von ihrem Gebiet fernhalten oder selbst in deren Gebiet kämpfen.</p>
        </div>
    </body>
</html>

Let's go through this example step by step:

UTF-8

Make sure your page is encoded as utf-8.

script blocks – load, configure and run Hyphenopoly_Loader.js

Hyphenopoly_Loader.js needs some information to run. This information is provided as a parameter object to the function Hyphenopoly.config(). This information is stored in a globally accessible Object called window.Hyphenopoly. Hyphenopoly_Loader.js and (if necessary) Hyphenopoly.js will add other methods and properties only to this object – there will be no other global variables or functions beyond this object.

require

The configuration object must have exactly one property called require which itself is an object containing at least one nameValuePair where the name is a language code string (Some languages are region-specific. See the patterns directory for supported languages. E.g. just using en won't work, use en-usor en-gb) and the value is a long word string in that language (preferably more than 12 characters long).

If you want to force the usage of Hyphenopoly.js for a language (e.g. for testing purposes), write "FORCEHYPHENOPOLY" instead of the long word.

Hyphenopoly_Loader.js tests if the client (aka browser, aka user agent) supports CSS hyphenation for the language(s) given in require. In the example above, it will test if the client supports CSS-hyphenation for Latin, German and US-English.

If one of the given languages isn't supported, it automatically hides the document's contents and loads Hyphenopoly.js and the necessary WebAssembly modules.

Hyphenopoly.js – once loaded – will hyphenate the elements according to the settings and unhide the document when it's done.

If something goes wrong and Hyphenopoly.js is unable to unhide the document, Hyphenopoly_Loader.js has a timeout that kicks in after some time (defaults to 1000ms) and unhides the document and writes a message to the console.

If the browser supports all required languages, the script deletes the Hyphenopoly-object and terminates without further ado.

enable CSS-hyphenation

Hyphenopoly by default hyphenates elements (and their children) with the classname .hyphenate. Don't forget to enable CSS-hyphenation for the classes eventually handled by Hyphenopoly.

Usage (node)

Try hyphenopoly on RunKit

Install:

npm i hyphenopoly
import hyphenopoly from "hyphenopoly";

const hyphenator = hyphenopoly.config({
    "require": ["de", "en-us"],
    "hyphen": "•",
    "loader": async (file) => {
        const {readFile} = await import("node:fs/promises");
        const {dirname} = await import("node:path");
        const {fileURLToPath} = await import("node:url");
        const cwd = dirname(fileURLToPath(import.meta.url));
        return readFile(`${cwd}/../patterns/${file}`);
    },
    "exceptions": {
        "en-us": "en-han-ces"
    }
});

async function hyphenate_en(text) {
    const hyphenateText = await hyphenator.get("en-us");
    console.log(hyphenateText(text));
}

async function hyphenate_de(text) {
    const hyphenateText = await hyphenator.get("de");
    console.log(hyphenateText(text));
}

hyphenate_en("hyphenation enhances justification.");
hyphenate_de("Silbentrennung verbessert den Blocksatz.");

Support this project

PayPal

Automatic hyphenation

The algorithm used for hyphenation was developed by Franklin M. Liang for TeX. It works more or less like this:

  1. Load a set of precomputed language specific patterns. The patterns are stored in a structure called a trie, which is very efficient for this task.
  2. Collect all patterns that are a substring of the word to be hyphenated.
  3. Combine the numerical values between characters: higher values overwrite lower values.
  4. Odd values are hyphenation points (except if the hyphenation point is left from leftmin and right from rightmin), replace them with a soft hyphen and drop the other values.
  5. Repeat steps 2. - 4. for all words longer than minWordLength

Example:

Hyphenation
h y p h e n a t i o n
h y3p h
      h e2n
      h e n a4
      h e n5a t
         1n a
          n2a t
             1t i o
               2i o
                  o2n
h0y3p0h0e2n5a4t2i0o2n
Hy-phen-ation

The patterns are precomputed and available for many languages on CTAN and tex-hyphen. For Hyphenopoly.js they are converted to a succinct trie data structure (including pattern license, metadata, and the patterns).

The original patterns are computed from a large list of hyphenated words by a program called patgen. They aim to find some hyphenation points – not all – because it's better to miss a hyphenation point than to have some false hyphenation points. Most patterns are really good, but none are error free.

These patterns vary in size. This is mostly due to the different linguistic characteristics of the languages.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Stephan Hoyer
Stephan Hoyer

📖 💻
Thomas Broadley
Thomas Broadley

📖
Kai Lüke
Kai Lüke

💻
Sebastian Blank
Sebastian Blank

💡
ReLater
ReLater

🚧
julian-zatloukal
julian-zatloukal

📖
Maik Jablonski
Maik Jablonski

📖
yashha
yashha

💻
Dan Burzo
Dan Burzo

💻
Tobias Speicher
Tobias Speicher

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

hyphenopoly's People

Contributors

blankse avatar commanderroot avatar dependabot[bot] avatar mnater avatar pothos avatar relater avatar stephanhoyer avatar tbroadley avatar yashha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hyphenopoly's Issues

Question concerning lang attribute

Just a question if you have the time to answer.
Example: If a page starts with

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de-de" lang="de-de" >

(no other lang attributes) does Hyphenopoly understand that the pattern de.hpb is relevant for the whole page. Or do I have to add a lang="de" explicitely (to BODY tag or so)? Or duplicate the pattern file to de-de.hpb ?

The same with example en-au or similiar. Is there an automatic fallback to en-gb or other en-xy pattern? Or to en.hpb if there would be one?

Thank you!

Catch more errors

Currently no errors are catched nor handled.

  • add error handling for dangerous things (specifically: ressource loading, DOM manipulations)
  • see if an error event could be better than the error function in setup

Trigger hyphenation after AJAX content update?

I'm getting new content to a page with an AJAX request. The hyphenation works fine when the page is first loaded, but after ajax refresh no hyphenation is happening.

What is the recommended way to make Hyphenopoly to do it again? Not sure if this is a FAQ, but maybe have it documented as well.

All files in Hyphenopoly folder plus path modification

I put the Hyphenopoly files in a separate folder with the same name and put the folder in the root. But that produced 404 errors. Still, that is where I need them to be, and I'd think many more people do. Could you modify the package and paths as such? It would speed up the implementation of your system substantially.

provide node module

Something like this:

const Hyphenopoly = require("hyphenopoly.module");
const hyphenate = Hyphenopoly.createHyphenateFunction("de", {hyphen: "•"});
console.log(hyphenate("Silbentrennung")); // Sil•ben•tren•nung

make path and setup optional

Most users will probably leave the following lines unchanged:

paths: {
    patterndir: "../patterns/",
    maindir: "../"
},
setup: {
     classnames: {
        "hyphenate": {}
    }
}

So, they can be the default and overwritable…

On Load site is blinking

I don't know why but the whole site is hidden on Load and after is shown. This lead to site blinking while navigating through the site

minor issue with language codes that aren't region specific

Many general examples specify an html tag with a lang property like <html lang="en">, but Hyphenopoly uses region-specific pattern files like en-us and en-gb.

So, it doesn't work if I have code that looks something like this:

<html lang="en">
...
  <script>
    var Hyphenopoly = {
      require: {
        'en': 'automatically'
      },
      paths: {
        patterndir: "js/patterns/",
        maindir: "js/"
      },
      setup: {
        classnames: {
          'hyphenate': {},
          'description': {}
        }
      }
    };
  </script>
  <script src="js/Hyphenopoly_Loader.js"></script>

There are a thousand ways you could handle this, including just a note in the documentation to be aware of it. After I realized what was wrong, I just changed the lang property on my html tag and in the require object property.

Thank you.

Uncaught TypeError: Cannot read property 'maindir' of undefined

Please update the example in the readme with the example from example1.html.

The current example throws:
Uncaught TypeError: Cannot read property 'maindir' of undefined In Hyphenopoly_Loader.js:359

Because paths is undefined in that example.


For everyone with the same issue, use this:

        var Hyphenopoly = {
            require: {
                "la": "honorificabilitudinitas",
                "de": "Silbentrennungsalgorithmus",
                "en-us": "Supercalifragilisticexpialidocious"
            },
            paths: {
                patterndir: "./patterns/",
                maindir: "./"
            },
            setup: {
                classnames: {
                    "container": {
                    }
                }
            }
        };

Wordpress de-DE

Hi,
inside a Wordpress Theme im using this to get the language. For german i get "de-DE".

So if i add
"de-DE": "Silbentrennungsalgorithmus" as required, the pattern file "de-DE.hpb" is 404.

Now i copy the pattern file de.hpb and rename it to "de-DE.hpb"

But nothing happens. No Error Message in the console.

Is there anyway i can fix this, without geting rid of the wordpress function and write lang="de" by myself?

How to hide Error Msg?

how can I hide errors?
handleEvent: { error: function (e) { e.preventDefault(); return; /*don't show error messages in console*/ } }
seams not to work.
I am getting all the time error:
engine for language 'de' loaded, but no elements found.

Fallback-Definition for lang-Attrs

Hi there,

this is a great package, thank you for all the work you put in, this is a very mature and flexible solution for a complicated problem and everything so far works fine.

Except this one thing of course :D

https://github.com/mnater/Hyphenopoly/wiki/Global-Hyphenopoly-Object#fallbacks

If I'm trying to define fallbacks to map lang-Attr values to specific dictionaries I seem to be out of luck.

Example: I'm setting lang="de-CH" on html or another element and define a fallback "de-CH" : "de" in the Hyphenopoly config.

In that case he's loading the correct dict, but giving me the "Hyphenopoly_Loader.js:105 Element with 'de-ch' found, but 'de-ch.hpb' not loaded. Check language tags!" exception nonetheless.

After reading your source I think the reason is that you only use the fallbacks to determine which dict to load by remapping the request filename in Hyphenopoly_Loader., but do not use the fallbacks to determine the connection between elements and loaded dict in Hyphenopoly.js iself correctly.

Or am I missing something important here?

The plan is to decouple lang-Values from dictionary filenames to map CMS langs to specific hpb-files (to differentiate German in CMS content for countries, but not for the hyphenation algo).

WebAssembly.Memory() error when requiring many languages in Node

I've added 29 languages to the require statement, and now I'm getting the error below when running the program:

const hyphenator = hyphenopoly.config({
  require: Object.keys(hyphenationLanguages)
})
/.../node_modules/hyphenopoly/hyphenopoly.module.js:380
    const wasmMemory = new WebAssembly.Memory({
                       ^
RangeError: WebAssembly.Memory(): Property value 256 is below the lower bound 196
    at instantiateWasmEngine (/.../node_modules/hyphenopoly/hyphenopoly.module.js:380:24)

overflow

If a word is longer than 61 characters it overflows.

There are 128 bytes allocated for words encoded in UTF16, i.e. 64 characters.
Two fields (2 * 16bytes) are reserved for marking the beginning and end with "_" and one field for the length of the word. 64 - 2 - 1 = 61 characters.

Find perfomant solution to check for word lengths that are too long (e.g. in word-finding regex or throw an error, when hyphenateFunction gets a too long word).

Make FOUHC prevention optional

Sometimes it's annoying when the text "flashes".

An option like "hideOnHyphenate" (even if defaults to true) would be a nice thing in my opinion.

Make node-module sync

Currently the node module Hyphenopoly.module.js returns promises. This is great but doesn't work well with existing scripts that in generally work synchronous.

The module is currently async because the files (.wasm and .hpb are loaded async and the WebAssembly-Module is instantiated async).

With fs.readFileSync() and new WebAssembly.Instance() there are sync (allthough not recommended) alternatives available.

Use node version with browser.

Hi, thank you for tool!!
How can I use a node version with a browser?
Instead of trigger the 'contentLoaded', I want to call the library like node on client side.
Is it possible?

[Request] Pattern update for PTBR

Hi,

What version of PT pattern do you use in this repo? I did some testing and the PT pattern do not work properly with ptbr, I just founded an updated version here or in the babelpdf package for tex.

I'm a very new with TEX, so how do you compile the .tex file to hpb?

Possible bug in algorithm

I've stumbled over some strange/incorrect hyphens in some words.

To validate i tryed http://pyphen.org/ and compared on a large list of words. With a ton of differences. In this list I've found one wrong word (didn't look any further):
"zweihenklig" should be "zwei-henk-lig" but is "zwei-hen-klig"

It seems there are multiple pattern lists for german available therefore I've created a custom de.hpb with the patterns found in the MiKTeX Portable Package (6/30/2018) to fix this.

BUT: Then TeX and Hyphenopoly seem to disagree on other words (again i did not look further):
"zytosol" => "zyto-s-ol" in TeX: "zy-to-sol" (which is correct)
"indestructible" => "in-des-t-ruc-tible" in TeX: "in-de-struc-tible" (while not german this is almost correct)

Your de.hpd results in: "zy-to-sol" and "in-de-st-ruc-ti-ble"

Can you look into this?
I would like to avoid doing some ajax request to get this done with the Python solution.

I can provide a TeX test file and the custom de.hpb if you need it.

Implement faster pattern-loading and pattern-to-trie-conversion

Patterns are currently downloaded and interpreted as JavaScript and later converted to a trie. The trie is based on numbers, the patterns are characters, thus a lot of char-to-number conversions happen. This takes time (~7ms for en-us).

Concept:

  • define a pattern-file format. E.g.:
:3:<patterns of length 3>\nl
:4:<patterns of length 4>\nl
:n:<patterns of length n>\nl

with extra header fields for meta information.

  • load binary patterns as ArrayBuffer using XHR or File-API (data could be streamed, so conversion starts earlier)
  • Convert patterns in ArrayBuffer to trie using WASM (of fallback to asm.js)

Intent:

  • faster loading (not interpreting patterns as js)
  • faster execution (no conversions from string to charPoints)
  • support for full unicode range BMP
  • having fun by learning WASM

Use Selectors instead of classnames

Currently Hyphenopoly uses single classnames to select elements to be hyphenated.
L300:

const nl = w.document.querySelectorAll("." + cn);

Using fully featured CSS-Selectors would give more flexibility.

class-based settings

Different text-elements (e.g. title vs. main text) often require different settings.
So support of multiple classes with respective settings is required.

Remove getScriptPath

Let users decide where the resources (Hyphenopoly.js and patterns) are located.
(document.currentScript allocates lots of memory!)

Hyphenate one word

Ist there a way to call Hyphenopoly with one word to get returned an array with all possible linebreaks?

I need to just know where a word could break.

Thanks!

prepare("*") and prepare(lang)

In 080d5e7 engine-instantiation for .hpb's stored in the tmp hub-array was commented out (to get 100%) in coveralls.

That was an error!

With that commented out and if the hpb gets loaded before the engine (that may happen with small hpbs like pt.hpb) the promise returned by Hyphenopoly.config is never resolved.
(Thanks to isaacs on tapjs/tapjs#491 for the hint).

Therefor the code is reactivated with 93d7f31

Now, if no hpb gets loaded before the wasm-file in the test files we eventually won't get a coverage of 100%.

Thus I need to find a way to cover both cases in tests.

Hyphenate in small viewports only

Got mail from V.:

Find a way to support something like:

h1.hyphenate { hyphenate: no; } 

@media screen and (max-width: 300px){
    h1.hyphenate { hyphenate: yes; } 
}

Config Options

Thank you for Hyphenopoly.
I used Hyphenator in the past. Is it possible to use the config options like minwordlength with Hypenopoly, too?

contentEditable is possible?

Is it possible tu use this polyfill with contentEditable elements? I would like to edit and visualize in real time.

JS Interface Documentation

Hello,

I would like to use Hyphenopoly in a single-page application via a JS interface. I would image that should look something like the Node.js usage example but cannot find any docs on the topic.

Could you please help - where should I look?

Thanks and best regards,
Jordan

dontHyphenateClass setup option

First of all, thanks for this awesome work! I love it!
I struggled with the dontHyphenateClass setup option.

Reading the wiki section, I understood that the dontHyphenateClass can be modified (maybe I'm wrong). I did several tests and I didn't succeed to modify its default value.

The donthyphenate class works perfectly well (even if I try to modify the dontHyphenateClass value).

Is there something I missed?

Disable Fallback to Native

Is it possible to disable the fallback to native browser hyphenation? So hyphenopoly is always used?

Apparently, the native hyphenation rules of browsers are worse. At least for safari and german, I get much better results with your library.

settings for `minWordLength`, `leftmin` and `rightmin` in all Browsers

Many thanks for this extension. I hope I may ask something again.
I'm using hyphenopoly and hyphenation only recently.

In Firefox my settings for minWordLength, leftmin and rightmin are ignored. Here are the CSS setting used - I think.
Chrome used my settings for the options minWordLength, leftmin and rightmin.

Is there a way to use the options in all browsers?

Double hyphen in Firefox

Windows 10 Pro x64 Polish
Firefox 63.0.1 x64 Polish
Hyphenopoly (newest)

In Firefox Hyphenopoly adds hyphen in words x-abcdefg… (for example: e-commerce, e-bookami, n-kreślny, A-klasowy, L-karnityna, T-Mobile, u-kształtny, X-Factor)

doublehyphenfirefox

These are normal words in Polish (although the problem is not only about the Polish language).

If the problem is Firefox then maybe some workaround?

Maximum word length problem

Hi,
I have face a problem with the maximum length of the words that can be hyphenated.

For example I have initialize German language and trying to hyphenate word : "Kindercarnavalsoptochtvoorbereidingswerkzaamhedenplan" ->"Kin-der-car-na-vals-op-tocht-vo-or-berei-dings-werk-zaam-he-den"

Problem: The last 4 symbols are missing which happen on other languages too.

Is there any limitation by word length or this is some bug ?
Could you suggest workaround or some fix for current situation?

Regards,
Stefan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.