Code Monkey home page Code Monkey logo

ux's Introduction

; -- Mode: Markdown; -- ; vim: filetype=markdown tw=76 expandtab shiftwidth=4 tabstop=4

Unicode eXtension

License: Apache License, Version 2.0 <http://www.apache.org/licenses/LICENSE-2.0.html>_

Alternative license: LGPLv3 <http://http://www.gnu.org/licenses/lgpl-3.0.html>_

Author: Uvarov Michael ([email protected])

Unidata version: 6.1.0

Read edoc documentation <https://github.com/erlang-unicode/ux/blob/master/doc/README.md>_

Module for working with strings. A string is a flatten list of Unicode characters.

All actions with Unicode were described in the Unicode Standards <http://www.unicode.org/reports/>_.

.. image:: https://secure.travis-ci.org/erlang-unicode/ux.png?branch=master :alt: Build Status :target: http://travis-ci.org/erlang-unicode/ux

This library realized only these documents:

  • UAX 15 <http://www.unicode.org/reports/tr15/>_ Unicode Normalization Forms
  • UTS 10 <http://www.unicode.org/reports/tr10/>_ Unicode Collation Algorithm

and some parts from:

  • UAX 44 <http://www.unicode.org/reports/tr44/>_ Unicode Character Database

Structure of the library

ux_string uses ux_char and ux_unidata.

ux_uca uses ux_char and ux_unidata.

ux_char uses ux_unidata.

ux_unidata is for an internal data access.

ux_string.erl: String Functions for lists of Unicode characters.

This module provides the functions for operations with UNIDATA <http://www.ksu.ru/eng/departments/ktk/test/perl/lib/unicode/UCDFF301.html>_. UNIDATA contains data about Unicode characters.

Functions for working with Unicode Normal Forms (UNF)

  • to_nfc/1
  • to_nfd/1
  • to_nfkd/1
  • to_nfkc/1
  • is_nfc/1
  • is_nfd/1
  • is_nfkc/1
  • is_nfkd/1

Functions from stdlib for Unicode strings

  • to_lower/1
  • to_upper/1

Functions for processing strings as groups of graphemes

Grapheme is a letter with its modifiers.

  • length/1
  • reverse/1
  • first/2
  • last/2

Examples

Code:

.. code-block:: erlang

(ux@delta)11> ux_string:length("FF g̈").
4
(ux@delta)12> string:len("FF g̈").       
5
(ux@delta)13> ux_string:to_graphemes("FF g̈").
["F","F"," ",[103,776]]

"PHP-style" string functions

  • explode/2,3
  • html_special_chars/1 (htmlspecialchars in php)
  • strip_tags/1,2

Examples


Code:

.. code-block:: erlang

    ux_string:explode(["==", "++", "|"], "+++-+=|==|==|=+-+++").

Result:

.. code-block:: erlang

    [[],"+-+=",[],[],[],[],"=+-","+"]

Code:

.. code-block:: erlang

    ux_html:strip_tags("<b>bold text</b>").

Result:

.. code-block:: erlang

    "bold text"

Types function
--------------

Type is a General Category in Unicode.

Code:

.. code-block:: erlang

    Str = "Erlang created the field of telephone
    networks analysis. His early work in scrutinizing the use of local, exchange
    and trunk telephone line usage in a small community, to understand the
    theoretical requirements of an efficient network led to the creation of the
    Erlang formula, which became a foundational element of present day
    telecommunication network studies.",
    ux_string:explode_types(['Zs', 'Lu'], Str).

Result:

.. code-block:: erlang

    [[],"rlang","created","the","field","of","telephone",
     "networks","analysis.",[],"is","early","work","in",
     "scrutinizing","the","use","of","local,","exchange","and",
     "trunk","telephone","line","usage","in","a","small",
     [...]|...]

Code:

.. code-block:: erlang

    ux_string:types(Str).

Result:

.. code-block:: erlang

    ['Lu','Ll','Ll','Ll','Ll','Ll','Zs','Ll','Ll','Ll','Ll',
     'Ll','Ll','Ll','Zs','Ll','Ll','Ll','Zs','Ll','Ll','Ll','Ll',
     'Ll','Zs','Ll','Ll','Zs','Ll'...]

Where atom ``'Lu'`` is Letter, Uppercase; ll is Letter, Lowercase. Read
more about types from description of ``ux_char:type/1``.

Code:

.. code-block:: erlang

    ux_string:delete_types(['Ll'], Str).

Result:

.. code-block:: erlang

    "E       . H        ,          ,                E ,           ."

ux_char.erl: Char Functions
===========================

Code:

.. code-block:: erlang

    ux_char:type($ ).

Result:

.. code-block:: erlang

    'Zs'

`List of types <http://www.ksu.ru/eng/departments/ktk/test/perl/lib/unicode/UCDFF301.html#General%20Category>`_
---------------------------------------------------------------------------------------------------------------

-  Normative Categories:

   -  Lu Letter, Uppercase
   -  Ll Letter, Lowercase
   -  Lt Letter, Titlecase
   -  Mn Mark, Non-Spacing
   -  Mc Mark, Spacing Combining
   -  Me Mark, Enclosing
   -  Nd Number, Decimal Digit
   -  Nl Number, Letter
   -  No Number, Other
   -  Zs Separator, Space
   -  Zl Separator, Line
   -  Zp Separator, Paragraph
   -  Cc Other, Control
   -  Cf Other, Format
   -  Cs Other, Surrogate
   -  Co Other, Private Use
   -  Cn Other, Not Assigned (no characters in the file have this
      property)

-  Informative Categories:

   -  Lm Letter, Modifier
   -  Lo Letter, Other
   -  Pc Punctuation, Connector
   -  Pd Punctuation, Dash
   -  Ps Punctuation, Open
   -  Pe Punctuation, Close
   -  Pi Punctuation, Initial quote (may behave like Ps or Pe depending
      on usage)
   -  Pf Punctuation, Final quote (may behave like Ps or Pe depending on
      usage)
   -  Po Punctuation, Other
   -  Sm Symbol, Math
   -  Sc Symbol, Currency
   -  Sk Symbol, Modifier
   -  So Symbol, Other

ux_uca.erl: Unicode Collation Algorithm
=======================================

See `Unicode Technical Standard #10 <http://unicode.org/reports/tr10/>`_.

Functions
---------

-  compare/2,3
-  sort/1,2
-  sort_key/1,2
-  sort_array/1,2
-  search/2,3,4

Examples
--------

Code from erlang shell:

.. code-block:: erlang

    1> ux_uca:sort_key("a").   
    <<21,163,0,0,32,0,0,2,0,0,255,255>>

    2> ux_uca:sort_key("abc"). 
    <<21,163,21,185,21,209,0,0,34,0,0,4,0,0,255,255,255,255,
      255,255>>

    3> ux_uca:sort_key("abcd").
    <<21,163,21,185,21,209,21,228,0,0,35,0,0,5,0,0,255,255,
      255,255,255,255,255,255>>

Code:

.. code-block:: erlang

    ux_uca:compare("a", "a").
    ux_uca:compare("a", "b").
    ux_uca:compare("c", "b").

Result:

::

    equal
    lower
    greater

Code:

.. code-block:: erlang

    Options = ux_uca_options:get_options([ 
            {natural_sort, false}, 
            {strength, 3}, 
            {alternate, shifted} 
        ]),
    InStrings = ["erlang", "esl", "nitrogen", "epm", "mochiweb", "rebar", "eunit"],
    OutStrings = ux_uca:sort(Options, InStrings),
    [io:format("~ts~n", [S]) || S <- OutStrings],

    SortKeys = [{Str, ux_uca:sort_key(Options, Str)} || Str <- OutStrings],
    [io:format("~ts ~w~n", [S, K]) || {S, K} <- SortKeys],

    ok.

Result:

::

    epm
    erlang
    esl
    eunit
    mochiweb
    nitrogen
    rebar
    epm [5631,5961,5876,0,32,32,32,0,2,2,2]
    erlang [5631,6000,5828,5539,5890,5700,0,32,32,32,32,32,32,0,2,2,2,2,2,2]
    esl [5631,6054,5828,0,32,32,32,0,2,2,2]
    eunit [5631,6121,5890,5760,6089,0,32,32,32,32,32,0,2,2,2,2,2]
    mochiweb [5876,5924,5585,5735,5760,6180,5631,5561,0,32,32,32,32,32,32,32,32,0,2,2,2,2,2,2,2,2]
    nitrogen [5890,5760,6089,6000,5924,5700,5631,5890,0,32,32,32,32,32,32,32,32,0,2,2,2,2,2,2,2,2]
    rebar [6000,5631,5561,5539,6000,0,32,32,32,32,32,0,2,2,2,2,2]
    ok

Searching
---------

Code:

.. code-block:: erlang

    (ux@delta)30> ux_uca:search("The quick brown fox jumps over the lazy dog.",
    "fox").
    {"The quick brown ","fox"," jumps over the lazy dog."}

    (ux@delta)33> ux_uca:search("The quick brown fox jumps over the lazy dog.",
    "cat").         
    false

Searching and Strength
----------------------

Code:

.. code-block:: erlang

    (ux@delta)20> CF = fun(S) -> ux_uca_options:get_options([{strength,S}]) end.      
    #Fun<erl_eval.6.80247286>

    (ux@delta)32> ux_uca:search(CF(2), "The quick brown fox jumps over the lazy
    dog.", "dog", maximal).
    {"The quick brown fox jumps over the lazy"," dog.",[]}

    (ux@delta)21> ux_uca:search(CF(2), "fF", "F").                                    
    {[],"f","F"}

    (ux@delta)22> ux_uca:search(CF(3), "fF", "F").
    {"f","F",[]}

Searching and Match-Style
-------------------------

Code:

.. code-block:: erlang

    (ux@delta)20> CF = fun(S) -> ux_uca_options:get_options([{strength,S}]) end.      
    #Fun<erl_eval.6.80247286>

    (ux@delta)27> ux_uca:search(CF(3), "! F   ?S?", "! F !", 'minimal').
    {"! ","F","   ?S?"}

    (ux@delta)28> ux_uca:search(CF(3), "! F   ?S?", "! F !", 'maximal').
    {[],"! F   ?","S?"}

    (ux@delta)29> ux_uca:search(CF(3), "! F   ?S?", "! F !", 'medium'). 
    {[],"! F ","  ?S?"}

ux_unidata.erl
==============

Stores UNIDATA information. For internal using only.

Data loading
============

.. code-block:: erlang

    ux_unidata_filelist:set_source(Level, ParserType, ImportedDataTypes,
    FromFile).

For example:

.. code-block:: erlang

    ux_unidata_filelist:set_source(process, blocks, all, code:priv_dir(ux) ++ "/UNIDATA/Blocks.txt").

loads data about Unicode blocks from ``priv/UNIDATA/Blocks.txt``.

So, different processes can use their own unidata dictionaries.

Level is ``process``, ``application`` or ``node``.

Parsers are located into ``ux_unidata_parser_*`` modules.

Default unidata files are loaded when the application tries get the
access to them.

ux's People

Contributors

arcusfelis avatar boozelclark avatar dluna avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ux's Issues

race conditions? randomly on custom deployments

I have an app, that using idna (and ux) very often. I'm generating release using rebar generate. Release usually cannot start with following error (but sometimes it starts - on slower servers):

=ERROR REPORT==== 11-Nov-2012::12:52:51 ===
** Generic server ux_unidata_server terminating 
** Last message in was {get_default,{unidata,to_lower}}
** When Server state == []
** Reason for termination == 
** {{case_clause,<0.454.0>},
    [{ux_unidata_server,handle_call,3,
                        [{file,"src/unidata/ux_unidata_server.erl"},
                         {line,102}]},
     {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,588}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}

=CRASH REPORT==== 11-Nov-2012::12:52:51 ===
  crasher:
    initial call: ux_unidata_server:init/1
    pid: <0.169.0>
    registered_name: ux_unidata_server
    exception exit: {{case_clause,<0.454.0>},
                     [{ux_unidata_server,handle_call,3,
                          [{file,"src/unidata/ux_unidata_server.erl"},
                           {line,102}]},
                      {gen_server,handle_msg,5,
                          [{file,"gen_server.erl"},{line,588}]},
                      {proc_lib,init_p_do_apply,3,
                          [{file,"proc_lib.erl"},{line,227}]}]}
      in function  gen_server:terminate/6 (gen_server.erl, line 747)
    ancestors: [ux_sup,<0.161.0>]
    messages: []
    links: [<0.166.0>]
    dictionary: [{{unidata,to_lower},<0.454.0>},
                  {#Ref<0.0.0.2659>,{comp_exclusions,is_exclusion}},
                  {{comp_exclusions,is_exclusion},
                   #Fun<ux_unidata_filelist.2.116846164>},
                  {#Ref<0.0.0.2603>,{unidata,to_lower}}]
    trap_exit: false
    status: running
    heap_size: 2584
    stack_size: 24
    reductions: 407
  neighbours:

=SUPERVISOR REPORT==== 11-Nov-2012::12:52:51 ===
     Supervisor: {local,ux_sup}
     Context:    child_terminated
     Reason:     {{case_clause,<0.454.0>},
                  [{ux_unidata_server,handle_call,3,
                       [{file,"src/unidata/ux_unidata_server.erl"},
                        {line,102}]},
                   {gen_server,handle_msg,5,
                       [{file,"gen_server.erl"},{line,588}]},
                   {proc_lib,init_p_do_apply,3,
                       [{file,"proc_lib.erl"},{line,227}]}]}
     Offender:   [{pid,<0.169.0>},
                  {name,ux_unidata_server},
                  {mfargs,{ux_unidata_server,start_link,[]}},
                  {restart_type,permanent},
                  {shutdown,10000},
                  {child_type,worker}]

..............

=SUPERVISOR REPORT==== 11-Nov-2012::13:01:08 ===
     Supervisor: {local,ux_sup}
     Context:    shutdown
     Reason:     reached_max_restart_intensity
     Offender:   [{pid,<0.895.0>},
                  {name,ux_unidata_server},
                  {mfargs,{ux_unidata_server,start_link,[]}},
                  {restart_type,permanent},
                  {shutdown,10000},
                  {child_type,worker}]


=INFO REPORT==== 11-Nov-2012::13:01:08 ===
    application: ux
    exited: shutdown
    type: permanent
** Found 0 name clashes in code paths 
{"Kernel pid terminated",application_controller,"{application_terminated,ux,shutdown}"}

Crash dump was written to: ./log/erl_crash.dump
Kernel pid terminated (application_controller) ({application_terminated,ux,shutdown})

friendly apache 2 license ?

Hi,

I was about to propose a patch to eventually replace our use of icu by ux in CouchDB but actually you are using the LGPL3 license which is on the list of licenses where the product "may NOT be included within Apache " :

http://apache.org/legal/resolved.html#category-x

So I'm asking if you could eventually bump ux to an apache friendly license (maybe apache 2 or any bsd) maybe double license it ? Just asking anyway, I understand that a license is a personal choice.

Let me know.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.