Code Monkey home page Code Monkey logo

node-diacritics's Introduction

node-diacritics

remove diacritics from strings

useful when implementing some kind of search or filter functionality.

Installation

$ npm install diacritics

API

var removeDiacritics = require('diacritics').remove;
console.log(removeDiacritics("Iлtèrnåtïonɑlíƶatï߀ԉ"));
// prints "Internationalizati0n"

node-diacritics's People

Contributors

andrewrk avatar cesine avatar j3bb9z avatar jengeb avatar rachel-carvalho avatar thejoshwolfe avatar trott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-diacritics's Issues

Work in console but not in my code

Hi, thanks for this tool but i've a very strange bug, i use node.js webkit.

It don't work in my function :
image

But it work when i call the function in the console 😒 :
image

WTF ? Do you have any idea ? Thanks

Support for Arabic?

Hello. I wrote this function on stackoverflow that removes the most common Arabic diacritics, as well as normalizing some characters that are commonly (although incorrectly) substituted in place of each other.

Please feel free to incorporate the code in your library as you wish.

л is not a diactric

This a cyrillic lowercase L (Л) which is not a diactric symbol

U+043B
U+041B

List of diacritrics

Do you have a list of diacritrics that get converted?

In Italy, the fiscal code ("codice fiscale") has recently changed in a way that all the diacritrics are converted to ASCII characters. This table has been provided for the conversion.

How can I know if those characters are actually supported by your module, given that there's just a list of Unicodes in the source code?

Thanks

No case-sensitive option

It would be nice to have a version of the function that conserved case, as much as possible. (I.e., turn œ into oe and ș into s, but turn Æ into AE and Ș into S.)

Use Unicode normalization?

Unicode defines normalized forms for characters and character classes.

It might work to normalize strings to NFKD and remove any characters of class Mn (Nonspacing_Mark) (see table 12)

It might be necessary to specially handle conversions like ß to ss

See also python stack overflow answer

Combining diacritics

> console.log(removeDiacritics("más"));
mas
> console.log(removeDiacritics("más"));
más

The first has U+00E1 LATIN SMALL LETTER A WITH ACUTE and the second has U+0061 LATIN SMALL LETTER A + U+0301 COMBINING ACUTE ACCENT.

Add a changelog

So we can follow what is being changed on the releases. Thanks :)

.find() method to retrieve all the group of diacritics from a specific char

Hi @andrewrk,

What do you think about this? Right now i'm facing a case where i need to have a group of all possible diacritics from a specific char. I remembered about your great list of diacritics, and that your package is named as 'diacritics', and not something like 'remove-diacritics', so i thought that would be better to extend it with one more method instead of create another package.

I already created the new method:

function findDiacritics(chr) {

  var diacriticsFound = replacementList.find( o => o.base == chr || o.chars.indexOf(chr) >= 0 );
  return (diacriticsFound)? diacriticsFound.base + diacriticsFound.chars : null;

}

If you think it is ok, i can send you a pull request.

String.prototype.normalize

'Sîne klâwen Johan Öbert'.normalize('NFKD').replace(/[\u0080-\uF8FF]/g, '');
//-> 'Sine klawen Johan Abert'

Weird behavior for "Cassiopée" string

Looks like "C.....é..." always leads to having the 'C' turned in a lower-case 'c'.

See below:

> var removeDiacritics = require('diacritics').remove;
undefined
> removeDiacritics("Cassiopée")
'cassiopee'
> removeDiacritics("Cassiopee")
'Cassiopee'
> removeDiacritics("Bassiopée")
'Bassiopee'
> removeDiacritics("Clé")
'cle'
> removeDiacritics("Cé")
'ce'
> 

`ß` should be `ss` not `s`

In German a ß should be replaced by ss. See http://www.duden.de/sprachwissen/rechtschreibregeln/doppel-s-und-scharfes-s

Regel 160:

  1. Fehlt das ß auf der Tastatur eines Computers oder einer Schreibmaschine, schreibt man dafür ss. In der Schweiz kann das ß generell durch ss ersetzt werden <§ 25 E2>.

Rule 160:

  1. In case the ß is missing on the keyboard, ss can be used instead. In Switzerland ß can be replaced by ss in general.

I'm a native German speaker and I can confirm that Fußball and Weißbier should become Fussball and Weissbier.

Handling of German umlaute

German umlaute are currently transformed by just removing diacritics. It would however make more sense to transform them the following way:

ä -> ae
ö -> oe
ü -> ue
ß -> ss
Ä -> Ae
Ö -> Oe
Ü -> Ue

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.