inexorabletash / text-encoding Goto Github PK

View Code? Open in Web Editor NEW

715.0 35.0 267.0 2.16 MB

Polyfill for the Encoding Living Standard's API

License: Other

JavaScript 96.67% HTML 3.33%

encoding polyfill

text-encoding's People

Contributors

Stargazers

Watchers

Forkers

rickeyre emailjs paivaric pmq favila mpls orzundher rillke pabl0v webcoding cosium coolaj86 paddymahoney auvik agilebits foolin toenseo mcanthony arv langri-sha eugenpodaru mobilekosmos sedwards2009 cgwyllie dictbox ladislavsopko rasata maka-io dizuo icellan mstaas kaiqinetwork keeweb cnooljaw chenrui2014 uphold-forks 19317362 robmclarke lodexinc waipptt cloudtracer hectorm documatrix cr0ck therufa heyuanchao daniel-007 717009629 lopidav calvinmetcalf leer561 loophq javonc pengfei2017 aiham bugapple ferrymanz lele-110 bjlaa wdlomeur georges-grajchen c01nd01r berzerk-interactive fuchunfen alexxnica kryndex sedukhinapolina rafaelfigueiredo83 canv15 littlemyx zhengxingjian shadowkimi520 jonathanperret mvasilkov avaer lizarani-vine pengxiang01 meganz n040661 icefoxs z121388038 zhengwk chezhi timothygu meggiepeng tjysunset qianlidongfeng dhilip89 mikeswei myvn alaanasreng liangnex dyl169 webcirque remember888 ameshkov ivankudinov ko-haga brewer-algosec sinonjs

text-encoding's Issues

throw TypeError on "GBK"

Hi!, I'm a Chinese.

when I use the library like :

var dataString = '"操作","运单状态","运单号","货物名称","件数","寄件客户","寄件人电话","寄件地址","收件客户","收件人电话","收件地址","寄件网点","目的地","目的网点","重量(kg)","体积(M³)","到货时间","包装","到付款","代收货款","回付金额","寄货日期","回单号","回单","回单备注","提货密码"'

var textEncoder = new CustomTextEncoder('gbk', {NONSTANDARD_allowLegacyEncoding: true});

var csvContentEncoded = textEncoder.encode([dataString]);

var blob = new Blob([csvContentEncoded], {type: 'text/csv;charset=gbk;'});

Uncaught TypeError: The code point 179 could not be encoded.

some words could not be encoded ?

I found that 'M³' in the words caused, 'M' is ok, can support 'M³' ?

publish encoding-indexes as it's own module

to make it easier to include

Encode in windows-1252 Problems

I have Installed this package over Bower
bower install text-encoding
It is in my index.html

...
  <body ng-app="myApp">
    <script>
    window.TextEncoder = window.TextDecoder = null;
    </script>
...
  <!-- build:js({client,node_modules}) app/vendor.js -->
      <!-- bower:js -->
   <script src="bower_components/text-encoding/lib/encoding.js"></script>
      <script src="bower_components/text-encoding/lib/encoding-indexes.js"></script>
      <!-- endbower -->

Iam using it here

  case 'php':
         ...
            var uint8array = new TextEncoder(
              'windows-1252', { NONSTANDARD_allowLegacyEncoding: true }).encode(str);
              data = new Blob([uint8array], {
              type: 'application/json; charset=windows-1250;'
            });
          this.FileSaver.saveAs(data, this.$scope.selectedLang[i].code + '.php');
          }

It throws the Error
Encoder not present. Did you forget to include encoding-indexes.js?
Where did I something wrong?
Removing

  <script>
    window.TextEncoder = window.TextDecoder = null;
    </script>

Errors then
Failed to construct 'TextEncoder': The encoding provided ('windows-1252') is not one of 'utf-8', 'utf-16', or 'utf-16be'
So it finds it but cant use because of the Pollyfill ?
So did I placed the

  <script>
    window.TextEncoder = window.TextDecoder = null;
    </script>

Somewhere wrong? Where have it to be?

npm/bower

TextEncoder throws TypeError on 'windows-1252'

Hi, and first of all thanks for this wonderful library. Much needed to export CSV files in CP1252 in my case. Fun!

I'm using your library like this : var array = TextEncoder('windows-1252').encode(text);

To make it work, I had to change encoding.js:933 to suppress the checks forcing the encoding name to be either 'utf-8', 'utf-16le' or 'utf-16be'. I'm not sure why this is needed, but I'm probably terribly wrong.

Anyway, removing this makes it work perfectly. Will generate a pull request so you can read the diff more easily. Thanks again!

Publish ES6 modules

A lot of tool chains are moving towards using ES6 modules (rollup for example) and having ES6 modules published to npm would make this package easier to consume.

npm outdated

The npm package is still at version 0.6.2. Could you publish it?

Proposal: use native string_decoder for popular encodings

Using https://nodejs.org/api/string_decoder.html should theoretically significantly decrease size of manually supported code as it already provides proper decoding algorithms for popular (supported in Node) encodings.

What is the copyright from test/testharness*.js ?

I know the license is the BSD-3-clause one, but what worth is a license mentioning the copyright holders if they're not made explicit?

I can't help but notice that testharness.js looks like a mix of the W3C version and it's api -- so perhaps a copyright to the W3C is correct?

Remove require of encoding-indexes.js

Right now encoding-indexes.js is always required in a node like environment (such as browserify).

This was done in a85ce6f

It would be better if encoding-indexes.js was never implicitly included.

Null terminator option for encoder (Feature Request).

I'm serializing stuff to a C++ library that wants null terminated strings. I noticed a "TODO: any options?" in your code, and this seems like a candidate. Would a pull request offering such an option be welcomed?

Add unit tests for optional parameters

The current unit tests pass in all parameters, but several of the parameters are specified (and implemented in the shim) as optional. Verify that the optional parameter behavior is per spec.

License ?

I couldn't find a license in the repo. What is the license for this project? BSD, MIT?

Thanks

Incorporate Shift_JIS PUA fix

whatwg/encoding@236196e

windows-1255 map 0xCA to U+05BA

Track the spec update: whatwg/encoding@e32a57b

Bug: whatwg/encoding#73

Seems to be broken under node 0.12.0

Since the upgrade it gets stuck in an endless loop.

stringencoding vs text-encoding

I am using stringencoding in one of my projects... but it feels pretty dead to me
https://www.npmjs.com/package/stringencoding
https://github.com/defunctzombie/stringencoding

Is seems text-encoding is an updated fork, can someone explain the relation between stringencoding and text-encoding, what is the functional difference.

Release new version on npm?

I'd like to use the new NONSTANDARD_allowLegacyEncoding option, but it hasn't been released in a version on npm yet.

Reason for going from Apache 2 license to Unlicense?

I am just curious, what was the reason to make 630eed1 happen?

I was about to request changing to MIT license from Unlicense, but instead opened this question...

Usage outside of Polyfill

I'm working on a tool to generate a CSV export of some data in a browser, and unfortunately, the encoding situation is a bit of a mess. TL/DR, it seems that the best outcome results in a tab separated, utf-16-le encoded file. [1][2]

I'm interested in using this package not as a polyfill, but by directly loading it when I need to perform that encoding. Would you be open to a PR that moved the core encoding functionality into a standalone module, and implementing the polyfill as a consumer of that standalone library?

[1] https://donatstudios.com/CSV-An-Encoding-Nightmare
[2] http://terminaln00b.ghost.io/excel-just-save-it-as-a-csv-noooooooo/

Divide into Encode/Decode

It would be nice to be able to only have the Encode or Decode code, the code is really long and for my actual project that must be as small as possible I am needing only the Encode part, is this possible to do?

Remove U+FFFD entries in euc-kr index

Per: whatwg/encoding@7991e7b

... which really just means: sync to the indexes.json file

treatment of invalid 2-byte sequence is different in different CJK encodings

whatwg/encoding@640bf69

Special cases for U+2022 should be for U+2212

Per whatwg/encoding#21

Submit tests to wpt.

I'd like to get these APIs implemented in Servo, but I'm unwilling to do one-off test imports. @inexorabletash, are you willing to do the work on this?

"Indexes missing" error is thrown no matter what.

Whenever I try to use encoding.js without encoding-indexes.js it throws this error. It didn't used to be like that, it was optional, if I recall correctly. Did something change?

If it is required, why not just put the files together?

Support ArrayBuffer

...not just ArrayBufferView, per spec (via WebIDL/BufferSource)

Decoder not present in Node.

I am not able to reproduce this error myself, but I have received several issue reports to my shapefile parser of an error generated by text-encoding: 1 2 3. The error is:

Decoder not present. Did you forget to include encoding-indexes.js first?

But since the Node API is simply require("text-encoding"), the error is internal, because the text-encoding library should be loading the indexes itself in Node.

In an attempt to workaround this error, I reverted shapefile to use [email protected] rather than 0.6.3, as it appears that changes 8c43765 b8503e7 may be related. Sorry I don’t have any more information to go on. If users report any more details I will relay them here. Thank you!

Non standard encoding in Firefox

The option NONSTANDARD_allowLegacyEncoding: true works fine in Chrome and Internet Explorer but is been ignored in Firefox.

encoding-indexes aren't properly loaded

Getting Uncaught TypeError: Cannot read property 'ibm866' of undefined error in client-console. It's likely an issues occurs due to expectation that encoding-indexes.js returns an object with encoding-indexes whereas it assigns them to global object.

Decode to =?GB18030?B?lDnaMw==?=: failed

The following GB18030 decode incorrect.
=?GB18030?B?lDnaMw==?=:

Problem with encoding SJIS text

I am having problems encoding Shift_JIS text, If I try to encode a character (in my case "点"), instead of getting 2 bytes I get 3 incorrect bytes. Decoding seems to work ok. Am I missing something?

Test code:

const textEncoder = new TextEncoder("Shift_JIS")
const textDecoder = new TextDecoder("Shift_JIS")

const text = "点"

console.log("Decode from bytes:")
const bytes = Uint8Array.from([0x93,0x5f])
const textResultFromBytes = textDecoder.decode(bytes)
console.log(bytes, "->", textResultFromBytes, textResultFromBytes === text ? "OK": "NOT OK")

console.log("Encode text then decode again:")
const bytesOfText = textEncoder.encode(text)
const textResultFromText = textDecoder.decode(bytesOfText)
console.log(text, "->", bytesOfText, "->", textResultFromText, textResultFromText === text ? "OK": "NOT OK")

Console output:

Decode from bytes:
Uint8Array(2) [147, 95] "->" "点" "OK"
Encode text then decode again:
点 -> Uint8Array(3) [231, 130, 185] -> 轤ｹ NOT OK

This quick test is also available here:
https://jsfiddle.net/aleris/pt17dkz3/

Contribute to https://cdn.polyfill.io/

I would love to see this feature polyfill if (and only if) I need it!

https://cdn.polyfill.io/

Please tag releases

Could you please tag releases, so github makes release tarballs?

Thanks!

Override TextEncoder()

Hi all,

chrome has its own implementation of TextEncoder(), but it doesn't support the legacy encoding.

I use your implementation simply replacing "TextEncoder" with "NewTextEncoder" directly editing your source code.

There is a better way?

Thanks

Add unit tests for invalid parameters

The current tests don't validate the behavior of the implementation when invalid parameters are passed. Add tests for these cases.

Add way to override native implementation

Add MS932 as alias for shift-jis

Add MS932 as alias for shift-jis:
whatwg/encoding@01db1f8

Inconsistent behavior

new TextEncoder().encode(false) // Chrome, Firefox: [102, 97, 108, 115, 101]
new TextEncoder().encode(0)     // Chrome, Firefox: [48]

new TextEncoder().encode(false) // Polyfill: []
new TextEncoder().encode(0)     // Polyfill: []

Undocumented breaking changes introduced somewhere between 0.5.2 and 0.5.5

Hi,

We depended on ^0.5.2, and our CI server just installed 0.5.5, which broke the build. We received an exception thrown from here https://github.com/inexorabletash/text-encoding/blob/master/lib/encoding.js#L979.

The solution is simple enough (a change from TextDecoder('utf-8').decode(...) to creating an instance of var decoder = new TextDecoder('utf-8'); decoder.decode(...)), but it caused us some debug time to fix because the documentation isn't up to date, and the breaking change was introduced in a patch release.

Was this intentional, or are the documentation updates just pending?

Thanks!

CP437 Encoder/Decoder

Just thinking it would be nice to have a CP437 (IBM US-DOS Extended ASCII) convertors available.

There have been a number of projects to convert ansi/ascii art for display in modern UTF systems... It would be a nice to have feature if this were available here... I'm not sure about the format used in the lib/encoding-indexes.js file... Is the array simply a mapping of position X to character code/pair Y? Does this start at the byte value of 1 (skipping null)? And am I correct in assuming I can go all the way from character 1 through 255, and does null mean don't map, or default map?

I'd be happy to make a PR if these questions could be clarified... possibly adding comments as to the structure of the encoding-indexes in the file itself.

Encoding failing with windows-1254

I found a problem when trying to encode a string with turkish characters in windows 1254:

TypeError: Cannot read property 'indexOf' of undefined
     at indexPointerFor (encoding.js:823)
     at SingleByteEncoder.handler (encoding.js:1434)
     at Object.encode (encoding.js:1122)

Way to reproduce:

var textEncoder = new CustomTextEncoder('windows-1254', {NONSTANDARD_allowLegacyEncoding: true}),
    contentEncoded = textEncoder.encode(["Tuğce"]);