Code Monkey home page Code Monkey logo

forceutf8's People

Contributors

byjg avatar codelingobot avatar hugopakula avatar j03k64 avatar j0k3r avatar mcuadros avatar mmarynich avatar neitanod avatar neoteknic avatar pborreli avatar podolinek avatar postalservice14 avatar redolent avatar superhero avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

forceutf8's Issues

Support for  before group B in toUTF8 function

The toUTF8 function is currently unusable for me because of  not supported before group B. The non-breaking space character is alas, widely used in various sites. Would it be possible to add a specific fix for this character when followed by group B characters?

unable to fix a string

Hi i try to fix a garbled UTF8 string but i don't know why it doesn't work :

require_once("./Encoding.php");

echo \ForceUTF8\Encoding::fixUTF8("Fédération Camerounaise de Football\n");

echo \ForceUTF8\Encoding::fixUTF8("Fédération Camerounaise de Football\n");

echo \ForceUTF8\Encoding::fixUTF8("Fédération Camerounaise de Football\n");

echo \ForceUTF8\Encoding::fixUTF8("Fédération Camerounaise de Football\n");

$data = "Année 80 International 587 tubes année par année";

$new = \ForceUTF8\Encoding::UTF8FixWin1252Chars($data);
echo "Test 1 
"; echo $new; $new = \ForceUTF8\Encoding::fixUTF8($data); echo "Test 2
"; echo $new;

Return me this :

Fédération Camerounaise de Football 

Fédération Camerounaise de Football 

Fédération Camerounaise de Football 

Fédération Camerounaise de Football 

Test 1 
Année 80 International 587 tubes année par année

Test 2 
Année 80 International 587 tubes année par année

Thanks for the help

Failed to detect 4 bytes chars

I'm putting the text as quoted-printable

Testing emojis =F0=9F=98=84

That char (F09F9884) doesn't get correctly detected

Some UTF8 characters become ? with fixUTF8

Hello neitanod, great lib!
However, some characters are converted into question marks with fixUTF8 method.
For example, the whole Russian alphabet and letters š and ž.

echo ForceUTF8::fixUTF8('hello žš'); //outputs hello ??
echo ForceUTF8::fixUTF8('привет'); //outputs ?????? 

Garbles some characters

The script garbles some characters.
For instance if applied directly to a garbled source that contains "'â€�'", this script will convert it into "â??"
When using str_replace first, to turn "'â€�'" into the correct "”", then fixUTF8 will convert the "”" into a simple "?"

The problem is that forceUTF8 will turn almost all of these characters into the same representation, thus making it impossible to apply both str_replace and forceUTF8 on the same string.

Here's the chars that it doesn't correctly convert:

'“'
'�'
'’'
'‘'
'—'
'–'
'•'
'…'

Create a release

It would be really good if you could create a release for this so we can target a specific release of this. I feel uncomfortable targeting 'master'.

Thanks in advance!

fixUTF8 Problem with certain characters

Hi, I have found that the fixUTF8 has issues when the input string has ligature characters such as Œ, this is converted to a ? sign, even tough the input string does not need any fixing.

Input: Café Nöel Œuf Aoüt
Output: Café Nöel ?uf Août

script breaks two-byte (e.g. Czech) symbols

$ cat a.php
<?php
require_once('Encoding.php');
use \ForceUTF8\Encoding;
$str = 'šřěī';
$newstr = Encoding::fixUTF8($str);
var_dump($str);
var_dump($newstr);

$ php a.php
string(8) "šřěī"
string(4) "????"

japanese characters are turned to ?

Hi,

I'm using your library and it works but I only have one problem when I enter chinese or japanese characters it converted to "?". Any solution for this?

Thanks

Conversion fix

Hello,

I have this string:

...in „Test string�

This string is stored in a database field which previously had the collation latin1_sweedish_ci. I have converted this to utf8_general_ci.

The value should be

...in „Test string”

I have tried so many solutions including yours, but it seems none of them are working. If you have a quick solution to this, would be very nice.

Cheers !

Unable to convert string

I am unable to convert the string, no mater what function i try to use.

<?php

require_once 'vendor/forceutf8/src/ForceUTF8/Encoding.php';

$string = "“Grinvich�";

echo "string to convert: {$string}<br>";

$string1 = \ForceUTF8\Encoding::toUTF8($string);

echo "<hr>toUTF8: {$string1}";

$string2 = \ForceUTF8\Encoding::fixUTF8($string);

echo "<hr>fixUTF8: {$string2}";

$string3 = \ForceUTF8\Encoding::UTF8FixWin1252Chars($string);

echo "<hr>UTF8FixWin1252Chars: {$string3}";

$string4 = \ForceUTF8\Encoding::toWin1252($string);

echo "<hr>toWin1252: {$string4}";

Truncated text when strlen is overloaded by mb_strlen.

Text is truncated when calling Encoding::fixUTF8 if mb_strlen is overloading strlen. This is because mb_strlen returns the char length instead of the byte length of the string.

This fixes the issue:

/** Encoding::toUTF8 @line 184 */
if ( function_exists('mb_strlen') && ((int) ini_get('mbstring.func_overload')) & 2) {
  $max = mb_strlen($text,'8bit');
} else {
  $max = strlen($text);
}

Translit symbols

Hi

According to PHP manual, iconv with //TRANSLIT flag should convert symbols (eg: € to EUR)

BUT it doesn't work with your version (still € after finction call)

It is because you are using "windows-1252" instead of "iso-8859-1" used as an example of the PHP manual ?

I am using it with Encoding::fixUTF8($a, Encoding::ICONV_TRANSLIT)
with $a = array('symbol' => '€')

Any help on this matter would be appreciated.

Also, isn't it odd not to allow flags for "toUTF8" and "toLatin1" functions

Could a new v1.5 release get tagged?

There's a lot of fantastic work happening and I love seeing so much activity. But having a newer tag release would help me feel more comfortable with my build stability.

To latin1 didn't work

i've tried with pure utf8 string:
$s = 'Những Ca Khúc Nhạc Vàng';

and utf8 mixed:
$s = 'Những Ca Khúc Nhạc Vàng';

to latin1 but didn't worked.

how to call the class ?

kindly:
how I can call the class ?
this make me error:
use \ForceUTF8\Encoding;
suggestions ?
thx

Leave alone numbers

Hi.
I'm working with this library in an existing project.
We are having an issue with numbers. They return inside quotes. And we'd like to leave them alone.

Something like this would be nice:

if(is_numeric($text)){
   return $textM
}

So, I'm wondering if this chunk intended to do that):

if(!is_string($text)) {
      return $text;
}

Because I don't wanna touch something that make break something else in the existing code.

Do you remember?

Thanks in advance.

The example does not work

<?php
require_once("ForceUTF8/Encoding.php");

echo \ForceUTF8\Encoding::fixUTF8("Fédération Camerounaise de Football");
echo "<br>";
echo \ForceUTF8\Encoding::fixUTF8("FÃédÃération Camerounaise de Football");
echo "<br>";
echo \ForceUTF8\Encoding::fixUTF8("FÃÃédÃÃération Camerounaise de Football");
echo "<br>";
echo \ForceUTF8\Encoding::fixUTF8("FÃÃÃédÃÃÃération Camerounaise de Football");

output:

Fédération Camerounaise de Football
FÃédÃération Camerounaise de Football
FÃÃédÃÃération Camerounaise de Football
FÃÃÃédÃÃÃération Camerounaise de Football

convert character à to à problem

Hello

I've a problem when tried to convert string with char "à" : is not convert, so print Ã.
it's normal ?

I use fixUTF8() for this string : "La vidéo-surveillance se développe à Chambéry
Thanks

No effect

Everyone seems to have this working, but I get the same output before and after:

Before:
96 kr/mån

After:
96 kr/mån

If I run: utf_decode on the string I get:
96 kr/mån

I tried to find the issue my self, but with no result..

First example in readme works but not the other 3

Examples:

echo Encoding::fixUTF8("Fédération Camerounaise de Football");
echo Encoding::fixUTF8("FÃédÃération Camerounaise de Football");
echo Encoding::fixUTF8("FÃÃédÃÃération Camerounaise de Football");
echo Encoding::fixUTF8("FÃÃÃédÃÃÃération Camerounaise de Football");

will output:

Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football

The 1st works but the 3 after it only remove one of the à instead of all of them.

simple quote encoding

Hello, i use forceutf8 to encode some of email bosy in php imap.
I find an issue to encode simple quote.
It appears like '?'
Any method to fix this?

Is this project still active?

Hi, this project is very useful for the community and there are a lot of suggestions and requests but there is no answers from the maintainers. Is this project still alive?

Packagist was not updated with the new tags

First of all thank you for create the tags.

Now, packagist did not updated with these tags. I realize that the package neitanod/forceutf8 does not belongs to you neither to a any forceutf8 developer. I already contact the owner of packagist (https://github.com/nidelson) but there is no answer.

I suggest you to claim the ownership of this package to avoid this happens again. The packagist email is [email protected].

Sorry If I am boring you but I prefer the project to be maintained by the creator instead to create a fork and separate the codes.

toUTF8() return Invalid UTF-8 ?

Hi there
great lib. Works nearly perfectly on my project.
However, it seems that sometimes toUTF8() returns an invalid utf8-encoded string.

See this example:

$utf8_string = \ForceUTF8\Encoding::toUTF8(base64_decode('PGk+PCFbQ0RBVEFbMS4gUGVyZm9ybWFuY2UgYW5kIMxm5mdi9G504WF56atlOiBHZW5yZSwgS25vd2xlZGdlLCBhbmQgUG9saXRpY3MgIDIuIEEgQ3JpdGlxdWUgb2YgWW9y+WLhIEp1ZGdtZW50OiBJbmRpdmlkdWFsIEF1dGhvcml0eSwgQ29tbXVuaXR5IENyZWF0aW9uLCBhbmQgdGhlIEVtYm9kaW1lbnQgb2YgwKv3PEJSPjMuIFdoYXQgTWF0dGVyIFdobyBEYW5jZXM6IFNlbGYtZmFzaGlvbmluZywgKG5vbilTdWJqZWN0cywgYW5kIHRoZSBOYXRpb248QlI+NC4gTm8gVmljdG9yLCBubyBWYW5xdWlzaGVkLCBubyBQYXN0OiBPbGEgUm90aW1pLCBZYWt1YnUgR293b24sIFNhbmkgQWJhY2hhLCBhbmQgICcgJ1RoZSBFbmQgb2YgTmlnZXJpYW4gSGlzdG9yeSAnICc8QlI+NS4gVmFsdWVzIGJleW9uZCBFdGhpY3M6IEZyb20gU3RlbGxhIERpYSBPeWVkZXBvIHRvIFRlc3MgT253dWVtZTxCUj42LiBDb25jbHVzaW9uczogQ2l2aWwgR292ZXJuYW5jZSBhbmQgdGhlIFBvbGl0aWNzIG9mIFlvcvli4SBUaGVhdHJlPEJSPkJpYmxpb2dyYXBoeTxCUj5JbmRleDxCUj4gIF1dPjwvaT4NCg=='));

var_export(mb_check_encoding($utf8_string, 'UTF-8'));

Should print "true" but got "false".

Any Ideas?

Thank you

About japanesse chars

Hello neianod! I wouldr apreciate your work! Bu now I'm experimenting an issue and is that i can't visualize japanesse characters.. Can you tell how can I visualize it?

EDIT: Here is the code and comparation:
fixUTF8() Example 5 not working. -> FAILED 17 tests passed. 1 tests failed.
Test::identical("fixUTF8() Example 5 not working.",
Encoding::fixUTF8("Á0ë0¿0ê0¹0a\n"),
"チルタリス\n");

Thanks in advance!

gb2312 encoding not supported

Hey there, great class! But i have problems with an gb2312 encoded document.
Is it right that this isn't supported?
If yes, is it possible that you implement this?

Thanks
Julian

Special chars

Hello, i'm using this class and o notice that when I try to convert UTF-8 with special chars I got an problem.
The words are converted correctly but the special chars like • not

Please help..

Hello,
I received error:
XML Parsing Error at line 1:
PCDATA invalid Char value 4.

Im using notepad++ for xml files.

How to find that error.. xml file big as universe... :( it should be some char written in bad encoding I guess, but how to find it in all file.. :/

Thank You for your help

Fatal error: Class 'Encoding' not found

hello,
Can you help me with this error?

require_once('config/encoding.php');

function convertUtf8($str)
{
return Encoding::toUTF8($str);
}
echo convertUtf8($field);

License

Hi,

It seems to works just fine but what is the license used by your library ?

Some strings that were failed to Fix

I have some strings with broken encoding:

$testStr1 = <<<TEXT
China. In 1953, Max̥s parents decided
by ÌÕnew-ageÌÒ meditative
Peter Max (American, b.1937) ÌÕFour KennedysÌÒ, screenprint in colors, 1989, signed in white pencil
TEXT;

echo Encoding::UTF8FixWin1252Chars($testStr1), "\n\n", Encoding::fixUTF8($testStr1), "\n\n";

None of them were fixed:

China. In 1953, Max̥s parents decided
by ÌÕnew-ageÌÒ meditative
Peter Max (American, b.1937) ÌÕFour KennedysÌÒ, screenprint in colors, 1989, signed in white pencil

China. In 1953, Max?s parents decided
by ÌÕnew-ageÌÒ meditative
Peter Max (American, b.1937) ÌÕFour KennedysÌÒ, screenprint in colors, 1989, signed in white pencil

Any idea?

Some string failed to convert

This package was very helpful for me, but still, there're some rare case that it fail to convert to proper utf8. I just want to post it here, in case the author or anyone , someday might have interest in further perfect this package.
Here're some string sample i have, while trying to read RSS from some Thailand sources, and input into MySQL database.


<p>หลังจากที่ภรรยา ฮารุ คลอดลูกคนที่สาม น้องเฮเดน คุณพ่อลู […]</p>
<p>The post <a rel="nofollow" href="http://www.tvpoolonline.com/content/357922">(คลิป) ฟังไปเสียวไป!!! เมื่อ “กาย รัชชานนท์” เล่าวินาทีทำหมันให้เห็นเป็นภาพ…ปิดอู่ลูก 3 เป็นที่เรียบร้อย</a> appeared first on <a rel="nofollow" href="http://www.tvpoolonline.com/">TV Pool</a>.</p>

and


<p>หลังจากที่ภรรยา <strong>ฮารุ</strong> คลอดลูกคนที่สาม <strong>น้องเฮเดน</strong> คุณพ่อลูกดก<strong> กาย รัชชานนท์</strong> มีแพลนว่าจะทำหมัน และแล้ววันนี้ก็มาถึงเพราะ ล่าสุด<strong> ฮารุ </strong>ได้โพสต์ภาพและคลิปวีดีโอ หลังจากที่ <strong>กาย รัชชานนท์</strong> ทำหมันเสร็วว่าอย่างไรไปฟัง</p>
<p> </p>
<blockquote class="instagram-media" data-instgrm-captioned data-instgrm-version="7" style=" background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:658px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);">
<div style="padding:8px;">
<div style=" background:#F8F8F8; line-height:0; margin-top:40px; padding:50.0% 0; text-align:center; width:100%;">
<div style=" background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAsCAMAAAApWqozAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAMUExURczMzPf399fX1+bm5mzY9AMAAADiSURBVDjLvZXbEsMgCES5/P8/t9FuRVCRmU73JWlzosgSIIZURCjo/ad+EQJJB4Hv8BFt+IDpQoCx1wjOSBFhh2XssxEIYn3ulI/6MNReE07UIWJEv8UEOWDS88LY97kqyTliJKKtuYBbruAyVh5wOHiXmpi5we58Ek028czwyuQdLKPG1Bkb4NnM+VeAnfHqn1k4+GPT6uGQcvu2h2OVuIf/gWUFyy8OWEpdyZSa3aVCqpVoVvzZZ2VTnn2wU8qzVjDDetO90GSy9mVLqtgYSy231MxrY6I2gGqjrTY0L8fxCxfCBbhWrsYYAAAAAElFTkSuQmCC); display:block; height:44px; margin:0 auto -44px; position:relative; top:-22px; width:44px;"></div>
</div>
<p style=" margin:8px 0 0 0; padding:0 4px;"> <a href="https://www.instagram.com/p/BQxBd0_jmHX/" style=" color:#000; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px; text-decoration:none; word-wrap:break-word;" target="_blank">นางเล่าเรื่องการทำหมันซะเห็นภาพเลย 5555 part1 (เดี๋ยวมาต่อpart2คลิปต่อไปนะ) <img src="https://s.w.org/images/core/emoji/2.2.1/72x72/1f602.png" alt="😂" class="wp-smiley" style="height: 1em; max-height: 1em;"><img src="https://s.w.org/images/core/emoji/2.2.1/72x72/1f602.png" alt="😂" class="wp-smiley" style="height: 1em; max-height: 1em;"> #ไม่ฝากร้านฝากงานนะจ๊ะ</a></p>
<p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;">A post shared by HARU SUPRAKOB (YAMAGUCHI) ❣ (@haruyamaguchi) on <time style=" font-family:Arial,sans-serif; font-size:14px; line-height:17px;" datetime="2017-02-21T07:54:09+00:00">Feb 20, 2017 at 11:54pm PST</time></p>
</div>
</blockquote>
<p></p>
<p> </p>
<p> </p>
<blockquote class="instagram-media" data-instgrm-captioned data-instgrm-version="7" style=" background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:658px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);">
<div style="padding:8px;">
<div style=" background:#F8F8F8; line-height:0; margin-top:40px; padding:50.0% 0; text-align:center; width:100%;">
<div style=" background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAsCAMAAAApWqozAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAMUExURczMzPf399fX1+bm5mzY9AMAAADiSURBVDjLvZXbEsMgCES5/P8/t9FuRVCRmU73JWlzosgSIIZURCjo/ad+EQJJB4Hv8BFt+IDpQoCx1wjOSBFhh2XssxEIYn3ulI/6MNReE07UIWJEv8UEOWDS88LY97kqyTliJKKtuYBbruAyVh5wOHiXmpi5we58Ek028czwyuQdLKPG1Bkb4NnM+VeAnfHqn1k4+GPT6uGQcvu2h2OVuIf/gWUFyy8OWEpdyZSa3aVCqpVoVvzZZ2VTnn2wU8qzVjDDetO90GSy9mVLqtgYSy231MxrY6I2gGqjrTY0L8fxCxfCBbhWrsYYAAAAAElFTkSuQmCC); display:block; height:44px; margin:0 auto -44px; position:relative; top:-22px; width:44px;"></div>
</div>
<p style=" margin:8px 0 0 0; padding:0 4px;"> <a href="https://www.instagram.com/p/BQxCJ9vjWt0/" style=" color:#000; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px; text-decoration:none; word-wrap:break-word;" target="_blank">มาฟังต่อ part2 <img src="https://s.w.org/images/core/emoji/2.2.1/72x72/1f4aa.png" alt="💪" class="wp-smiley" style="height: 1em; max-height: 1em;"><img src="https://s.w.org/images/core/emoji/2.2.1/72x72/1f3fb.png" alt="🏻" class="wp-smiley" style="height: 1em; max-height: 1em;">✌<img src="https://s.w.org/images/core/emoji/2.2.1/72x72/1f3fb.png" alt="🏻" class="wp-smiley" style="height: 1em; max-height: 1em;"> ขอบคุณด่าด๊านะที่เสียสละทำหมันให้ แต้งกิ้วนะ love you <img src="https://s.w.org/images/core/emoji/2.2.1/72x72/1f495.png" alt="💕" class="wp-smiley" style="height: 1em; max-height: 1em;"> #ไม่ฝากร้านฝากงานนะจ๊ะ @guyratchanont</a></p>
<p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;">A post shared by HARU SUPRAKOB (YAMAGUCHI) ❣ (@haruyamaguchi) on <time style=" font-family:Arial,sans-serif; font-size:14px; line-height:17px;" datetime="2017-02-21T08:00:10+00:00">Feb 21, 2017 at 12:00am PST</time></p>
</div>
</blockquote>
<p></p>
<p> </p>
<p>The post <a rel="nofollow" href="http://www.tvpoolonline.com/content/357922">(คลิป) ฟังไปเสียวไป!!! เมื่อ “กาย รัชชานนท์” เล่าวินาทีทำหมันให้เห็นเป็นภาพ…ปิดอู่ลูก 3 เป็นที่เรียบร้อย</a> appeared first on <a rel="nofollow" href="http://www.tvpoolonline.com/">TV Pool</a>.</p>
  
[2017-02-22 09:48:11] local.ERROR: description: 
<p>หลังจากที่ภรรยา ฮารุ คลอดลูกคนที่สาม น้องเฮเดน คุณพ่อลู […]</p>
<p>The post <a rel="nofollow" href="http://www.tvpoolonline.com/content/357922">(คลิป) ฟังไปเสียวไป!!! เมื่อ “กาย รัชชานนท์” เล่าวินาทีทำหมันให้เห็นเป็นภาพ…ปิดอู่ลูก 3 เป็นที่เรียบร้อย</a> appeared first on <a rel="nofollow" href="http://www.tvpoolonline.com/">TV Pool</a>.</p>

The example from the readme does not work anymore

The example from the readme

echo Encoding::fixUTF8("Fédération Camerounaise de Football");
echo Encoding::fixUTF8("FÃédÃération Camerounaise de Football");
echo Encoding::fixUTF8("FÃÃédÃÃération Camerounaise de Football");
echo Encoding::fixUTF8("FÃÃÃédÃÃÃération Camerounaise de Football");

outputs the following for me:

Fédération Camerounaise de Football
FÃédÃération Camerounaise de Football
FÃÃédÃÃération Camerounaise de Football
FÃÃÃédÃÃÃération Camerounaise de Football

Basically it only seems to run one pass over the string, the first string looks ok though.
Am I doing something wrong? Perhaps there should be a tiny test suite?

I'm on debian 7, and this is the relevant PHP data:

PHP 5.4.4-14+deb7u8 (cli) (built: Feb 17 2014 09:18:47)
Copyright (c) 1997-2012 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2012 Zend Technologies

PCDATA invalid Char value 31

First of all, thanks !!
Your tools helps me a lot for UTF8 string conversion

I would save a string in XML and I would like to format it in UTF8,
But I have this error when I try to load XML after save.

DOMDocument::load(): PCDATA invalid Char value 31

I look for an answer in Google and I found that.

http://stackoverflow.com/questions/14463573/php-simplexml-load-file-invalid-character-error

function utf8_for_xml($string)
{
    return preg_replace ('/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u', ' ', $string);
}

Can you tell me if there is an other way to solve it?

Mathieu

Does not handle illegal UTF-8 chars

The Wikipedia article on UTF-8 (as well as other documents around the web) mention a handful of situations where a structurally-valid UTF-8 character should actually be rejected or modified. One example of this is overlong characters.
Your script doesn't do anything about such cases.

W3C Test strings

I tried some of the strings from http://www.w3.org/2001/06/utf-8-test/UTF-8-demo.html and they failed to convert.

Specifically

Mathematics and Sciences:

  ∮ E⋅da = Q,  n → ∞, ∑ f(i) = ∏ g(i), ∀x∈ℝ: ⌈x⌉ = −⌊−x⌋, α ∧ ¬β = ¬(¬α ∨ β),

  ℕ ⊆ ℕ₀ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ, ⊥ < a ≠ b ≡ c ≤ d ≪ ⊤ ⇒ (A ⇔ B),

Not working

Trying on this word but not working at all. Also tried the example but not working.

Medveđa

Unused local variable $enc

In Encoding.php on line 297 my debugger says "Unused local variable $enc"

$enc = preg_replace('/[^a-zA-Z0-9\s]/', '', $encoding);

Should this be changed to:

$encoding = preg_replace('/[^a-zA-Z0-9\s]/', '', $encoding);

?

Not converting ISO-8859-2

As said, toUTF8 method does not convert properly latin special characters from ISO-8859-2 encoding.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.