Comments (6)
Hm, github not showing me this issue.
Original lib https://github.com/lexborisov/myhtml work only with UTF-8
Support encodings for output
Program working in UTF-8 and returns all in UTF-8
Simple logic:
- If encoding != UTF-8 => lib recoding anything to UTF-8
- If encoding == "UTF-8" => don't touch original bytes, because lib expects perl code in UTF-8.
But you use windows-1521 in code. Unexpected, but has workaround.
In case 2 UTF-8 really means raw bytes. If force set encoding to "UTF-8" myhtml lib fine works with any ASCII compatible encodings.
In this case internal encoding recoder not involved, only true bytes.
In future, i add "raw"/ENCODING_RAW encoding for this case (working with raw bytes) and will add documentation.
But now you can use this hack. Example:
use strict;
use HTML5::DOM;
use Encode;
my $cp1251_input_string = "<p>Если заголовок заполнен, а подзаголовка нет – для материала все остается так же, как раньше.</p><a></a>\n";
my $tree = HTML5::DOM->new({
encoding => "UTF-8" # UTF-8 really means work with raw bytes
})->parse($cp1251_input_string);
my $cp1251_input_string2 = "<div>Вставляемый фрагмент кода</div>\n";
$tree->at('a')->innerHTML($cp1251_input_string2);
print $tree;
from perl-html5-dom.
Hi, @Azq2 !
Thanks for answer!
I'll try this hack.
Now I use encode(decode())
to switch the source encoding before and after processing.
from perl-html5-dom.
Okay. But my way more efficient and simplier. Please, write if it works. :)
from perl-html5-dom.
@Azq2 sure! )
I'll try it in project on the weekend. Can you keep this issue open for 2-3 days?
from perl-html5-dom.
You still alive? :)
from perl-html5-dom.
Oh my!
Sorry, my fault!
For our purposes we use this:
my $text = "кириллический текст статьи";
my $parser = HTML5::DOM->new();
my $DOM_tree = $parser->parse($text, {
encoding => "windows-1251",
encoding_use_bom => 0
});
my $textNodes = $DOM_tree->querySelectorAll('body > p');
my $importText = encode('utf8', decode('cp1251', 'кириллица'));
# any code...
$intextWrapper->innerHTML($importText);
$textNodes->[$index]->after($intextWrapper);
$t = $DOM_tree->body->innerHTML;
$t = encode('cp1251', decode('utf8', $t));
# a little bit of more hell..
Thank you for your answer! Sorry again! I think issue is resolved.
from perl-html5-dom.
Related Issues (9)
- Packaging issue — 1.01 includes 1.00 as a tar file HOT 1
- Problems when html's charset is windows-1253
- How async mode works? HOT 6
- Why largest threads number runs slower? HOT 13
- ->text and other similar methods always returns encoded string HOT 6
- Bad examples for outerHTML and innerHTML HOT 2
- Calling replace method with a fragment stops responding
- HTML5-DOM-1.23: Warning: the following files are missing in your kit HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from perl-html5-dom.