Code Monkey home page Code Monkey logo

Comments (11)

MartinPacker avatar MartinPacker commented on August 17, 2024

That's a good question - to which I don't know the answer. It looks to me like it's actually a problem in lxml - which python-pptx relies on (and hence md2pptx relies on).

I'd love to fix this - if it's possible (even if I would rely on you to test it).

Does anyone know how to get python-pptx or lxml to accept Chinese characters?

One possibility worth testing is whether entity references can be used to insert Chinese characters. That might provide a workaround - if it worked.

from md2pptx.

zhuwentao2150 avatar zhuwentao2150 commented on August 17, 2024

I'd be more than happy to help test it, I've tried generating slides with Chinese using python-pptx with good results and now wonder if the problem is with lxml.

from md2pptx.

MartinPacker avatar MartinPacker commented on August 17, 2024

Thanks for the offer to help - by testing. If python-pptx works it's either not a lxml problem or else it's the way md2pptx is directly using lxml.

What happens if you try encoding a string (a character or two) using entity references?

I'll note that md2pptx only uses lxml directly for fancy things - manipulating XML beyond what python-pptx does.

from md2pptx.

MartinPacker avatar MartinPacker commented on August 17, 2024

Looks like it's the string addFormattedText passes to python-pptx that's the problem.

from md2pptx.

MartinPacker avatar MartinPacker commented on August 17, 2024

image

The above - from the python-pptx docs for adding text to a run - might have some bearing on the problem. Particularly the "assumed to be UTF-8" bit.

from md2pptx.

MartinPacker avatar MartinPacker commented on August 17, 2024

I wonder if making md2pptx UTF-16, instead of UTF-8, would help.

from md2pptx.

MartinPacker avatar MartinPacker commented on August 17, 2024

Now I'm confused. I have a UTF-8 file with the following in and md2pptx has no trouble with it:

### String Test

* ディアボリックラヴァーズ or バッテリ
* 已下架
* لحضور المؤتمر الدولي العاشر
* 类, 有 优 先 选 择 的 权 利

(Note the right to left arabic texton the third line.)

from md2pptx.

zhuwentao2150 avatar zhuwentao2150 commented on August 17, 2024

It's very strange indeed. I can't use the UTF-8 format, but try to use the GB2312 format and it generates text with Chinese characters.

from md2pptx.

MartinPacker avatar MartinPacker commented on August 17, 2024

Thanks for the feedback. I don't know where we go from here. BTW I've not heard of GB2312 before.

from md2pptx.

MartinPacker avatar MartinPacker commented on August 17, 2024

I'm inclined to close this Issue. I don't normally do this without a fix (or similarly valid resolution).

The reason is I don't see a "fix" short of copying the input data to a temporary file with a different encoding. That seems plain wrong to me.

Anyhow I won't close this for a few days. And in any case it can always be reopened.

from md2pptx.

MartinPacker avatar MartinPacker commented on August 17, 2024

I'm closing now as I don't see a meaningful way to progress it. It can always be re-opened.

from md2pptx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.