Code Monkey home page Code Monkey logo

Comments (12)

eerimoq avatar eerimoq commented on June 7, 2024

Sounds like a good idea, but how should it be implemented? Choose encoding based on the file extension?

from cantools.

juleq avatar juleq commented on June 7, 2024

I would say:
Encoding defaults to None.

If encoding is none, select the default encoding by the type argument. If that is none, too, select the encoding by file name extension. If this did not help either, default to UTF-8 again.

The selections are in a dict [type, default_encoding].

The described selection process could be a local function named e.g. guess_encoding(filename, type).

from cantools.

eerimoq avatar eerimoq commented on June 7, 2024

What if the user want to pass encoding as None to open() to use the platform dependent encoding?

https://docs.python.org/3/library/functions.html#open

Maybe adding a special default encoding, like 'auto', which will do as you suggested.

from cantools.

juleq avatar juleq commented on June 7, 2024

I would suggest using a special option 'platform' because I would argue that selecting the encoding based on the platform is the most edgiest of cases. The file formats are associated with specific tools that use one specific encoding. If I deviate from that, I should know what I am doing and make that explicit by using respective arguments to cantools.

I know that this strategy is different from the one that open uses, but open is general purpose. And thats not necessarily the most convenient even though it is the most consistent with Python. But still, this is cantools :-).

from cantools.

eerimoq avatar eerimoq commented on June 7, 2024

All I know is that file encoding is much harder to get right than one can imagine. There are always some use case you don't think about. When I'm in the unknown I tend to implement as few restrictions as possible in the API. An additional platform argument might work, but would be nice if it's not needed.

I totally agree that DBC-files should have the same default encoding as CANdb++, we just have to figure out how to do it in a good way =)

from cantools.

eerimoq avatar eerimoq commented on June 7, 2024

Btw, how do you know that CP1252 is the default encoding?

from cantools.

juleq avatar juleq commented on June 7, 2024

At first I assumed it. Then I noticed, that the canmatrix project uses iso-8859-1 in one of its examples. So I verified by creating a dbc with an € char in it. I read that with an editor in 1252 mode and it came out fine.

from cantools.

juleq avatar juleq commented on June 7, 2024

8859 and 1252 are basically the same, but M$ replaced a few control chars with printables like €.

from cantools.

eerimoq avatar eerimoq commented on June 7, 2024

Let's implement it as you first suggested. If someone want to use the platform encoding they can always use load() instead of load_file().

from cantools.

eerimoq avatar eerimoq commented on June 7, 2024

I implemented the suggested behavior on master, not yet released. Please give it a try. Consult the documentation for details.

from cantools.

juleq avatar juleq commented on June 7, 2024

I have updated to master and removed all the arguments to load_files. Does work like charm. It even fixes an issue I had in my early days with cantools: Some clever customer worked around regular quotes not being allowed in the comment field of CANdb++ by using fancy quotes... Which also happen to be in the char range that CP1252 adds to the ISO charset. Since I did not get the encoding right then, I got broken dbc files when saving with cantools (CANdb++ would refuse to open).

By the way, the last potential hurdle for this use case is, that a cantools user needs to pass the correct encoding to write() when saving the db string. A wrapper save_file that uses the appropriate enconding could work around that. Otherwise I would expect to find an increasing amound of dbc files written with the wrong encoding in the wild.

I also did verify that e.g a degC survives dbc to kcd translation (the latter being written in UTF8).

The sym default encoding seems also fine, I have checked one of the Peak tools and that uses UTF8 with BOM.

Great. Thanks.

from cantools.

eerimoq avatar eerimoq commented on June 7, 2024

Great that it works!

Yeah, feel free to add a dump (or write) function.

from cantools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.