Code Monkey home page Code Monkey logo

Comments (7)

tfeldmann avatar tfeldmann commented on July 20, 2024 2

Thank you for the minimal reproducer! I found the issue - the strings are both unicode but are composed differently:

>>> s1 = 'Erträgnisaufstellung'  # copied from config.yaml
>>> s2 = 'Erträgnisaufstellung'  # copied from filename
>>> s1.encode('utf-8')
b'Ertr\xc3\xa4gnisaufstellung'
>>> s2.encode('utf-8')
b'Ertra\xcc\x88gnisaufstellung'

To be honest, this sent me down a rabbit hole. It seems like the HFS+ filesystem saves filenames in UTF-8 in NFD (decomposed) form and in your config you wrote the NFC (precomposed) form.
The form on NFS filesystems is not specified and filenames on samba shares and in linux seem to be in NFC. APFS does not enforce a normalization as far as I know.

I guess using the NFKD form internally for all comparisons would behave like expected for most use cases. This means rolling my own unicode-normalized glob implementation and normalizing the config before parsing. This is now on my todo-list but might take a while because it needs to be tested on different platforms. I'm really wondering why python doesn't simplify these things for you :/

from organize.

tfeldmann avatar tfeldmann commented on July 20, 2024

Hi, thank you for the detailed report. Unfortunately I'm not able to reproduce this - it works for me on both windows 10 and macOS :/

  • Can you check whether your config.yaml is UTF-8 encoded? Or even better send the original file via email (address is in my profile)
  • Can you try retyping the filename by hand and check again? UTF-8 has some identical looking, confusable characters, maybe your ä is really an 𝚊̈ (https://unicode.org/cldr/utility/confusables.jsp?a=ä&r=None)

from organize.

kpprt avatar kpprt commented on July 20, 2024

I think the encoding of the config file is correct and I tried both versions of the confusables again, but with no luck. I will send you an email with an example file and config.

from organize.

tfeldmann avatar tfeldmann commented on July 20, 2024

In the meantime you can copy your filename into your config file and everything should work 👌

from organize.

kpprt avatar kpprt commented on July 20, 2024

Hi Thomas, sorry for the ultra late reply!

This is indeed a rabbit hole! Never heard of the confusables before, but it is good to know. I remember that I tried to copy the filename, but that did not work either.

As I tested it now it works though. I guess this has sth. to do with these confusables. The original 'ä' in the PDF from my bank was probably a different one than the 'ä' I type on my Mac and after fiddling around with renaming and adjusting the config a couple of times, the original 'ä' probably disappeared.

Thanks for clearing that up and thanks for providing organize!

from organize.

tfeldmann avatar tfeldmann commented on July 20, 2024

Hey thank you so much for the kind words and for the donation! It's very much appreciated!

from organize.

tfeldmann avatar tfeldmann commented on July 20, 2024

This is now fixed and will be available in the next version 👍

from organize.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.