Comments (3)
0x00E4 is the utf16 encoding of ä. 061 + 0308 is the correct utf8 decomposition for the same character.
Basically your filesystem is using utf16, so you should figure out why and fix that.
from paperless-ngx.
I must openly admit: my encoding knowledge is very limited. But maybe you can make sense out of my thoughts / wild guesses:
- To my understanding the code points for 8859-1 and UTF-16 for
äöüÄÖÜ
are the same. So the encoding could come from there. - I sometimes just copy & paste text from the PDF to the file name - maybe from there
- The newest affected file is from around 2020-03-20.
- I checked both server and container
locale
and both return UTF-8 - Checking with a new file I can confirm that both database and file system use the UTF-8 encoding
So, maybe some historical error.
The correct way to resolve it, if I understand it right is to rename all files to the UTF-8 convention and update the database as well to reflect that?
from paperless-ngx.
That's what I would do. Strings in Python are utf8 and we're not doing anything odd with encoding. Values copied from a PDF might not be UTF-8, I guess an OCR program wouldn't know what encoding, though I would assume everything at this point is UTF-8
from paperless-ngx.
Related Issues (20)
- [BUG] Missing Daphne ASGI application server required for Websockets HOT 7
- [BUG] when setting UID different from 1000: cannot lock /etc/passwd HOT 5
- [BUG] Custom field of type currency does not allow negative values
- [BUG] Cannot set tag color via REST HOT 6
- [BUG] Bad UX when using a custom monetary field HOT 9
- [BUG] In auto-complete fields, exact matches should be sorted before substring matches HOT 2
- [BUG] Pressing "Dismiss Completed" on Dashboard clears all tasks including running
- [BUG] [ERROR] [paperless.handlers] Setting PaperlessTask started failed HOT 1
- [BUG] HTTP 500 Error when trying to add or save group with Admin permissions HOT 1
- [BUG] PDF split leaves the source doc unchanged
- [BUG] ASN barcode not always recognized HOT 2
- [BUG] `invalid input syntax for type inet` when trying to set up a correspondant
- [BUG] Document thumbnails will not be rotated HOT 3
- [ERROR] Paperless-ngx is loading...
- [BUG] Document merge breaks on existing ASN HOT 3
- [BUG] Document title are not visible in merge window HOT 5
- [BUG] Resetting password via mail leads to internal server error
- system is unusable with a lot of correspondents HOT 1
- [BUG] Images do not rotate HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paperless-ngx.