Comments (7)
Note if I use alg = (natsort.ns.LOCALE | natsort.ns.GROUPLETTERS )
then uppercase groups with lowercase, but the accented versions still sort last, that is, 'a' is far away from 'å'.
from natsort.
Do you have PyICU installed? I have found that python's built-in locale
library (which does the work of understanding local-dependent sorting) does not work properly on some systems (specifically on Mac, which is what you are on). If you have PyICU installed, natsort
will use that under the hood, and it gives better results. Can you try that?
from natsort.
I saw the note about PyICU in the docs, and specifically recommended for OSX. Before I install that rather large package, (a) what sequence would you expect the above code to print, if everything is working as you expect it (e.g. on your own test system)? and 2, would you expect changing locale from en_US to fr_FR or de_DE to make a difference?
from natsort.
a. I would expect the sequence that Qt printed to be the correct sequence.
b. In my tests it makes no difference which locale was used.
I can confirm that using Mac OS X's locale library (python uses's the system's C locale library), I get the (incorrect) results that you see. Below is the test file I used.
# -*- coding: utf-8 -*-
from __future__ import print_function, unicode_literals
import locale
from natsort import natsort_keygen, ns
words = ['apple', 'åpple', 'Apple', 'Äpple', 'Epple', 'Èpple', 'épple', 'epple']
locale.setlocale(locale.LC_ALL, str('de_DE.UTF-8'))
key_func_L = natsort_keygen(alg=ns.LOCALE)
print(' '.join(sorted(words, key=key_func_L)))
When I disabled PyICU
, I get:
Apple Epple apple epple Äpple Èpple åpple épple
When I turn on PyICU
, I get:
apple Apple åpple Äpple epple Epple épple Èpple
This is identical to what Qt is reporting.
from natsort.
Unfortunately, this is not something I can fix... it is a bug in the BSD locale implementation. There is a recent Python bug report on this... check it out: http://bugs.python.org/issue23195 (also check this out: http://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help). I'll definitely keep an eye on the bug report, but notice one of the solutions suggested is to install PyICU. Incidentally, it seems like the only affected locales are en_US, fr_FR and de_DE, which are the three you tried.
I'll make sure to update the docs in the next release to indicate that PyICU should only be needed on Mac OS X and BSD.
from natsort.
BTW, if you use HomeBrew (and I recommend it!), you can easily install ICU and PyICU with the following commands:
brew install icu4c
CFLAGS=-I/usr/local/opt/icu4c/include LDFLAGS=-L/usr/local/opt/icu4c/lib pip install pyicu
HomeBrew does not link icu4c to the system to avoid conflicts, so you need to tell python where to find it when installing PyICU.
from natsort.
Yes, good. I had to add exports, pip didn't pick up the flags otherwise. Putting this in for reference for anybody else:
brew install icu4c
CFLAGS=-I/usr/local/opt/icu4c/include
export CFLAGS
LDFLAGS=-L/usr/local/opt/icu4c/lib pip install pyicu
export LDFLAGS
pip install pyuic
After which natsort did behave as you say.
Thank you for your prompt & detailed help.
from natsort.
Related Issues (20)
- Can't use natsort_keygen() as key for sorting DataFrame with MultiIndex in pandas HOT 3
- Some values don't sort in a consistent order HOT 3
- Set which OS to sort by in `os_sorted` HOT 8
- Paths should be sorted like strings HOT 6
- Loosen types and type checking
- Sorting a list of dictionaries when the sort field might or might not have a number HOT 2
- Improve os_sorted performance by avoiding `Path` roundtrips HOT 1
- add a mode for hexadecimal numbers HOT 7
- 1 test fails HOT 5
- RFE: drop use `m2r2` module HOT 8
- Sorting income category with both string and num HOT 1
- compatibility with GNU coreutils sort -n (numeric sort) HOT 14
- 'os_sorted' sorts files with spaces in names not as in Windows Explorer HOT 2
- not consistent with behavior of windows chinese edition HOT 1
- `cmp_to_key` gives an error and I don't know if this is a bug or just not supported HOT 3
- wiki pages so huge that they are not loaded properly HOT 2
- Error while sorting dates if `NaT` HOT 3
- Unexpected natural sort when sorting multi-dimensional arrays or `pandas.DataFrame` HOT 14
- Support Python 3.12 HOT 2
- Character based sorting HOT 14
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from natsort.