Comments (12)
Yes, the TYPESAFE
method was added to prevent exactly this problem. If you don't need to account for signed numbers, you can use UNSIGNED
and you shouldn't need to use TYPESAFE
; you won't get any performance penalty - in fact, if you move to UNSIGNED | INT
the number searching is the fastest.
Note that in natsort >= 4.0.0, the default sorting algorithm will be UNSIGNED | INT
, so this problem will go away (unless a user wants to sort signed numbers specifically).
Also, in natsort >= 5.0.0, TYPESAFE
is no longer needed and is a no-op because the under-the-hood algorithm was changed.
from natsort.
Also, if you are concerned about speed, you should install https://pypi.python.org/pypi/fastnumbers.
from natsort.
BTW, is there a reason you aren't just using natsorted
rather than generating the keys with natsort_keygen
? I ask because natsorted
takes care of the TYPESAFE
problem for you, and only turns it on if needed.
from natsort.
Thanks for being so prompt & helpful! Without going into too much depth, I am using the keyfunc with a sortedcontainers.SortedDict, to control the implicit ordering of SortedDict. The base SortedDict is the data model behind a Qt TableView. Always I want alg=LOCALE because some users are working in French and want their Ás sorted next to their As which native Python doesn't do. Also when the user clicks on a table heading, indicating a desire to sort ascending or descending on that column, I query a "Respect Case" checkbox and make a new SortedDict using a key_func with or without IGNORECASE.
from natsort.
from natsort.
Unfortunately, this is not possible because it is the locale/PyICU library that is dictating the order when you use LOCALE so I am not sure it would be possible to force it to not group the letters in that manner.
from natsort.
You could try sorting twice. The first time, use natsort
with LOCALE
, and then the second time use python's built in sort with the itemgetter
function from the operator
module to sort on the first letter using ordinal sorting. Because sorting is guaranteed to be stable in python, within each letter your order will be maintained, and the order of the first letters will be what you want.
>>> from natsort import natsorted, ns
>>> from operator import itemgetter
>>> a = ['Apple', 'c1', 'Banana', 'apple', 'c14', 'banana', 'C4', 'Apples', 'C30', 'Applu','c2']
>>> sorted(natsorted(a, alg=ns.LOCALE), key=itemgetter(0))
>>> ['Apple', 'Apples', 'Applu', 'Banana', 'C4', 'C30', 'apple', 'banana', 'c1', 'c2', 'c14']
I realize this may not be desirable since you will have to sort twice, but I don't believe I could support this within the natsort
library.
from natsort.
Thanks for your as-always prompt & helpful reply.
On Mon, Apr 6, 2015 at 2:57 PM, Seth Morton [email protected]
wrote:
You could try sorting twice. The first time, use natsort with LOCALE, and
then the second time use python's built in sort with the itemgetter
function from the operator module to sort on the first letter using
ordinal sorting. Because sorting is guaranteed to be stable in python,
within each letter your order will be maintained, and the order of the
first letters will be what you want.from natsort import natsorted, ns>>> from operator import itemgetter>>> a = ['Apple', 'c1', 'Banana', 'apple', 'c14', 'banana', 'C4', 'Apples', 'C30', 'Applu','c2']>>> sorted(natsorted(a, alg=ns.LOCALE), key=itemgetter(0))>>> ['Apple', 'Apples', 'Applu', 'Banana', 'C4', 'C30', 'apple', 'banana', 'c1', 'c2', 'c14']
I realize this may not be desirable since you will have to sort twice, but
I don't believe I could support this within the natsort library.—
Reply to this email directly or view it on GitHub
#23 (comment).
from natsort.
I would like to highlight that you will run into issues if the first character is a digit, because that will ruin the order that natsort
created for those entries. If that cannot be the case, this should be a safe method.
from natsort.
I was mistaken before when saying that I could not support this in the natsort
library. It would require adding an additional option, probably UNGROUPLETTERS
which would only mean anything if LOCALE
was enabled.
I'm not sure if I will get a chance to do it for a few days, so if you want here is the diff to add the ability to do what you want without sorting twice; you can add it to your local copy for now. This implementation will force LOCALE
to sort the way you want always, so it will be less flexible than a real implementation later. This also will gracefully handle when numbers are at the start of a string so my previous warning there would be void.
diff --git a/natsort/utils.py b/natsort/utils.py
index 0eb5302..a9d6759 100644
--- a/natsort/utils.py
+++ b/natsort/utils.py
@@ -275,6 +275,8 @@ def _natsort_key(val, key, alg):
val = val.swapcase()
if alg & _ns['IGNORECASE']:
val = val.lower()
+ if use_locale and val[0] == val[0].upper():
+ val = b' ' + val if isinstance(val, bytes) else ' ' + val
return tuple(_number_extracter(val,
regex,
num_function,
UPDATE: I'll probably get to it tonight, actually, since I'm thinking about it.
from natsort.
Please check out issue #26 and the latest version on PyPI for this fix.
from natsort.
Hello from the future. TYPESAFE
is no longer required as of version 5.0.0 because this functionality is now always on without speed penalty due to a new internal structure of the code. If you add this it will now do nothing.
from natsort.
Related Issues (20)
- Can't use natsort_keygen() as key for sorting DataFrame with MultiIndex in pandas HOT 3
- Some values don't sort in a consistent order HOT 3
- Set which OS to sort by in `os_sorted` HOT 8
- Paths should be sorted like strings HOT 6
- Loosen types and type checking
- Sorting a list of dictionaries when the sort field might or might not have a number HOT 2
- Improve os_sorted performance by avoiding `Path` roundtrips HOT 1
- add a mode for hexadecimal numbers HOT 7
- 1 test fails HOT 5
- RFE: drop use `m2r2` module HOT 8
- Sorting income category with both string and num HOT 1
- compatibility with GNU coreutils sort -n (numeric sort) HOT 14
- 'os_sorted' sorts files with spaces in names not as in Windows Explorer HOT 2
- not consistent with behavior of windows chinese edition HOT 1
- `cmp_to_key` gives an error and I don't know if this is a bug or just not supported HOT 3
- wiki pages so huge that they are not loaded properly HOT 2
- Error while sorting dates if `NaT` HOT 3
- Unexpected natural sort when sorting multi-dimensional arrays or `pandas.DataFrame` HOT 14
- Support Python 3.12 HOT 2
- Character based sorting HOT 14
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from natsort.