Code Monkey home page Code Monkey logo

Comments (12)

SethMMorton avatar SethMMorton commented on August 13, 2024

Yes, the TYPESAFE method was added to prevent exactly this problem. If you don't need to account for signed numbers, you can use UNSIGNED and you shouldn't need to use TYPESAFE; you won't get any performance penalty - in fact, if you move to UNSIGNED | INT the number searching is the fastest.

Note that in natsort >= 4.0.0, the default sorting algorithm will be UNSIGNED | INT, so this problem will go away (unless a user wants to sort signed numbers specifically).

Also, in natsort >= 5.0.0, TYPESAFE is no longer needed and is a no-op because the under-the-hood algorithm was changed.

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

Also, if you are concerned about speed, you should install https://pypi.python.org/pypi/fastnumbers.

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

BTW, is there a reason you aren't just using natsorted rather than generating the keys with natsort_keygen? I ask because natsorted takes care of the TYPESAFE problem for you, and only turns it on if needed.

from natsort.

tallforasmurf avatar tallforasmurf commented on August 13, 2024

Thanks for being so prompt & helpful! Without going into too much depth, I am using the keyfunc with a sortedcontainers.SortedDict, to control the implicit ordering of SortedDict. The base SortedDict is the data model behind a Qt TableView. Always I want alg=LOCALE because some users are working in French and want their Ás sorted next to their As which native Python doesn't do. Also when the user clicks on a table heading, indicating a desire to sort ascending or descending on that column, I query a "Respect Case" checkbox and make a new SortedDict using a key_func with or without IGNORECASE.

from natsort.

tallforasmurf avatar tallforasmurf commented on August 13, 2024

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

Unfortunately, this is not possible because it is the locale/PyICU library that is dictating the order when you use LOCALE so I am not sure it would be possible to force it to not group the letters in that manner.

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

You could try sorting twice. The first time, use natsort with LOCALE, and then the second time use python's built in sort with the itemgetter function from the operator module to sort on the first letter using ordinal sorting. Because sorting is guaranteed to be stable in python, within each letter your order will be maintained, and the order of the first letters will be what you want.

>>> from natsort import natsorted, ns
>>> from operator import itemgetter
>>> a = ['Apple',  'c1', 'Banana', 'apple',  'c14', 'banana', 'C4', 'Apples', 'C30', 'Applu','c2']
>>> sorted(natsorted(a, alg=ns.LOCALE), key=itemgetter(0))
>>> ['Apple', 'Apples', 'Applu', 'Banana', 'C4', 'C30', 'apple', 'banana', 'c1', 'c2', 'c14']

I realize this may not be desirable since you will have to sort twice, but I don't believe I could support this within the natsort library.

from natsort.

tallforasmurf avatar tallforasmurf commented on August 13, 2024

Thanks for your as-always prompt & helpful reply.

On Mon, Apr 6, 2015 at 2:57 PM, Seth Morton [email protected]
wrote:

You could try sorting twice. The first time, use natsort with LOCALE, and
then the second time use python's built in sort with the itemgetter
function from the operator module to sort on the first letter using
ordinal sorting. Because sorting is guaranteed to be stable in python,
within each letter your order will be maintained, and the order of the
first letters will be what you want.

from natsort import natsorted, ns>>> from operator import itemgetter>>> a = ['Apple', 'c1', 'Banana', 'apple', 'c14', 'banana', 'C4', 'Apples', 'C30', 'Applu','c2']>>> sorted(natsorted(a, alg=ns.LOCALE), key=itemgetter(0))>>> ['Apple', 'Apples', 'Applu', 'Banana', 'C4', 'C30', 'apple', 'banana', 'c1', 'c2', 'c14']

I realize this may not be desirable since you will have to sort twice, but
I don't believe I could support this within the natsort library.


Reply to this email directly or view it on GitHub
#23 (comment).

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

I would like to highlight that you will run into issues if the first character is a digit, because that will ruin the order that natsort created for those entries. If that cannot be the case, this should be a safe method.

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

I was mistaken before when saying that I could not support this in the natsort library. It would require adding an additional option, probably UNGROUPLETTERS which would only mean anything if LOCALE was enabled.

I'm not sure if I will get a chance to do it for a few days, so if you want here is the diff to add the ability to do what you want without sorting twice; you can add it to your local copy for now. This implementation will force LOCALE to sort the way you want always, so it will be less flexible than a real implementation later. This also will gracefully handle when numbers are at the start of a string so my previous warning there would be void.

diff --git a/natsort/utils.py b/natsort/utils.py
index 0eb5302..a9d6759 100644
--- a/natsort/utils.py
+++ b/natsort/utils.py
@@ -275,6 +275,8 @@ def _natsort_key(val, key, alg):
                 val = val.swapcase()
             if alg & _ns['IGNORECASE']:
                 val = val.lower()
+            if use_locale and val[0] == val[0].upper():
+                val = b' ' + val if isinstance(val, bytes) else ' ' + val
             return tuple(_number_extracter(val,
                                            regex,
                                            num_function,

UPDATE: I'll probably get to it tonight, actually, since I'm thinking about it.

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

Please check out issue #26 and the latest version on PyPI for this fix.

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

Hello from the future. TYPESAFE is no longer required as of version 5.0.0 because this functionality is now always on without speed penalty due to a new internal structure of the code. If you add this it will now do nothing.

from natsort.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.