Code Monkey home page Code Monkey logo

Comments (12)

SethMMorton avatar SethMMorton commented on August 13, 2024

I'm not particularly interested in implementing this myself. Any takers?

from natsort.

thebigmunch avatar thebigmunch commented on August 13, 2024

I'm somewhat interested in this. But there's a much easier solution for SemVer specifically using the semver package:

>>> a = ['1.0.0-alpha', '1.0.0-alpha.1', '1.0.0-alpha.beta', '1.0.0-beta', '1.0.0-beta.2', '1.0.0-beta.11', '1.0.0-rc.1', '1.0.0']
>>> natsorted(a, key=semver.parse_version_info)
[
    '1.0.0-alpha',
    '1.0.0-alpha.1',
    '1.0.0-alpha.beta',
    '1.0.0-beta',
    '1.0.0-beta.2',
    '1.0.0-beta.11',
    '1.0.0-rc.1',
    '1.0.0',
]

Perhaps this should just be documented instead? There are many different versioning systems, so you'd likely end up making the API and/or code ugly trying to specifically support them all (or even just the most popular) from within natsort.

It could also be supported by creating an algorithm for SemVer and others if needed. Users would still need to specify the algorithm in the call, but natsort would handle any package imports, etc.

Edit: Should have used semver.parse_version_info instead of semver.parse.

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

I like this idea, but to be successful I think it needs to handle input that contains versions (e.g. package names with versions), like the below list

a = [
    "package-1.0.0.tar.gz",
    "package-1.0.0-alpha.tar.gz",
    "package-1.0.0-rc.gz",
    "package-1.0.0-alpha.1.tar.gz",
    "package-1.0.0-beta.tar.gz",
]

Can the semver package handle this (documentation's pretty light so it's not immediately obvious to me if it does).

Alternatively, users could jjust be recommended to remove "package-" and ".tar.gz" from their input as part of the key.

from natsort.

thebigmunch avatar thebigmunch commented on August 13, 2024

The semver only handles version strings.

So, it might be possible to find semantic version strings within package and file names (if algorithm specified) to help determine the sorting key in some way. The only tricky part might be separating extensions from the end of the version string in some cases. The semver package has a regular expression that might be modified a bit to get anything preceding and following the version string as well as the version string. It really depends on how generalized you want to get. Should it be limited to only things that look like package and file names? Should it be able to support any string as long as there is a semantic version string in it?

from natsort.

thebigmunch avatar thebigmunch commented on August 13, 2024

Yeah, I don't think there's going to be a reliable way to separate the file extension from dotted pre-release or build sections short of whitelisting extensions.

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

Yeah, that's why I had lost interest in implementing 😄

I think this can be done using a factory function given to the user so that they can make a custom key. I'll give it some thought and respond later today with an idea of what I am thinking.

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

What if natsort provided a key-generation function for semver that optionally accepted a regular expression that matches possible suffixes (like file extensions). This way, the user defines where the semantic version ends. (Instead of a key-generation function, if this were implemented as part of versorted then that function could just take an extra parameter for the possible suffixes.)

This is a really hard problem. I think that if it is implemented with known limitations, and those limitations are documented clearly, it will be a win.

from natsort.

thebigmunch avatar thebigmunch commented on August 13, 2024

I think that this is really a bigger change. If versorted is going to be taken out of deprecation and made the canonical of sorting version strings, package names with version strings, and file names with version strings (which it should for what is being proposed), this is a major, breaking change. Not only would deprecation undone, the semantics of versorted would be changed. Also, it should then support at least the version scheme natsort currently (mostly) supports, SemVer, and CalVer from the start. This leads to some more questions:

  • Should sorting by version be taken out of natsorted in favor of using versorted?
  • Should natsorted be made to support other version schemes instead?
  • Should this workaround be implemented in versorted for the default version scheme?
  • How to/Should versorted and/or natsorted handle sorting mixed input?
    • Strings without versions and strings with versions.
    • Version strings, package names with version strings, file names with version strings.
  • How many/what variations of package/file names to support? Leave it to the user in some way?

I'm sure I've probably forgotten some of the questions/ideas I came up with last night in bed about your idea. But here are my thoughts on these:

  • I think versorted should at least be strongly encouraged for version string sorting rather than using natsorted directly for all version schemes, if not having sorting by version scheme be limited to versorted.
  • I think the workaround for the default version scheme should be implemented in some way with versorted.
  • I haven't thought about mixed input enough yet to have a solid opinion. I'm leaning towards supporting the 2nd case, but not the 1st. And possibly making the 2nd case configurable, so it could be done by just the version or by prefix->version->suffix.
  • I'd have to look at a more exhaustive list of package naming for programming languages, etc to have a good idea of what is possible and necessary.

Note: Restrictions in my opinions generally include a `when a version algorithm or the versorted function is used' caveat. But, if I'm not mistaken, the currently supported version algorithm is on by default in natsorted, correct?


So, here's different questions I have: are there people who actually want file names sorted by versioning rather than as file names are sorted by <insert OS/file manager>? Is this a problem we should be worrying about? My gut feeling says that people are looking for OS/file manager sorting for file names, at least the other case would be quite rare.

I also think we're conflating many different ideas/features into one here. I think the idea of supporting version-based sorting on anything other than version strings is a separate idea from supporting sorting strings with versions based on a specific versioning scheme. Frankly, the current version scheme support isn't technically sorting by the version scheme anyway, hence the documented workaround. I think supporting sorting of version strings based on a version scheme through the use of algorithms is what should be done right now. Maybe it should still be done by taking versorted out of mothballs. I think this could even include versioned package names (which is a more likely case for versioning-based sorting) but not file names or arbitrary strings. If there's really a strong desire for sorting versioned file names in the future, it will be brought up and discussed at that time. I don't think we need to swallow the whole thing at once (and maybe not at all).

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

I think you have many good points. There's a lot to sift through - apologies if I miss something you felt is important.


I think that many of the points you made can be addressed if I give some history of natsort. When I originally released natsort, the default algorithm for sorting was using signed floats instead of unsigned ints. At the time this was my major use case so I made it the default.

In retrospect, this was a terrible idea. I had many issues filed where natsort did not give results meeting user's expectations. Out of fear of breaking backwards compatibility, Instead of changing the default to what most people actually want and expect I added the number_type and signed keyword options (this was before alg was available), and users could get their expected behavior with number_type=int, signed=False, or just number_type=None.

In retrospect, this was a terrible idea. Discoverability of this was low, and it is a lot to type. Again, instead of changing the default behavior to what people want 99% of the time, I decided to make it easier to use that algorithm by providing a function called versorted, because at the time I believed that the only reason you wouldn't want to sort by signed floats was to sort strings with versions in them.

In retrospect, this was a terrible idea. Now there was a function with a name that implies that it treated version numbers specially in some manner, when in fact it was just using a run-of-the-mill algorithm that just happens to work for most version numbers.

So, in natsort version 4 I made the default use unsigned integers instead of signed floats. Finally, a good idea. The only problem was that now there was this crusty old function versorted that I couldn't remove for backwards-compatibility reasons.

I don't really like the presence of versorted because it doesn't actually comprehend versions. It is ultra-misleading. The reason I created this issue was that if there is a versorted function, it probably should actually comprehend version numbers. Otherwise, it should be removed in the next major release.

Every other function within the natsort package can handle any type of input given to it - it just returns different results depending on which function was called. This is the reason was not excited about making versorted only handle version strings without anything before or after the version itself - it would not behave like the rest of the functions in the natsort suite.


Should sorting by version be taken out of natsorted in favor of using versorted?

natsorted has actually never actually comprehended versions. It separates out the numbers in a string then passes that result to sorted. Sorting versions cannot be taken out of natsorted because being able to sort most versions is just a natural consequence of this mechanism.

Should this workaround be implemented in versorted for the default version scheme?

I don't think so, because that only works if what a user is sorting is only the version, and if that is the limitation then semver.parse_version_info would do everything the workaround does.

How many/what variations of package/file names to support? Leave it to the user in some way?

I think this is getting a bit too specific. I really don't like the idea of tailoring the algorithm to assume the input data conforms to a particular "shape". Many of the problems I faced early on with this library were because I made assumptions about how the input data looked. So, rather than supporting packages/file names, the way I want to approach the problem is handling arbitrary input where the definition of the number is a version rather than a signed/unsigned float/int.

I think the workaround for the default version scheme should be implemented in some way with versorted.

I think that an optimal solution to finding versions in an arbitrary string would not need a workaround in order to give the correct results.

So, here's different questions I have: are there people who actually want file names sorted by versioning rather than as file names are sorted by <insert OS/file manager>? Is this a problem we should be worrying about? My gut feeling says that people are looking for OS/file manager sorting for file names, at least the other case would be quite rare.

This. I think these are the correct types of questions to be asking.

Consider that you have a folder of distributions of a package, e.g. "foo-1.0.0.zip", "foo-2.0.0.zip", etc. And you want to present them to a user to indicate the available packages they can use, starting from the latest. In this case the sorting would be on more than just the version.

Did natsorted work for me as-is? Yes. Do I think that people really need SemVer support for this? Maybe. No one has asked yet, so maybe it's not worth it.

Perhaps the whole idea of supporting SemVer natively and completely is me looking for a problem where there isn't one. Your suggestion of just using semver.parse_version_info as a key to natsorted would probably be fine solution for most cases, and in that case no change would need to be made to natsort, just maybe an additional section in the documentation. It could probably even replace the workaround you mentioned because it handles cases the workaround does not.

from natsort.

thebigmunch avatar thebigmunch commented on August 13, 2024

Just some quick clarifications and conclusion.

Should sorting by version be taken out of natsorted in favor of using versorted?

natsorted has actually never actually comprehended versions. It separates out the numbers in a string then passes that result to sorted. Sorting versions cannot be taken out of natsorted because being able to sort most versions is just a natural consequence of this mechanism.

Technically, as shown in the existence of that workaround, natsorted doesn't actually sort versions correctly by coincidence or otherwise. It only works properly when all versions are nothing but numbers (and separators).

Should this workaround be implemented in versorted for the default version scheme?

I don't think so, because that only works if what a user is sorting is only the version, and if that is the limitation then semver.parse_version_info would do everything the workaround does.
snip
Perhaps the whole idea of supporting SemVer natively and completely is me looking for a problem where there isn't one. Your suggestion of just using semver.parse_version_info as a key to natsorted would probably be fine solution for most cases, and in that case no change would need to be made to natsort, just maybe an additional section in the documentation. It could probably even replace the workaround you mentioned because it handles cases the workaround does not.

The versions in the workaround example are not valid semantic versions, so it couldn't replace the workaround for non-SemVer version strings. I'm not sure that workaround works properly for all semantic versions (or how many cases it actually does solve). I really do (and always did) think this should be a documented example using semver.parse_version_info as the key. That being said, I always enjoy thinking out and discussing things like this. And find it helpful when someone else does so with me when I'm working on API/usability ideas and issues.

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

Regarding your first point, I think we are both in agreement. There is no handling within natsort at all for versions. It just happens to work for versions following MAJOR.MINOR.PATCH, which was the whole point of that convention in the first place. For the vast majority of cases this is enough, which is where the statement "being able to sort most versions is just a natural consequence of this mechanism" came from, emphasis on most.

The real issue is that this is not called out explicitly in the documentation. I will make sure to do that.

As for your second point, I hadn't given it too much thought. The real problem is (as you pointed out in an earlier comment) that there are simply too many version number conventions to be able to reliably handle all of them. The best case scenario is to show users examples of how to handle various version schemes (like the workaround or semver.parse_verison_info) and then keep the API general.

To avoid confusion, in the next major release I think versorted should simply be removed from the API.

from natsort.

SethMMorton avatar SethMMorton commented on August 13, 2024

Resolution:

  • Add more info in documentation about what version sorting will work out-of-the-box and what will not
  • Direct users to use third-party modules to handle specific versioning schemes.
  • Handle anything more complicated when it arrives.

@thebigmunch Thanks for the discussion!

from natsort.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.