morwenn / cpp-sort Goto Github PK
View Code? Open in Web Editor NEWSorting algorithms & related tools for C++14
License: MIT License
Sorting algorithms & related tools for C++14
License: MIT License
The Ranges TS introduces the notion of sentinels. A sentinel is an end iterator with a different type than the actual iterators; it shall be comparable to the other iterators and may be used to help the compiler generate better code.
We probably want to have such sentinel iterators at some point in cpp-sort. However, it will probably require lots of tricky SFINAE and modifying most of the available algorithms to get things right.
It is worth noting that the Ranges TS makes sort
and stable_sort
return an iterator to the last element, which makes sense in the context of sentinels: when a couting/size sentinel is passed, the last iterator isn't known beforehand, so the algorithm returns it since it might be useful. This adds a new question: how do we handle adapters such as counting_adapter
that already override the return type? Do we still override that return type, or do we return a tuple? The latter solution allows not to throw away information, so it might be preferred.
While specializations of fixed-size sorters are actual sorters, fixed-size sorters are not sorters and sorter_traits
cannot always provide enough information about an unspecialized fixed-sized sorter. For example, it could be interesting to know for which sizes a fixed-size sorter is specialized.
We could add a specializable fixed_sorter_traits
to the library. It would at least contain an integer sequence corresponding to the values of N
for which it has specializations, which could help small_array_adapter
when no integer sequence is actually provided. How would we tell that a fixed-size sorter like low_moves_sorter
works with any size (a SFINAE check could use the size information only when available)? In which header would the feature live (sorter_traits.h
)?
Since small_array_adapter
makes a sorter out of a fixed-size sorter, it would be great if it could provide information about the fixed-size sorter as a whole to create its own sorter traits. The fixed-size sorter trait could contain the stability (stable if every specialization is stable) and the iterator category (most restrictive one among the specializations).
cpp-sort currently provides compile-time information about the stability of a sorting algorithm via sorter_traits
even though this information is currently unused. However, stability may be important and we could provide tools to work with it. Here are some ideas for future designs:
stable_adapter
template to make any algorithm stable. It would do so by comparing (value, index)
pairs instead of simply comparing the values.stable_adapter
would probably call the original sorting algorithm directly if sorter_traits<Sorter>::is_stable == true
.stable_adapter
for selection_sorter
to achieve that, but is it a good idea?std_stable_sorter
by stable_adapter<std_sorter>
?default_sorter
should be unstable but there should also be a stable_adapter<default_sorter>
specialization made only of sorting algorithms.stable_adapter
generates an index-based stable sorting algorithm for forward iterators?As always, lots of question and the proposed design could probably be improved.
N4475 and a few related papers discuss the idea of generating the operators ==
, !=
, <
, >
, <=
and >=
be defaults for classes, with "usual" semantics (too long to explain here) and under some specific conditions.
As every paper discussing comparison, it might have an influence on sorting operations, but how? It seems that a generated operator<
would perform a lexicographic comparison over every member of the class to compare, in declaration order. If every member of a class is taken into account, then a basic sort over the structure is stable under some conditions:
Provided the standard library adds type traits such as is_trivially_less_than_comparable
to help identify such classes, it may be possible to make stable sorts use equivalent unstable sorts when they exist since the latter should be more performant. It relates to issue #21 but more work has to be done to identify when such a trick is usable and whether it is worth it.
The current buffer providers fixed_buffer
and dynamic_buffer
are fine for simple tasks, but they can't handle types that don't have a default constructor, and I guess they can't handle proxy iterators either. We need a new dynamic buffer type that allows that. It probably means that a chunk a raw memory would be allocated, and every element should be associated to a boolean to know whether a type has already been constructed or not. Also, it should properly handle cleaning of resources when destructed.
Such a buffer could also be optimized thanks to traits such as std::is_trivially_destructible
and friends once the basic features are there. It would allow to lower the otherwise rather expensive cost of the buffer provider when possible.
I gave it some more thought and tried to implement the thing. However, it seems that such a feature would need something akin to operator.
to work properly and handle all the use cases in the library. The buffered sorters would also have to be modified so that they can handle arbitrary iterator types from the buffer providers rather than only pointers.
More than once, Matt Austern raised awareness of segmented data structures such as std::deque
. The standard algorithms are currently suboptimal for these data structures, while they could be way better if they new that the given iterators correspond to a segmented structure (some standard library implementations already have specific algorithm overloads for std::deque
iterators).
Sorting algorithms could take advantage of segmented iterators if they knew about it. For example, if an std::deque
uses fixed arrays of only 32 elements, then some kind of bottom-up mergesort could start by sorting each subarray, then it would merge them. It would probably increase data locality and thus performance.
The Meeting C++ 2015 presentation contains some ideas about how the mechanism could be implemented. While it would probably require some standard library support, it is highly unlikely that standard support for segmented iterators will make it into the standard in the next few years (there aren't even official proposals about them), so we might as well start reimplementing those ideas locally.
A CppCon2016 presentation by Patrick Niedzielski gives more insights and ideas for segmented iterators. Moreover, it also defines the SegmentedIterator
concept and a few other things that could serve as a base for a library.
Yet another fancy idea that could be thrown into the library: some kind of automatic partial application for comparison and projection functions. Let's say we always want to sort a collection of wrapper
objects on their value
field in descending order. We could acquire a sorter object that already knows about these comparison and projection functions by the time it's given the collection to sort:
auto sorter = cool_sorter(std::greater<>{}, &wrapper::value);
sorter(collection);
sorter(begin, end);
It sounds cool on the paper, but it actually means that there would be different sorter templates and that we need factory functions everywhere, that can hardly be generated simply (maybe some help from P0091 if it ever gets accepted?). It also means even more SFINAE madness and probably no conversions to function pointers anymore if we decide to store the comparison and projection functions into the sorter. Probably too many problems to solve at once for a little added benefit. I may start to work on such an issue when I really don't have anything else to do.
Update: On second thoughts, it could be one step among the ones required to handle mutable sorters. In its current state, the library only has explicit support for non-mutable sorters, which seems fine, but there are probably valid examples of mutable sorters somewhere, and the library might want to make them first-class citizens too (but not today, we're already busy).
The current sorting networks as well as the ones considered in #54 are "optimal" sorting networks but only take into account dedicated networks for some sizes. While it is probably enough most of the time, there might be use cases for sorting networks bigger than 32 elements.
We could add generic sorting networks to the library implementing the following algorithms:
We could maybe add some kind of generic sorting sorter network that generates size-optimal sorting networks by dividing the network in half, sorting each half with a size-optimal sorting network, then merging both halves with Batcher's merge network.
Comparing strings by their ASCII values is often enough for programmer's purposes, but other kinds of string sorting are often needed for user interaction:
"I have 99 problems"
should be "smaller" than "I have 100 problems"
."Whatever"
and "whatever"
should compare equal. This comparator should convert the letters at every comparison since the alternative using more memory can simply be realized with schwartz_adapter
and a simple upper
or lower
projection.Both might be needed at once, so a general string_compare
could be provided, with a bitfield-like structure describing which properties should be taken into account. Not sure whether it's better to pass the value as a template parameter at compile-time, or as a parameter to the constructor at runtime. If it is passed at compile-time, it allows to have dedicated case_insensitive_compare
and natural_compare
specialization type aliases implementing more efficient algorithms. There might be way to force string_compare
to accept runtime parameters when no compile-time parameter is explicitly given.
Concerning the names, natural_less
may be better than natural_compare
; morevoer, it might allow for a corresponding natural_greater
if needed. Making them customization points might let the door open for other people to reimplement the functions for custom string types.
This comparator would only handle ASCII, not unicode since handling unicode would require more code than the library has in its current state.
This could be the birth of a new cpp-sort/comparators
subdirectory and the linked cpp-sort/comparators.h
general include file. It could be interesting to expose some of the comparators in detail
such as projection_compare
, since that specific one could potentially be used at more places in the library and thus might need some documentation (see issue #53). The *_less
and *_greater
comparators described in #18 can also be included in this module.
Issue #67 introduced the basic container_aware_adapter
understanding a sort
ADL-found function and added algorithms for std::list
and std::forward_list
with a few sorters. However, it lets a lot of questions unanswered, mainly about stability:
container_aware_adapter
should be able to understand a stable_sort
ADL-found function with the same rules as the ones used to find sort
. All the easy additional overloads should also be generated to properly handle comparisons, projections and the lack thereof.stable_adapter<container_aware_adapter<Sorter>>
. If such a sorter is given a suitable collection to sort, it would look for stable_sort
instead of sort
. Should we also take container_aware_adapter<stable_adapter<Sorter>>
into account?container_aware_adapter
should be able to use an ADL-found stable_sort
for an unstable sort when ADL doesn't find any suitable sort
function in the container to sort's namespace.is_stable
is implement (see issue #75), we need to find a way to document whether container_aware_adapter<Sorter>
is stable when called with Args...
. It could probably be done by checking whether it finds a sort
function with ADL or whether it finds a stable_sort
function instead. New SFINAE madness incoming, but where is the fun without those?Currently the Travis process only checks that the tests run with clang++ in Debug/Release mode with the default standard library (apparently libstdc++). We should update the relevant CMakeLists.txt and the .travis.yml to check that the tests pass with clang++ with both libc++ and libstdc++.
After the clang++-3.8 whitelist thingy is fixed that is... [Update: solved]
Projections are an interesting tool to transform elements on-the-fly when sorting them, but algorithms may be slower than needed if a projection is expensive to compute. A simple solution to handle that case would be to introduce a schwartz_adapter
that would create a new array of elements which would store the iterator to the corresponding element in the original array and a the projected element to make sure that every element is projected only once. It may help to implement the generic solution for stable_adapter
(issue #21).
A small design issue though: the adapter could either take the transformation as a template parameter only store it. The first design would probably prevent the use of function pointers as projections, but the second one would prevent the transformation of a schwartz_adapter
into a function pointer, and I am not sure which of these properties is the most desirable. Adding a function pointer wrapper utility::function_constant<>
could solve half of the problem by making it possible to pass function pointers as template parameters:
using sorter = schwartz_adapter<
verge_sorter,
utility::function_constant<int(*)(int), &std::abs>
>;
Just like with #40, the adoption of P0091 would perhaps make it possible to hide two similars classes under a same factory function: one that converts to a function pointer and one that stores an invocable object instance.
Second possible design, which could have the best of both worlds:
using sorter = schwartz_adapter<verge_sorter>;
In this case, schwartz_adapter
would look like a normal sorter and would only change the way the projection parameter is handled. That way it could really be used as a drop-in replacement for the sorter it wraps whenever a Schwartzian transform is needed.
Sadly, it feels like this adapter needs the proxy iterator mechanism of #30 (iter_swap
and iter_move
tricks) in order to work, which means that every adapted sorter shall make perform an ADL lookup for the two aforementioned functions in order for it to work with the adapter. In the case of std_sorter
and std_stable_sorter
, it means that we will have to wait for a standardization of the Ranges TS or at least of the proxy iterators proposal if it ever gets accepted separately.
In their current form, type-specific algorithms are designed to sort collections of specific types, and that's pretty much it. However, it would be more interesting if these algorithms accepted to sort a collection if the projected elements correspond to the accepted types.
While it is probably hard for some of them (e.g.: some incarnations of counting sort discard the original values and overwrite them later), I believe that some of these algorithms already move the elements to sort, in which case the original elements are preserved, and it should be possible to make them work with projected elements.
For example, it is currently possible to write this:
std::vector<int> vec = { /* ... */ };
integer_sort(vec);
But the following fails:
struct wrapper { int value; };
std::vector<wrapper> vec = { /* ... */ };
integer_sort(vec, &wrapper::value);
Which is a shame: I guess that most real-world data to be sorted are not raw integers, but it sorting collections of objects based on an integer key is probably common enough. If possible, integer-specific algorithms should strive to sort objects that return an integer when projected. More generally, type-specific sorters should accept to sort a collection when the projected elements return an appropriate type. It would make them far more usable than they are now.
Apparently the idea is sane and actually implementable. As always, a small TODO list:
integer_spread_sorter
.float_spread_sorter
.string_spread_sorter
.schwartz_adapter
.It should be possible to make a full-fledged projection sorter given a sorter implementation that supports generic comparison by transforming the comparison operation into a new operation that performs compare(proj(lhs), proj(rhs))
instead of compare(lhs, rhs)
. Such a trick would allow to give out-of-the-box projection support to std_sorter
and std_stable_sorter
without having to wait for the Ranges TS to provide the appropriate operations.
As an added bonus, it would simplify projection handling for custom sorters. That said, introducing all the quirks of getting a sorter right is important, so the tutorial should still introduce projection parameters and then end that part with "lol, actually you don't need that". Honestly, manual handling of projections seems unneeded if not for potential optimizations (e.g. lower_bound
or upper_bound
where the value only has to be projected once. The compiler is likely to optimize that too, but still...).
projection_compare
class to bake projection into comparison.sorter_facade
to handle projections as described above.std_sorter
and std_stable_sorter
.Then maybe remove the projections from most algorithms when they apparently don't optimize anything.
self_sort_adapter
should be enhanced to understand the method stable_sort
as well as the method sort
. For any Sorter
, self_sort_adapter<Sorter>
should call the container to sort's sort
method if there one, or stable_sort
if there if no sort
method.
stable_adapter<self_sort_adapter<Sorter>>
should call the container to sort's stable_sort
method if there is one.
self_sort_adapter
aware of stable_sort
.stable_adapter<self_sort_adapter<Sorter>>
.The original design of the Ranges TS already proposes some extensions to the standard library algorithms (see issues #13 and #20), but there are also proposed extensions to this TS that may be incroporated into the future. P0022: Proxy Iterators for the Ranges Extensions is one of them and is somewhat likely to be accepted at some point.
The proposal introduces the function std::iter_move
and makes it a customization point as well as std::iter_swap
. These functions are meant to solve the problem of proxy iterators that can be encountered when using classes like std::vector<bool>
. If this proposal makes its way to the standard, we might want to update cpp-sort to support that model as well. It shouldn't be much harder than adding some using
declarations and replacing some std::move(*it)
calls by the corresponding iter_move(it)
when moving values out of the collections.
On second thought, this change will likely cause a problem to algorithms using swap_if
, whose optimizations are required for sorting networks to actually be fast when sorting integers. There may be a way to tweak the design to get the best of both worlds; currently my best bet is that cpp-sort could use an iter_swap_if
function that detects whether an unqualified iter_swap(a, b)
is a valid expression, use it if it is and, fall back to swap_if
otherwise.
Now that it isn't a future issue anymore, a small roadmap:
utility::iter_swap
and utility::iter_move
.std_sorter
and std_stable_sorter
).swap_if
problem.utility::iter_move
and utility::iter_swap
.bubble_sorter
tutorial an example.As of today, the latest version of clang++ is unable to compile cpp-sort. While some bugs were actually due to g++ accepting code it shoudn't have accepted, some other errors are less obvious and it is harder to tell who is wrong. I would really like cpp-sort to compile with clang++ but it seems that it still won't be possible before a while. Most notably, it seems that clang++ isn't able to use the select_overload
idiom as described in this article, which is one of the most important building bocks of hybrid_adapter
.
The following commits shoud fix issues that block the compilation with clang++ 3.8:
This issue will be updated to keep track of the commits made to improve clang++ compatibility.
In one of his blog posts, Eric Nieber introduces generic counted iterators, which might come along with the Ranges TS. These iterators allow to use the same algorithms for usual iterables and counted iterables instead of having to provide counted versions of the algorithms (e.g. copy_n
and friends). The proposed mechanism relies on sentinel parameters, so this issue logically comes after issue #20, and would also need a small update on the standard library side in order to work properly, making it a future issue.
Anyway, we should take into account the fact that counting iterables will be proposed for standardization at some point, and think of how it might impact cpp-sort. It shouldn't make things really different, and even things like counted
and uncounted
should not be a problem since we don't have counted algorithms in the library.
A new problem arises: in the new iterations of the work related to the Ranges TS, std::sort
is specified to return the end iterator. This is because when passed a normal iterator and a counted sentinel, the actual end iterator of the algorithm is unknown even though it is valuable information that is most probably computed by the sorting algorithm. The first obvious problem is that we will probably have to add that to every sorting algorithm of the library.
The second and most subtle problem has to do with adapters: today counting_adapter
returns an integer type. When given a sorter that returns an iterator, should it return the integer type instead, or build a tuple with the integer type and the iterator type? I have currently no elegant solution to this problem.
cpp-sort already uses sorting algorithms from all around the web (pdqsort, spreadsort, timsort, etc...); adding more of them in the long run can't be that harmful. Smart people created many interesting sorting algorithms and it would be a waste not to try them all. Here is a list of interesting projects that we could use to complete the library:
flat_stable_sort
from Boost.Sort.spinsort
from Boost.Sort.parallel_stable_sort
, sample_sort
, etc.unsigned char
strings.There are probably even more interesting sorting algorithms around. However, adapting them is often not trivial. For example SqrtSort uses 3-way comparisons and C-isms all over the place, Exact-Sort mixes IO operations with the sorting logic, some of the algorithms simply don't have an implementation in C++... Adapting these algorithms implies that we have to adapt them to use an iterator interface, to add support for arbitrary comparison functions (or even projections) and to make them work with move-only types when possible. The sort::parallel
library will probably be an interesting way to test the integration of parallel sorting algorithms in cpp-sort; floki would be a way to integrate SIMD algorithms in the library.
Reminder: ask the authors for the license when needed.
It saddens me bit since it is was the original motivation for cpp-sort, but small_array_adapter
has to go: it's too specific and too too hard to maintain. For example, for an array of size 10, one sorting algorithm was faster for int
and long int
, another one for float
, another one for double
and yet another one for long long int
and long double
. And we're still only talking about a small subset of the built-in types, and those results might change depending on the architecture. In other words, maintaining a sorter whose goal is raw speed is impossible. Therefore, small_array_adapter
has to go. However, it should provide tools to let developers create an equivalent adapter if needed, here is what is needed:
sorting_network_adapter
: a small arrays adapter that implements sorting network algorithms with possibly the smallest number of conditional swaps. Sorting networks seem to be among the fastest algorithms when it comes to sorting small arrays of integers.Basically, we should provide to the users simple tools that will allow them to find the most suitable small-arrays sorting algorithm depending on their needs.
It saddens me but the library design is definitely not standard-compliant. Even though I managed to work around that for both g++ and clang++, the handling of projection parameters creates even more problems that I'm not able to solve and it will probably only get worse with time. Uglifying the creation of sorters seems to be a necessary step for standard compliance. Here is the proposed new design:
struct foo_sorter_impl
{
// regular sorter code
};
// strong typedef for better error messages
struct foo_sorter:
cppsort::sorter_facade<foo_sorter_impl>
{};
Exit the CRTP and the tricky non-standard code. However, it means that sorter writers have to introduce a new meaningless name. I hate that, but I don't think it can sanely be worked around. At least, sorter_facade
should be easier to reason about once the changes are done. Fortunately, it shouldn't change the way sorter are used, but writing them is suddenly not as cool as it used to be.
cpp-sort currently only cares about in-place sorting algorithms: algorithms that mutate the passed collection and don't return anything (except when they do), which is pretty much enough for people most of the time. However, I believe that there is some value in having algorithms that actually copy a sorted equivalent of the passed collection into the library. It once again leads to maaaaany design decisions depending on how we want to implement that:
cppsort::sort_copy
function in the library taking a sorter (or defaulting to default_sorter
) and an additional OutputIterator
parameter?cppsort::sort_move
function?cppsort::sorted
function that takes an iterable and returns a new instance of the same iterable? Make it work with an OutputIterator
instead? Have it take any sorter like cppsort::sort
?cppsort::sorted_view
returning a range with sorted iterators which, when iterated, corresponds to a sorted view of the original collection? It's pretty much what indirect_adapter
does in the first place, but decoupling it from the adapter might also be interesting.is_online
property in cppsort::sorter_traits
is easy, but how to use it properly?An interesting algorithm about incremental sort that might provide some idea and/or insights about online/incremental sorting: http://larshagencpp.github.io/blog/2016/04/23/fast-incremental-sort
Currently, because of the way hybrid_adapter
and sorter_traits
work, the following does not pick the most suitable algorithm possible:
using sorter = cppsort::hybrid_adapter<
cppsort::hybrid_adapter<
forward_sorter,
random_access_sorter
>,
bidirectional_sorter
>;
If we feed a random-access iterable to sorter
, it will pick bidirectional_sorter
instead of random_access_sorter
because the iterator category of the nested hybrid_adapter
is forward_sorter
. To overcome this problem, hybrid_adapter
should unwrap so that the sorter above is equivalent to:
using sorter = cppsort::hybrid_adapter<
forward_sorter,
random_access_sorter
bidirectional_sorter
>;
Unfortunately, this is not trivial since hybrid_adapter
already performs some heavy wizardry and even a bit of back magic. Basically, it would be nice to be able to write the following specialization:
template<typename... TT, typename... UU, typename... VV>
struct hybrid_adapter<TT..., hybrid_adapter<UU...>, VV...>:
hybrid_adapter<TT..., UU..., VV...>
{};
Unfortunately, it's not possible to expand a parameter pack on the left like that so it needs to be done differently. Implementing that proved to be complex and I still did not manage to tackle the problem, be it from the inside of hybrid_adapter
or from the outside. The closest thing I got is this:
template<typename... Sorters1, typename... Sorters2>
struct hybrid_adapter<hybrid_adapter<Sorters1...>, Sorters2...>:
hybrid_adapter<Sorters1..., Sorters2...>
{};
Unfortunately, it only unwraps nested hybrid_adapter
if it appears at the beginning of the parameter pack, so it only solves half of the problem. Having a half-fix is worse than no fix at all in this case though, so I won't implement it in master.
While having sorter_traits
as a separate entity is fine by itself and might be specialized for classes that we don't maintain in case we need it, opening the cppsort
namespace to write a specialization of sorter_traits
is a bit bothersome since it introduces some unneeded boilerplate. Moreover, when specializing sorter_traits
for a sorter in the global namespace, one has to make sure that a component with the same name doesn't already exist in the namespace cppsort
, and that's not always obvious.
To simplify the implementation of sorters, it might be an interesting idea to allow sorter writers to embed iterator_category
and is_stable
directly into the sorter and have sorter_traits
use those when they exist. Then we could do follow the standard library containers model: declare the traits in the sorter when possible, in the sorter traits otherwise, and always access them from the sorter traits.
One problem I can think of is the potential case where the traits declared in the sorter do not correspond to the ones defined in sorter_traits
. I hope that such cases are rare though. Good idea or not? We'll probably see soon enough.
We could also enhance sorter_traits
so that it defines types only if they exist in the sorter. That would decrease the coupling between traits. For example, it would be sad if hybrid_adapter
didn't work because one of the sorters doesn't define is_stable
even though hybrid_adapter
doesn't use it.
sorter_traits
to detect traits in sorters, without imposing the presence of any of them unless required.#48 added a few measures of presortedness into the library, and there still hasn't been a problem having them around. The original measures of presortedness were the eight ones described in Right invariant metrics and measures of presortedness, but there are actually more of them in the litterature. We could complete the module with the measures of presortedness described in the following papers:
The following to-be-implemented measures of presortedness can be found in the aforementioned articles:
At the time of writing, all of the sorting algorithms provided by cpp-sort are sequential. Having parallel sorting algorithms too would be great, but how do we want to implement them? No definitive design but several ideas:
std::is_execution_policy
should ease the implementation a bit: for example cppsort::sort
could simply use tag dispatch to forward the parameter appropriately and do the current checks and layer deeper. Hope it can work well with hybrid_adapter
too.merge_sorter
or std_sorter
would probably keep the same name and provide both sequential and parallel versions of the algorithm. Other sorters will probably be parallel-only and implement some specific sorting algorithms.sorter_facade
could probably make some default overloads when no execution policy is provided or when std::vec
is passed. Not sure in which sense.Annnnnnd, once again, many design questions that won't have an answer right now. Anyway, these algorithms probably won't see the light before the parallelism TS is widely supported. The work will probably start in an experimental branch and won't be merged before long.
New thoughts for a user-friendly yet flexible design: make sorters take execution policy and allow the to call different but related overloads depending on the policy (for example merge_sorter
has obvious sequencial and parallel versions). However, do not default the execution policy to std::sequential_execution_policy
: instead add a default_policy
trait to sorters and make sorter_facade
call the overload for that default execution policy when none is provided; it should still be std::sequential_execution_policy
for most sorters, but some sorters could implement algorithms that are parallel by nature and don't have a sequencial counterpart, making std::parallel_execution_policy
the obvious default for them. When a sorter does not provide a default execution policy, make sorter_facade
call an operator()
overload that does not take an execution policy if such an overload exists, and trigger an error otherwise (soft or hard, TBD).
Such a mechanism should ensure that the sorters mostly do what we expect them to do while still remaining flexible. Moreover, the last point should also allow to use old sorters that are not meant to work with execution policies without having to modify them.
A small problem: sorter_facade
won't be able to able to take advantage of a specialized sorter_traits
to find the implementation since it is often specialized for the sorter and not the sorter implementation. One woud have to specialize it for the sorter implementation instead. It could probably be done (I'll have to analyze it first) but then it would require to change the tutorials to advise to specialize sorter_traits
for the sorter implementation and not for the sorter itself. Then, it should be possible to use sorter_traits
in sorter_facade
to find the default execution policy.
A sorter implementation would then look like that:
struct spam_sorter_impl
{
template<typename Iterator>
auto operator()(std::sequential_execution_policy,
Iterator first, Iterator last) const
{ /* ... */ }
template<typename Iterator>
auto operator()(std::parallel_execution_policy,
Iterator first, Iterator last) const
{ /* ... */ }
using default_policy = std::sequential_execution_policy;
};
The proposed executor_traits
uses execution_category
, so default_policy
could be named default_execution_category
instead to follow the same naming scheme. I doubt the name would be used that often, so we might as well have consistent names even if they are a bit long.
How to handle the type-erasing container std::execution_policy
? Every implementation I have seen implements algorithms as virtual members of an execution policy so that polymorphism works, but since we do not implement the execution policies, it seems hard to extend. Open methods could be a solution, but if they ever get standardized, it will likely not be before years and years. Update: apparently std::execution_policy
was not voted into C++17, which means that we won't have to handle the problem before a while.
Shell sort and comb sort are common sorting algorithms that have in common the fact that their complexity depends on a series of integer values that can either be generated by a formula or given by hand. Comb sort generally takes a single ratio used to compute the successive gaps. These algorithms could be interesting additions to the library but - as usual - open a new can of worms:
std::index_sequence
instead?dynamic_buffer
?Anyway, the bufferef sorters made it pretty clear that the configuration of a sorter would be in the type and not at call time. I think that we can at least agree on that point.
P0100, Comparison in C++, is a proposal to enhance comparisons in C++, which may directly affect sorting algorithms and related features. Among the future improvements are the following ones:
Since cpp-sort is all about sorting, these ideas may be interesting to improve the library. That said, I don't want to change anything before an actual design is proposed to alter the algorithms in the standard library to handle three-way comparators. Once such a design has been proposed and validated, I'll start updating cpp-sort so that it matches the new design.
Among the changes proposed or implied by P0100, here is what could be used to improve cpp-sort, and a bunch of interesting questions:
std::qsort
(mostly used to show how other algorithms are better though).not compare(x, y)
used to either make sorts stable or make them more efficient with equivalent values.grail_sort
already uses some kind of three-way comparator, but one that almost always perform more operations than it should. Standard three-way comparators would be better for sure.total_order
, if possible so that the results are a bit more deterministic?total_order
).Lots of things to think through, but it may be interesting in the end :)
As always, here we go with a list of features to implement:
partial_ordering
, weak_ordering
and total_ordering
.partial_order
, weak_order
and total_order
.partial_less
, partial_unordered
, partial_greater
, weak_less
, weak_equivalence
, weak_greater
, total_less
, total_equal
and total_greater
.std::string::compare
.three_way_compare
so that it can adapt both binary and trinary comparators. Make it take an order and 4 functions objects corresponding to <
, >
, <=
and >=
, which could possibly be optimized. If these function objects are not provided by the user, default them to generic function generating the comparison from the order function.three_way_compare
and transform everything with it).bubble_sorter
example (it might get tricky).C++17 is around the corner, and the plan is to move on as soon as possible. Things get standardized by the day (except when they don't), and even though I can't miss some of them (like the execution policies required for #22), it's still good to have a list of the small things that can easily be changed (strikethrough text means that the feature won't/can't be used):
destroy_at
with std::destroy_at
.utility::void_t
by std::void_t
.utility::is_invocable
by std::is_invocable
.utility::conjunction
, utility::disjunction
and utility::negation
by std::conjunction
, std::disjunction
and std::negation
.std::bool_constant
where possible.std::invoke_result
instead of the now deprecated std::result_of
.any
and all
by fold expressions, and get rid of the functions.static_assert
when there isn't any message to start with.typename
instead of class
in template template parameters.[[fallthrough]]
in the loop unrolling function in ska_sorter
.utility::size
could surely use the non-member std::size
.static_const
+ anonymous namespace tricks by inline
variables.__has_include
can be neat to avoid checking against random macros. It could be used to detect standard library extensions: for example libstdc++'s bitmap_allocator
seems to significantly improve merge_insertion_sort
almost for free.constexpr
lambdas can be used to make constexpr
the conversion of a sorter into a function pointer in sorter_facade
.make_*
factories all over the place, and to reduce the amount of diamond operators with std::less
.auto sorter = schwartz_adapter(quick_sorter);
.rotate
by std::gcd
.std::string_view
.noexcept
specification of iter_swap
and friendsnoexcept
conditions and I would rather wait for the Ranges TS functions).operator new[]
could be used when temporary buffers are needed. A simple 16-byte alignment could ensure better performance for some operations when SIMD instructions are available.[[maybe_unused]]
can be used.[[nodiscard]]
is used in the wild and check whether some functions could be marked with the attributestd::not_fn
may be used in detail::inplace_merge
's implementation to replace detail::negate
in a more generic fashionif constexpr
might help to rewrite sorter_facade
and get rid of the whole SFINAE madness (maybe). In which case I'd expect a more readable class, and maybe even faster compile times and overload resolution.if constexpr
can also be used to reintroduce the old tests of spread_sorter
for std::wstring
, which were dropped because the whole dispatching was too complex for the tests.sizeof(wchar_t) == 2
is not enough to discard the branch)if
with initializer, template<auto>
, or guaranteed copy elision.hybrid_adapter
's sorters_merger
by using operator()...
.std::invoke
to call the functions passed to the algorithms instead of using utility::as_function
(P0312 would be awesome, but I guess it wasn't accepted).utility::as_function
in some places, so we can't get rid of it.While most of these changes are mere implementation details which improve only the quality of the implementation, and sometimes its correctness or performance, a few of them also impact the documentation and the examples:
std::invoke
can be used instead of utility::as_function
if one wants to use less cpp-sort and more standard library. Also, change the tutorial accordingly.std::
equivalents or core language counterparts instead.merge_insertion_sort
might be faster with libstdc++ thanks to GNU allocators.I will start working on a C++17 branch as soon as GCC 6 is available on Travis. I plan to first create a C++17 branch, then make it the master branch when C++17 is standardized, and maintain a C++14 branch meanwhile. I plan to drop support for C++14 once C++19/C++20 is standardized. The documentation should reflect both the maintained legacy branch, the current branch and the future standard branch. Once the support for a branch has been totally dropped, create a wiki branch with the current state of the legacy documentation to make sure that it stays around.
Some sorting algorithms such as block sort and grail sort optionally use a buffer, either fixed or dynamic (in which case it often depends on the size of the input) to speed up the sorting process and exhibit somewhat significant performance changes depending on how the buffer is allocated. As things are right now, block sort uses a fixed-size 512 elements buffer and I plan to implement the version of grail sort that uses no buffer at all (SqrtSort is a variation of grail sort using a buffer whose size corresponds to the square root of the size of the input). I wish there was a simple way for users to use different buffering strategies for the same algorithms depending on their needs, but really don't see how: adding new parameters to the functions would be a mess (named parameters would be great though) and adding a template parameter does not feel like the greatest idea in the world either.
Another solution would be to give sorters a template parameter corresponding to the buffering strategy, but it wouldn't be as nice syntax-wise because of the mandatory diamond <>
even for the default buffering strategy. I'm a bit short on ideas for this one. New ideas are welcome.
Examples of usage:
using sorter1 = block_sorter<fixed_buffer<512>>;
using sorter2 = grail_sorter<dynamic_buffer<sqrt>>;
Apparenty, the code above needs something akin to partial application of template parameters or template template alias in order to work. There is a workaround though: even though it makes writing buffer providers harder, it makes writing and using buffered sorters easier:
template<std::size_t N>
struct fixed_buffer
{
template<typename T>
struct implementation
{
implementation(std::size_t size) { /* discard size */ }
auto begin() const { return std::begin(data); }
auto end() const { return std::end(data); }
T data[N]; // actual buffering memory
};
};
The buffered sorting algorithms would then use Buffer::template implementation<T>
to get an instance of the actual buffer provider. That said, implementation
is an ugly name, and we can probably do better. The function object types log
and sqrt
could live in a <cpp-sort/utility/functional.h>
header where we could also move utility::identity
for consistency. The templates fixed_buffer
and dynamic_buffer
could live in the header <cpp-sort/utility/memory.h>
.
Things are becoming clearer. We can probably go on with the following roadmap:
fixed_buffer
, taking a size.dynamic_buffer
, taking a size policy.log
and sqrt
.block_sorter
and give it a default fixed_buffer<512>
template parameter.block_sorter
, library concepts and how to write sorters).Open questions: what about allocators? Do we want them as an additional template parameter for dynamic_buffer
? Is there a way to wrap std::get_temporary_buffer
to allocate some memory on the fly? Should it take a function object parameter like dynamic_buffer
?
constexpr
algorithmsP0202 proposes to add constepxr
to many functions from <cstring>
and to basically every algorithm from <algorithm>
. Apparently, the proposal was rather well-received and we might have constexpr
algorithms in the standard at some point in the future.
So, what about cpp-sort? Since we're already using C++14, we can use the improved constexpr
but I feel that it won't solve every problem:
std::swap
and a few other utility functions are not constexpr
, but we could probably reimplement a few of them. Update: actually, some parts of <functional>
such as std::mem_fn
and a good part of <iterator>
would have to be reimplemented too in order to handle everything, and some parts are rather huge.constexpr
either, but since we ship altered versions of many of them, adding constexpr
shouldn't be a problem.constexpr
version of the standard algorithms such as Bolero Murakami's Sprout don't use algorithms that allocate heap memory; these projects are generally only meant to be used at compile-time.I would be trivial to change some algorithms to be constexpr
but it's probably not the case for most of them. A partial constexpr
support could be a starting point.
The first revision of P0202 takes several design choices into consideration and gives the following set of rules to make standard algorithms constexpr
(<cstring>
has been disqualified from constexpr
enhancements):
constexpr
should be enough.std::memcpy
and friends, so compilers are encouraged to use their own equivalent constexpr
intrinsics of these functions instead. It concerns everything that relies on the std::copy
and std::move
family of algorithms.std::stable_partition
and std::stable_sort
are not marked constexpr
.Basically, we will wait for a constexpr
-ified standard library (at least for std::copy
and std::move
), then we can start to constexpr
-ify cpp-sort. The whole iter_move
thing might make things harder to do though. Anyway, this is definitely a future issue.
I often wondered whether there were use cases for constexpr
sorting algorithms. Here is a list of situations where sorting at compile-time might be useful:
std::set
which sorts its input and performs a binary search on lookup, which is more performant than the tree implementation (e.g. Frozen).Let's face it: in its current form, insertion_sorter
is a clusterfuck of bad design. The forward iterators version implements a binary insertion sort while the bidirectional (and thus random-access iterator) version(s) implement a linear insertion sort.
Both are useful: the linear one is used as a fallback when there are few elements to sort, and the binary one is useful when there are only a few misplaced elements. Different use cases. The obvious solution would be to have linear_insertion_sorter
and binary_insertion_sorter
, but I'd like to have one of them keep the name insertion_sorter
for brevity (it would probably be the binary version).
Having an implicit and an explicit name is probably a bad idea too.
wat am the do
After having given it some thoughts again, it's pretty obvious that binary insertion sort is actually useless. Not sure whether it didn't occured to me before. The solution is obvious: kill binary insertion sort and totally remove the insertion sort support for forward iterators; it's not worth it.
Currently, cpp-sort define two « concepts » for concepts: Sorter
and ComparisonSorter
. Generally speaking, a ComparisonSorter
can take any kind of comparison function to sort the collection, and other sorters simply don't take such a parameter. After a collection has been sorted with a non-comparison sorter, the following is assumed:
assert( std::is_sorted(std::begin(collection), std::end(collection), std::less<>{}) );
However, some non-comparison sorting algorithms also exist to sort a collection in reverse order for example. After any collection has been sorted in reverse order, the following is assumed:
assert( std::is_sorted(std::begin(collection), std::end(collection), std::greater<>{}) );
When a non-comparison sorting algorithm exists in several flavours (generally to sort a collection in different orders), would it be a good idea to provide several overloads for the corresponding sorter's operator()
? The overloads would not take a generic type for the comparison sorter, but use specific comparison functors to ensure that the following relations always holds:
cppsort::sort(collection, some_sorter{}, some_compare{});
assert( std::is_sorted(std::begin(collection), std::end(collection), some_compare{}) );
Real-life example: boost::sort::spreadsort
seems to offer a spreadsort to sort strings in reverse order. The current cpp-sort wrapper - string_spread_sorter
- does not provide any way to use this algorithm even though it's probably useful. The idea would be to add the following overload to string_spread_sorter::operator()
:
template<typename RandomAccessIterator>
auto operator()(RandomAccessIterator first, RandomAccessIterator last, std::greater<>) const
-> std::enable_if_t<
std::is_same<typename std::iterator_traits<RandomAccessIterator>::value_type,
typename std::string>::value
>;
Such an overload would allow the following sorter to use reverse_string_sort
when it is passed a collection of str::string
and an instance of std::greater<>
while falling back to pdq_sorter
any other type or sorter, transparently choosing the best algorithm:
using sorter = cppsort::hybrid_adapter<
cppsort::string_spread_sorter,
cppsort::pdq_sorter
>;
Of course, to be consistent, we would also need to provide overloads for std::greater<T>
, std::less<>
and std::less<T>
. While this seems like boilerplate, the job could partly be done in sorter_base
, making every sorter a ComparisonSorter
when called with std::less<>
. While it might introduce some incompatibilities, I doubt it will if implemented with care. Anyway, it would allow to hide even more algorithms under a common generic interface for free, so... it might be worth it.
Sometimes, cpp-sort tries to be too smart, for example when it tries to discriminate comparisons from projections to call the expected overload. When a sorter is given a single function that could be both a suitable comparison and projection function, the call will probably be considered ambiguous. Here are a few ways to solve this:
as_comparison
and as_projection
to help users to manually disambiguate the call.as_projection
. This would avoid transforming valid std::sort
calls into invalid cpp-sort calls in some corner cases.as_comparison
so that users can make their will explicit.We're going with option 4, here is the TODO list:
as_projection
as_comparison
cppsort::sort
and cppsort::stable_sort
While Valgrind is pretty cool, there can't be too many tools checking for potential undefined behaviour. We could add the undefined behaviour sanitizer to the Debug mode:
Unfortunately, it doesnt "just" work with Clang. hen I try to add -fsanitize=undefined
, I get the following error:
Linking CXX executable cpp-sort-testsuite
/usr/bin/ld: cannot find /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/lib/linux/libclang_rt.ubsan_standalone-x86_64.a: No such file or directory
/usr/bin/ld: cannot find /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/lib/linux/libclang_rt.ubsan_standalone_cxx-x86_64.a: No such file or directory
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [testsuite/cpp-sort-testsuite] Error 1
make[1]: *** [testsuite/CMakeFiles/cpp-sort-testsuite.dir/all] Error 2
make: *** [all] Error 2
It feels like some files are missing from the Linux VM on Travis, but I can't confirm for sure... Any help to solve this issue is welcome :)
There are several measures of presortedness in the literature to estimate how much sorted a collection already is. These measures are interesting because adaptative sorting algorithm may be optimal with respect to some measure of presortedness (for example, some algorithms runs in linear time when the collection is sorted or reverse-sorted).
Checking how much presorted a collection is before sorting is almost always a waste of time. However, having functions to measure presortedness may still be useful: for example, one might want to run such an algorithm on actual data for a period of time and choose a more appropriate sorting algorithm depending how presorted the data is on average. Basically, these functions can be useful probe tools for people trying to improve their programs. We could even imagine a small program anayzing actual data samples and recommending some specific sorting algorithm depending on the results of the measures.
Long story short: measures of presortedness can be a useful tool and we want some in the library. Now, we still need a list of useful measures to implement and a nice design.
The following measures of presortedness exist in the computer science litterature (from Right invariant metrics and measures of presortedness):
The easiest design is probably to have a function per measure of sortedness whose name corresponds to the measure name; this would make short names, including common ones (max
for example), so these functions probably belong to a nested namespace (e.g. cppsort::probe
). Of course, custom comparison functions as well as projections need to be handled too. Unlike sorters, I don't think there is any value in exposing presorter objects (and the name would be bad since they don't presort anything), but sorter_facade
could probably be reused to create function objects whose only one named instance would be exposed.
In its current state, the library produces rather unreadable error messages and can spawn several hundreds of lines of cryptic error messages for a simple missing const
without a single hint to what may be wrong. Honestly, this is a major drawback and the crazy amount of SFINAE insanity doesn't help.
We can't realistically put static assertions all over the place because many mechanisms in the library rely on SFINAE. There may be other ways to handle errors.
Some elements of answer:
typedef
s via inheritance instead of templates aliases to define sorters.using Sorters::operator()...;
makes its way to C++17, it might remove some annoying errors dues to recursion.operator()
to ensure that the iterator category of the parameter is correct. By the time the operator is called, all of the SFINAE checks should already have passed and the only remaining thing before the actual sort is an eventual tag dispatch. Unfortunately, I doubt that it is possible to automate this mechanism.hybrid_adapter
which would act as a fallback if none of the sorters are picked and trigger an explicit error. Getting the mechanism to work with nested hybrid_adapter
seems crazy hard though.static_assert
to the sorter in the simplest case in order to reduce the amount of SFINAE logs (see #113).To be honest the errors occuring when a const
is missing are the worst kind. There might be ways to require less const
everywhere while still providing it when needed.
A few resources about getting better error messages:
std::sort
works with move-only types. Actually, it requires the iterators to be ValueSwappable
, and the type of the dereferenced iterators to be MoveAssignable
and MoveConstructible
. We would realy like all of our algorithms to work with move-only types. The following sorters fail to handle move-only types:
quick_sorter
: it makes a copy of the pivot. There is probably a way to move the pivot to the beginning or the end of the partition and make sure it is never moved during this step. See pdq_sorter
for an equivalent mechanism.tim_sorter
: that one is trickier. It would need to make the original code work with move-only types. While adding the moves operations is pretty trivial, I'm not sure that doing it without precautions won't cause reads from moved-from values, which would be bad.block_sorter
: trying to trivially convert copy operations to move operations resulted in reads from moved-from objects. I guess that the algorithm actually needs to copy things in the cache; if so, making it work with move-only types will be tricky.Additionally, std::sort
is also required to work even if the type is not default-constructible. The following sorters currently do not work with such types:
block_sorter
: the cache is a fixed-size C array of size 512; initializing it default-initializes every element. We can probably get rid of that by using unitialized memory and std::raw_storage_iterator
. The latter has been updated for C++17 to be more move-friendly.Lists are interesting data structures because they can move elements by relinking nodes, which means that some operations become much cheaper: for example a rotate operation becomes O(1) while it is O(n) with the usual iterator abstraction. However, cpp-sort gives them no credit...
Currently the sorting algorithms in the library take either a pair of iterators or a full container, but even when they take a full container, they actually fall back on a container-agnostic iterator-based algorithm (there is a notable exception for self_adapter
). This follows the design of the algorithms in the standard library, and ensures that containers are not modified. Is the non-modification really desirable when you explicitly want to sort a container in place?
Dedicated list sorting algorithm exist, and the library already includes algorithms that could use the list structure to be more efficient, such as mergesort, insertion sort, stable insertion sort or melsort (some of them aren't quite there yet but in experimental branches). Efficiency does not always mean performance, but it generally does for big objects, and it also means that some sorters could explicitly sort lists of unmovable objects while maintaining stronger iterator invalidation guarantees; both might be desirable properties.
Now comes the real question: how do we design a proper interface for list sorting algorithms?
Since lists don't expose their nodes in any standard way, we obviously can't make iterator-based algorithms special wrt list iterators since most of the interesting operations rely on splice
, and splice
needs to know the actual list instance(s). Lists splicing will be the poor man's node relinking.
We could simply add overloads to operator()
for std::list<T>&
and/or std::forward_list<T>&
in sorters that are designed to handle lists, but that would create a dichotomy between the iterator overloads and the range overloads, which isn't desirable. Moreover the benchmarks revealed that some some of the iterator-based algorithms such as merge_sort
are actually faster than std::list::sort
for small types such as int
, so one might want to have both.
Basically, list-aware algorithms should be opt-in, which screams for an adapter. A simple design would be to add a container_aware_adapter
that simply provides overloads for some specific containers (a simple list_adapter
would be a bit specific; user-defined containers can benefit from it). That said, for such an adapter to be sexy, it would need to take into account user-defined overloads of the sorting algorithms for their own containers when needed; there might be an interesting mix of ADL and some other tricks to hook into there. For example, we could use sort
as a customization point when it takes a sorter and a container (we would need sort
to be in in the container's namespace though), and someone could define their specific algorithm like this:
namespace cool_lib
{
struct cool_list { /* ... */ };
// ADL-found function
void sort(cppsort::insertion_sorter, cool_list& list, /* [compare, projection] */)
{
/* ... */
}
}
In the example above, container_aware_adapter<insertion_sorter>
would find the sort
function taking a cool_list&
and use it to sort the list. Of course this design has to be validated through experience, but it sounds rather cool.
If we follow the direction above, it's either the container or the sorter's author responsibility to write the sort
function. I guess it doesn't play well with other adapters too; we could make an equivalent ADL-found stable_sort
for stable_adapter
, but that's pretty much it. It could solve the current problem of explicitly specializing cppsort::stable_adapter
though, which is a rather ugly solution.
container_aware_adapter
would probably have to detect/generate overloads depending on the presence of comparison and projection function, making it a sorter_facade
nightmare bis.
On a side note, while finding sort
(and potentially stable_sort
) in other namespaces via ADL can be done, it is far too hard to find such functions in the cppsort
namespace because of the generic cppsort::sort
. We have to specialize container_aware_adapter
for the provided sorters, with direct overloads of operator()
for std::list
and std::forward_list
. As long as other users don't try to find the specific sort
functions via ADL by themselves, it shouldn't be a problem.
The usual TODO list:
container_aware_adapter
.sort
for classes in other namespaces.std::list
and selection_sorter
.std::forward_list
and selection_sorter
.std::list
and merge_sorter
.std::forward_list
and merge_sorter
.std::list
and insertion_sorter
.std::forward_list
and insertion_sorter
.The intersection between container-aware algorithms and stable algorithms will probably be for another issue.
Sorters are a useful abstraction but users might not like to use them directly. Would it be a good idea to provide a default instance for every sorter?
auto selection_sort = selection_sorter{};
auto pdq_sort = pdq_sorter{};
auto tim_sort = tim_sorter{};
// etc...
It would reduce the cognitive load for the simplest use case and hide the sorters abstraction while still allowing potential users to use sorters just like they can today. A few additional questions:
Do we provide default_sort
? I guess that cppsort::sort
is already enough to invoke the default sorter.
How do we handle buffered sorters and future sorters that take template parameters? We could use variable templates but the buffer provider abstraction is arguably harder to understand than the sorter abstraction:
template<typename BufferProvider = /* ... */>
auto block_sort = block_sorter<BufferProvider>{};
We could only use the default buffer provider to hide the abstraction from users:
auto block_sort = block_sorter<>{};
Or we could make it a function with a default template parameter that simply forwards everything to the buffered sorter:
template<typename BufferProvider=/*...*/, typename... Args>
auto block_sort(Args&&... args)
-> decltype()
{
return blocke_sorter<BufferProvider>{}(std::forward<Args>(args)...);
}
This solution allows to keep the flexibility while hiding it, but then block_sorter
is not an object anymore, which might be desirable (really? why? does it impact parameter passing to functions?). This last problem could be mitigated by the adoption of P0119, even though it still wouldn't be perfect.
Last but not least: we could add a template parameter to a buffered sorter's operator()
defaulted to the sorter's one if not provided. That way, an instantiated bufferred sorter would look like a regular sorter but we could pass a buffer provider to it at the call point; we can have the cake and eat it too. Shall we simply move the class-level template parameter to operator()
? Not sure: one might want to specify the buffer provider only once, but we then hurt problems related to issues #40 and #44... Except apparently it does not work: calling operator()
with the regular function call syntax doesn't allow to pass explicit template parameter.
What should we do about the files hierarchy? The simple solution would be to provide those helper functions in the sorter file, but that might expose the sorter abstraction directly in the file name, which might not be desirable if the goal is to make the library simpler to use. Should we then rename the sorter
directory to sort
and sorters.h
to sorts.h
even though the name is really close to sort.h
? Naming really is difficult :/
Everything should probably borrow the customization points design to avoid ODR problems, but that shouldn't be a real problem (might still not possible/needed depending on how we solve the issue for buffered sorters):
namespace
{
constexpr const auto& quick_sort =
detail::static_const<quick_sorter>;
}
We could provide a global cppsort::stable_sort
akin to cppsort::sort
, which uses stable_adapter<default_sorter>
when no sorter is provided, and which uses stable_adapter<Sorter>
whenever a specific sorter is given.
I would be great if one could chain projections, for example with operator|
since it's already used to chain transformations in range-v3 and Boost.Range:
cppsort::sort(collection, proj1 | proj2);
Several questions that can be answered as we go:
proj1
first then proj2
or the opposite? IIRC if we follow range-v3, it will be proj1
first then proj1
.then
, hence proj1
is applied before proj2
.sorter_facade
or projection
base class to provide the mechanism?operator|
accept something like &user::name | lower
with bidirectional overloads?After having forgotten to think about it for the last 3 years, the operator|
design sounds good enough. The following set of features is probably enough to get something working:
projection_interface
from which projections inherit, which implements operator|
.operator|
works in both directions with any suitable callable.utility::as_projection
so that the returned function object inherits from projection_interface
and can benefit from operator|
.While is_stable
is okayish, some components of the library don't play well with it. For example, self_sorter_adapter
always considers that an algorithm is unstable because it has no way tell whether the sort
function of a class is stable or not. Whether the algorithm called is stable or not will actually depend on what arguments it is called with.
Therefore, is_stable
should be changed to look more like std::is_callable
(breaking change, but whatever). Namely, the trait should have the form is_stable<Sorter(Args...)>
and tell whether Sorter
called with Args...
will call a stable sorting algorithm.
For self_sort_adapter
, it means that when the class to sort doesn't have a sort
method, the stability is known: it is that of the wrapped sorter, and the new form of is_stable
can provide this information. When a sort
method exists, is_stable
would still be std::false_type
until a better way to solve the problem is found. A partial solution for the latter problem would be to specialize is_stable<self_sort_adapter<Sorter>(Args...)>
to mark it stable for std::list&
and std::forward_list&
. There shouldn't any cv-ref qualification to handle for those ones considering that only lvalue references to them shall be sortable.
Ideally, sorter's authors should still be able to write using is_stable = std::true_type;
without having to specify more parameters when they want their sorter to be unconditionally stable or unstable. Not sure whether it can be done.
We need to check how it would interact with the rest of the library (the implementation of stable_adapter
may have to change), but in the grand scheme of things, it certainly makes things better.
Swiggity sweety lovely TODO listy:
is_stable
to is_always_stable
.is_stable
"functional" type trait.container_aware_adapter
counting_adapter
hybrid_adapter
indirect_adapter
schwartz_adapter
self_sort_adapter
std::list
and std::forward_list
small_array_adapter
stable_adapter
The Ranges TS (and the Adobe Source Libraries before it) adds a projection parameter to standard algorithms where a projection is an invocable object that allows to trivially transform input data on the fly during the execution of the algorithm.
It would be interesting to conditionally support projections in cpp-sort. Just like comparison objects, projections support would be optional in an algorithm. I still know whether non-comparison sorts can support projections but I think they can't in any reliable way, so the idea would be that sorters supporting custom comparisons could also optionally support projections:
struct spam_sorter_impl
{
template<
typename Iterator,
typename Compare = std::less<>,
typename Projection = /* some identity projection */
>
auto operator()(Iterator first, Iterator last,
Compare comp={}, Projection proj={})
-> void
{
return spam_sort(first, last, comp, proj);
}
};
struct spam_sorter:
cppsort::sorter_facade<spam_sorter_impl>
{};
Is it a good idea? How to support and discriminate everything between Sorter
, Compare
and Projection
from cppsort::sort
? Looks like another dirty layer of SFINAE might solve the problem. Moreover, sorter_facade
should probably add default overloads of operator()
to handle the utility::identity
function object. Getting everything right looks like an exponential problem that concepts would only partly solve.
TODO list to get projection parameters into the library (see the projections
branch):
utility::identity
function object to the library.sorter_facade
to handle projection parameters.sorter_facade
generate overloads to handle utility::identity
out of the box even when the underlying algorithm doesn't handle projection parameters (the same way it currently handles std::less<>
).sorter_facade
accordingly.cppsort::sort
to handle projection parameters.bubble_sorter
tutorial to show how to handle projection parameters.The testsuite currently always randomizes the data to sort, which is arguably bad, but it sometimes helped to find bugs. However, some algorithms are known to depend on the data distribution, and some of them fall back to other sorting algorithms when the data doesn't have specific patterns, making it hard to test their specific behaviour.
The testsuite needs more tests to test the sorters with specific distributions to make sure that the algorithms are not only safe for random data.
The current fixed-size sorting_network_sorter
uses size-optimal sorting networks: sorting networks that use the smallest known number of comparators. However; other sorting networks may be interesting:
A first step would be to rename sorting_network_sorter
into size_optimal_network_sorter
and to introduce the new fixed-size depth_optimal_network_sorter
and swap_optimal_network_sorter
. However, the names are a bit long and typing them might be a bit tedious. An open question would be: while every sorter has an optimality criterion, what should be the second criterion? For example, size_optimal_network_sorter
uses size-optimal networks, but when there are several size-optimal networks, should it use the most depth-optimal one or the most swap-optimal one?
A more evolved solution would be to keep sorting_network_sorter
and give it an additional template parameter Optimality
corresponding to an enum struct
with the values size
, depth
and swaps
. Even more complete: make it take two parameters for optimality (variadic in a far future?) to choose the most important criterions for optimality. From a back-end point of view, there would be a collection of sorting networks with associated size
, depth
and swaps
properties and we could plug an insane template system to dispatch a call to the sorting network whose properties are the closest to those wanted (inb4 madness). Unfortunately, passing more template parameters to a fixed-size sorter means that it doesn't compose well with small_array_adapter
(how do you pass the non-size parameters? No matter how you see it it seems ugly), so a clean solution should be found to mitigate everything.
Now that issue #36 is closed, sorters implementers don't need to manually handle projections: sorter_facade
will aggregate a comparison and a projection function in a projection_compare
class template. While this is great per se, it actually raises new interesting questions: is it better to manually handle projections or to let projection_compare
handle them? A few numbers to take into account:
std::greater<>
and std::negate<>
at once, merge_sorter
was clearly faster when projection_compare
handled the projection than when the projection was manually handled. It probably has to do with the fact that mergesort is recursive and a lot of comparison and projection objects are passed down the call stack. std::greater<>
and std::negate<>
are empty classes, so projection_compare<std::greater<>, std::negate<>>
probably takes advantage of EBCO and we end up passing fewer objects around. Moreover, utility::as_function
is only applied once. I would expect the compiler to optimize the transformation away but maybe it actually has a cost. Update: apparently it is not the case anymore.quick_sorter
was globally faster when projections were handled manually (it's also a good sorter to perform such tests since it's recursive). It probably has to do with the fact that we only need to project a value only once before each partitioning operation when manually handling projections, while this very projection is done again and again when projection_compare
handles it.insertion_sorter
didn't seem to be better with manual projection handling, but wasn't better with automagic projection handling either.Now, before making a decision, several things need to be considered: we need to check how an algorithm performs when given nothing, a comparison, a projection, or both at once. If there is a clear winner, then go for it, otherwise we're probably fine letting projection_compare
handle the projections.
However, there are other creatures lurking in the dark and even more questions: what to do when an algorithm that performs better with automagic projections needs to call another algorithm that performs better with manual handling? Can we get the best of both?
I have a few ideas, but more tests are needed before a clear answer: projection_compare
only stores the transformed projection, but assuming that either the non-transformed or the transformed projection weights nothing (I believe it's often the case), then storing it too shouldn't cost anything more, and the non-transformed function could then be retrieved thanks to a getter. A good heuristic would be to create another class to pack the additional function, let's call it projection_compare_full
and make make_projection_compare
return a projection_compare_full
instance instead of a projection_compare
one if the additional information weights nothing.
Going further, we could then add overloads to sorter_facade::operator()
that take a projection_compare_full
instance and dispatch the comparison and the projection functions to the sorter implementation iff it manually handles projections (if it accepts a dedicated projection parameter). That way, we could call a sorter that performs better with a projection_compare
object but that can still call a another sorter that performs better with manual projection handling, while still getting the best of both.
Also, that design can be considerably simpler if as_function(as_function(projection))
is always guaranteed to be equivalent to as_function(projection)
. In such a case, we could just store as_function(projection)
in projection_compare
, returning it when a projection is needed, and totally get rid of projection_compare_full
.
It seems that as_function(as_function(projection))
always has the same type than as_function(projection)
, so we can just go ahead and already give accessors to projection_compare
and drop a part of the overly complicated design.
Apparently CMake 3.2+ is available in george-edison55-precise-backports which is whitelisted for the container-based infrastructure. It could be an interesting idea to use it to benefit from some of the recent additions such as the super-cool CMAKE_CXX_STANDARD
.
For some reason the changes work perfectly with GCC, but CMake isn't able to find clang++-3.8 anymore, which is a bit bothersome... [Update: solved]
To sort an iterable, one currently needs to pass a pair of iterators or an lvalue of a container type. However, there are more and more view and span classes around in popular libraries (GSL, Library Fundamentals TS, Ranges v3), and cpp-sort can't handle them in the following scenario:
int arr[] = { /* ... */ };
cppsort::sort(gsl::span<int>(arr + 5, arr + 10));
This example is a bit artificial, but it highlights the problem: one might want to create a span on-the-fly and feed it to a sorting function. While we would expect it should be legal and devoid of surprises, it currently does not compile because cpp-sort only handles container lvalues and rejects temporaries (which is normal considering it was designed to sort standard library containers).
Just like Ranges v3 and the latest iteration of the Ranges TS, we should be able to take spans and sort the corresponding iterable. That said, it will probably introduce new classes of error harder to spot since it will probably accept temporary standard library containers and try to sort them.
Now, the usual TODO list:
utility::begin
and utility::end
functions to handle rvalues.utility::size
to handle rvalues.sort
and stable_sort
.merge_sorter
and quick_sorter
.sorter_facade
.bubble_sorter
example.utility::begin
and utility::end
.It's a bit tedious to implement, but more low_comparisons_sorter
specializations could use the Ford-Johnson merge insertion sort algorithm to reduce the number of comparisons performed. A research paper from 2007 says it is still unbeaten under 47 elements, which leaves some room for new specializations.
The algorithm variant described by M Ayala-Rincón in 2006 is more space-efficient and apparently easier to implement. It should be preferred to the original version.
It would be great if there was some algorithm to generate the code, but I don't know whether it's really feasible. Anyway, it would probably be better that the current binary insertion sort used for almost every specialization.
Update: currently low_comparisons_sorter
performs an optimal number of comparisons for the sizes 0 thru 8. Apparently, merge insertion sort is only needed for some sizes and the recursive binary insertion using smaller low_comparisons_sorter
specialization is enough for some sizes.
When using hybrid_adapter
, the most influential parameter in the overload resolution is the iterator category. However, in some cases, it might be interesting to rebind the iterator category of a sorter to a more specific category. Let's say that we have two sorters foo_sorter
and bar_sorter
whose iterator category is std::forward_iterator_tag
; foo_sorter
is better for forward iterators than bar_sorter
but bar_sorter
performs tag dispatch and has a more efficient sort for bidirectional iterators. Therefore, we would like to use foo_sorter
for forward iterators but bar_sorter
for bidirectional iterators. However, since both of them have the same iterator category, the first one to appear in the hybrid_adapter
will be picked for both forward and bidirectional iterators. One solution would be to have an utility class to rebind the iterator category of a sorter on the fly:
using sorter = hybrid_adapter<
foo_sorter,
rebind_iterator_category<
bar_sorter,
std::bidirectional_iterator_tag
>
>;
Of course, the rebind_iterator_category
would have to statically check that the required iterator category is compatible with the existing one. A static_assert
might be fine for this use case.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.