@dbjorge commented on Tue Dec 06 2016
The documentation for IComparer
(@docs.microsoft.com, @msdn) and IComparer<T>
(@docs.microsoft.com, @msdn) do not note whether any particular ordering properties are expected of a valid implementation, but at least List<T>.Sort
assumes some (antisymmetry, per this Connect feedback circa .NET 4.5).
The interface documentation should note what sort of ordering properties the rest of the standard library will be assuming/demanding of a Compare
implementation. I'd expect this to be the usual properties of a total order.
@CIPop commented on Tue Dec 06 2016
cc: @rpetrusha, @AArnott
@karelz commented on Tue Dec 06 2016
cc: @mairaw
@JonHanna commented on Tue Dec 06 2016
Is that a responsibility of IComparer<T>
or of the caller of Sort
(or OrderBy
etc.). If one was going to implement an order that wasn't total (e.g. XmlSchema's rules on comparing datetimes without time zones, would doing so in an IComaprer<T>
be disallowed?
@dbjorge commented on Tue Dec 06 2016
I would consider it reasonable to either:
- Explicitly demand a well-defined set of ordering properties as a contract of the
IComparer
interfaces
- Explicitly disclaim requiring any particular ordering guarantees as an interface contract, but include an informational warning that
IComparer
use sites may require certain ordering properties (with examples).
I think either of those would be a big improvement over not documenting it either way. I could be convinced that the former would be too big a breaking change to be acceptable.
In either case, I think it would improve clarity to also document it on the individual IComparer
-based framework methods.
@JonHanna commented on Tue Dec 06 2016
I think the warning is a good idea.
I think the idea of documenting the relevant methods, even more so.
@mairaw commented on Tue Dec 06 2016
I own these docs so I'd be the one making the updates. Just let me know who I should be working with to figure out what's the guidance we want to provide there. /cc @karelz
@karelz commented on Wed Dec 07 2016
@dbjorge @JonHanna can you please suggest which text to change to what? (ideally mention the diff or old & new text) Thanks!
@JonHanna commented on Wed Dec 07 2016
Could an extra page be added describing total ordering? A concise statement linking to it would probably be clearer and with less text added to the individual pages affected.
@mairaw commented on Wed Dec 07 2016
@JonHanna yes, it's possible to add a separate article explaining the concepts and then link to it from the affected pages.
@JonHanna commented on Mon Jan 09 2017
Total Order
A total order is a binary relation that has the following properties:
- If
a <= b && b <= a
then a == b
- If
a <= b && b <= c
then a <= c
a <= b || b <= a
In other words:
- For any two elements x and y, either we can decide that x should be sorted before y, that y can be sorted before x, or that they are equivalent in sort order.
- For any three elements, x, y and z. If we would sort x before y and y before z we would also always sort x before z. The same holds in that if we would consider x to sort equivalently to y and y to sort equivalently to z we would always consider x to sort equivalently to z.
Exceptions to Total Order
It can perhaps be easier to think of total order by considering cases that don’t have total order.
Floating point operators.
Because double.NaN <= double.NaN
returns false
, double.NaN < 0
returns false and 0 < double.NaN
returns false, we cannot use the arithmetic operators to determine a total order on a set of floating point numbers that includes NaN
.
XML Schema Datetimes without Time Zones
XML Schema defines some datatypes that are only partially ordered. One example is dateTime
which represents a date and time without a timezone (comparable to DateTime
in .NET). By the rules of XML Schema, 2001-03-29T12:00:57 is always sorted before 2001-04-23T09:17:39 and 2001-04-23T11:31:09 is always sorted before 2001-04-29T23:09:10 but 2001-04-23T09:17:39 can be considered neither before nor after 2001-04-23T11:31:09.
Not knowing the time zone, it could be e.g. 2001-04-23T09:17:39 in Copenhagen compared to 2001-04-23T11:31:09 in Dublin (earlier), or 2001-04-23T09:17:39 in Montreal compared to 2001-04-23T11:31:09 in Bangalore (later).
To offer a dependable ordering within the international context XML Schema time-related data types consider, one dateTime
can only be considered as sorting before another if that holds for any of a possible range of timezones from -14:00 to +14:00 to have full confidence that the ordering would hold.
Importance or Total Order to Sort operations.
Many methods, especially sorting methods, that take an IComparer
or IComparer<T>
require that the Compare()
methods of those objects give a total order. Failure to do so can result in the results not being sorted at all, exceptions, or even infinite loops. Examples include List<T>.Sort()
, Array.Sort()
and Enumerable.OrderBy()
.
Other methods may have similar issues, such as those that find minima and maxima, such as Linq's Min()
and Max()
extension methods.
Providing a Total Order on Partially Ordered values.
If you want to sort values that are not normally totally ordered, you need to define a total ordering for it. For example, as mentioned above double
is not totally ordered because NaN
cannot be placed into any meaningful position by the <
, <=
, =
, >
and >=
operators. Comparer<double>.Default
, Enumerable.Min()
and Enumerable.Max()
all solve this problem by adding their own rule that sorts NaN
before any other value. This results in the three properties mentioned at the start of this article applying to the ordering.
An approach that works for many cases, is to add a tie-breaker rule onto a partial ordering. This is used here, with most comparisons being the same as with <
but the "NaN equal to NaN and before everything else" providing that tie-breaking rule that makes it a total order.
@karelz commented on Wed Dec 14 2016
@ianhays @Priya91 can you please review the text as area experts? (or pull in appropriate folks please)
@ianhays commented on Thu Dec 15 2016
@JonHanna that's a great doc on total order. 👍very concise and informative. The DateTime section is a bit verbose but explains the reasoning behind the exception to total order well.
@JonHanna commented on Thu Dec 15 2016
I split up the sentences and hopefully clarified a bit (sometimes attempts at concision end up reading more verbose than otherwise) and added a sentence at the end about using tie-breaking rules in creating total orders out of parial.
@JonHanna commented on Thu Dec 15 2016
Also fixed the copy-pasting in the example dates, as I had Montreal and Bangalore in time zones over a week apart 😁
@svick commented on Fri Dec 16 2016
@JonHanna
EqualityComparer<double>.Default
, Enumerable.Min()
and Enumerable.Max()
all solve this problem by adding their own rule that sorts NaN
before any other value.
Did you mean Comparer<double>.Default
? EqualityComparer
does not define ordering.
@JonHanna commented on Fri Dec 16 2016
@svick I did indeed. Fixed.
@ianhays commented on Mon Feb 13 2017
@mairaw did you get a chance to take a look at this?
@mairaw commented on Mon Feb 13 2017
Not yet @ianhays. I'm completely swamped at the moment with the .NET Core release.
@ianhays commented on Mon Feb 13 2017
Gotcha, thanks for the update :)