Comments (25)
I wont have time to look into this any time soon, but patches are welcome. :)
from containers.
I second that. Patches are welcome :-)
from containers.
What's the story with all those #if __GLASGOW_HASKELL__ >= ...
? Is that preferred over #if MIN_VERSION_BASE(...)
(although I saw that as well)?
from containers.
I think MIN_VERSION_base
is more correct.
from containers.
The members of Foldable
class are probably dependent on base
version, not on GHC
version (even in little people use different base versions) -- would it be possible to change the __GLASGOW_HASKELL__
to __MIN_VERSION_base__
? We use __MIN_VERSION_base__
for example in IntSet/Base.hs
, in the following way:
-- We want to be able to compile without cabal. Nevertheless
-- #if defined(MIN_VERSION_base) && MIN_VERSION_base(4,5,0)
-- does not work, because if MIN_VERSION_base is undefined,
-- the last condition is syntactically wrong.
#define MIN_VERSION_base_4_5_0 0
#ifdef MIN_VERSION_base
#if MIN_VERSION_base(4,5,0)
#undef MIN_VERSION_base_4_5_0
#define MIN_VERSION_base_4_5_0 1
#endif
#endif
Also, could you add {-# INLINE #-}
pragma for the new class methods? We have it for the other Foldable methods so that when the Foldable
dictionary is unknown, the method is replaced at compile time.
from containers.
I can do the __MIN_VERSION_base__
change, but that would effectively mean adding significant amount of CPP boilerplate in additional modules for the base-4.6 and base-4.8 boundaries. Can't we just use a commonly included .h
file, and then simply just define a MIN_VERSION_base(x,y,z)
macro manually there?
To be more specific:
#ifndef MIN_VERSION_base
#include "../../macros.h"
#endif
whereas macros.h
would construct a proper MIN_VERSION_base(a,b,c)
macro based on the __GLASGOW_HASKELL__
value.
As for the INLINE
, I'm happy to do that if it's useful, but I'm totally confused about
We have it for the other Foldable methods so that when the Foldable dictionary is unknown, the method is replaced at compile time.
Did you really mean "unknown"? Why doesn't GHC automatically inline such a trivial function? can you give a simple code example where I can see the effect you're describing?
from containers.
Can't we just use a commonly included .h file, and then simply just define a MIN_VERSION_base(x,y,z) macro manually there?
Yes, that would probably be best. The base_version.h
would look something like
#ifndef MIN_VERSION_base
#ifdef __GLASGOW_HASKELL__
... try defining MIN_VERSION_base according to __GLASGOW_HASKELL__
#else
... define MIN_VERSION_base to be 0 for every version (or maybe return 1 for minimal base version specified in cabal)
#endif
And then do #include base_version.h
every needed module (no need to include it conditionally). BTW, we have already include-dir: include
in the cabal file.
(When you are at it, could you please also use this file in IntSet/Base
and remove the MIN_VERSION_base definitions from that file? Thanks.)
Did you really mean "unknown"? Why doesn't GHC automatically inline such a trivial function? can you give a simple code example where I can see the effect you're describing?
Sorry, I was rewriting the sentence and left the negation there. What I really ment was known. I understand that GHC may inline it automatically but I would still prefer to explicitly spell out the {-# INLINE #-}
pragma -- one of the issues I am not sure about is what will happen if the right side is itself {-# INLINE #-}
(for example foldl
or foldl'
) -- then it may look like non-trivial function, but we still want to inline it.
from containers.
Is it really worth being able to compile without cabal? If you really want to you can use
ghc -O2 Data/Map -optP-include -optPdist/build/autogen/cabal_macros.h
and only run cabal once to generate that file.
(Although, if this makes the containers package not work with the Emacs haskell-mode and/or ghc-mod, that would be a shame.)
from containers.
@tibbe, Fwiw, ghc-mod
mentions explicit support for cabal_macros.h
in its changelog, and haskell-mode
does support Cabal via cabal repl
from containers.
I'll leave it up to @foxik.
from containers.
I frequently load the containers files directly to ghci, so I would be happy if I could continue doing that.
But maybe it would be enough just to add
#ifndef MIN_VERSION_base
#define MIN_VERSION_base(x,y,z) 0
#endif
which I can do myself.
So just use MIN_VERSION_base(x,y,z)
please and I will add and comment these no-min-version-base-what-should-I-do-oh-my-oh-my defaults.
from containers.
ignore that updated commit, it's still missing the INLINE
s
from containers.
It is a shame we cannot implement logarithmic elem
for Set
. Nevertheless, is there any reason why we could not implement minimum
and maximum
for Map
and IntMap
?
Also, could we implement sum
and product
using foldl'
(instead of foldMap
which is used in the default definition)?
Moreover, even if we cannot implement logarithmic elem
, maybe we could implement optimized linear version for Set
.
Thoughts?
from containers.
- As for
elem
onSet
: we'd need to capture theOrd
dict. It's possible but I'm not sure if it's worth it. minimum
/maximum
onMap
/IntMap
does not refer to the keys but rather the valuessum
/product
: I think you're right; we should usefoldl'
/foldr'
instead (I wonder why the default impl usesfoldMap
from containers.
@hvr: The default definition is set up to maximally avoid bottoms. You have to fear infinite-snoc-list-like structures when infinitely left associating trees. You don't have that problem here, so going foldl
or foldr
works fine -- all maps are finite.
Strictifying sum
is one of those design decisions that putting it in the class leaves open to implementors.
Changing sum
to be strict by default everywhere would be a big jump and ran into resistance last time it was suggested as I recall, but folks can chip away at it a project at a time. ;)
As a thought exercise, if not a practical approach, there /is/ an option to get a log time elem
for Set, but you'd probably have to make every other operation pay a tax.
Basically what you'd need to do is differentiate the root of the Set
, so we can know it is either
data Set a = Empty | NonEmpty (Set' a)
And in the NonEmpty case you'd store
data Set a = Empty | NonEmpty (a -> a -> Ordering) (Set' a)
or
data Set a where
Empty :: Set a
NonEmpty :: Ordering a => Set' a -> Set a
and have the existing Set (probably with the Empty case stripped out) moved to Set'
. The problem is removing the Empty case as i recall might require you to change the way some of the tree algorithms fold back up, and probably incurs a non-trivial cost.
Once you have that it is actually possible to do the elem
search in log time, because you have access to the dictionary.
The cost is constant factors everywhere and that a couple of mapMonotone like functions need to pick up a previously missing Ord
constraint.
from containers.
This does not look like a clean approach. I missed the missing-Ord
complication when I suggested adding elem
to the Foldable
class. I think that was an error on my part. elem
should probably just be a function (the cases where it can be optimized cleanly seem ... unlikely). What the Foldable
class should really have are methods elemOrd :: Ord a => a -> f a -> Bool
and (perhaps, with difficulty) elemHash :: Hashable a => a -> f a -> Bool
.
from containers.
I don't think that having elem<Class> :: Class a => a -> f a -> Bool
for each class(combination) would scale. Also it would leak implementation details making it more difficult to write generic code, as you can't know whether elemHash
or elemOrd
will be more efficient.
from containers.
@ekmett I assume you pull the Empty
out in
data Set a where
Empty :: Set a
NonEmpty :: Ordering a => Set' a -> Set a
in order to accommodate the
empty :: Set a
constructor w/o having to add an Ord a =>
constraint there. But what about the existing
singleton :: a -> Set a
constructor?
from containers.
Ah. In the version of this I built I had that one separated as well. That is what I get for doing this from memory. =)
from containers.
This is a general issue -- for all containers, we pass the dictionary to every function, instead of storing it in the structure. If we stored it in the structure, it would also allow us to use user-specific instance (i.e., user specified hash
or reversed ordering for toList
return decreasing list).
We might face this issue again in the future when we have Modifiable
or Insertible
or similar.
Nevertheless, if we changed it for Set
, we should maybe change it for all containers, which could result in quite different API :-)
BTW, I am thinking for some time that we should separate the singleton case for Set
and Map
. I have some benchmarks showing both memory reduction (20-30%) and speedup (5% for most operations, 10% for folds). But the code gets more complicated, especially in union
-like methods and in rebalancing, where we decompose two trees, so we have up to 9 instead of 4 cases.
BTW2, we could create specialized versions of Foldable.elem
which would kick in if the Foldable
dictionary is known. I am not that familiar with rules and specialization, so I am not sure if it is possible to say for everything with known Ord instance do this or if we had to explicitly spell out basic types like Int
etc.
@hvr Oh, thanks for the minimum
/maximum
, I did not realize it. I believe we should create specialized versions using foldl'
for all these methods (including elem
), but I can do it myself if you want.
from containers.
I don't think we should worry so much about elem
. It's a very uninteresting method in Foldable
as it cannot be guaranteed to be better than O(n). We ought to have a FiniteMap
or similar class which contains an elem
that can be constrained by some class constraint (using constraint kinds) and have map-like types implement that type class.
Nevertheless, if we changed it for Set, we should maybe change it for all containers, which could result in quite different API :-)
Also in quite different performance as GHC will no longer be able to do call-site specialization (i.e. INLINABLE
specialization.)
from containers.
Oh, thanks for the minimum/maximum, I did not realize it. I believe we should create specialized versions using foldl' for all these methods (including elem), but I can do it myself if you want.
That'd be great; ...so are you taking it from here?
from containers.
@hvr Yes, I am. But please do not remove your commit until I merge it and close this issue :-)
from containers.
I got myself confused last night. Now I realize that the problem implementing elem
for Set
is entirely the fault of Set
(well, the fault of Haskell 98, really, but in a GHC context, Set
), and Set
should bear the blame for that problem. In the GHC world, it would be much better to give Set
a more constrained type with a GADT.
from containers.
Done. I merged the original proposal and added specializations to other base-4.8 foldable methods. The elem
could be probably implemented using Foldable.fold
, but I spelled out the specialization.
from containers.
Related Issues (20)
- Set difference and union in one HOT 4
- foldTree does not optimize well HOT 2
- Tree fusion HOT 16
- Fusible Set.fromDistinctAscList definition HOT 10
- Fusible IntSet.fromDistinctAscList definition HOT 3
- NonEmpty for CyclicSCC HOT 11
- better instance Hashable IntSet? HOT 8
- Unusual definition of foldrBits and foldlBits HOT 3
- Unnecessary CPP and C header in `Data.Map.Internal.Debug.html`?
- Release for GHC 9.8.1 HOT 17
- feat request: Add `popLeftWithValue` and `popWithValue` in `Data.Sequence` HOT 5
- Data.Graph: detect cycles utility functions HOT 2
- Data.Map.mergeWithKey doesn't match documentation
- Flag to introduce pedantic invariant checks? HOT 2
- Map.unionWith is over specialized and not consistent with intersectionWith HOT 10
- Add `flattenSCC1 : SCC vertex -> Data.List.NonEmpty.NonEmpty vertex` HOT 2
- Data.Map.Internal does not export insertMin
- Repo: remove merged branches?
- Contribution guide outdated?
- Potential memory optimization for IntMap and IntSet HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from containers.