nvictus / pqdict Goto Github PK
View Code? Open in Web Editor NEWA Pythonic indexed priority queue
License: MIT License
A Pythonic indexed priority queue
License: MIT License
Hey,
I noticed that when I use pqdict.copy()
to copy over a pqdict
, it's abnormally slow.
I was able to track it down to the fact that copy()
calls __init__
, and it does this:
if data is not None:
self.update(data)
self.heapify()
Which re-heapify everything, causing it to go through the whole data structure. But we already have a heapified pqdict, so it's not necessary.
Instead you should just copy over the state (_heap
and _position
).
Here is some benchmarking/test script I created to show the problem:
import pqdict
import random
import timeit
import copy
def get_dict():
a = pqdict.pqdict()
for i in range(10_000):
a[random.random()] = random.random()
return a
def copy_dict(a):
return a.copy()
def copy_dict_test(a):
new_dict = pqdict.pqdict()
new_dict._heap = a._heap[:]
new_dict._position = a._position.copy()
return new_dict
def consume_dict(a):
while a:
assert a.popitem()[1]
a = get_dict()
print("Time it takes to copy using pqdict.copy():")
print(timeit.timeit(lambda: copy_dict(a), number=100))
print("Time it takes to copy by copying _heap and _position:")
print(timeit.timeit(lambda: copy_dict_test(a), number=100))
print("Time it takes to create the pqdict:")
print(timeit.timeit(lambda: get_dict(), number=100))
print("Time it takes to consume the pqdict after pqdict.copy():")
print(timeit.timeit(lambda: consume_dict(copy_dict(a)), number=100))
print("Time it takes to consume the pqdict after copy of _heap and _position:")
print(timeit.timeit(lambda: consume_dict(copy_dict_test(a)), number=100))
Would it be possible to re-implement the copy()
method? I can make a PR if you would like.
I'm curious if you have any ideas of the best way of finding the relative position of an element in the queue that you wrote here.
A naive way would be to use the function nlargest, iterate through that list and keep track of the position as you iterate through the list. This would be problematic if n items is big. Can you think of better approach?
I really like this package! When I use it from 3.7+ I get the Decprecation message:
...site-packages/pqdict/__init__.py:37: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
from collections import MutableMapping as _MutableMapping
I'll put in a PR for this soon.
Hi,
Thanks a lot for pqdict! It's very helpful!
I'm using mypy in strict mode, and created this stub file. Would you like to add it?
I guess that an even better exercise would be to add typing, but then you'd lose support for python 2.
Thanks!
Noam
from typing import TypeVar, Callable, Any, Iterable, Self, Iterator, Mapping
from collections.abc import MutableMapping
K = TypeVar("K")
V = TypeVar("V")
class pqdict(MutableMapping[K, V]):
def __init__(
self,
data: Any = ...,
key: Callable[[V], Any] | None = ...,
reverse: bool = ...,
precedes: Callable[[K, K], Any] = ...,
) -> None: ...
@property
def precedes(self) -> Callable[[K, K], Any]: ...
@property
def keyfn(self) -> Callable[[V], Any] | None: ...
@classmethod
def fromkeys(cls, iterable: Iterable[Any], value: Any, **kwargs: Any) -> Self: ...
def __len__(self) -> int: ...
def __contains__(self, key: object) -> bool: ...
def __iter__(self) -> Iterator[K]: ...
def __getitem__(self, key: K) -> V: ...
def __setitem__(self, key: K, value: V) -> None: ...
def __delitem__(self, key: K) -> None: ...
def copy(self) -> Self: ...
def top(self) -> K: ...
def popitem(self) -> tuple[K, V]: ...
def topitem(self) -> tuple[K, V]: ...
def additem(self, key: K, value: V) -> None: ...
def pushpopitem(self, key: K, value: V) -> tuple[K, V]: ...
def updateitem(self, key: K, new_val: V) -> None: ...
def replace_key(self, key: K, new_key: K) -> None: ...
def swap_priority(self, key1: K, key2: K) -> None: ...
def popkeys(self) -> Iterator[K]: ...
def popvalues(self) -> Iterator[V]: ...
def popitems(self) -> Iterator[tuple[K, V]]: ...
def heapify(self, key: K = ...) -> None: ...
def topvalue(self, default: V = ...) -> V: ...
PQDict = pqdict
def minpq(*args: Any, **kwargs: Any) -> pqdict[Any, Any]: ...
def maxpq(*args: Any, **kwargs: Any) -> pqdict[Any, Any]: ...
def nlargest(n: int, mapping: Mapping[K, V], key: Callable[[V], Any] | None = ...) -> list[K]: ...
def nsmallest(n: int, mapping: Mapping[K, V], key: Callable[[V], Any] | None = ...) -> list[K]: ...
Is it possible that in the 'key' parameter, I specify the keys of the dictionary as the priority function?
Example:
a_dict = {'b': 1, 'a': 3 , 'c': 2}
Instead of:
my_pqdict = {'b': 1, 'c': 2, 'a': 3} ... ordered by value
I obtain this:
my_pqdict = {'a': 3, 'b': 1, 'c': 2} ... ordered by key
It would really be helpful if the priority queue dictionary can do so.
The CI only covers versions 3.8 and above, and 3.6 reached EOL a while ago.
I didn't actually go through the code but I noticed that the package has the classifiers for the old versions.
Hi,
Thanks for creating this simple yet useful library.
While using the lib, I find that sometimes we need to find the value of pq[pq.top()]
or pq.topitem()[1]
where pq
is non-empty maxpq
.
In general, I think, It will be useful to add a function .topvalue()
in pqdict
API.
Let me know your thoughts, and if you are okay, I will create a PR for this.
Thanks
Hi, I'm considering using this data struct for a project. The exposed interfaces (heap queue + dictionary access) and implementation are appealing, but it looks like it may not be thread-safe.
Any suggestions on what it would take to implement thread-safe behavior for the pqdict object? I plan to have mostly multiple writers at any given time. I might have to introduce a lock variable and add it to the main methods in the class whenever the items are added or removed.
Traceback:
Traceback (most recent call last):
File " ", line 4, in
from pqdict import PQDict
File " ", line 98, in
class PQDict(MutableMapping):
File " ", line 85, in new
cls = super().new(mcls, name, bases, namespace, **kwargs)
ValueError: 'create_entry' in slots conflicts with class variable
I am getting these failures when used from Python 3.8.3:
$ nosetests test
EE......E.FE.E.EE.E.EEEEEEEEEEEEE.EEEEEEEEEEEEEEEEEEEE.EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE.EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE.EEEEEEEEEEEEEEEEEE.....SSSSSSSSSSSSSSSS.....................................S.............. C-c C-cS
======================================================================
ERROR: test.support.testresult.get_test_runner_class
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/rocky/.pyenv/versions/3.8.3/lib/python3.8/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
TypeError: get_test_runner_class() missing 1 required positional argument: 'verbosity'
======================================================================
ERROR: test.support.testresult.get_test_runner
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/rocky/.pyenv/versions/3.8.3/lib/python3.8/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
TypeError: get_test_runner() missing 2 required positional arguments: 'stream' and 'verbosity'
======================================================================
ERROR: test.audit-tests.test_open
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/rocky/.pyenv/versions/3.8.3/lib/python3.8/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/home/rocky/.pyenv/versions/3.8.3/lib/python3.8/test/audit-tests.py", line 201, in test_open
(open, sys.argv[2], "r"),
IndexError: list index out of range
-------------------- >> begin captured stdout << ---------------------
sys.addaudithook 139704741433248
... [hundreds of other lines]
I note however that when i use this package from 3.8.3 things work fine.
If you'd like me to investigate and suggest a fix/PR, let me know.
It seems to me we need to have "priority key function" argument in nsmallest
/nlargest
functions.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.