nvictus / pqdict Goto Github PK

View Code? Open in Web Editor NEW

81.0 2.0 27.0 986 KB

A Pythonic indexed priority queue

License: MIT License

Python 98.94% Makefile 1.06%

priority-queue dictionary indexed python sorted-dict

pqdict's People

Contributors

Stargazers

Watchers

pqdict's Issues

pqdict.copy() causes slow/un-necessary heapsort of all the data.

Hey,

I noticed that when I use pqdict.copy() to copy over a pqdict, it's abnormally slow.

I was able to track it down to the fact that copy() calls __init__, and it does this:

        if data is not None:
            self.update(data)
        self.heapify()

Which re-heapify everything, causing it to go through the whole data structure. But we already have a heapified pqdict, so it's not necessary.

Instead you should just copy over the state (_heap and _position).

Here is some benchmarking/test script I created to show the problem:

import pqdict
import random
import timeit
import copy

def get_dict():
    a = pqdict.pqdict()
    for i in range(10_000):
        a[random.random()] = random.random()
    return a

def copy_dict(a):
    return a.copy()

def copy_dict_test(a):
    new_dict = pqdict.pqdict()
    new_dict._heap = a._heap[:]
    new_dict._position = a._position.copy()
    return new_dict

def consume_dict(a):
    while a:
        assert a.popitem()[1]

a = get_dict()

print("Time it takes to copy using pqdict.copy():")
print(timeit.timeit(lambda: copy_dict(a), number=100))
print("Time it takes to copy by copying _heap and _position:")
print(timeit.timeit(lambda: copy_dict_test(a), number=100))

print("Time it takes to create the pqdict:")
print(timeit.timeit(lambda: get_dict(), number=100))
print("Time it takes to consume the pqdict after pqdict.copy():")
print(timeit.timeit(lambda: consume_dict(copy_dict(a)), number=100))
print("Time it takes to consume the pqdict after copy of _heap and _position:")
print(timeit.timeit(lambda: consume_dict(copy_dict_test(a)), number=100))

Would it be possible to re-implement the copy() method? I can make a PR if you would like.

Position of element in queue

I'm curious if you have any ideas of the best way of finding the relative position of an element in the queue that you wrote here.

A naive way would be to use the function nlargest, iterate through that list and keep track of the position as you iterate through the list. This would be problematic if n items is big. Can you think of better approach?

Squelch a 3.7+ Decprecation warning (and breakage in 3.9)

I really like this package! When I use it from 3.7+ I get the Decprecation message:

 ...site-packages/pqdict/__init__.py:37: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
    from collections import MutableMapping as _MutableMapping

I'll put in a PR for this soon.

Add typing support?

Hi,

Thanks a lot for pqdict! It's very helpful!

I'm using mypy in strict mode, and created this stub file. Would you like to add it?

I guess that an even better exercise would be to add typing, but then you'd lose support for python 2.

Thanks!
Noam

from typing import TypeVar, Callable, Any, Iterable, Self, Iterator, Mapping
from collections.abc import MutableMapping

K = TypeVar("K")
V = TypeVar("V")

class pqdict(MutableMapping[K, V]):
    def __init__(
        self,
        data: Any = ...,
        key: Callable[[V], Any] | None = ...,
        reverse: bool = ...,
        precedes: Callable[[K, K], Any] = ...,
    ) -> None: ...
    @property
    def precedes(self) -> Callable[[K, K], Any]: ...
    @property
    def keyfn(self) -> Callable[[V], Any] | None: ...
    @classmethod
    def fromkeys(cls, iterable: Iterable[Any], value: Any, **kwargs: Any) -> Self: ...
    def __len__(self) -> int: ...
    def __contains__(self, key: object) -> bool: ...
    def __iter__(self) -> Iterator[K]: ...
    def __getitem__(self, key: K) -> V: ...
    def __setitem__(self, key: K, value: V) -> None: ...
    def __delitem__(self, key: K) -> None: ...
    def copy(self) -> Self: ...
    def top(self) -> K: ...
    def popitem(self) -> tuple[K, V]: ...
    def topitem(self) -> tuple[K, V]: ...
    def additem(self, key: K, value: V) -> None: ...
    def pushpopitem(self, key: K, value: V) -> tuple[K, V]: ...
    def updateitem(self, key: K, new_val: V) -> None: ...
    def replace_key(self, key: K, new_key: K) -> None: ...
    def swap_priority(self, key1: K, key2: K) -> None: ...
    def popkeys(self) -> Iterator[K]: ...
    def popvalues(self) -> Iterator[V]: ...
    def popitems(self) -> Iterator[tuple[K, V]]: ...
    def heapify(self, key: K = ...) -> None: ...
    def topvalue(self, default: V = ...) -> V: ...

PQDict = pqdict

def minpq(*args: Any, **kwargs: Any) -> pqdict[Any, Any]: ...
def maxpq(*args: Any, **kwargs: Any) -> pqdict[Any, Any]: ...
def nlargest(n: int, mapping: Mapping[K, V], key: Callable[[V], Any] | None = ...) -> list[K]: ...
def nsmallest(n: int, mapping: Mapping[K, V], key: Callable[[V], Any] | None = ...) -> list[K]: ...

Key as Priority Element Instead of Value

Is it possible that in the 'key' parameter, I specify the keys of the dictionary as the priority function?

Example:
a_dict = {'b': 1, 'a': 3 , 'c': 2}

Instead of:
my_pqdict = {'b': 1, 'c': 2, 'a': 3} ... ordered by value

I obtain this:
my_pqdict = {'a': 3, 'b': 1, 'c': 2} ... ordered by key

It would really be helpful if the priority queue dictionary can do so.

Suggestion: Drop support for python versions < 3.7

The CI only covers versions 3.8 and above, and 3.6 reached EOL a while ago.
I didn't actually go through the code but I noticed that the package has the classifiers for the old versions.

Adding a function topvalue() in pqdict

Hi,

Thanks for creating this simple yet useful library.

While using the lib, I find that sometimes we need to find the value of pq[pq.top()] or pq.topitem()[1] where pq
is non-empty maxpq.

In general, I think, It will be useful to add a function .topvalue() in pqdict API.

Let me know your thoughts, and if you are okay, I will create a PR for this.

Thanks

Tips for Thread Safety?

Hi, I'm considering using this data struct for a project. The exposed interfaces (heap queue + dictionary access) and implementation are appealing, but it looks like it may not be thread-safe.

Any suggestions on what it would take to implement thread-safe behavior for the pqdict object? I plan to have mostly multiple writers at any given time. I might have to introduce a lock variable and add it to the main methods in the class whenever the items are added or removed.

Getting error : ValueError: 'create_entry' in slots conflicts with class variable.

Traceback:

Traceback (most recent call last):
File " ", line 4, in
from pqdict import PQDict
File " ", line 98, in
class PQDict(MutableMapping):
File " ", line 85, in new
cls = super().new(mcls, name, bases, namespace, **kwargs)
ValueError: 'create_entry' in slots conflicts with class variable

Massive test failures when used from Python 3.8

I am getting these failures when used from Python 3.8.3:

$ nosetests test
EE......E.FE.E.EE.E.EEEEEEEEEEEEE.EEEEEEEEEEEEEEEEEEEE.EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE.EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE.EEEEEEEEEEEEEEEEEE.....SSSSSSSSSSSSSSSS.....................................S..............  C-c C-cS
======================================================================
ERROR: test.support.testresult.get_test_runner_class
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rocky/.pyenv/versions/3.8.3/lib/python3.8/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
TypeError: get_test_runner_class() missing 1 required positional argument: 'verbosity'

======================================================================
ERROR: test.support.testresult.get_test_runner
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rocky/.pyenv/versions/3.8.3/lib/python3.8/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
TypeError: get_test_runner() missing 2 required positional arguments: 'stream' and 'verbosity'

======================================================================
ERROR: test.audit-tests.test_open
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rocky/.pyenv/versions/3.8.3/lib/python3.8/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/rocky/.pyenv/versions/3.8.3/lib/python3.8/test/audit-tests.py", line 201, in test_open
    (open, sys.argv[2], "r"),
IndexError: list index out of range
-------------------- >> begin captured stdout << ---------------------
sys.addaudithook 139704741433248
... [hundreds of other lines]

I note however that when i use this package from 3.8.3 things work fine.

If you'd like me to investigate and suggest a fix/PR, let me know.

Missing priority key function argument in nsmallest/nlargest functions

It seems to me we need to have "priority key function" argument in nsmallest/nlargest functions.

nvictus / pqdict Goto Github PK

pqdict's People

Contributors

Stargazers

Watchers

Forkers

pqdict's Issues

pqdict.copy() causes slow/un-necessary heapsort of all the data.

Position of element in queue

Squelch a 3.7+ Decprecation warning (and breakage in 3.9)

Add typing support?

Key as Priority Element Instead of Value

Suggestion: Drop support for python versions < 3.7

Adding a function topvalue() in pqdict

Tips for Thread Safety?

Getting error : ValueError: 'create_entry' in slots conflicts with class variable.

Massive test failures when used from Python 3.8

Missing priority key function argument in nsmallest/nlargest functions

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent