Code Monkey home page Code Monkey logo

pecl-teds's Introduction

Teds

Introduction

Build Status Build Status (Windows)

Teds is a another collection of data structures. (Tentative Extra Data Structures)

Installation

This extension requires php 8.0 or newer.

phpize
./configure
make install

On Windows, see https://wiki.php.net/internals/windows/stepbystepbuild_sdk_2 instead

Overview

Teds contains the following types of Teds\Collection instances as well as various iterable functionality:

Teds\Sequence - A Collection of values with keys 0..n-1

Teds\Sequence is an interface representing a Collection of values with keys 0..n-1 and no gaps. (also known as a list in other languages) It is implemented by the following classes.

  • Teds\Vector - a memory-efficient representation of a list of values that can easily grow/shrink.
  • Teds\Deque - a memory-efficient representation of a list of values with amortized constant time push/pop/pushFront/popFront.

Teds also includes several specializations of Teds\Sequence for memory-efficiency:

  • Teds\IntVector - a memory-efficient specialization of lists of integers, typically with faster serialization requiring less memory.
  • Teds\LowMemoryVector - exposes the same API as Teds\Vector but uses less memory for specializations (exclusively bool/null, exclusively integers, exclusively floats)
  • Teds\BitVector - exposes the same API as Teds\Vector but uses significantly less memory for booleans. Provides ways to read/write raw bytes/words.

And immutable versions:

  • Teds\ImmutableSequence - represents an immutable sequence.
  • Teds\EmptySequence::INSTANCE (PHP 8.1+, uses enum) - represents an immutable empty sequence.

Teds\Set - a Collection of unique values

The Teds\Set interface is implemented by the following classes:

  • Teds\StrictHashSet - a hash table based set of unique values providing efficient average-time lookup/insertion/removal. Uses Teds\strict_hash.
  • Teds\StrictTreeSet - a binary tree-based set of sorted unique values providing good worst-case time. Uses Teds\stable_compare for stable ordering.
  • Teds\StrictSortedVectorSet - provides a similar API to Teds\StrictTreeSet but is represented internally as a Vector. This has reduced memory usage and faster construction time, and can be useful in cases where modification is infrequent (e.g. more common to unserialize without modifying)

Low-memory Sets

Some specializations are provided for reduced memory usage:

  • Teds\SortedIntVectorSet - a mutable sorted set of integers, represented internally as a Vector. This has reduced memory usage and can be useful in cases where modification is infrequent (e.g. more common to unserialize without modifying)

Teds also provides immutable specializations of common data types offering faster serialization/unserialization:

  • Teds\ImmutableSortedStringSet - a sorted set of strings sorted by strcmp. Note that internally, this is backed by a single string with the data that would be used for __unserialize/__serialize.

    As a result, this is faster to unserialize and uses less memory than an array (see benchmark), but when iterating over the string, temporary copies of the strings that are members of the set are returned.

  • Teds\ImmutableSortedIntSet - a sorted set of integers.

  • Teds\EmptySet::INSTANCE (PHP 8.1+, uses enum) - represents an immutable empty set.

Teds\Map - a Collection mapping unique keys to values

The Teds\Map interface is implemented by the following classes:

  • Teds\StrictHashMap - a hash table based map of unique keys to values providing efficient average-time lookup/insertion/removal. Uses Teds\strict_hash.
  • Teds\StrictTreeMap - a binary tree-based map of sorted unique keys to values providing slower average-case time but good worst-case time. Uses Teds\stable_compare for stable ordering.
  • Teds\StrictSortedVectorMap - provides a similar API to Teds\StrictTreeMap but is represented internally like a Vector. This has reduced memory usage and faster construction time and can be useful in cases where modification is infrequent (e.g. more common to unserialize without modifying)

Teds also provides the following map types:

  • Teds\EmptyMap::INSTANCE (PHP 8.1+, uses enum) - represents an immutable empty map.

Teds\Collections - others

  • Teds\StrictMinHeap and Teds\StrictMaxHeap are heaps where elements can be added (allowing duplicates) and are removed in order (or reverse order) of Teds\stable_compare.
  • Teds\CachedIterable is a Collection that lazily evaluates iterables such as Generators and stores the exact keys and values to allow future retrieval and iteration.
  • Teds\ImmutableIterable is a Collection that eagerly evaluates iterables such as Generators and stores the exact keys and values to allow future retrieval and iteration.
  • Teds\MutableIterable is a Collection that allows construction and modification of iterables, setting keys and values and allowing duplicate keys in any order.

Functionality

Teds\ImmutableIterable

Teds\ImmutableIterable API

Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later (when the iterable has arbitrary keys, or when keys might be repeated). There are use cases for this, such as:

  1. Creating a rewindable copy of a non-rewindable Traversable (e.g. a Generator) before passing that copy to a function that consumes an iterable/Traversable. (new ImmutableIterable(my_generator()))
  2. Generating an IteratorAggregate from a class still implementing Iterator (e.g. SplObjectStorage) so that code can independently iterate over the key-value sequences. (e.g. foreach ($immutableImmutableIterable as $k1 => $v1) { foreach ($immutableImmutableIterable as $k2 => $v2) { /* process pairs */ } })
  3. Providing helpers such as iterable_flip(iterable $iterable), iterable_take(iterable $iterable, int $limit), iterable_chunk(iterable $iterable, int $chunk_size) that act on iterables with arbitrary key/value sequences and have return values including iterables with arbitrary key/value sequences
  4. Providing constant time access to both keys and values of arbitrary key-value sequences at any offset (for binary searching on keys and/or values, etc.)

Having this implemented as a native class allows it to be much more efficient than a userland solution (in terms of time to create, time to iterate over the result, and total memory usage, depending on the representation).

Objects within this data structure or references in arrays in this data structure can still be mutated.

Traversables are eagerly iterated over in the constructor.

Teds\CachedIterable

Teds\CachedIterable API

This is similar to Teds\ImmutableIterable but lazily evaluates Traversables instead of eagerly evaluating Traversables. (E.g. Generators will only run until the last offset used from a CachedIterable. See tests/CachedIterable/lazy.phpt and tests/CachedIterable/selfReferential.phpt for examples.)

This can be used to cache results of generators without fetching more elements than needed (e.g. Generators that repeatedly call databases or services to paginate).

Teds\ImmutableSequence

Teds\ImmutableSequence API

Similar to SplFixedArray or Ds\Sequence, but immutable. This stores a sequence of values with the keys 0, 1, 2....

Teds\LowMemoryVector

Teds\LowMemoryVector API

This exposes the same API as a Vector, but with a more memory-efficient representation if the LowMemoryVector has only ever included a type that it could optimize.

Benefits:

  • For collections of exclusively int, exclusively floats, or exclusively null/true/false, this uses less memory when serialized compared to vectors/arrays.

  • For collections of exclusively int, exclusively floats, or exclusively null/true/false, this uses less memory when serialized compared to vectors/arrays.

    Note that adding other types will make this use as much memory as a Vector, e.g. adding any non-float (including int) to a collection of floats.

  • Has faster checks for contains/indexOf (if values can have an optimized representation)

  • Has faster garbage collection (if values can have an optimized representation due to int/float/bool/null not needing reference counting).

  • Interchangeable with Vector or other collections without configuration - this will silently fall back to the default mixed representation if a more efficient representation is not supported.

Drawbacks:

  • Slightly more overhead when types aren't specialized.
  • Adding a different type to the collection will permanently make it used the less efficient mixed representation.

In 64-bit builds, the following types are supported, with the following amounts of memory (plus constant overhead to represent the LowMemoryVector itself, and extra capacity for growing the LowMemoryVector):

  1. null/bool : 1 byte per value.

    (In serialize(), this is even less, adding 1 to 2 bits per value (2 bits if null is included)).

  2. signed 8-bit int (-128..127): 1 byte per value. Adding a larger int to any of these n-bit types will convert them to the collection of that larger int type.

  3. signed 16-bit int: 2 bytes per value.

  4. signed 32-bit int: 4 bytes per value.

  5. signed 64-bit int: 8 bytes per value. (64-bit php builds only)

  6. signed PHP float: 8 bytes per value. (C double)

  7. mixed or combinations of the above: 16 bytes per value. (Same as Vector)

In comparison, in 64-bit builds of PHP, PHP's arrays take at least 16 bytes per value in php 8.2, and at least 32 bytes per value before php 8.1, at the time of writing.

Example benchmarks: benchmarks/benchmark_vector_bool.php and benchmarks/benchmark_vector_unserialize.phpt.

Teds\IntVector

Teds\IntVector API

Similar to Teds\LowMemoryVector but throws a TypeError on attempts to add non-integers.

Teds\BitVector

Teds\BitVector API

Similar to Teds\LowMemoryVector/Teds\IntVector but throws a TypeError on attempts to add non-booleans. This can be used as a memory-efficient vector of booleans.

This uses only a single bit per value for large bit sets in memory and when serializing (around 128 times less memory than arrays, for large arrays of booleans)

Teds\Vector

Teds\Vector API

Similar to SplFixedArray or Ds\Vector. This stores a mutable sequence of values with the keys 0, 1, 2... It can be appended to with push(), and elements can be removed from the end with pop()

This is implemented based on SplFixedArray/ImmutableSequence. There are plans to add more methods.

Teds\MutableIterable

Teds\MutableIterable API

Similar to Teds\Vector and Teds\ImmutableIterable. This stores a mutable vector of keys and values with the keys 0, 1, 2... It can be resized with setSize().

Teds\Deque

Teds\Deque API

Similar to SplDoublyLinkedList but backed by an array instead of a linked list. Much more efficient in memory usage and random access than SplDoublyLinkedList.

(Also similar to Ds\Deque)

Teds\StrictTreeMap and Teds\StrictTreeSet

Teds\StrictTreeMap API

This is a map where entries for keys of any type can be inserted if Teds\stable_compare !== 0.

This currently uses a balanced red-black tree to ensure logarithmic time is needed for insertions/removals/lookups.

Removing a Teds\StrictTreeMap/Teds\StrictTreeSet entry will move iterators pointing to that entries to the entry before the removed entry (as of 1.2.1).

This uses Teds\stable_compare internally.

The Teds\StrictTreeSet API implementation is similar, but does not associate values with keys. Also, StrictTreeSet does not implement ArrayAccess and uses different method names.

Teds\StrictHashMap and Teds\StrictHashSet

Teds\StrictHashMap API

This is a map where entries for keys of any type can be inserted if they are !== to other keys. This uses Teds\strict_hash internally.

The Teds\StrictHashSet API implementation is similar, but does not associate values with keys and does not implement ArrayAccess and uses different method names.

NOTE: The floats 0.0 and -0.0 (negative zero) have the same hashes and are treated as the same entries, because 0.0 === -0.0 in php. NOTE: The float NAN (Not a Number) is deliberately treated as equivalent to itself by Teds\strict_hash and StrictHashSet/StrictHashMap, despite having NAN !== $x in php for any $x, including NAN. This is done to avoid duplicate or unremovable entries.

Removing an entry from a hash map/set will move iterators pointing to that entry to the entry prior to the removed entry.

Teds\StrictMinHeap and Teds\StrictMaxHeap

Teds\Strict*Heap API

This uses Teds\stable_compare instead of PHP's unstable default comparisons. Sorting logic can be customized by inserting [$priority, $value] instead of $value. (Or by subclassing SplMinHeap/SplMaxHeap and overriding compare manually).

php > $x = new SplMinHeap();
php > foreach (['19', '9', '2b', '2'] as $v) { $x->insert($v); }
php > foreach ($x as $value) { echo json_encode($value).","; } echo "\n"; // unpredictable order
"2","19","2b","9",

php > $x = new Teds\StrictMinHeap();
php > foreach (['19', '9', '2b', '2'] as $v) { $x->add($v); }
php > foreach ($x as $value) { echo json_encode($value).","; } echo "\n"; // lexicographically sorted
"19","2","2b","9",

Teds\EmptySequence and Teds\EmptySet and Teds\EmptyMap

This provides empty immutable collections for php 8.1+ based on single-case enums.

Empty Immutable Collection API

Common interfaces

teds_interfaces.stub.php

NOTE: This is currently being revised, and new methods may be added to these interfaces in 0.x releases or new major releases. More methods are currently being added.

These provide common interfaces for accessing the lists, sorted/hash sets, sorted/hash maps, sequences, and key value sequences that are provided by Teds\.

<?php
namespace Teds;

/**
 * Collection is a common interface for an object with values that can be iterated over and counted,
 * but not addressed with ArrayAccess (e.g. Sets, Heaps, objects that yield values in an order in their iterators, etc.)
 */
interface Collection extends \Traversable, \Countable {
    /** @return list<values> the list of values in the collection */
    public function values(): array {}

    /**
     * Returns a list or associative array,
     * typically attempting to cast keys to array key types (int/string) to insert elements of the collection into the array, or throwing.
     *
     * When this is impossible for the class in general,
     * the behavior depends on the class implementation (e.g. throws \Teds\UnsupportedOperationException, returns an array with representations of key/value entries of the Collection)
     */
    public function toArray(): array {}

    /** Returns true if count() would be 0 */
    public function isEmpty(): bool {}

    /** Returns true if this contains a value identical to $value. */
    public function contains(mixed $value): bool {}
}

/**
 * This represents a Collection that can be used like a list without gaps.
 * E.g. get()/set() will work for is_int($offset) && 0 <= $offset and $offset < $list->count().
 */
interface Sequence extends Collection, \ArrayAccess {
    public function get(int $offset): mixed {}
    public function set(int $offset, mixed $value): void {}

    public function push(mixed ...$values): void {}
    public function pop(): mixed {}
}

/**
 * A Map is a type of Collection mapping unique keys to values.
 *
 * Implementations should either coerce unsupported key types or throw TypeError when using keys.
 *
 * Implementations include
 *
 * 1. Teds\StrictHashMap, a hash table with amortized constant time operations
 * 2. Teds\StrictTreeMap, a sorted binary tree
 */
interface Map extends Collection, \ArrayAccess {
    /**
     * Returns true if there exists a key identical to $key according to the semantics of the implementing collection.
     * Typically, this is close to the definition of `===`, but may be stricter or weaker in some implementations, e.g. for NAN, negative zero, etc.
     *
     * containsKey differs from offsetExists, where implementations of offsetExists usually return false if the key was found but the corresponding value was null.
     * (This is analogous to the difference between array_key_exists($key, $array) and isset($array[$key]))
     */
    public function containsKey(mixed $value): bool {}
}

/**
 * A Set is a type of Collection representing a set of unique values.
 * Implementations include Teds\StrictHashSet and Teds\StrictTreeSet.
 *
 * 1. Teds\StrictHashSet, a hash table with amortized constant time operations
 * 2. Teds\StrictTreeSet, a sorted binary tree
 */
interface Set extends Collection {
    /**
     * Returns true if $value was added to this Set and was not previously in this Set.
     */
    public function add(mixed $value): bool {}

    /**
     * Returns true if $value was found in this Set before being removed from this Set.
     */
    public function remove(mixed $value): bool {}
}

iterable functions

This PECL contains a library of native implementations of various functions acting on iterables. See teds.stub.php for function signatures.

The behavior is equivalent to the following polyfill (similarly to array_filter, the native implementation is likely faster than the polyfill with no callback, and slower with a callback)

namespace Teds;

/**
 * Determines whether any element of the iterable satisfies the predicate.
 *
 * If the value returned by the callback is truthy
 * (e.g. true, non-zero number, non-empty array, truthy object, etc.),
 * this is treated as satisfying the predicate.
 *
 * @param iterable $iterable
 * @param null|callable(mixed):mixed $callback
 */
function any(iterable $iterable, ?callable $callback = null): bool {
    foreach ($iterable as $v) {
        if ($callback !== null ? $callback($v) : $v) {
            return true;
        }
    }
    return false;
}
/**
 * Determines whether all elements of the iterable satisfy the predicate.
 *
 * If the value returned by the callback is truthy
 * (e.g. true, non-zero number, non-empty array, truthy object, etc.),
 * this is treated as satisfying the predicate.
 *
 * @param iterable $iterable
 * @param null|callable(mixed):mixed $callback
 */
function all(iterable $iterable, ?callable $callback = null): bool {
    foreach ($iterable as $v) {
        if (!($callback !== null ? $callback($v) : $v)) {
            return false;
        }
    }
    return true;
}

/**
 * Determines whether no element of the iterable satisfies the predicate.
 *
 * If the value returned by the callback is truthy
 * (e.g. true, non-zero number, non-empty array, truthy object, etc.),
 * this is treated as satisfying the predicate.
 *
 * @param iterable $iterable
 * @param null|callable(mixed):mixed $callback
 */
function none(iterable $iterable, ?callable $callback = null): bool {
	return !any($iterable, $callback);
}

// Analogous to array_reduce but with mandatory default
function fold(iterable $iterable, callable $callback, mixed $default): bool {
	foreach ($iterable as $value) {
		$default = $callback($default, $value);
	}
	return $default;
}

/**
 * Returns the first value for which $callback($value) is truthy.
 */
function find(iterable $iterable, callable $callback, mixed $default = null): bool {
	foreach ($iterable as $value) {
		if ($callback($value)) {
			return $value;
		}
	}
	return $default;
}

/**
 * Similar to in_array($value, $array, true) but also works on Traversables.
 */
function includes_value(iterable $iterable, mixed $value): bool {
	foreach ($iterable as $other) {
		if ($other === $value) {
			return true;
		}
	}
	return false;
}

/**
 * Returns a list of unique values in order of occurrence,
 * using a hash table with `Teds\strict_hash` to deduplicate values.
 */
function unique_values(iterable $iterable): array {
	// Without Teds installed, this takes quadratic time instead of linear time.
	$result = [];
	foreach ($iterable as $value) {
		if (!in_array($value, $result, true)) {
			$result[] = $value;
		}
	}
	return $result;
}

Stable comparison

Teds\stable_compare is a function that can be used to compare arbitrary values in a stable order.

This exists because php's < operator is not stable. '10' < '0a' < '1b' < '9' < '10'. Teds\stable_compare fixes that by strictly ordering:

  • null < false < true < int,float < string < array < object < resource.
  • objects are compared by class name with strcmp, then by spl_object_id.
  • resources are compared by id.
  • arrays are compared recursively. Smaller arrays are less than larger arrays.
  • int/float are compared numerically. If an int is equal to a float, then the int is first.
  • strings are compared with strcmp.

Strict hashing

Teds\strict_hash provides a hash based on value identity. Before a final step to improve accidental hash collisions:

  • Objects are hashed based only on spl_object id. Different objects will have different hashes for the lifetime of the hash.
  • Resources are hashed based on get_resource_id.
  • Strings are hashed
  • References are dereferenced and hashed the same way as the value.
  • Integers are used directly.
  • Floats are hashed in a possibly platform-specific way.
  • Arrays are hashed recursively. If $a1 === $a2 then they will have the same hash.

This may vary based on php release, OS, CPU architecture, or Teds release and should not be saved/loaded outside of a given php process. (and spl_object_id/get_resource_id are unpredictable)

Binary search

Teds\binary_search(array $values, mixed $target, callable $comparer = null, bool $useKey=false) can be used to binary search on arrays that are sorted by key (ksort, uksort) or value (sort, usort, uasort). (even if keys were unset).

This will have unpredictable results if the array is out of order. See Teds\stable_sort for ways to sort even arbitrary values in a stable order.

This is faster for very large sorted arrays. See benchmarks.

This returns the key and value of the first entry <= $needle according to the comparer, and whether an entry comparing equal was found. By default, php's default comparison behavior (<=>) is used.

php > $values = [1 => 100, 3 => 200, 4 => 1000];
php > echo json_encode(Teds\binary_search($values, 1));
{"found":false,"key":null,"value":null}
php > echo json_encode(Teds\binary_search($values, 100));
{"found":true,"key":1,"value":100}
php > echo json_encode(Teds\binary_search($values, 201));
{"found":false,"key":3,"value":200}
php > echo json_encode(Teds\binary_search($values, 99));
{"found":false,"key":null,"value":null}
php > echo json_encode(Teds\binary_search($values, 1, useKey: true));
{"found":true,"key":1,"value":100}

Array functionality

teds.stub.php

  • Teds\is_same_array_handle(array $array1, array $array2) - check if two arrays have the same handle, for infinite recursion detection.
  • Teds\array_value_first(array $array), Teds\array_value_last(array $array) - Return the first/last value of an array without creating references or moving the internal array pointer. Similar to $array[array_key_first($array)] ?? null.

Motivation

This contains functionality and data structures that may be proposed for inclusion into PHP itself (under a different namespace) at a future date, reimplemented using SPL's source code as a starting point.

Providing this as a PECL first makes this functionality easier to validate for correctness, and make it more practical to change APIs before proposing including them in PHP if needed.

License

See COPYING

Changelog

See package.xml

Related Projects

pecl-teds's People

Contributors

andypost avatar gemorroj avatar remicollet avatar tysonandre avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pecl-teds's Issues

Refactor SortedStrictMap to use skip list/balanced tree(e.g. red-black tree) for iteration/lookup/insertion

https://en.wikipedia.org/wiki/Skip_list seems the easiest to implement and likely uses less space.

https://en.wikipedia.org/wiki/Red%E2%80%93black_tree

  • The zval's uint32_t extra can be used to store the red/black flag and whether the node was removed from the list
  • In addition to the key and value, a pointer to the active iterators on each node should be tracked (doubly linked list?). Removing an iterator should remove the entry from that list.
  • See tables/pseudocode.

This is similar to php-src/ext/spl/spl_dllist.c (associate iterators with tree nodes rather than linked list nodes and have reference counts on tree nodes)

/* define an overloaded iterator structure */
struct _spl_dllist_it {
	zend_object_iterator   intern;
	spl_ptr_llist_element *traverse_pointer;
	int                    traverse_position;
	int                    flags;
};

Idea: Add a 2DMatrix/1DMatrix class

offsetGet can return a 1DMatrix slice, etc.

This may be useful for bounds checking or for reducing memory needs of algorithms (e.g. dynamic programming) implemented in php

Assert SIZEOF_ZEND_LONG <= SIZEOF_SIZE_T

Furthermore, by the current design, size_t can always overflow zend_long.

Make that assumption explicit, and #error if it ever changes

// php-src/Zend/zend_range_check.h
/* Flag macros for basic range recognition. Notable is that
   always sizeof(signed) == sizeof(unsigned), so no need to
   overcomplicate things. */
#if SIZEOF_INT < SIZEOF_ZEND_LONG
# define ZEND_LONG_CAN_OVFL_INT 1
# define ZEND_LONG_CAN_OVFL_UINT 1
#endif

#if SIZEOF_INT < SIZEOF_SIZE_T
/* size_t can always overflow signed int on the same platform.
   Furthermore, by the current design, size_t can always
   overflow zend_long. */
# define ZEND_SIZE_T_CAN_OVFL_UINT 1
#endif

Clarify expected stability of individual datastructures/methods

These are Tentative Extra data structures

E.g. Vector and Deque are evolving/unstable as issues are discovered while writing/discussing the RFC proposals.

Most APIs should currently be considered unstable and applications should pin patch version ranges.

Idea: StrictEqualityMap, StrictEqualityDefaultMap

Reimplement a map data structure based on zend_array with (almost) arbitrary keys (non-references, non-circular arrays?)

For DefaultMap, offsetExists would always return true, and offsetGet(mixed $key) would return $someProvidedCallable() and create an entry

Idea: SortedIntMap/SortedStringMap/SortedMap

(e.g. with a skip list, splay tree, or red black tree, etc)

Use integers and strings as keys

Possibly invalidate iterators or throw if a node is removed during iteration (if invalidating, keep removed nodes in an associated buffer if they have active iterators?)

Idea: Data structures for manipulating raw bytes as bits/words (e.g. uint32, bitset, etc)

E.g. similar to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint32Array but smaller in scope

Possible uses:

  • working with binary file data
  • storing/accessing lists of integers (e.g. to memcache) in a memory efficient way
  • Possibly working with FFI (may be redundant)
  • Floating point operations?

This may be useful for efficiently applying operations to arrays of integers in C (e.g. adding, multiplying, bitwise operations, exponentiation, etc)

https://numpy.org/ is a similar project of a vastly larger scope in another programming language.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array exists in JavaScript.

strict_hash should not (solely) use GC_PROTECT_RECURSION to check for self-reference

keep a linked list of arrays that were already processed on the stack? E.g. if var_export on arrays of arrays of objects calls __debugInfo and debugInfo calls strict_hash on a parent array (e.g. through global variables, static properties, etc), the hash would unexpectedly be different.

Also, the depth could be distinguished, e.g. $y = []; $x = [&$y]; $y[1] = [&$y]; is not the same as $y = []; $x = [&$y]; $y[1] = &$x;

Add AbstractSortedMap?

Concurrent modification is possible through side effects of method overrides and would need to be prevented in many places.

StableMinHeap/StableMaxHeap extends SplMinHeap/SplMaxHeap?

And MinHeap/MaxHeap/user-defined subclasses?

This would be useful for comparing strings or strings grouped with other types, or arrays (e.g. [$priority, $actualValue])

<=> isn't stable for non-numbers (and NAN) but is used by SplMinHeap.

php > $x = new SplMinHeap();
php > foreach (['19', '9', '6', '1a', '2b', '3', '2'] as $v) { $x->insert($v); }
php > foreach ($x as $value) { var_dump($value); }
string(1) "2"
string(1) "3"
string(1) "6"
string(2) "2b"
string(2) "1a"
string(1) "9"
string(2) "19"

Teds\Deque iteration should account for pushFront/popFront calls adjusting the front of the deque

e.g. if modifying the deque during iteration, don't just use an offset, use two offsets - one for the Deque as a whole, another for the InternalIterator, and stop once the index=offset(InternalIterator)-offset(Deque) becomes invalid (before the start or after the end).

This ensures that elements won't be processed twice or skipped if values are pushed/popped onto the front of the deque during iteration

Implement a Comparator class that can be used for customizing stable/strictly typed sorting

Related to Teds\stable_sort - the base class can call that if it isn't subclassed

making a note of an idea I've had for how php could expose object comparisons safely in arrays of objects - Implementing an extensible Comparator class seems doable but would require much more familiarity with internals to guard against comparison callbacks mutating parent objects/arrays while recursing on zvals, and possibly detecting infinite recursion (adding and decrementing reference counts with GC_ADDREF/GC_DELREF)

  • A Comparator may be of use for implementing things like a extensible SortedSet internally, to handle arrays of objects and other values.

Related past/current RFCs tried to change objects themselves:

// Extensible internal class - user-defined classes can override compare, compareObjects, compareStrings
// $c = Closure::fromCallable(new ComparatorOrSubclass()); can be used to create closures to pass elsewhere, e.g. array_unique($values, $c);
// Allows tracking state and customizations while comparing.
class Comparator {
    /** Compares two top-level values but not inner values, in an overridable way (e.g. to allow ReverseComparator, custom behavior only for top-level objects, etc.) */
    public function __invoke(mixed $a, mixed $b): int {
    }
    /** Compares two objects. Can be extended to customize for specific use cases. Defaults to .... */
    public function compareObjects(object $a, object $b): int {
    }
    /** Compares two strings. Defaults to same ordering as strcmp. */
    public function compareStrings(string $a, string $b): int {
    }
    // I expect php to remove resources at some point.
}

Idea: KeyValueVector?

This can be used when

  • Building an iterable of keys and values with unknown length (more memory efficient than fromPairs)
  • Mutation of keys/values is needed (e.g. sorting, updating in place)
  • Memory usage being reduced is a goal

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.