rust-itertools / itertools Goto Github PK

View Code? Open in Web Editor NEW

2.5K 21.0 291.0 2.54 MB

Extra iterator adaptors, iterator methods, free functions, and macros.

Home Page: https://docs.rs/itertools/

License: Apache License 2.0

Rust 100.00%

rust iterators

itertools's Introduction

Itertools

Extra iterator adaptors, functions and macros.

Please read the API documentation here.

How to use with Cargo:

[dependencies]
itertools = "0.13.0"

How to use in your crate:

use itertools::Itertools;

How to contribute

If you're not sure what to work on, try checking the help wanted label.

See our CONTRIBUTING.md for a detailed guide.

License

Dual-licensed to be compatible with the Rust project.

Licensed under the Apache License, Version 2.0 https://www.apache.org/licenses/LICENSE-2.0 or the MIT license https://opensource.org/licenses/MIT, at your option. This file may not be copied, modified, or distributed except according to those terms.

itertools's People

Contributors

Stargazers

Watchers

Forkers

awelkie ema-fox steveklabnik milibopp sbward zsiciarz stebalien shepmaster oli-obk flo-l simonsapin erickt angelofsol wilfred danielkeep stepancheg milancio42 popog tempbottle tikue johanneskoester zummenix aonorin arcnmx 1uks birkenfeld panicbit salty-horse mitchmindtree gereeter bsteinb lud4ik imp jaredr tbu- kindlychung zhangf911 chicker jethrogb artemban dirkyjerky phimuemue malbarbo pthariensflame luser weiznich theemathas rlugojr burdges jameysharp qthree sgrif fiedzia rkarp srijs krdln cramertj centril spease tmccombs phlosioneer matematikaadit tobz1000 behnam matklad ignatenkobrain pazzaz nicopap xion williamyaoh gnzlbg scottmcm torkleyy dwijnand kamilaborowska happy-forks jturner314 emerentius davidcornu beatbutton ethanthemaster hcpl aertoria johnheitmann zayenz avi-d-coder sheganinans fyrchik burtonageo willcrozi timotree3 tommilligan peterjoel timvermeulen devonhollowood aschampion svenknobloch ia0 jswrenn mesalock-linux

itertools's Issues

Wouldn't it be more flexible to use `F: FnMut(&Option<I::Item>, &Option<I::Item>) -> bool` instead?

https://github.com/bluss/rust-itertools/blob/master/src/adaptors.rs#L556

Wouldn't it be more flexible to use F: FnMut(&Option<I::Item>, &Option<I::Item>) -> bool instead?

Breaking changes in 0.5

Is there a function to do the SQL equivalent of COUNT(*) GROUP BY

for example i have a vector of value

[ 'a'  , 'a' ,  'b' , 'c', 'c' 'c']

and it returns me a hashmap

[
   'a' => 2
  'b' => 1 
  'c' =>  3
]

Group by that merges same key elements

After a quick glance I couldn't find a function like that. I believe it would be useful to have for unsorted Vecs, instead of creating multiple groups only for consecutive elements.

Thoughts?

cannot use .rev() on (i..j).combinations()

Would be nice to have that

fn main() {
    let v: Vec<_> = (0..5).combinations().rev().collect();
}

cargo build --release
   Compiling palindrome v0.1.0 (file:///Users/SumProxy/Projects/rust/palindrome)
src/main.rs:43:43: 43:48 error: the trait `core::iter::DoubleEndedIterator` is not implemented for the type `itertools::adaptors::Combinations<core::ops::Range<_>>` [E0277]
src/main.rs:43     let v: Vec<_> = (0..5).combinations().rev().collect();
                                                         ^~~~~
src/main.rs:43:43: 43:48 help: run `rustc --explain E0277` to see a detailed explanation
src/main.rs:43:49: 43:58 error: no method named `collect` found for type `core::iter::Rev<itertools::adaptors::Combinations<core::ops::Range<_>>>` in the current scope
src/main.rs:43     let v: Vec<_> = (0..5).combinations().rev().collect();
                                                               ^~~~~~~~~
src/main.rs:43:49: 43:58 note: the method `collect` exists but the following trait bounds were not satisfied: `itertools::adaptors::Combinations<core::ops::Range<_>> : core::iter::DoubleEndedIterator`, `core::iter::Rev<itertools::adaptors::Combinations<core::ops::Range<_>>> : core::iter::Iterator`
error: aborting due to 2 previous errors
Could not compile `palindrome`.

To learn more, run the command again with --verbose.

Add #[must_use] where appropriate

Add #[must_use] just like libstd.

Why not derive Clone for PushBackN

Sorry if I am asking something obvious, but why not derive Clone here? https://github.com/bluss/rust-itertools/blob/master/src/adaptors.rs#L282

Index trait error in stride.rs compliation.

When compiling src/lib.rs, I get the following error:

src/stride.rs:141:5: 148:6 error: method `index` has an incompatible type for trait: expected type parameter but found &-ptr
src/stride.rs:141     fn index<'b>(&'b self, i: &uint) -> &'b A
src/stride.rs:142     {
src/stride.rs:143         assert!(*i < self.size_hint().val0());
src/stride.rs:144         unsafe {
src/stride.rs:145             let ptr = self.begin.offset(self.stride * (*i as int));
src/stride.rs:146             mem::transmute(ptr)

This is compiling with rustc version 0.11.0, on Arch Linux x86_64.

Add minmax/minmax_by_key

Sometimes I'd like to keep track of minimum and maximum without iterating twice (if that's even possible). Itertools might be a good place for such adapters?

Improve the fast case for .group_by(), .chunks()

We could keep some kind of flag for non-buffering, so that the simplest use case of .group_by_lazy() is as fast as possible.

The simplest case is: For each group, the group is iterated until its end, before the next iteration of the group's iterator.

Add function to convert Option<IntoIterator> into Iterator

I've run into a situation where I could use a function that takes an Option<T> where T: IntoIterator and return a struct that implements the Iterator trait. The behavior would be the same as the contained iterator, or std::iter::Empty if there is no contained iterator.

First of all, is there any way to do this without defining a new struct? If not, would this function be a useful addition to itertools? I'm willing to do the implementation, I just wanted to get the go-ahead first.

Find a good example for .while_some()

This new adaptor .while_some() (commit, not yet merged) needs a nice example + test.

@DanielKeep Is this the iterator we decided was missing? The missing part to turn an infinite iteration of None's into a sensible iterator.

prepend/append convenience methods

I have quite some instances of combining iter::once and Iterator::chain in my code, do you think it would make sense to add two convenience methods for this? Looking at Vec, push may be a better name for append, but I'm not sure what to call prepend then. The argument order of prepend may be confusing, either way.

Add .get(usize) method to Stride

Use #[must_use]

Mimic libstd and put #[must_use] on adaptors where it makes sense.

Add a `skip_every` iterator

The name and semantics can be changed if necessary, but I'd like an iterator that skips every nth element, and I wasn't able to find an existing one in reading through the docs. For example:

let items = vec![1, 2, 3, 4, 5, 6, 7, 8];
assert_eq!(vec![1, 2, 4, 5, 7, 8], items.iter().skip_every(3).collect());

Stop using function pointers

Function pointers are useful in place of closures, because we can actually return them without boxing; however, the virtual calls are hard for the compiler to inline, leading to degradation, see #47.

For itertools 0.4.0, remove function pointers. Possible workarounds: Wrapper types. Unit structs as statically compiled function values.

Add reductions()

Unless I missed it, there doesn't seem to currently exist a function to get an iterator of the intermediate values of a reduction function. Here is an example of how this can be used to easily create a cumulative sum iterator:

#[test]
fn test_cumsum() {
    let expected = vec![1, 3, 6, 10, 15];
    let result: Vec<_> = (1..6).reductions(|a,b| a+b).collect();
    assert_eq!(expected, result);
}

For me this use-case up when I needed a cumulative sum over an addition operator without inverse, to optimize sums of certain structure.

Does this seem like something that could be part of this library? (The name "reductions" is not great, as rust uses "fold" instead of "reduce", there's probably some other term for this.)

I came up with the following implementation for my own use, though me being new to Rust it likely does everything wrong:

pub struct Reductions<I, R> where I : Iterator {
    iter: I,
    reductor: R,
    prev: Option<I::Item>
}

impl<I, R> Iterator for Reductions<I, R>
        where I: Iterator, I::Item: Clone,
              R: FnMut(I::Item, I::Item) -> I::Item {
    type Item = I::Item;

    fn next(&mut self) -> Option<Self::Item> {
        self.iter.next().map(|item| {
            let value = match self.prev.take() {
                Some(prev) => (self.reductor)(prev, item),
                None => item
            };
            self.prev = Some(value.clone());
            value
        })
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        self.iter.size_hint()
    }
}

impl<I, R> ExactSizeIterator for Reductions<I, R>
    where I: Iterator, I::Item: Clone,
          R: FnMut(I::Item, I::Item) -> I::Item {}

pub trait IteratorExt : Iterator {
    fn reductions<R>(self, reductor: R) -> Reductions<Self, R>
            where R: FnMut(Self::Item, Self::Item) -> Self::Item,
                  Self: Sized {
        Reductions {
            iter: self,
            reductor: reductor,
            prev: None
        }
    }
}

impl<T> IteratorExt for T where T: Iterator { }

Add a .windows(n) iterator adapter

Similar to .windows() in std, just not only for slices but for arbitrary iterators. I saw that .chunks() is already implemented as chunks_lazy(). Would you find eg. windows_lazy() implemented in a similar fashion useful?

I've read the code of chunks_lazy(), it's quite complicated. But with a few hints from somebody who knows the code I could probably implement windows_lazy() myself.

bound function not correct

See here: https://github.com/bluss/rust-itertools/blob/master/src/adaptors.rs#L142

Isn't it missing the case self.phase && a < b?

step() slowness

I encountered a case where the step() method leads to incredibly slow code:

#![feature(step_by)]
extern crate time;
use time::PreciseTime;
extern crate itertools;
use itertools::Itertools;

fn calc2(n: usize) -> Vec<usize> {
    let mut t = vec![0; n] ;
    for i in 2..n {
        if t[i] == 0 {
            t[i] = i - 1;
            for j in (2*i..n).step_by(i) {
                if t[j] == 0 {
                    t[j] = j;
                }
                t[j] = t[j] * (i - 1) / i;
            }
        }
    }
    t
}

fn calc1(n: usize) -> Vec<usize> {
    let mut t = vec![0; n] ;
    for i in 2..n {
        if t[i] == 0 {
            t[i] = i - 1;
            for j in (2*i..n).step(i) {
                if t[j] == 0 {
                    t[j] = j;
                }
                t[j] = t[j] * (i - 1) / i;
            }
        }
    }
    t
}

fn main() {
    let nb = 100_001;
    let start = PreciseTime::now();
    let s = calc2(nb).into_iter().fold(0, |a, c| a + c);
    println!("sum (step_by): {} {:?}", s, start.to(PreciseTime::now()));

    let start = PreciseTime::now();
    let s = calc1(nb).into_iter().fold(0, |a, c| a + c);
    println!("sum (step): {} {:?}", s, start.to(PreciseTime::now()));
}

Replacing step() with the unstable step_by() gives a 100 fold improvement in execution time, as can be observed on the playground here.

0.4.0 checklist / upcoming breaking changes

ticks are for my WIP

Remove deprecated items (icompr, fn_map, etc.)
Mark Slice trait unsafe
~~Rename MergeAscend to MergeFn (?)~~
~~Generalize .format to debug / display(?)~~
Merge / MergeBy without fn() #49
Dedup without fn() #49
Unique without fn() #49
MendSlices without fn() #49
Rename sort_by → sorted_by
Remove times()
interleave_shortest not fused

Add quickcheck tests for minmax methods

Add iterate() function

To replace the deprecated one from the std:
https://doc.rust-lang.org/std/iter/fn.iterate.html

Put common usage /use cases/ examples that this crate helps to fix

Some examples in the readme or the doc will be greatly appreciable , as it improve the discoverabiltiy of your lib , especially for guys like me that are just wondering "hmmmm should i use it ?"

thanks for the lib.

Crate Issue for Itertools 0.4.11 for Rust 1.0 & 1.1

Running travis builds here and seem to see an issue for itertools. Is there some breaking change that prevents 1.0 & 1.1 from working with the new version? It seems to work fine for rustc > 1.1

 Downloading itertools v0.4.11
Unable to get packages from source
Caused by:
  failed to parse manifest at `/home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/itertools-0.4.11/Cargo.toml`
Caused by:
  expected a value of type `string` for the key `lib.name`

take(n).collect() behind a mutex on combinations iterator

When compiling gist https://gist.github.com/sumproxy/5ddd66b69e5c2cba6932 I have problems with borrow checker, probably not an itertools issue, but can you please take a look?

Add a simpler version of .format()

Idea here: https://users.rust-lang.org/t/enumerate-a-vector-of-results/5237/13

Document fusing and other iterator protocol details

Adaptor is fused: guaranteed to return None indefinitely after end of iterator.
Adaptor is well behaved: Does not call .next() on non-fused iterator after it has returned None once.

All of our adaptors must be well behaved, some guarantee being fused.

with_value and into_parts for PushBackN

Would be nice to have these implemented for PushBackN:

    /// Create a `PutBackN` along with the `value` to put back.
    #[inline]
    pub fn with_value(value: Vec<I::Item>, it: I) -> Self {
        PutBackN {
            top: value,
            iter: it,
        }
    }

    /// Split the `PutBackN` into its parts.
    #[inline]
    pub fn into_parts(self) -> (Vec<I::Item>, I) {
        let PutBackN{top, iter} = self;
        (top, iter)
    }

Why the extra impl for ZipSlices?

I was looking at the ZipSlices impl and noticed that you have the signatures impl<'a, 'b, A, B> ZipSlices<&'a [A], &'b [B]> and impl<T, U> ZipSlices<T, U> where T: Slice, U: Slice. The latter seems a lot more general, and I was wondering why the former is still in the library? Are there some non-trivial reasons for this?

Support no_std

'sorted()' method

It would be useful to have sorted() method for regular sorting that calls sorted_by(Ordering::cmp) internally. What do you think?

Enhancement: Coalesce-style functions

I've used the following functions in two completely different unrelated projects already. They basically take an Iterator<Result> (or Iterator<Option>) and return the first Ok Result or the last Err.

trait RetryIterator: Iterator {
    fn retry_results<T, E>(&mut self) -> std::result::Result<T, E> where Self: Iterator<Item=std::result::Result<T, E>> {
        let mut last=None;
        loop {
            let cur=self.next();
            match cur {
                Some(v @ Ok(_)) => { return v; },
                Some(e @ Err(_)) => { last=Some(e); },
                None => if let Some(e) = last { return e; },
            }
        }
    }

    fn retry_options<T>(&mut self) -> Option<T> where Self: Iterator<Item=Option<T>> {
        let mut last=None;
        loop {
            let cur=self.next();
            match cur {
                Some(v @ Some(_)) => { return v; },
                Some(e @ None) => { last=Some(e); },
                None => if let Some(e) = last { return e; },
            }
        }
    }
}

impl<T: Iterator> RetryIterator for T {}

I'm happy to turn this into PR but first wanted to discuss what would be the best way to implement this.

Kmerge fails

This simple test crashes itertools:

#[test]
fn kmerge_reverse() {
    let its = vec![3, 2, 1, 0].into_iter().map(|s| (s..10).step(4));
    it::assert_equal(its.kmerge(), (0..10));
}

The error is:

        thread 'kmerge_reverse' panicked at 'Failed assertion Some(1) == Some(0) for iteration 0', src/lib.rs:1585

If I print the sequences I'm getting the following:

---- kmerge_reverse stdout ----
        [[3, 7], [2, 6], [1, 5, 9], [0, 4, 8]]

So nothing wrong with the sequences.

I also have an issue with getting the wrong (unsorted) data. But I have not found a simple example to reproduce.

Breakage with recent nightly

   Compiling itertools v0.4.13
/Users/alexbool/.cargo/registry/src/github.com-1ecc6299db9ec823/itertools-0.4.13/src/free.rs:218:1: 224:2 error: type mismatch resolving `<<J as std::iter::IntoIterator>::IntoIter as std::iter::Iterator>::Item == <I as std::iter::IntoIterator>::Item`:
 expected type parameter,
    found associated type [E0271]
/Users/alexbool/.cargo/registry/src/github.com-1ecc6299db9ec823/itertools-0.4.13/src/free.rs:218 pub fn merge<I, J>(i: I, j: J) -> Merge<I::IntoIter, J::IntoIter>
                                                                                                 ^
/Users/alexbool/.cargo/registry/src/github.com-1ecc6299db9ec823/itertools-0.4.13/src/free.rs:218:1: 224:2 help: run `rustc --explain E0271` to see a detailed explanation
/Users/alexbool/.cargo/registry/src/github.com-1ecc6299db9ec823/itertools-0.4.13/src/free.rs:218:1: 224:2 note: required by `adaptors::Merge`
error: aborting due to previous error
Build failed, waiting for other jobs to finish...
error: Could not compile `itertools`.

alexbool@alexbool-osx ~/D/I/i/r/portfolio> rustc -vV
rustc 1.10.0-nightly (22ac88f1a 2016-05-11)
binary: rustc
commit-hash: 22ac88f1a47a82195a49fbff3cf24a2c395d7a81
commit-date: 2016-05-11
host: x86_64-apple-darwin
release: 1.10.0-nightly

Add an iterator that peeks the future from MultiPeek.

starmap

It'd be nice to have a starmap method. The main aim is to skip having to write closures after things like zip. If you're not familiar with it from Python, a starmap function unpacks the tuples in the input iterable into the function being mapped.

A simple but unstable implementation is

fn starmap<F>(self, f: F) -> iter::Map<Self, FnZipper<F>>
    where Self: Sized, F: Fn<Self::Item>
{
    self.map(FnZipper(f))
}

where

pub struct FnZipper<F>(F);

impl<F, Args> FnOnce<(Args,)> for FnZipper<F>
    where F: FnOnce<Args>
{
    type Output = F::Output;
    extern "rust-call" fn call_once(self, (args,): (Args,)) -> Self::Output {
        self.0.call_once(args)
    }
}

impl<F, Args> FnMut<(Args,)> for FnZipper<F>
    where F: FnMut<Args>
{
    extern "rust-call" fn call_mut(&mut self, (args,): (Args,)) -> Self::Output {
        self.0.call_mut(args)
    }
}

// For completeness
impl<F, Args> Fn<(Args,)> for FnZipper<F>
    where F: Fn<Args>
{
    extern "rust-call" fn call(&self, (args,): (Args,)) -> Self::Output {
        self.0.call(args)
    }
}

A stable implementation would probably requires macros to handle all the wanted cases, and macros would also let us also support more sophisticated flattening like for

xs.zip(ys).zip(zs).starmap(|x, y, z| ...)

Thoughts?

Iterator adapter like std::iter::Inspect, but which passes a mutable reference to the closure

I don't know how common of a use-case this is, but I can see some utility in an iterator adapter which can mutate items via a closure before yielding them. I was going to open this as an issue on rust-lang/rfcs but I figured it would be better to incubate the idea here first.

Basically a more succinct version of this:

my_iter.map(|mut val| {
    val.mutate();
    val
})

As for what to call it, I'm not sure. Perhaps as a parallel to .inspect() but implying the possibility of change. Here's some of my ideas so far:

.mutate()
.edit() (as a reference to sed)
.update()
.probe()
.modify()
.tweak()

fold_options?

Would it make sense to add an Option<T> equivalent to fold_results?

Integrate with either

It would be nice if itertools integrated with either to provide:

left, right -- Return iterators mapping over Either::Left/Either::Right.
interleave_either, interleave_either_shortest, intersperse_either, and chain_either -- Same as the existing adapters but these would work with distinct types.

If the above rust PR were merged, it would be possible to write: some_string.extend(items.intersperse_either(", ")).

Make Stride use RandomAccessIterators

As it is now, the Stride struct only operates on slices. We can instead make the Stride struct use any iterator that implements the RandomAccessIterator trait. The benefits of doing this are that we can stride over more than just slices, the Stride struct should be simpler, and we won't need to use unsafe blocks.

I thought there might be a performance decrease in switching to RandomAccessIterator-based strides, especially when composing strides. I implemented the RandomAccessIterator-based strides, and there is indeed a performance drop when compiling without optimization. However, when compiling with optimization (level 2), the RandomAccessIterator-based stride actually performs better than the slice-based stride, often by at least a factor of 2. I have the tests at awelkie/rust-itertools on the "stride_performance" branch.

Would you mind checking my performance comparison? If the RandomAccessIterator-based strides are indeed faster with optimization, and you think it's a good idea, then I would be happy to re-implement the Stride struct.

RepeatCall shouldn't implement DoubleEndedIterator

For example, the following iterator is definitely not reservable:

RepeatCall::new({
    let mut counter = 0;
    move || {
        counter += 1;
        counter
    }
});

Optimize kmerge

See the PR discussion in #97 for different ideas of improvement

Add Windows-like Iterator that returns static-sized tuple elements instead of slices.

The Windows iterator in the std lib returns its values as slices of elements.
This makes it unpleasant to work with since pattern matching and borrowing of slices is more restricted than (for example) for tuples.

Besides that a tuple-based Windows iterator should not involve runtime overhead and should make it possible to make compile time assumptions about the correctness of the implementation. For example, it can be checked at compiletime if accesses to iterator elements are "out-of-bounds".

E.g.:

    let primes = &[2, 3, 5, 7, 11, 13, 17, ...];
    for (fst, snd) in primes.iter().windows(2) {
        println!("({}, {})", fst, snd);
    }

Add Debug implementations where possible

I would like to use MultiPeek as part of a type which must be Debug. Currently I have to write a custom Debug implementation for this.

Investigate .merge()

@frankmcsherry reported a big runtime improvement by switching from .merge(j) to .merge_by(j, Ord::cmp), with merge::wrapper showing up in the profile.

.partial_ord() call being harmful?
function pointer vs inlined closure issue?

(Profile says: It's both)

Is there a function equivalent to SQL inner join?

As a rust newbie coming from C#, I was wondering if there's any function equivalent to this.
I was looking through the docs of itertools and std::iter and couldn't find anything related.

Thanks in advance

Add a generic Chain iterator

It would be nice to have a Chain iterator in this library that could chain several iterators together instead of just too as it is supported in the std lib at the moment.
This should behave similar to the currently implemented Zip::new((..)).

Simple-silly example code:

let v = vec![4,5,6];
let words: Vec<String> = vec!["Hello", "World", "!"];
for x in Chain::new((&[1,2,3], v.iter(), words.iter().map(|word| word.len()))) {
    println!("{}", x);
}

Important would be a possibility to also create a way to chain a bunch of iterators over mutable elements to obtain yet another iterator over mutable data.

A (possibly buggy) example code:

let (ax, bx, cx) = (&[1,2,3], &[4,5,6], &[7,8,9]);
for mut x in Chain::new((ax.iter_mut(), bx.iter_mut(), cx.iter_mut())) {
    x = x*x + x;
}

Make a Product adapter for the Itertools trait

In addition to iproduct!(iter_a, iter_b), Itertools should have something like iter_a.times(iter_b) or iter_a.cartesian_product(iter_b). This would let you take the product of iterators inline, functional style.

Maybe a method like this:

pub trait Itertools: Iterator {
    ...
    fn cartesian_product<U: Iterator>(self, other: U) -> Product<Item, Self, U>

This would allow you to do much better inline operations, like:

oranges.iter().map(apple_for_orange).filter(is_granny_smith).cartesian_product(
    guests.iter().map(favorite_desert)
).map(more_nonsense).etc()

This shouldn't require higher-kinded types either.