gabriella439 / pipes Goto Github PK

Compositional pipelines

License: BSD 3-Clause "New" or "Revised" License

Haskell 99.44% Nix 0.56%

pipes's Introduction

`pipes`

pipes is a clean and powerful stream processing library that lets you build and connect reusable streaming components.

Quick start

Install the Haskell Platform
cabal install pipes

Then fire up ghci:

$ ghci
Prelude> import Pipes
Prelude Pipes> import qualified Pipes.Prelude as P

... and echo standard input to standard output until you enter quit.

Prelude Pipes P> runEffect $ P.stdinLn >-> P.takeWhile (/= "quit") >-> P.stdoutLn
Test[Enter]
Test
Apple[Enter]
Apple
quit[Enter]
Prelude P> -- Done!

To learn more, read the pipes tutorial.

Features

Concise API: Use simple commands like for, (>->), await, and yield
Blazing fast: Implementation tuned for speed, including shortcut fusion
Lightweight Dependency: pipes is small and compiles very rapidly, including dependencies
Elegant semantics: Use practical category theory
ListT: Correct implementation of ListT that interconverts with pipes
Bidirectionality: Implement duplex channels
Extensive Documentation: Second to none!

Philosophy

The pipes library emphasizes the following three design precepts:

Emphasize elegance - Elegant libraries replace inelegant libraries
Theory everywhere - Principled design promotes intuitive behavior
Minimize dependencies - Small dependency profiles maximize portability

Outline

The core pipes ecosystem consists of the following four libraries:

pipes: The elegant, theoretically-principled core.
pipes-concurrency: Message passing and reactive programming
pipes-parse: Utilities for parsing streams
pipes-safe: Resource management and exception safety

These represent the core areas that I envision for pipes. The latter three libraries represent the more pragmatic face of the ecosystem and make design tradeoffs where there is no known elegant solution.

Derived libraries that build on top of these include:

pipes-network and pipes-network-tls: Networking support
pipes-attoparsec: High-efficiency streaming parsing
pipes-zlib: Compression and decompression
pipes-binary: Streaming serialization and deserialization
pipes-aeson: Streaming JSON encoding and decoding

Development Status

pipes is stable, and current work focuses on packaging pipes for various package managers. The long term goal is to get pipes into the Haskell platform and become the basic building block for streaming APIs.

Community Resources

How to contribute

Contribute code
Build derived libraries
Write pipes tutorials

License (BSD 3-clause)

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of Gabriella Gonzalez nor the names of other contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

pipes's People

Contributors

Stargazers

Watchers

Forkers

dag pcapriotti paul-r-ml dougburke franklinchen ozelis benarmston davorak k0001 bgamari danburton shimuuar m0a0t0 archblob merijn iainnicol bens klao paddymahoney quchen hiratara buybackoff jdnavarro phischu rpglover64 timjb fmapfmapfmap manykey alang9 danidiaz gregoryschwartz bergmark mitchellwrosen herberteuler bosu rizo iblech hvr adamwespiser silky heather cocreature sid-kap treeowl sjakobi fosskers michaelt eamsden ndcroos ivan-timokhin ryantrinkle hardentoo gwils rubenpieters zoranbosnjak haskell-vanguard chris-martin milton-lopez duairc doytsujin felixonmars zarazek stjordanis gdevanla azurecloudmonk youms56

pipes's Issues

pipe composition types aren't general enough!

I think that the operator

+> should have type (Pipe a b m r1)-> (Pipe b c m r2) -> (Pipe a c m r2)
and likewise for the other compose operations. Otherwise the pipe types are too rigid I think... and many "obvious" programs become more complicated. Is there a way of modifying pipes to support this?

(i'm currently using the github 1.1 version, but I think this problem applies to both version uniformly)

Fully link tutorial haddock documentation

The haddocks and the tutorial code segments should have hyperlinked identifiers like the haddocks in the "Pipes" module. This requires transitioning code from this haddock code block style:

> example = foo >>= bar

... to this style:

@
'example' = 'foo' '>>=' 'bar'
@

This is not only more visually pleasing, but allows users to click on linked identifiers to navigate to their documentation.

Behavior of stdin, stdout, readHandle and toHandle

I think there are some issues with the default behavior of stdin, stdout, readHandle and toHandle, from the Pipes.Prelude module. I'll mention them here so that we can discuss them.

readHandle and stdin read an entire line at once, but the produced line doesn't keep the '\n' byte. I don't think this is a default sensible behavior given the names of these functions, and the fact that readHandle and stdio from Pipes.ByteString behave differently. Perhaps it should be better to rename Pipes.Prelude.readHandle to Pipes.Prelude.hGetLine and Pipes.Prelude.stdin to Pipes.Prelude.getLine, since they behave exactly as the equally named functions in System.IO.

If we perform such renaming, then perhaps we don't need to provide functions that behave similarly to Pipes.ByteString.stdin and Pipes.ByteString.readHandle, since those would be quite inefficient anyway, and Pipes.Prelude.getLine and Pipes.Prelude.hGetLine should be enough for simple purposes.

In any case, even if I suggest renaming this functions, I would prefer if we just removed these two functions from Pipes.Prelude, since liftting System.IO.getLine and System.IO.hGetLine as Effects should be more than enough.
toHandle and stdout share similar problems to the ones mentioned above with regards to line endings. Perhaps renaming them to hPutStrLn and putStrLn, or just dropping them all together, is better. However, …
There is a subtle difference between the expected behavior when writing to standard output and when writing to other Handles. Currently, toHandle swallows IOError{ioe_type=ResourceVanished} exceptions that might happen when trying to write to a Handle, gracefully stopping the Consumer by returning (). However, I think such behavior is appropriate only when the Handle is standard output. Consider a command-line program invocation like the following:

foo | head -3

The head program will consume 3 lines of input from foo's output and die, and a SIGPIPE signal is sent to the foo program. The default behavior in this scenario would be for foo to terminate. However, if one decides to ignore the SIGPIPE signal (which I think is the case for GHC, but I'm not sure), then the next time foo tries to write to standard output it will fail with an exception that matches the following pattern, meaning that we got an EPIPE error:
```
IOError { ioe_type = ResourceVanished
        , ioe_errno = Just ioe }
  | Errno ioe == ePIPE -> ...
```
I think it is perfectly fine to swallow such exception and gracefully end a Consumer that writes to standard output in that scenario, because:
- If we are getting that exception it means that the whole program is ignoring SIGPIPE, so we are supposed to handle EPIPE errors ourselves.
- Since getting an EPIPE error when writing to standard output means that the “downstream” program has stopped consuming, and that we should terminate our program as soon as possible (as witnessed by the default behavior of SIGPIPE, which is to terminate the program).
- If we don't swallow such exception, the user executing such program invocation in the command line will see an extraneous “resource vanished” message printed in the screen.
In any case, I think that the GHC RTS might have already solved this issue for us, since in some example code I wrote just now I couldn't manage to reproduce this scenario. I think this issue might be related: http://ghc.haskell.org/trac/ghc/ticket/2699

However, EPIPE errors might also happen when writing to other kinds of Handles that don't behave like standard output. For example, when writing to the Handle that underlies a TCP connection, you would receive the same kind of exception if the remote end closes its side of the connection, and it is OK to treat that scenario as extraordinary by letting the exception happen, instead of gracefully ending the consumer.

So, in summary: I think Pipes.Prelude.toHandle should propagate all exceptions, and Pipes.Prelude.stdout should ignore the EPIPE errors. This would give different behaviors between Pipes.Prelude.toHandle System.IO.stdout and Pipes.Prelude.stdout, but I think that's OK and can be communicated in the documentation. However, I still think we should remove these functions altogether if possible, or at least rename them to hPutStrLn and putStrLn as I mentioned before.

Notice all that I've mentioned here regarding EPIPE will also apply to the use of output Handles in Pipes.ByteString.

Add `mtl` instances for `ListT`

Since Proxy has mtl instances, it doesn't make any sense for ListT not to have them, too.

possible document typo

bracketOnAbort 
  Analogous to bracketOnAbort from Control.Exception

There is no bracketOnAbort in Base's Control.Exception. Do you mean bracketOnError

travis-ci

Got a fail on travis with pipes:

Builds fine local to me, but Travis is a 7.4 environment and I'm on 7.6. Any ideas?

src/Pipes/Internal.hs:149:5:

reader' is not a (visible) method of classMonadReader'

src/Pipes/Internal.hs:149:21:

Not in scope: `reader'

Perhaps you meant one of these:

read' (imported from Prelude),readIO' (imported from Prelude),

`readLn' (imported from Prelude)

src/Pipes/Internal.hs:154:5:

state' is not a (visible) method of classMonadState'

src/Pipes/Internal.hs:154:20: Not in scope: `state'

src/Pipes/Internal.hs:157:5:

writer' is not a (visible) method of classMonadWriter'

src/Pipes/Internal.hs:157:21: Not in scope: `writer'

Provide a way to conditionally compile the Haskell98 subset of pipes

If one removes the mmorph and mtl type classes from pipes the library is Haskell 98. It would be nice to have a cabal flag that one can set to conditionally compile the Haskell98 subset of the library for people who don't want to use any extensions.

Add `toList` to the "folds" section of `Pipes.Prelude`

Desired type signature:

toList :: (Monad m) => Producer a m r -> m [a]

Elements should be in the same order as the original Producer, not reversed. The most efficient way to do this from my experience is to use a difference list, not reverse.

Incorrect examples in pipes 3.0 tutorial

Hi,

I'm getting into pipes. Thank you for all the hard work, the way you've managed to reduce complexity over the previous version is very impressive.

There are a couple of issues I noticed in the tutorial:

the signature for lines' says:

lines' :: (Proxy p) => Handle -> () -> Producer p String IO r

However, I get the following error when trying to compile it:

test_pipes.hs:6:28:
    Could not deduce (r ~ ())
    from the context (Proxy p)
      bound by the type signature for
                 lines' :: Proxy p => Handle -> () -> Producer p String IO r
      at test_pipes.hs:(6,1)-(14,16)
      `r' is a rigid type variable bound by
          the type signature for
            lines' :: Proxy p => Handle -> () -> Producer p String IO r
          at test_pipes.hs:6:1
    Expected type: IdentityP p C () () String IO r
      Actual type: IdentityP p C () () String IO ()
    In the first argument of `runIdentityP', namely `loop'
    In the expression: runIdentityP loop

Replacing r with () makes it work.

Additionally, this code does not work:

withFile "test.txt" $ \h -> runProxy $ lines' h >-> printer

It should be:

withFile "test.txt" ReadMode $ \h -> runProxy $ lines' h >-> printer

`MFunctor` instance for `EitherT e`

In order to not have an orphan instance laying around, it may make more sense to declare the instance in pipes package than in the either package. However I'm assuming it's not worth adding an extra dependency just for that instance.

I believe I read somewhere you were thinking about having the MFunctor module in a different package. Would it make sense to have that new package incurring in extra dependencies in order to provide more instances?

Another solution would be to wrap EitherT in a newtype in every package where is needed or have a specific package just with the newtype, but I think EitherT is too common for both cases.

What are your thoughts on this?

FWIW: This is related to this SO question

Which library should I use, pipes or pipes-core?

Hello,

I love Pipes much more than other similar libraries because it's simple and embracing type polymorphism and lazy evaluation of Haskell.

BTW, there is another library using the same module name, Contorl.Pipe, you and @pcapriotti are working on.
https://github.com/pcapriotti/pipes-core

Are there any guideline which to choose? It seems pipes-core is the successor of pipes and the latter is going to obsolete. Or, are you intended to keep evolving this in different way?

Examples from documentation do not type check

Some of specialized type signatures for for don't actually type check:

for3 :: Monad m => Pipe   x b m r -> (b -> Effect       m ()) -> Consumer x   m r
for3 = for

Couldn't match type `()' with `Void'
Expected type: Pipe x b m r -> (b -> Effect m ()) -> Consumer x m r
  Actual type: Proxy () x () b m r
               -> (b -> Proxy () x () Void m ()) -> Proxy () x () Void m r
In the expression: for
In an equation for `for3': for3 = for

for4 :: Monad m => Pipe   x b m r -> (b -> Producer   c m ()) -> Pipe     x c m r
for4 = for

Couldn't match type `()' with `Void'
Expected type: Pipe x b m r
               -> (b -> Producer c m ()) -> Pipe x c m r
  Actual type: Proxy () x () b m r
               -> (b -> Proxy () x () c m ()) -> Proxy () x () c m r
In the expression: for
In an equation for `for4': for4 = for

Add `MMonad` instances

This is a note to myself to add MMonad instances to the base proxy types and extensions.

RequestT and ResponseT can be indexed monads

This isn't really a bug or feature request, just a thought. If the type (>>=) for RespondT and RequestT are unconstrained (especially the newtype wrappers), they are actually more general. This may be a interesting thing to explore. My first thought was whether you could write reset and shift, but it turns out you can only write the former (although it's still a fairly interesting operation). I have not explored anything else and probably will not do so.

>+> operator could not compose Producers

Here is example

import Control.Proxy
import Control.Proxy.Pipe ((>+>))

prod :: (Monad m,Proxy p) => Producer p () m r
prod = undefined

pipe :: (Monad m,Proxy p) => Pipe p () () m r
pipe = undefined

cons :: (Monad m,Proxy p) => Consumer p () m r
cons = undefined

All four expressions should type check but only two does:

*Main> :t pipe >+> pipe
pipe >+> pipe :: (Monad m, Proxy p) => Pipe p () () m r

*Main> :t pipe >+> cons
pipe >+> cons :: (Monad m, Proxy p) => Pipe p () C m r

*Main> :t prod >+> pipe
<interactive>:1:1:
    Couldn't match type `C' with `()'
    Expected type: Pipe p0 () () m0 r0
      Actual type: Producer p0 () m0 r0
    In the first argument of `(>+>)', namely `prod'
    In the expression: prod >+> pipe

*Main> :t prod >+> cons
<interactive>:1:1:
    Couldn't match type `C' with `()'
    Expected type: Pipe p0 () () m0 r0
      Actual type: Producer p0 () m0 r0
    In the first argument of `(>+>)', namely `prod'
    In the expression: prod >+> cons

incorrect example in documentation

I can't figure out where the error is, but the definitions from the documentation, printer seems to go wrong by continuing to request input when it should have stopped:

{-#LANGUAGE ScopedTypeVariables#-}

import Control.Pipe
import Control.Monad
import Control.Monad.Trans
import Control.Applicative


prompt :: Producer Int IO a
prompt = forever $ do
    lift $ putStrLn "Enter a number: "
    n <- read <$> lift getLine
    yield n

take' :: Int -> Pipe a a IO ()
take' n = do
    replicateM_ n $ do
        x <- await
        yield x
    lift $ putStrLn "You shall not pass!"

fromList :: (Monad m) => [a] -> Pipe Zero a m ()
fromList = mapM_ yield     

printer :: (Show a) => Pipe a Zero IO b
printer = forever $ do
    x <- await
    lift $ print x


pipeline :: Pipeline IO ()
pipeline = printer <+< take' 3 <+< fromList [1..]

badpipeline =  printer <+< take' 3 <+< prompt
right = runPipe pipeline
wrong = runPipe badpipeline

-- 
-- *Main> right
-- 1
-- 2
-- 3
-- You shall not pass!
-- *Main> wrong
-- Enter a number: 
-- 1
-- 1
-- Enter a number: 
-- 2
-- 2
-- Enter a number: 
-- 3
-- 3
-- You shall not pass!
-- Enter a number: 
-- 4
-- *Main>

Generalize type signatures

Is there a solution to the merge problem using pipes?

I would like to understand whether I can merge two streams in pipes.

The problem is apparently solved for Iteratees here
http://okmij.org/ftp/Haskell/Iteratee/Merge.hs

if it can't be solved in pipes, could you document that and any associated limitations?

If not, it would be great to have a function that merges two pipes using a comparator.

Tutorial little error ? fromList should be fromListS

https://github.com/Gabriel439/Haskell-Pipes-Library/blob/master/Control/Proxy/Tutorial.hs#L1271

Add `pipes` to Stackage

This is a reminder to myself to submit pipes to Stackage after releasing version 4.1.0.

Provide a fold function for the foldl package Fold

GHC wouldn't let me de-construct a Fold into a step begin and done in a where clause. It said it couldn't pattern match on the existential qualification

Synopsis:

aggregateHeadFold = (,) <$> fld <*> L.last
-- (L.Fold step begin done) =  (,) <$> fld <*> L.last   <--- GHC's head assploded here      
foldL :: Monad m => L.Fold t t1 -> Producer t m () -> m t1
foldL (L.Fold step begin done) = P.fold step begin done

I'm now calling foldL aggregateHeadFold and all is well again. Is it worth providing this?

Space leaks don't come easy

New or maybe same as in #15 space leak shows again. Here is test case. takeF is only to ensure termination. Program leaks space without it. NOINLINE on enumerateList is crucial without it GHC is able to eliminate leak somehow.

import Control.Frame
import Control.IMonad     (foreverR,(!>),(!>=),mapMR_)

takeF :: Monad m => Int -> Frame a m (M a) C ()
takeF 0 = close
takeF n = await !>= yield !> takeF (n - 1)

enumerate :: FilePath -> Frame () IO C C ()
enumerate _
  = ({-# SCC "#1" #-} finallyD) (print ())
  $ foreverR (yield ())

enumerateList :: [FilePath] -> Frame () IO (M ()) C ()
{-# NOINLINE enumerateList #-}
enumerateList xs = close !> mapMR_ enumerate xs

main :: IO ()
main = do
  runFrame $  foreverR await
          <-< takeF 100000
          <-< enumerateList [""]

Heap profiling points towards finallyD. Removing it in the original program plugged space leak

Adding maybeP, errorP

When working on pipes-attoparsec, I came across the need for these functions. I think they might be an interesting addition to the Pipes.Lift module:

-- Wrap the base monad in 'ErrorT'.
--
-- @
-- runErrorP . errorP = id
--
-- errorP . runErrorP = id
-- @
errorP -- or whatever its name might be!
  :: (Monad m, Error e)
  => Proxy a' a b' b m (Either e r)
  -> Proxy a' a b' b (ErrorT e m) r
errorP = lift . ErrorT . return <=< hoist lift


-- Wrap the base monad in 'MaybeT'.
--
-- @
-- runMaybeP . maybeP = id
--
-- maybeP . runMaybeP = id
-- @
maybeP -- or whatever its name might be!
  :: (Monad m)
  => Proxy a' a b' b m (Maybe r)
  -> Proxy a' a b' b (MaybeT m) r
maybeP = lift . MaybeT . return <=< hoist lift

You can also do similar things for the other base monads, but I find that promoting Maybe and Either values to MaybeT and ErrorT transformers is particularly useful, so I propose these two first. If you agree that these are useful, then we can add the others too.

Even while these functions are not hard to write by hand, perhaps it's not obvious how to do it. And since the Pipes.Lift module already exports many functions to deal with Proxy base monads, we could also export these there so that users don't go reinventing them.

In any case, errorP and maybeP can be abstracted using a third function wrapP, which seems it can be useful for building other monad transformers too:

wrapP -- or whatever its name might be!
  :: (Monad m, Monad (t m), MonadTrans t) 
  => (a ->            t m  b)
  -> Proxy x' x y' y    m  a
  -> Proxy x' x y' y (t m) b
wrapP f = lift . f <=< hoist lift

errorP = wrapP (ErrorT . return)
maybeP = wrapP (MaybeT . return)

hyperlink 'run' in @sections@

Forgot to hyperlink run in the @docs@.

I did hyperlink some of the tutorial. Have a look and can fix up what you don't think goes.

Quadratic slowdown with sequence, mapM, replicateM and co.

Summary

The Proxy monad has bad asymptotic behavior with monadic constructs that do not "properly lean to the right". In particular: sequence, mapM and replicateM are quadratic in the length of the lists they produce.

Demonstration

Consider the following two programs:

import Control.Monad (replicateM)
import Pipes

numbers :: Producer Int IO r
numbers = go 0
  where
    go k = yield k >> go (k+1)

main :: IO ()
main = do
  l <- runEffect $ numbers >-> replicateM 20000 await
  print $ last l

And:

import Control.Monad (mapM)
import Pipes

main :: IO ()
main = do
  l <- runEffect $ mapM (lift . return) [1..5000 :: Int]
  print $ last l

Both run in about a second and demonstrate clear quadratic slowdown when you change the corresponding number.

Analysis

We have discovered the issue because we have something like the first use-case in our production code. We have a pipe that parses messages from a byte stream, and one particular message type contains a list of elements. The lists are of nontrivial but bounded length: a few thousand elements each. The code looks something like this:

parseMessage :: Pipe ByteString Message m r
parseMessage = do
  t <- parseMessageTag
  parseMessageBasedOnTag t

...

parseThatOneParticularMessage = do
  header <- parseHeader
  numElts <- parseInt
  elts <- replicateM numElts parseElement
  return $ ThatOneParticularMessage header elts
    where
      parseElement = do
        a <- parseInt
        ...
        return $ Element a b c

When we introduced this message type we noticed the issue immediately because the whole thing stopped for a few seconds to parse one message with less than 10k elements.

The issue was quite tricky to debug, because the profiling output looked unhelpful: the cpu time is spent in parseOneParticularMessage in the replicateM line, but not within the parseElement function. Which seems impossible if you think about replicateM as a loop. The problem is, replicateM is not a loop, it's a recursive monadic expression, and in particular: one that does not "lean to the right". The standard implementation is equivalent to this:

replicateM :: Monad m => Int -> m a -> m [a]
replicateM 0 _ = return []
replicateM n op = do
  x <- op
  xs <- replicateM (n-1) op
  return (x:xs)

Once we have realized that this is the issue, we could work-around it with an accumulator based replicateM:

replicateM' :: Monad m => Int -> m a -> m [a]
replicateM' n op = go n []
  where
    go 0 acc = return $ reverse acc
    go n acc = do
      x <- op
      go (n-1) (x:acc)

And the run time dropped to a millisecond from a few seconds.

Source of the problem

So, why is this happening? Actually, this is the same issue why the Free monad has a quadratic slow down in similar cases. (If you change the runEffect in the second example to retract from Control.Monad.Free you'll see the same effect.) If the binds do not associate to the right you have to traverse the structure over and over again. Very similarly to lists and appends: (...(([1] ++ [2]) ++ [3]) ++ ...) ++ [n] is O(n^2), while [1] ++ ([2] ++ ... ([n-1] ++ [n])) is O(n).

I'd like to emphasize that this issue is not specific to pipes. Anything that is structurally similarl to Proxy and wants to be a monad will be the same. So, for example, Conduit has the same issue.

Evaluation

So, how serious is this issue? I'd like to argue that this is a pretty serious flaw.

It's easy to unknowingly step into this trap.
It's not obvious where the problem is: if our lists were a bit smaller we wouldn't have noticed the issue. But even with a list of hundred elements you pay a 100x overhead. And you notice it if you have a problem with an overall performance of your code, but then:
It's very tricky to debug. I only realized what the problem is, because I've read before about the same thing with Free and spent some time understanding it.
It can be worked-around, but only on a case-by-case basis. For example, I don't think it would be possible to change the implementation of replicateM and its brothers in base to the one above, as it has different laziness properties. And it's not restricted to these standard functions: this is the natural thing to do when you do recursion in monadic code.

Possible solution

I know of only one way of dealing with this in general. It's the solution to the "efficient free" monad, which you can see here: http://hackage.haskell.org/package/free-4.1/docs/Control-Monad-Free-Church.html or here: http://hackage.haskell.org/package/kan-extensions-3.7/docs/Control-Monad-Codensity.html . You need to change the representation of your basic data structure to "continuation passing" one.

It sounds scarier than it is, with all the fancy category theory stuff around it. Well, it did break my brain for a day or so, but you get used to it. And in the end it's surprisingly little work to change the code from one to the other. I have a prototype that shows that it works for a simplified Proxy. See: https://gist.github.com/klao/7237705. I will do it for the actual Proxy type and then update this bug with it.

The approach probably has its drawbacks, most notably we'll need to thoroughly check how it affects performance. (My little prototype looks promising in that too.) But, I think that even a little slow-down is acceptable for an asymptotic speed up. ;)

Acknowledgments

The bug was found by @errge and we also debugged it together.

Test suite

pipes needs a QuickCheck test suite for those who don't trust my equational reasoning skills. The ideal things to test are laws like category laws and functor laws which are documented throughout the code base although other miscellaneous tests would be useful, too.

This will require bumping the minimum cabal version to > = 1.9.2, but that's okay.

v4.0.0 tutorial failure ?

The latest v4.0.0 tutorial says that:

$easytobuild
Haskell pipes are simple to write. For example, here's the code for 'P.stdout':

import Control.Monad
import Pipes
stdout :: () -> Consumer String IO r
stdout () = forever $ do
    str <- request ()
    lift $ putStrLn str

But trying this using GHC 7.6.3 gives me:

Illegal polymorphic or qualified type: Consumer String IO r
Perhaps you intended to use -XRankNTypes or -XRank2Types
In the type signature for stdout':
stdout' :: () -> Consumer String IO r
Main.hs /haskell-test/src   line 70 Haskell Problem

Code:

module Main where

import           Control.Monad
import           Pipes
import qualified Pipes.Prelude      as P
import qualified System.IO as IO

stdin' :: () -> Producer String IO ()
stdin' () = loop
  where
    loop = do
        eof <- lift $ IO.hIsEOF IO.stdin
        unless eof $ do
            str <- lift getLine
            respond str
            loop

stdout' :: () -> Consumer String IO r
stdout' () = forever $ do
    str <- request ()
    lift $ putStrLn str


main :: IO ()
main = runEffect $ (stdin' >-> stdout') ()

Is this switch really necessary or did I make a mistake ?

Add mini-tutorial to Pipes.Lift

The main tutorial doesn't explain what Pipes.Lift is supposed to be used for, so this issue is to remind myself to add a short tutorial to Pipes.Lift explaining the motivation behind that module.

Frames leak memory

Here is minimal test case. Memory consumption steadily increases. Note that NOINLINE pragma is crucial. Without it enumerateGZFileList gets inlined and leak disappear

module Main(main) where
import Control.Frame
import Control.IMonad

enumerateGZFileList :: [FilePath] -> Frame () IO (M ()) C ()
enumerateGZFileList nms
  = close !> mapMR_ (const $ foreverR $ yield ()) nms
{-# NOINLINE enumerateGZFileList #-}

takeF :: Monad m => Int -> Frame a m (M a) C ()
takeF 0 = close
takeF n = await !>= yield !> takeF (n - 1)

main :: IO ()
main = 
  runFrame
    $   foreverR await
    <-< takeF 100000
    <-< enumerateGZFileList ["A"]

Migrate constructors to Control.Proxy.Internal

There's tension between whether, in the presence of smart constructors, the actual constructors in a library should be exposed or hidden, and the easy solution is to put then in an .Internal namespace so that the user is forced into awareness that code written in terms of the internals may break in the future (or right now, if typeclass laws can be broken by fiddling with constructors).

2del: Is State faster then recursion?

It just appeared to me that I should have posted this to the mailing list. Could someone please delete this issue?

Outdated cabal description

Description in .cabal file says
Lightweight Dependency: pipes depends only on transformers and compiles rapidly
But pipes-3.2.0 depends on mmorph too.

Port `pipes` tutorial to School of Haskell

After submitting pipes to Stackage I should port the pipes tutorial to School of Haskell so that people can play with it interactively.

Disable rewrite rules when running tests

The tests will trivially pass if the pipes rewrite rules fire. However, the whole point of the tests is to independently validate the rewrite rules, therefore they should be compiled with rewrite rules disabled.

Safe looping - keep tab on the lookahead in a Pipe?

Below, I'm using the <+> combinator that you implemented in Issue #9

I wonder if it would be possible to ensure that loops work in the type system.
Below are some examples of simple loops. Without the 'yield 0', it doesn't work of course, but in a larger system, keeping tabs of this when using <+> etc. can be difficult.

Is there a solution to this?

counter = runPipe $ printer <+< net
  where
    adder = yield 0 >> (forever $ do
        x <- await
        yield (x + 1))
    net = fromList [1..10] <+> (adder <+< net)

counter2 = runPipe $ printer <+< net
  where
    adder = yield 0 >> (forever $ do
        x <- await
        yield (x + 1))
    net = adder <+< net

More involved `pipes` examples in an `examples` directory`

Kai Wang requested this on the haskell-pipes mailing list:

One suggestion I have is to include concrete, complete and non-trivial example programs to compliment the tutorial.

attoparsec comes with a simple http parser. the example program did several things: 1) it showcases idiomatic usage, without which I would have to learn (<) and (>) the hard way. 2) it provides a comparison to alternatives. (the example program also includes implementations in C and parsec).

I think pipes should do similar thing. Example program doesn't have to be long or complicated. but it should demonstrates common usage patterns.

I'm just stashing this here to remind myself to include this later.

Possibly add `mtl` instances

I'm opening this issue to remind myself to consider adding mtl instances for the Proxy type. Any discussion is welcome.

Documentation wording: “blocked on respond/request”

I was reading the new documentation and I had to stop for a while to reason about the documentation for (>->). It says:

Compose two proxies blocked on a respond, creating a new proxy blocked on a respond

Perhaps I'm seeing this from the wrong perspective, but I interpret that sentence as the execution of the composed proxies “getting stuck” once they reach a respond action. However, what happens is quite the opposite; these proxies “get stuck” once they execute a request action, waiting for a respond from upstream.

¿Do you think that saying something like the following could be clearer?

… blocked waiting for a respond from upstream.

The opposite applies to (>~>).

Benchmarks/ reason for new perf claim?

I see that in the new 2.5 release some interesting performance claims are made. What has changed in the internals to enable this, and what are the associated benchmarks? Thanks!

Category axioms do not hold

The category axioms do not hold for either the Lazy or Strict category
instances, even if we identify observationally equal pipes. In particular,
composition is not associative:

runPipe $ lift (print 1) >+> yield () >+> lift (print 2)

prints 2, while

runPipe $ lift (print 1) >+> (yield () >+> lift (print 2))

prints 2 and 1.

Also, id is not a right identity for terminated pipes. For example, given

feed = yield () >> lift (print 0)
p1 = feed >+> return ()
p2 = feed >+> (idP >+> return ())

the pipe p2 prints 0 while p1 doesn't print anything.

I initially thought it was just a matter of reordering some clauses in the
definition of composition, but now I'm not so sure this could be fixed so
easily. In fact, x >+> Yield should always evaluate to Yield (as it does
currently), otherwise something of the form Yield >+> Await >+> Yield doesn't
associate. But then it seems that my first counterexample is unavoidable.

The issue with identity is less of a concern, since it's always possible to
introduce a formal identity (for example, by adding a Transform (a -> b)
alternative to the Pipe declaration).

Add `runSubPipe`

See:

https://www.fpcomplete.com/tutorial-preview/2402/1WKVyEkIP6

Basically, this function allows one to implement all the functions in Pipes.Lift (and other ones not yet anticipated) without using Pipes.Internal.

"Proxy" name clashes with Data.Proxy

This clash is unfortunate as the two types have nothing to do with each other, and it seems like Proxy might be included in base as part of the new-typeable work (or at least a type variable proxy, used like proxy a -> ...).

Since 4.0 seems to be breaking backwards compatibility and removing all the older forms of pipes, how about renaming Proxy to Pipe, to match the library name?

Pipes.Internal.X is not actually empty

It is inhabited by fix X, instead of just ⊥ like you'd expect. The correct way to define it without EmptyDataDecls would be newtype X = X X, but note that some older versions of GHC have trouble with this definition.

Since the void package solves these issues and has useful functions and instances, I propose simply depending on it and defining type X = Void (or using Void directly).

pipes -> void -> hashable -> text

I was just having trouble compiling pipes material together with a modified text; it seems that the void dependency entails a hashable dependency, which requires text (on ghc). My impression was that this was something you didn't want. I'm not sure why I hadn't bumped into this before.

Unknown symbol error

I upgraded to the latest pipes libraries but pipes-4.0.0 is no longer loading correctly. See ghci session below.

λ> import Pipes
λ> import qualified Pipes.Prelude as P
λ> runEffect $ P.stdinLn >-> P.takeWhile (/= "quit") >-> P.stdoutLn
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package array-0.4.0.1 ... linking ... done.
Loading package deepseq-1.3.0.1 ... linking ... done.
Loading package bytestring-0.10.0.2 ... linking ... done.
Loading package text-0.11.3.1 ... linking ... done.
Loading package mtl-2.1.2 ... linking ... done.
Loading package hashable-1.1.2.5 ... linking ... done.
Loading package containers-0.5.0.0 ... linking ... done.
Loading package mmorph-1.0.0 ... linking ... done.
Loading package nats-0.1 ... linking ... done.
Loading package semigroups-0.9.2 ... linking ... done.
Loading package void-0.6.1 ... linking ... done.
Loading package pipes-4.0.0 ... linking ... :
lookupSymbol failed in relocateSection (relocate external)
/Users/Chris/Library/Haskell/ghc-7.6.3/lib/pipes-4.0.0/lib/HSpipes-4.0.0.o: unknown symbol _foldlzm1zi0zi0_ControlziFoldl_genericLengthzuzdsgenericLength_closure' ghc: unable to load packagepipes-4.0.0'

newtype FreeT instead of data FreeT?

Replacing "data FreeT" with "newtype FreeT" in Control.Monad.Trans.Free probably improves efficiency.

Is there some reason to keep it as data instead of newtype? Is the whole thing getting replaced in the next version anyway?

Small bug in pipes 3.3.0 turorial

I think

rangeS n1 n2 = RespondT (enumFromS n1 n2 ())

Should be

rangeS n1 n2 = RespondT (enumFromToS n1 n2 ())

Will send a proper pull request later.

Switch `runStateP` and friends to put arguments after the proxy

Right now, runStateP, execStateP, evalStateP, runWriterP, and execWriterP differ from their transformers counterparts by taking all arguments before the body of the monad to run.

The original rationale for doing this was back when everything was a Kleisli arrow in the pipes-3.* series, and they made it possible to write code like this:

(runStateP someState) . someProxy

If the arguments went after it would have required the following even more unintelligible code:

(`runStateP` someState) . someProxy

However, now that the unidirectional API does not use Kleisli arrows it may be worth switching them back to place the arguments afterwards because it is much cleaner now:

-- What things would look like after the change
runStateP someProxy someState

This would make the API more consistent with transformers.

The other reason I would like to do this is because of issue #72, which adds support for the constructor analogs of these run functions (i.e. maybeP / stateP). If I modified runStateP to take the arguments afterwords, then I could treat the constructor functions and run functions as simple isomorphisms:

runStateP . stateP = id
stateP . runStateP = id

I'll probably make the switch in a couple of days but I wanted to invite discussion on this first.

Debianize `pipes`

This is a reminder to myself to create a Debian package for pipes. If anybody has experience with this, I'd appreciate any guidance.

Benchmark suite

pipes needs a few pure Criterion benchmarks that exercise the library so that the pragmas and function definitions can be more easily tuned for speed. This should preferably be integrated with cabal bench to make it easy for anybody to install and contribute performance enhancements to pipes.