mrkkrp / parser-combinators Goto Github PK

Lightweight package providing commonly useful parser combinators

License: Other

Haskell 100.00%

parser-combinators's Introduction

Parser combinators

The package provides common parser combinators defined in terms of Applicative and Alternative without any dependencies but base. There are also more efficient versions of the combinators defined in terms of Monad and MonadPlus.

Contribution

Issues, bugs, and questions may be reported in the GitHub issue tracker for this project.

Pull requests are also welcome.

License

Distributed under BSD 3 clause license.

parser-combinators's People

Contributors

Stargazers

Watchers

Forkers

ylh int-index kquick treeowl tweag kindaro faucct leifmetcalf acerempel strake recursion-ninja turion advancedalpaka t-ray123 luc-saccoccio

parser-combinators's Issues

many clashes with Control.Applicative.many

There's Control.Monad.Combinators.many defined in this library and Control.Applicative.many:

λ> :m Control.Applicative Control.Monad.Combinators
λ> :i many
Control.Monad.Combinators.many ::
  GHC.Base.MonadPlus m => m a -> m [a]
        -- Defined in ‘Control.Monad.Combinators’

class Applicative f => Alternative (f :: * -> *) where
  ...
  Control.Applicative.many :: f a -> f [a]
        -- Defined in ‘GHC.Base’

When importing unqualified, say, Control.Applicative and Text.Megaparsec (which re-exports from parser-combinators), many becomes ambiguous.

Can Control.Monad.Combinators.many be made simply a reexport of Control.Applicative.many, or is there a difference so that it needs to be a different function?

A partially-matching operator breaks makeExprParser

I'm trying to use makeExprParser with Megaparsec. I found its behavior surprising when an operator matches the string partially.

Take this example:

parseTerm = dbg "term" $ some alphaNumChar

op1 = space *> "+" *> space
op2 = "+"
op3 = "." *> "+" *> space

parseExpr = dbg "expr" $ makeExprParser parseTerm [
        [ InfixL ((++) <$ dbg "plus" op1) ]
        ]

parseMain str = parse (space *> parseExpr <* space <* (optional ".") <* eof) "" $ fromString str

When you use op1, the string "o + a" parses correctly, but the string "o + a " doesn't.
When you use op2, both the strings "o+a" and "o+a " work correctly.
When you use op3, "o.+ a ." works, but "o.+ a." doesn't.

I believe that what's happening in the cases that fail, is that the operator's first sub-combinator matches, and somehow that breaks everything in case the operator doesn't fully match.

Permutation parsing is more lenient than I expect

I'm attempting to parse permutations of flags. The behavior I want is "one or more flags in any order, without repetition".

The code I have is outputting what I want, but is too lenient on inputs. I don't understand why it's accepting multiples of the same flags. What am I doing wrong here?

pFlags :: Parser [Flag]
pFlags = runPermutation $ f <$> 
    toPermutation (optional (GroupFlag <$ char '\'')) <*> 
    toPermutation (optional (LeftJustifyFlag <$ char '-'))
    where f a b = catMaybes [a, b]

Example inputs / outputs:

"'-" = [GroupFlag, LeftJustifyFlag] -- CORRECT
"-'" = [LeftJustifyFlag, GroupFlag] -- CORRECT
"''''-" = [GroupFlag, LeftJustifyFlag] -- INCORRECT, should fail if there's more than one of the same flag.

Make it possible for operators in makeExprParser to fail conditionally by analyzing their operands

I'm running in to trouble with makeExprParser. I would like to conditionally return an error if something in an operator is wrong. I am making an assign operator, and the left side can only be an identifier. I'm finding that it's impossible to do any kind of error without fully traversing my Expr. I think it would be helpful if there was a version of makeExprParser that allows operators to return Either e a, and when Left e is returned, return that error if no alternative is found.

Like manyTIll but does not throw away the end?

Hi. Thank you for your great work on this project.

I notice manyTill will consume and throw away the result of
end.

λ parseTest @Void (someTill anySingle (single '$') <* eof) ("mika$" :: Text)
"mika"

Notice there is no $ sign in the output.

This is the relevant code from the latest Hackage release:

manyTill :: MonadPlus m => m a -> m end -> m [a]
manyTill p end = go id
  where
    go f = do
      done <- option False (re True end)  -- `re` for "replace", right?
      if done
        then return (f [])
        else do
          x <- p
          go (f . (x:))
{-# INLINE manyTill #-}

I would like there to be a function like this one, but preserving every bit of input, including
the end. Can we add it, under, say, the name manyTill_ ? Or is there a better way to achieve
what I want?

many and some for permutation parsers

Would be nice to have something like this. @recursion-ninja do you think this makes sense?

manyPerm :: Functor m => m a -> Permutation m [a]
manyPerm parser = go []
 where
  go acc = P (Just (reverse acc)) (go . (: acc) <$> parser)

somePerm :: Functor m => m a -> Permutation m [a]
somePerm parser = P Nothing (go . pure <$> parser)
 where
  go acc = P (Just (reverse acc)) (go . (: acc) <$> parser)

[Proposal] Add manyEndingWith

Hi! may you consider adding manyEndingWith (name subject to change)? The code would be like:

-- copy paste from manyTill_. It takes a parser p and a finalizer end. It returns the list of 
-- all parsed elements with p and(!) the element parsed with end
manyEndingWith :: MonadPlus m => m a -> m a -> m [a]
manyEndingWith p end = go id
  where
    go f = do
      done <- optional end
      case done of
        Just done' -> return $ f [done']
        Nothing -> do
          x <- p
          go (f . (x :))

This is particulary usefull when parsing the eof. For example in megaparsec this code will hang forever

-- This hangs forever. But I don't know why.
my_parser = (True <$ symbol ";") <|> (True <$ symbol "|") <|> (False <$ eof)
parse (many my_parser) "" ";|"
> hangs forever...

Ideally the last example could return Right [True, True, False], but I think it isn't possible with the current combinators. With the new combinator the above example could be rewritten as

my_parser = (True <$ symbol ";") <|> (True <$ symbol "|")
parse (manyEnding my_parser (False <$ eof)) "" ";|"
> Right [True, True, False]

I know manyTill_ exists but, It returns m ([a], end), forcing you to append end at the end of the list (if a ~ end), which is inefficient for linked lists.

count' is algorithmically inefficient

Looking at the definition of count' m n, it appears that if we successfully parsed m+k elements but then failed, there will also be n-k futile attempts:

count' :: Alternative m => Int -> Int -> m a -> m [a]
count' m' n' p = go m' n'
  where
    go !m !n
      | n <= 0 || m > n = pure []
      | m > 0           = (:) <$> p <*> go (m - 1) (n - 1)
      | otherwise       =
          let f t ts = maybe [] (:ts) t
          in f <$> optional p <*> go 0 (pred n)

For instance, if I have count' 50 150, and succesfully parse 51 times, then there will be 99 more attempts to parse. I think this combinator must be defined with MonadPlus for efficiency, to avoid these futile attempts.

Provide Control.Applicative.Combinators.Expr

The definitions in Control.Monad.Combinators.Expr are the only ones that don't exist as an Alternative version. Is it possible to create those as well?

[Proposal] Adding (back ?) chainl

Hi !

Is it possible to add the chainl, chainl1, chainr, chainr1 combinators ? Although Control.Monad.Combinators.Expr does exists, it feels overkill to use when there's only one layer to the chain; and I don't really understand why it was removed from megaparsec

The implementation would be pretty straightforward :

chainl1 parser op = parser >>= rest
  where rest x = do { f <- op; y <- parser; rest (f x y) } <|> return x

chainl parser op x = chainl1 parser op <|> return x

and so on...

Thanks in advance :) !

Permutation default field should be strict

Currently,

data Permutation m a = P (Maybe a) (m (Permutation m a))

It's actually impossible, given the exported functions, for the default field to be bottom or expensive to calculate, but fmap and <*> will each fill that field with a thunk. So it seems strictly better to write

data Permutation m a = P !(Maybe a) (m (Permutation m a))

I don't think it's possible to make the other field strict without changing the semantics if the monad is some weird lazy thing.

Have non-empty combinators return a NonEmpty result

When constructing the generalized parser combinators for the separate library, it would be nice for "non-empty" combinators to return NonEmpty typed values. Since Data.List.NonEmpty has been moved into base this provides much more type safe code.

Example combinators: endBy1, sepBy1, someTill, sepEndBy1, etc.

I have mixed feelings about some. It is probably more acceptable to re-export the Alternative instance and also some1 from Data.List.NonEmpty. In princliple some should be changed to return a NonEmpty type, but it is glued into a core type in the Haskell language so it's unlikely to be changed any time soon and the corrected function some1 from Data.List.NonEmpty is fine.

Hopefully providing the most "correct" interface should help the library adoption and reuse with other parsers.

Prioritize a parser over another (handling overlapping parsers)

Is it possible to prioritize parsers in the case two or more parsers "overlap"?

For example, if I have a parser that can parse any word, and a parser that may parse only the word "a".
Then if the string is "a word", and the any-word parser parses "a", it will make the other parser fail, (because "a" has been already consumed, and "word" is not valid for the a-word parser, but yes the other way around).

This is a simplified case, mine is a bit more complex, which I might be able to fix by changing the order in which I list the parsers in the applicative chain (i.e. f <$> p1 <*> p2 <*> ... <*> pn), but that is in a sense reverse engineering from whatever the implementation is under the hood. I'm fine if that ends up being the solution, but it would be nice having it at least documented.

Behaviour of someTill is confusing.

manyTill p end applies parser p zero or more times until parser end succeeds. ... someTill p end works similarly to manyTill p end, but p should succeed at least once.

(From the haddock)

Nothing in the above prepares me to this:

λ parseTest @Void (manyTill (anySingle) (single '!')) "any chars ! wrong chars !"
"any chars "
λ parseTest @Void (someTill (anySingle) (single '!')) "any chars ! wrong chars !"
"any chars "
λ parseTest @Void (manyTill (anySingle) (single '!')) "! wrong chars !"
""
λ parseTest @Void (someTill (anySingle) (single '!')) "! wrong chars !
"! wrong chars "

What I read is this:

someTill p end applies parser p zero or more times until parser end succeeds, but p should succeed at least once.
What I see is this:

someTill p end applies parser p one or more times until parser end succeeds.

Indeed, that is how it is defined:

someTill p end = liftM2 (:) p (manyTill p end)

I am not sure if it is not too late to change this behaviour, but I at least wish for it to be spelled out.

Control.Applicative.Permutations doesn't work with Applicative-only parsers

Control.Applicative.Permutations.runPermutation requires that the parser type be a Monad for performance. This is highly surprising when Control.Applicative.Combinators, in contrast, is willing to make some performance sacrifices in order to stay compatible with parsers with no Monad instance.

This isn't a purely theoretical concern. Unless I really need context-sensitivity, I like the Earley package because I don't need to work around the expressiveness limits in the parsec family of libraries. Because the Earley algorithm is restricted to context-free grammars, it doesn't provide a Monad instance for its Prod type. (It provides a Grammar type with a Monad instance, but don't let that distract. That type doesn't have an Alternative instance, as it's not actually a parsing type. It's only there as a tool for observable sharing.)

The original paper the module documentation links to provides an algorithm that doesn't need a Monad instance. It would be consistent with the way the other Applicative/Monad modules split if this module didn't require the Monad instance either. And it would make Control.Applicative.Permutations work with Earley.

The Applicative versions of sepEndBy and sepEndBy1 cause infinite loops with Earley

Main.hs:

import Text.Earley
import Control.Applicative.Combinators

main :: IO ()
main = print $ fullParses (parser grammar) "a;a"

grammar :: Grammar r (Prod r () Char String)
grammar = return $ sepEndBy (token 'a') (token ';')

sepEndBy-bugreport.cabal:

cabal-version:      3.4
name:               sepEndBy-bugreport
version:            0.1.0.0

executable sepEndBy-bugreport
    hs-source-dirs:   .
    main-is:          Main.hs

    build-depends:    base >=4 && <5,
                      Earley ==0.13.0.1,
                      parser-combinators ==1.3.0

This program loops forever. Replacing sepEndBy with sepBy makes it terminate. sepEndBy1 loops, sepBy1 does not. Note that the actual input to parse doesn't matter - the issue happens whenever sepEndBy or sepEndBy1 occurs within the parser.

After consulting with some other people, our best guess is that the problem is that generating an Earley parser requires exploring the entire parse tree, broken only by explicit recursion markers. The Prod type in Earley provides implementations of some and many which creates those explicit markers, so working in terms of them will succeed. sepBy and sepBy1 are implemented in terms of many, so they work with Earley. But sepEndBy and sepEndBy1 do not use some or many, and as a result they create infinite parse trees. At least, that's the guess.

[proposal] sepByN

Thanks for the library!

I was looking for something like sepBy but requiring an exact number of separated elements.

If it's not here yet, maybe it could be added:

sepByN :: Alternative m => m a -> m sep -> Int -> m [a]
sepByN x sep n = (:) <$> x <*> count (n-1) (sep *> x)

It can be used in this "semantic" way:

n & x `sepByN` sep