subttle / regular Goto Github PK

Finite Automata and Regular Expressions for Regular Languages in Haskell

License: BSD 3-Clause "New" or "Revised" License

Haskell 100.00%

regular-languages finite-state-automata deterministic-finite-automata non-deterministic-finite-automaton regular-expressions regular-expression regexp dfa-to-regex haskell

regular's Introduction

regular (WIP)

Type-safe formalisms for representing Regular Languages (Deterministic Finite Automata, Nondeterministic Finite Automata, Regular Expressions, etc.) in Haskell.

Example

Here is a small example of what FizzBuzz looks like with DFA:

-- A DFA which accepts numbers (as a string of digits) only when
-- they are evenly divisible by 5.
by5 ∷ DFA Bool Fin₁₀
by5 = DFA δ q₀ f
  where
    -- Theorem: A number is divisible by 5 iff its last digit is 0 or 5.
    δ ∷ (Bool, Fin₁₀) → Bool
    δ (_, 0) = True
    δ (_, 5) = True
    δ _      = False
    q₀ ∷ Bool
    q₀ = False
    f  ∷ Set Bool
    f = singleton True

-- A DFA which accepts numbers (as a string of digits) only when
-- they are evenly divisible by 3.
-- The transition function effectively keeps a running total modulo three by
-- totaling the numeric value of its current state and the numeric value of the
-- incoming digit, performing the modulo, and then converting that value back to a state.
-- There is a bit of overhead complexity added by the fact that an extra state, `Left ()`,
-- is introduced only to avoid accepting the empty string.
by3 ∷ DFA (Either () Fin₃) Fin₁₀
by3 = DFA δ (Left ()) (singleton (Right 0))
  where
    -- Theorem: A number is divisible by 3 iff the sum of its digits is divisible by 3.
    δ ∷ (Either () Fin₃, Fin₁₀) → Either () Fin₃
    δ = Right . toEnum . (`mod` 3) . \(q, digit) → fromEnum (fromRight 0 q) + fromEnum digit

-- FizzBuzz using DFA
main ∷ IO ()
main = mapM_ (putStrLn . fizzbuzz . toDigits) [1 .. 100]
  where
    fizz ∷ [Fin₁₀] → Bool
    fizz = accepts  by3
    buzz ∷ [Fin₁₀] → Bool
    buzz = accepts                         by5
    fbzz ∷ [Fin₁₀] → Bool
    fbzz = accepts (by3 `DFA.intersection` by5)
    fizzbuzz ∷ [Fin₁₀] → String
    fizzbuzz n | fbzz n    = "FizzBuzz"
               | fizz n    = "Fizz"
               | buzz n    = "Buzz"
               | otherwise = n >>= show

Type Safety

Consider, for example, the GNFA q s type defined in src/GNFA.hs which is the type used to represent Generalized Nondeterministic Finite Automaton in this library.

data GNFA q s where
  -- δ : (Q \ {qᶠ}) × (Q \ {qᵢ}) → Regular Expression
  GNFA ∷ { delta ∷ (Either Init q, Either Final q) → RegExp s } → GNFA q s

Striving for correctness by construction, the type alone enforces many required invariants:

A GNFA must have only one transition between any two states
A GNFA must have no arcs coming into its start state
A GNFA must have no arcs leaving its final state
A GNFA must have only one start state and one accept state, and these cannot be the same state

That means, any time you have some GNFA (and we assume pure functional programming), those invariants hold, without any extra work or runtime checks.

Try

This project currently uses stack to build. Because the project currently depends on some older packages (algebra and containers-unicode-symbols), you may have to set the allow-newer: true option in ~/.stack/config.yaml before proceeding. Then you can try out some of the code in the REPL:

git clone https://github.com/subttle/regular
cd regular
stack build
stack ghci

λ> Examples.by5'
(0∣1∣2∣3∣4∣5∣6∣7∣8∣9)★·(0∣5)
λ> by5' `matches` [1,2,3,4,5]
True
λ>

WIP status

For now it should be clear that I value correctness and simplicity over speed. Some of the code here is experimental and some is not yet structured properly, so expect major refactoring and restructuring. Once I have everything correct I can start to worry about speed. For now this code is slow.

I'm patiently (and gratefully!) waiting on a few things from some of the best projects out there right now:

Labelled graphs in alga
Linear types in Haskell
Better dependent type support in Haskell

I haven't proven all class instances to be lawful yet.

regular's People

Contributors

Stargazers

Watchers

Forkers

stjordanis

regular's Issues

Diagrams for Drawing Automata

Here is an example under the Haskell Diagrams library which shows a nice way said library could be used for drawing automata: https://diagrams.github.io/doc/quickstart.html#diagrams-as-a-monoid

Please, publish to hackage

Hi. Thanks for good lib. Please, publish it to Hackage, I want to use it

Restructuring

I need to restructure the project to separate the definitions of the data types and the code which uses those data types so that I do not create import cycles when trying to write conversions for each type, for example.

RE and ERE axiomization

For now what I've convinced myself I want is Kozen axioms [1] for RE and Salomaa axioms [2] for ERE.

[1] A Completeness Theorem for Kleene Algebras and the Algebra of Regular Events, Dexter Kozen
https://www.cs.cornell.edu/~kozen/Papers/ka.pdf

[2] Two Complete Axiom Systems for the Extended Language of Regular Expressions, Salomaa and Tixier (01687431.pdf)
Available free through https://ieeexplore.ieee.org/Xplore/home.jsp ("IEEE Transactions on Computers" C-17) with some school proxies.

Transition Monoid for DFA?

Is there a good way to represent the transition semigroup/monoid for DFA? I can think of a way to put all the induced functions q → q into a list for a given DFA q s but I'm not sure that would be helpful just yet.

This could come in handy for that later:

transition ∷ (Finite q, Finite s) ⇒ DFA q s → s → (q → q)
transition (DFA δ _ _) σ = \q → δ (q, σ)

transitions ∷ (Finite q, Finite s) ⇒ DFA q s → [s] → (q → q)
transitions m w = \q → delta' m (q, w)

Interactions with non-regular language formalisms

There are some cool ways of using some regular language formalisms in combination with other (non regular) language formalisms.
For example, Push Down Automata (PDA):

-- Take a DFA, m, and convert it to an PDA, p, such that ℒ(m) = ℒ(p)
toPDA ∷ DFA q s → PDA.PDA q () s
toPDA (DFA δ q₀ f) = PDA.PDA { PDA.delta = δₚ
                             , PDA.q0    = q₀
                             , PDA.fs    = f
                             , PDA.z0    = ()
                             } where δₚ (q, Just  σ, _) = singleton (δ (q, σ), [()])
                                     δₚ (_, Nothing, _) = (∅)

-- http://infolab.stanford.edu/~ullman/ialc/spr10/slides/cfl5.pdf 34
-- Stanford Automata, 4 - 3 - 15. Decision and closure properties for CFL-'s (35 min.).mp4 @ 34:00
-- Intuitively this runs a PDA and DFA in parallel and accepts a word if both accept individually.
toPDAIntersection ∷ ∀ q p s z . (Ord p, Ord q, Ord z) ⇒ DFA q s → PDA.PDA p z s → PDA.PDA (q, p) z s
toPDAIntersection (DFA δ q₀ f) (PDA.PDA δₚ p₀ z₀ fₚ) = PDA.PDA { PDA.delta = δ₁
                                                               , PDA.q0    = (q₀, p₀)
                                                               , PDA.z0    = z₀
                                                               -- The final states are (q, p) such that q ∈ f and p ∈ fₚ
                                                               , PDA.fs    = f × fₚ
                                                               } where δ₁ ∷ ((q, p), Maybe s, z) → Set ((q, p), [z])
                                                                       δ₁ ((q, p), Nothing, z) = Set.map (\(p', zs) → ((q,        p'), zs)) (δₚ (p, Nothing, z))
                                                                       δ₁ ((q, p), Just  σ, z) = Set.map (\(p', zs) → ((δ (q, σ), p'), zs)) (δₚ (p, Just  σ, z))

A decision on how to organize this should be made, ideally CFL/PDA for example would be in another repo. Also there is a useful paper[1] for that particular idea.

[1] Context-Free Languages, Coalgebraically
Joost Winter, Marcello M. Bonsangue, and Jan Rutten

Changes to Config.hs

Ideally this class should require only toGraph, initial, and final functions.

Contravariant hierarchy

I think it makes more sense to have a more precise hierarchy for contravariant functors.

Separate divide and conquer into two classes?
Seperate lose and choose into two classes?
Add class to handle These?
What about analog for Selective functors?

I need to figure out some laws that should give better guidance. For now, here is some spit-balling. All names here are subject to change.

class (Contravariant f) ⇒ Losable f where
  emptied ∷ f Void
  emptied' ∷ (Decidable f) ⇒ f Void
  emptied' = lost
  -- `covacuous`?
  empty ∷ f a → f Void
  empty = contramap absurd

class (Decidable f) ⇒ RenameMe f where
  renameme ∷ (a → These b c) → f b → f c → f a

renamed ∷ (RenameMe f) ⇒ f b → f c → f (These b c)
renamed = renameme id

renamed' ∷ (RenameMe f) ⇒ f a → f a → f a
renamed' = renameme (\s → These s s)

Non-Empty set from Contravariant.Adjuction use cases

Writing this idea down before I forget:
Data.Functor.Contravariant.Adjunction has an instance for Predicate Predicate, for which the unit (and also counit) which can be thought of, IIRC, as giving all the possible non-empty "sets" with respect to some unit element. I wonder if there is a way to use that in this library, for example in the synchronizing function below, I have a need for non empty sets. Would it make sense to use the aformention instance here or even perhaps other places?

-- Theorem (Cerny, 1964): A DFA M is (directable) synchronizing iff ∀q ∈ Q, ∃p ∈ Q, ∃w ∈ Σ★: δ(q, w) = δ(p, w)
-- That is, there exists a word w, such that evaluation of w from from any state, q, always ends up in the same state, p.
-- "A DFA is synchronizing if there exists a word that sends all states of the automaton to the same state." - https://arxiv.org/abs/1507.06070
synchronizing ∷ (Finite q, Finite s) ⇒                  DFA q s → Bool
synchronizing = not . isZero . power
          where power ∷ (Finite q) ⇒ DFA q s → DFA (Set q) s -- FIXME supposed to be a non-empty set -- TODO alter this to check for shortest path to get shortest reset word?
                power m@(DFA δ _ _) = DFA { delta = \(states, σ) → map (\q → δ (q, σ)) states
                                          , q0    = qs m
                                          , fs    = map singleton (qs m)
                                          }

The other place I think it could potentially be useful is in the next variant of the co-inductive automata, PA/NA. Will have to look into it.

Explicit testing of generated sequences which can be checked using external resource

There are many numerical sequences generated throughout this library, it would be nice to organize the testing of said sequences (in a dedicated scope label, perhaps Sequences.* or something) against some trusted third party sources (e.g. OEIS®)
For example:

-- Fibonacci numbers (as a non-empty list)
-- http://oeis.org/A000045
fibonacci ∷ NonEmpty ℕ
fibonacci = fix ((⊲) 0 . NE.scanl (+) 1)

Could easily be checked to have the same initial segment that is listed under OEIS®:

testFibInit ∷ Test ()
testFibInit = expect (((==) `on` NE.take 40) expected fibonacci)
  where
    -- http://oeis.org/A000045
    expected ∷ NonEmpty ℕ
    expected = 0
       NE.:| [ 1
             , 1
             , 2
             , 3
             , 5
             , 8
             , 13
             , 21
             , 34
             , 55
             , 89
             , 144
             , 233
             , 377
             , 610
             , 987
             , 1597
             , 2584
             , 4181
             , 6765
             , 10946
             , 17711
             , 28657
             , 46368
             , 75025
             , 121393
             , 196418
             , 317811
             , 514229
             , 832040
             , 1346269
             , 2178309
             , 3524578
             , 5702887
             , 9227465
             , 14930352
             , 24157817
             , 39088169
             , 63245986
             , 102334155
             ]

And have a scoped test added to the main run suite, e.g.:

              , scope "Sequences.Fib"        testFibInit

Not exactly a substitute for a proof of correctness but then again that is the nature of testing :D

Benchmarking

I am considering use of criterion for benchmarking a few things. The tutorial makes it seem easy enough and hopefully even fun. :)

Testing Suite

Easytest is available on https://www.stackage.org/lts but it has been a while since I have compared different testing suites, so it wouldn't hurt to look again before committing to one. That said, I have a lot of trust in Paul Chuisano to make great code, so I will most likely go with Easytest unless I discover there is something specifically I need which it is missing.

Agda RE proofs

It may be useful to add some of the Agda code directly to this project (perhaps changing the folder structure a bit).

So far the "structural" part of Regular Expressions is mostly if not all complete (I just finished making this compatible with the latest agda-stdlib 0.17).

It might be worthwhile to create the ERE equivalent of Structural.agda.

But the Kleene Algebra part needs to be done from scratch.

If it is worth keeping separate then just close this ticket.

Divisible/Decidable instances for DA

I can write code which type checks for these instances (see below) but I need to explore what the correlations are and what the interpretation should be to operations on regular languages. For example, contramap is quite close to the definition of inverse homomorphism of a regular language (a constructive proof is typically given for DFA), except that the morphism function is usually given in textbooks as (s → [g]) or sometimes equivalently ([s] → [g]). I'm not sure it's okay to say contramap would suffice for invhom despite being polymorphic because morphisms which "erase" might be troublesome. Even if we let the co-domain of h be some finite list type for example, I think it would still need a way to concat, or perhaps it would not matter, I'll have to think more about that.

instance Contravariant (DA q) where
    contramap ∷ (s → g) → DA q g → DA q s
    contramap h m@(DA _ t) = m { transition = \q → t q . h }

-- some ideas to consider (non-exhaustive):
-- https://en.wikipedia.org/wiki/Krohn%E2%80%93Rhodes_theory
-- https://liacs.leidenuniv.nl/~hoogeboomhj/second/secondcourse.pdf
-- https://is.muni.cz/th/jz845/bachelor-thesis.pdf
-- https://drona.csa.iisc.ac.in/~deepakd/atc-2015/algebraic-automata.pdf
-- http://www.cs.nott.ac.uk/~psarb2/MPC/FactorGraphsFailureFunctionsBiTrees.pdf
-- https://cstheory.stackexchange.com/questions/40920/generalisation-of-the-statement-that-a-monoid-recognizes-language-iff-syntactic
instance Divisible (DA q) where
    divide ∷ (s → (g₁, g₂)) → DA q g₁ → DA q g₂ → DA q s
    divide f (DA o₁ t₁) (DA o₂ t₂) = DA { output     = undefined  -- \q → o₁ q ∧ o₂ q -- or even something way more complicated!
                                        , transition = undefined --  \q s → case f s of (b, c) → t₂ (t₁ q b) c  -- remember that the types also allow composition in the other direction too!
                                        -- , transition = \q s → uncurry (t₂ . t₁ q) (f s)
                                        --, transition = \q → uncurry (t₂ . t₁ q) . f
                                        }

    conquer ∷ DA q a
    conquer = DA { output     = const True
                 , transition = const
                 }

instance Decidable (DA q) where
    lose ∷ (a → Void) → DA q a
    lose _ = empty

    choose ∷ (s → Either g₁ g₂) → DA q g₁ → DA q g₂ → DA q s
    choose f (DA o₁ t₁) (DA o₂ t₂) = DA { output     = undefined -- \q → o₁ q ∨ o₂ q
                                        , transition = undefined -- \q → either (t₁ q) (t₂ q) . f
                                        }

May also want to consider making a data type for semi automata in case there are multiple good interpretations each with different output definitions. But that is just speculation for now.