Code Monkey home page Code Monkey logo

niemaattarian / thompsons-construction-on-nfas Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 1.0 13 KB

I was asked to write a Python program to execute regular expressions on strings using an algorithm known as Thompson's construction. In this, I have to build a non-deterministic finite automaton (NFA) from a regular expression, and can use this to check if the regular expression matches any given string text

Python 100.00%
regular-expression postfix-expressions nfa thompsons-construction

thompsons-construction-on-nfas's Introduction

Niema Attarian

[email protected]

OVERVIEW I created this GitHub repository for my Graph Theory project 2019. For my project, I was asked to write a Python program to execute regular expressions on strings using an algorithm known as Thompson's construction. In this, I have to build a non-deterministic finite automaton (NFA) from a regular expression, and can use this to check if the regular expression matches any given string text.

This project can be cloned by navigating to the clone button on GitHub. From here, you can 'cd' into the project and run the project via that 'python Project.py' command in the command window.

SHUNTING YARD ALGORITHM

The purpose of the Shunting Yard Algorithm is to parse expressions from infix notations to postfix expressions or notation strings using stacks.

Firstly, special characters such as '*', '.', '?', '+', '|' are added to a dictionary and assigned a value which gives them an order of precedence. Empty 'pofix' and 'stack' strings are initialized. The stack string is where the operators are push in and out and the pofix string is where the result stored.

The 'specials' dictionary is then passed through a loop containing a series of statements that will in turn determine the output for the postfix expression.

The expression evaluation begins at the far left. If an opening bracket '(' is encountered, it is pushed to the stack.

NOTE: Remember that the stack is Last In First Out (LIFO).

If the following encountered is a closing bracket ')' then we look at the stack. We look at the character at the top of the stack that isn't the opening bracket and begin pushing the characters of the stack to the postfix expression. We do this for characters up to but not including the last character as it is the closing bracket. Once pushed to the postfix regular expression we remove the character.

If the character is in the special dictionary, we want to push the following from the stack to the postfix regular expression.

If it isn't an open bracket, closing bracket or a special character, we append what has already been read into the postfix regular expression.

We double check at the end of the function that everything in the stack has been pushed.

Thompson's Construction Thompson's Construction is a method in which we transform a regular expression into a (NFA) non-deterministic finite automaton. This can further be used to match strings against the expression. In this algorithm we created methods which will help execute what we intend to do.

#Compile Method In this method we initially create an NFA stack which is empty. This will contain instances of the NFA class already created. Firstly, we loop through the postfix regular expression one character at a time. We created an instance of the NFA class that has an accept state and an initial state. We join these via arrows labelled by the character. We join the new initial state to nfa's initial state and new accept state and then join old accept state to the new accept state and nfa's initial state Finally we append the new NFA to the NFA stack. We continue to do this with each of our characters in the dictionary.

When you run this, the user is shown the memory location of the NFA object.

Matching Regular Expressions

Followes Function This helper function returns the set of states that can be reached from the state following 'e' arrows. We do this by creating a new set with state as its only member. We check if the state has arrows labeled e from it and we do this by looking for states where the label is 'None'. If the states are labelled 'None' then we use nested if statement to follow the edges, 1 and 2.

Match Function This function takes the regular expression, changes it to infix and compiles the postfix regular expression to an NFA. We then create empty sets, 'current' and 'nexts' and loop through each character in the string and inside that loop we loop through the current set of states that we have. We check if the state is labelled 's', if so, we add it into the 'next' set. That next state has now been set as a current state and the next state is cleared and is now a empty set. We then check if the accept state is in the set of current states

Finally, we have a nested for loop which checks if the regular expression matches the string giving out a 'true' or 'false' output.

Research In doing this project, I had no previous experience with Python so I knew that my research must be in depth. I completed this project over the course of a few weeks we continual practise of Python.

What I found most helpful were the videos uploaded to LearnOnline. My approach was to first understand the Algorithms on paper before coding. This would give me an understanding of the structure of the code and not seem clueless as to method of what is going on. Once I coded the algorithms, I went back to the on paper explanation to compare and walk-through step by step, side by side to try and get a full understanding of the code.

Furthermore, I researched online about each algorithm and found Wikipedia tobe very useful with it's diagrams and examples.

Reference links: https://en.wikipedia.org/wiki/Thompson%27s_construction#Rules

https://en.wikipedia.org/wiki/Shunting-yard_algorithm#The_algorithm_in_detail

https://en.wikipedia.org/wiki/Shunting-yard_algorithm#Detailed_example

http://www.oxfordmathcenter.com/drupal7/node/628

https://swtch.com/~rsc/regexp/regexp1.html

https://web.microsoftstream.com/video/5e2a482a-b1c9-48a3-b183-19eb8362abc9

https://web.microsoftstream.com/video/cfc9f4a2-d34f-4cde-afba-063797493a90

https://web.microsoftstream.com/video/6b4ba6a4-01b7-4bde-8f85-b4b96abc902a

https://web.microsoftstream.com/video/946a7826-e536-4295-b050-857975162e6c

https://web.microsoftstream.com/video/1b3e7f4f-69e0-4316-853f-c63b14f9c36a

thompsons-construction-on-nfas's People

Contributors

niemaattarian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

emito-k

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.