The Python Automated Marker (PAM)

PAM provides a way to automatically test Python hangman games. This was created to enable a Python course that does not have time to mark a test at the end of the course to use this for automatic marking.

Student instructions

Your assignment is to code, in Python 3, a game of hangman. Specifically the program should choose a word from a predefined bank of words (word_list.txt is provided in this repository), and display to the user how many letters the word is (i.e. " ******** ". The user should then be able to guess one letter at a time, with the program either taking a "life" from them if it is an incorrect guess, or showing that the letter is correct and where it appears in the word (i.e. aa). The game ends when the user runs out of guesses (7 incorrect guesses) or the player has filled in all the letters of the word. The code should be submitted as a file (Please name it in the convention "Surname_Initial_Hangman.py"). We expect completion of the project for a beginner to take around 10-20 hours. The solution can easily found on google - PLEASE try to figure it out on your own - even if that just includes sketching the logic of creating the solution, then looking for help later - you will be doing yourself a lot of favours in the long-run!

We require that the program you write does some things in a very specific way so please follow the following bullet points to the letter otherwise your program will be automatically failed.

The program must run in Python3 without error.
The script must ONLY stop to either ask for the next guess or because the game has been won or lost (i.e. no menus or other user input).
Asking the user to make her next guess must use the following text (case sensitive and a space after the colon necessary (I think)): Please enter your next guess:
The text printed before Please enter your next guess: must END in the word to be guessed with the unknown letters starred out (i.e. hello would start as ***** and change into e** after ‘e’ was guessed). The string must not contain any other stars. The program must print either congratulations you win or you lose on exit (not case sensitive).
When a game is played, a word must be picked randomly (from a uniform distribution) from the word_list.txt file provided in this repository.
The word_list.txt file must be stored locally and when you load the file you don't include the path (i.e. open('word_file.txt', ...)).
If the user makes 7 wrong guesses then they lose the game.

Running automatic tests

There are two ways to test students submissions. The first is strongly preferable because second won’t do analysis.

NOTE: the line pexpect.spawn('python ' + file_path) in the solveGame function of the python_module3_marker.py module runs the student's script. However, the student’s script needs to be able to run on the default version of Python on the computer. If it does not then you can create a virtual environment (check out anaconda or virtualenv if you’re unsure what that is) with the necessary necessary version of Python with the necessary libraries installed, then activate the virtual environment before running the analyse_multiple_files.py file.

Copy analyse_multiple_files.py, python_module3_marker.py, and all the students scripts into a single directory (probably best to check that no students have a script with the same name). Then run analyse_multiple_files.py in Python 3+.
If you would rather keep the students scripts in different directories then create a Python list of all the file paths and pass it to the listOfFilesToTest function in the analyse_multiple_files.py module. Unfortunately this just runs the tests but does not do the analysis. In order to do the analysis you will need to copy the code from the analyse_multiple_files.py file after the line that contains if __name__ == "__main__":.

Interpretting results

The analyse_multiple_files.py script works by automatically playing hangman on each of the students scripts 20 times, it outputs a summary of the results into a csv file which will be discussed here.

Column file_name is the name of the students Python script being tested.
Column No. wins is the number of times (out of 20) that my algorithm was able to guess the correct word before being hung.
Column No. losses is the number of times (out of 20) that my algorithm was not able to guess the correct word before being hung.
Column List of errors is the error returned for each of the 20 tests (if an error occurred). It is worth noting here that these are my own custom errors and are Python strings not actual Python errors – if a python error actually occurs during the testing process then everything will stop and the results.csv file is unlikely to be created (off the top of my head this should not ever happen). There are 7 different types of return errors. To see how these errors are returned find the return statements in the solveGame function of the python_module3_marker.py module but here is written description:
- A ‘Spawn error’ occurs when our Python program attempts to run the students script but cannot (python student_script.py should recreate the problem in this case but in essence probably means that their script isn’t a valid Python script and so probably an auto-fail).
- An ‘Expect error’ occurs if the students script stops without either Please enter your next guess: or pexpect.EOF (end of file indicator according to the pexpect library). This should not occur if the student has correctly followed our description of how the script should run. However, it’s possible that their script is completely functional and so could be viewed as a harsh auto-fail.
- A ‘bfr decode error’ is likely to be caused if the last thing printed to screen before stopping was not a string. I expect that this is likely to only be caused by a significant error and so probably an auto-fail but should be watched closely in alpha testing just to be sure.
- A ‘afr decode error’ is similar to a ‘bfr decode error’ except it is refering to the output of the script after stopping and that it is neither a string nor a pexpect.EOF object. Again I expect that this is likely to only be caused by a significant error and so probably an auto-fail but should be watched closely in alpha testing just to be sure.
- A ‘Return error’ is caused if the output returned after a stop is a pexpect.EOF object and output before the stop is not a string that contains either congratulations you win or you lose. This could be caused by some that would be a harsh auto-fail (e.g. misspelling congratulations) or by an acceptable auto-fail. Alpha testing should reveal our best next steps.
- A ‘Vowel error’ should only be returned if my algorithm has tried all vowels without correctly guessing a letter. This means that either the word-list used by the student is wrong or the students script isn’t checking the user input correctly. This should be an easy auto-fail but should be confirmed in alpha testing.
- A ‘EOF/input error’ occurs if the output after a stop in the students program is not a pexpect.EOF object or is not a string that contains please enter your next guess. This is probably an auto-fail but should be checked in alpha testing.
Column Result is the result returned by the algorithm. This can be ‘pass’, ‘fail’, or ‘error’. Pass means that the students submission passed every single test carried out. Error means that the script did not perform as expected (i.e. as described in the instructions). Fail means that the students script performed as expected except the rate at which my algorithm solved the problems. This could be because the student used the wrong word list, was not picking a word randomly from a uniform distribution, or there is a small chance that this unlikely event happened by chance (‘probability that this result occurred by chance not error’ (see bullet point below) will give you the probability of this – a good way to test this is to run the test again as it would be very unlikely to happen twice in a row by chance). The easiest interpretation is to class pass as pass and fail and error as fail but could result in some harsh decisions, thus, these results do not have to be taken at face value. For example, pass could be interpreted as a good submission but some more subtle things need to be checked manually first before giving a final decision, and fail or error could be interpreted as a flag that strange behaviour occurred that needs to be manually investigated. What these mean in terms of final result need to be discussed and tested over alpha and beta testing.
Column Probability that this result occurred by chance not error calculates the probability that a fail result occurred by random chance (i.e. the students submission is correct and is being unfairly failed).

jjmistry / pam Goto Github PK

pam's Introduction

The Python Automated Marker (PAM)

Student instructions

Running automatic tests

Interpretting results

pam's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent