Code Monkey home page Code Monkey logo

auto-gfqg's Introduction

auto-gfqg

This Automatic Gap-Fill Question Generation system creates multiple choice, fill-in-the-blank questions from text corpora. Textbooks, factoid archives, news articles, reports, lecture notes, legal proceedings -- the minimum viable input is a small to moderate sized collection of coherent, well-formed english.

This work is a proof-of-concept reimplementation of the ideas behind RevUp. The ideas implemented here are largely the same as those in the paper. There are two notable differences. First, we the use a biterm topic model instead of the deep autoencoder topic model. Second, we use topic-weighted word vectors to perform the gap-phrase selection. In contrast, RevUp uses a supervised model trained on human judegements via Mechanical Turk.

Setup

This project uses sbt for build management. If you're unfamiliar with sbt, see the last section for some pointers.

Build

To download all dependencies and compile code, run sbt compile.

Test

To run all tests, execute sbt test.

Command Line Applications

To produce bash scripts that will execute each individual command-line application within this codebase, execute sbt pack. The output bash scripts will be located under target/pack/bin/: their names correspond to filenames for executable Scala programs within the project.

How to use sbt

When using sbt, it is best to start it in the "interactive shell mode". To do this, simply execute from the command line:

$ sbt

After starting up (give it a few seconds), you can execute the following commands:

compile // compiles code
pack // creates executable scripts
test // runs tests
coverage / initializes the code-coverage system, use right before 'test'
reload // re-loads the sbt build definition, including plugin definitions
update // grabs all dependencies

There are a lot more commands for sbt. And a ton of community plugins that extend sbt's functionality.

Final results

The conclusions, results, and future work file summarizes thoughts and findings of this proof-of-concept (poc). Importantly, if you are interested in viewing the generated gap-fill questions and distractors, read this page.

Overview of Information Flow

This gap-fill question generation system consists of a series of different programs and data resources. It is hacked-togeher research code that, in its current form, is unsuitable for production work. It does, however, demonstrate a question generation system from end-to-end.

Before attempting to run and programs here, please read through the documentation and ensure that your machine has the necessary pre-reqs.

The following numbered list roughly describes the system's sequential operation:

  1. Use NLP tools to pre-process text. Includes sentence splitting, tokenization, and word stemming over all corpus text. See NLP process with CoreNLP for more.

  2. Use word2vec to create word vectors over a larger, different corpus of text. See create word vectors for more.

  3. Use biterm topic modelling (BTM) to discover latent topics that are expressed on a per-sentence basis within the corpus. See train BTM for more.

  4. Use the learned BTM word-topic conditional probabilites and intuitive heuristics to score all sentences from the corpus. Then, threshold and eliminate low-scoring sentences, creating gap-fill question candidates. See score and generate gap fill question candidates for more.

  5. For each candidate sentence, choose a gap word. Removing the gap word from the sentence creates the fill-in-the-blank question (i.e. the gap word is the correct answer). Additionally, discover appropriate distractors for the chosen gap word. Distractors are semantically related, but ultimately different from the gap phrase (i.e. these are the incorrect answers). See finding gap words and distractors for more.

All of the Scala programs have built-in help support. Invoke them with "-h" or "--help" to see information about how to use each program.

auto-gfqg's People

Contributors

malcolmgreaves avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

auto-gfqg's Issues

what is cmd?

Hello, i read your auto-gfqg in github .i have understand the steps of that project. But i never use scala before. when i run the Scoresentences.scala , an error occurs , " error : not found : value cmd , import cmd.RunnerHelpers " , and i want to know what cmd is?

Hey, this is great but

Hi, this project looks really interesting - but!

When i run it - "run" i get prompted for 9 items. If i just wanted to try this out - ie i have a block of text, i'd like to see the questions/answers produced - what do i have to do?

Also, when i select a number at the screen i get an error (note all i do is hit "1" and then "enter", i don't provide any other input):

Multiple main classes detected, select one to run:

 [1] agfqg.ConllToTextLine
 [2] agfqg.CreateTopicWordVecs
 [3] agfqg.FinalQuizQuestionsJson
 [4] agfqg.GapWordAndDistractorSelection
 [5] agfqg.NlpAwareGapWordAndDistractorSelection
 [6] agfqg.ScoreSentences
 [7] agfqg.SelectSentencesForQuestions
 [8] agfqg.VocabFilterGapWordAndDistractorSelection
 [9] edu.stanford.nlp.process.Stemmer
^JEnter number: 1

[info] Running agfqg.ConllToTextLine
[HELP] Need 2 arguments:
1st: INPUT processed text, in CoNLL format
2nd: OUTPUT plain text, each line is a complete sentence


Exception: sbt.TrapExitSecurityException thrown from the UncaughtExceptionHandler in thread "run-main-0"
java.lang.RuntimeException: Nonzero exit code: 1
        at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code: 1
[error] Total time: 8 s, completed Nov 7, 2018 11:58:45 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.