Code Monkey home page Code Monkey logo

inlg2022studentquestions's Introduction

INLG 2022: Generation of Student Questions for Inquiry-based Learning

Data used in paper

Can be found at data/text_pairs.pkl. Consists of lecture window-question pairs.

Code used in paper

transformers_mod/src/prefix.ipynb is the main notebook used for experiments. Some values are hard coded (depending on docTTTTTquery, t5-base, and hyperparameters), so please refer to the notebook for how to reproduce results.

Run results

Can be found at transformers_mod/src/runs.

Raw Data

  • data/mooc_data contains the raw MOOC transcripts in both srt and txt formats. We use the srt format for the timestamps.
  • data/questions contains the raw questions. Most are formatted as ?L[LECTURE NUMBER]: [START TIME]-[END TIME]: [QUESTION]. Note that there are also quizzes, which are ignored in this project. Note that the lecture numbers need to be mapped to the transcript names, which we describe in the next section.

Data preprocessing

Questions

  • The parsed questions are saved at data/parsed/questions, which are further cleaned manually and saved in data/questions_cleaned. They are saved in .csv files with the following fieldnames:
    • name(str): question file name
    • lecture(str): lecture name
    • time(str): timespan containing both start and end time
    • text(str): question text For those that can't be processed by this simple rule, the file name and the line number of the question is saved into ./data/parsed/questions/messy_data.csv, which contains the following fieldnames:
    • name(str): name of the file from which the messy data comes from
    • line(int): line number of the messy data

Transcripts

  • The transcripts and related information for each lecture are saved in <lecture-name>.csv. For example, 1 - 1 - Course Welcome (00-03-11).srt will be processed into 1 - 1 - Course Welcome (00-03-11).csv. These files are saved in data/parsed/transcripts.
  • Each .csv file contains the following fieldnames:
    • name(str): transcript file name
    • id(int): transcript id
    • from(float): start time of the transcript
    • to(float): end time of the transcript
    • text(str): transcript text

Lecture number - transcript name mapping

Week 1

  • 2 - 1 - 1.1 Natural Language Content Analysis (00-21-05).csv --> 1.1
  • 2 - 2 - 1.2 Text Access (00-09-24).csv --> 1.2
  • 2 - 3 - 1.3 Text Retrieval Problem (00-26-18).csv -- > 1.3
  • 2 - 4 - 1.4 Overview of Text Retrieval Methods (00-10-10).csv --> 1.4
  • 2 - 5 - 1.5 Vector Space Model- Basic Idea (00-09-44).csv --> 1.5
  • 2 - 6 - 1.6 Vector Space Model- Simplest Instantiation (00-17-30).csv --> 1.6

Week 2

  • 2 - 7 - 1.7 Vector Space Model- Improved Instantiation (00-16-52).csv --> 2.1
  • 2 - 8 - 1.8 TF Transformation (00-09-31).csv --> 2.2
  • 2 - 9 - 1.9 Doc Length Normalization (00-18-56).csv --> 2.3
  • 3 - 1 - 2.1 Implementation of TR Systems (00-21-27).csv --> 2.4
  • 3 - 2 - 2.2 System Implementation- Inverted Index Construction (00-18-21).csv --> 2.5
  • 3 - 3 - 2.3 System Implementation- Fast Search (00-17-11).csv --> 2.6

Week 3

  • 3 - 4 - 2.4 Evaluation of TR Systems (00-10-10).csv --> 3.1
  • 3 - 5 - 2.5 Evaluation of TR Systems- Basic Measures (00-12-54).csv --> 3.2
  • 3 - 6 - 2.6 Evaluation of TR Systems- Evaluating Ranked Lists Part 1 (00-12-51).csv --> 3.3
  • 3 - 7 - 2.6 Evaluation of TR Systems- Evaluating Ranked Lists Part 2 (00-10-01) .csv --> 3.4
  • 3 - 8 - 2.7 Evaluation of TR Systems- Multi-Level Judgements (00-10-48).csv --> 3.5
  • 3 - 9 - 2.8 Evaluation of TR Systems- Practical Issues (00-15-14).csv --> 3.6

Week 4

  • 4 - 1 - 3.1 Probabilistic Retrieval Model- Basic Idea (00-12-44).csv --> 4.1
  • 4 - 2 - 3.2 Statistical Language Models (00-17-53).csv --> 4.2
  • 4 - 3 - 3.3 Query Likelihood Retrieval Function (00-12-07).csv --> 4.3
  • 4 - 4 - 3.4 Smoothing of Language Model - Part 1 (00-12-15).csv --> 4.4
  • 4 - 5 - 3.4 Smoothing of Language Model - Part 2 (00-09-36).csv --> 4.5
  • 4 - 6 - 3.5 Smoothing Methods Part - 1 (00-09-54).csv --> 4.6
  • 4 - 7 - 3.5 Smoothing Methods Part - 2 (00-13-17).csv --> 4.7

Week 5

  • 4 - 8 - 3.6 Feedback in Text Retrieval (00-06-49).csv --> 5.1
  • 4 - 9 - 3.7 Feedback in Vector Space Model- Rocchio (00-12-05).csv --> 5.2
  • 4 - 10 - 3.8 Feedback in Text Retrieval- Feedback in LM (00-19-11).csv --> 5.3
  • 5 - 1 - 4.1 Web Search- Introduction & Web Crawler (00-11-05).csv --> 5.4
  • 5 - 2 - 4.2 Web Indexing (00-17-19).csv --> 5.5
  • 5 - 3 - 4.3 Link Analysis - Part 1 (00-09-16).csv --> 5.6
  • 5 - 4 - 4.3 Link Analysis - Part 2 (00-17-30).csv --> 5.7
  • 5 - 5 - 4.3 Link Analysis - Part 3 (00-05-59).csv --> 5.8

Week 6

  • 5 - 6 - 4.4 Learning to Rank Part 1 (00-13-09).csv --> 6.1
  • 5 - 7 - 4.4 Learning to Rank - Part 2 (00-05-54).csv --> 6.2
  • 5 - 8 - 4.4 Learning to Rank - Part 3 (00-04-58).csv --> 6.3
  • 5 - 9 - 4.5 Future of Web Search (00-13-09).csv --> 6.4
  • 5 - 10 - 4.6 Recommender Systems- Content-based Filtering - Part 1 (00-12-55).csv --> 6.5
  • 5 - 11 - 4.6 Recommender Systems- Content-based Filtering - Part 2 (00-10-42).csv --> 6.6
  • 5 - 12 - 4.7 Recommender Systems- Collaborative Filtering - Part 1 (00-06-20).csv --> 6.7
  • 5 - 13 - 4.7 Recommender Systems- Collaborative Filtering - Part 2 (00-12-09).csv --> 6.8
  • 5 - 14 - 4.7 Recommender Systems- Collaborative Filtering - Part 3 (00-04-45).csv --> 6.9

Week 7

  • 2 - 1 - 1.1 Overview Text Mining and Analytics- Part 1 (00-11-43).csv --> 7.1
  • 2 - 2 - 1.2 Overview Text Mining and Analytics- Part 2 (00-11-44).csv --> 7.2
  • 2 - 3 - 1.3 Natural Language Content Analysis- Part 1 (00-12-48).csv --> 7.3
  • 2 - 4 - 1.4 Natural Language Content Analysis- Part 2 (00-04-25).csv --> 7.4
  • 2 - 5 - 1.5 Text Representation- Part 1 (00-10-46).csv --> 7.5
  • 2 - 6 - 1.6 Text Representation- Part 2 (00-09-29).csv --> 7.6
  • 2 - 7 - 1.7 Word Association Mining and Analysis (00-15-39).csv --> 7.7
  • 2 - 8 - 1.8 Paradigmatic Relation Discovery Part 1 (00-14-31).csv --> 7.8
  • 2 - 9 - 1.9 Paradigmatic Relation Discovery Part 2 (00-17-53).csv --> 7.9

Week 8

  • 2 - 10 - 1.10 Syntagmatic Relation Discovery- Entropy (00-11-00).csv --> 8.1
  • 2 - 11 - 1.11 Syntagmatic Relation Discovery- Conditional Entropy (00-11-57).csv --> 8.2
  • 2 - 12 - 1.12 Syntagmatic Relation Discovery- Mutual Information- Part 1 (00-13-55).csv --> 8.3
  • 2 - 13 - 1.13 Syntagmatic Relation Discovery- Mutual Information- Part 2 (00-09-42).csv --> 8.4
  • 3 - 1 - 2.1 Topic Mining and Analysis- Motivation and Task Definition (00-07-36).csv --> 8.5
  • 3 - 2 - 2.2 Topic Mining and Analysis- Term as Topic (00-11-31).csv --> 8.6
  • 3 - 3 - 2.3 Topic Mining and Analysis- Probabilistic Topic Models (00-14-17).csv --> 8.7
  • 3 - 4 - 2.4 Probabilistic Topic Models- Overview of Statistical Language Models- Part 1 (00-10-25).csv --> 8.8
  • 3 - 5 - 2.5 Probabilistic Topic Models- Overview of Statistical Language Models- Part 2 (00-13-11).csv --> 8.9
  • 3 - 6 - 2.6 Probabilistic Topic Models- Mining One Topic (00-12-21).csv --> 8.10

Week 9

  • 3 - 7 - 2.7 Probabilistic Topic Models- Mixture of Unigram Language Models (00-12-39).csv --> 9.1
  • 3 - 8 - 2.8 Probabilistic Topic Models- Mixture Model Estimation- Part 1 (00-10-16).csv --> 9.2
  • 3 - 9 - 2.9 Probabilistic Topic Models- Mixture Model Estimation- Part 2 (00-08-15).csv --> 9.3
  • 3 - 10 - 2.10 Probabilistic Topic Models- Expectation-Maximization Algorithm- Part 1 (00-11-05).csv --> 9.4
  • 3 - 11 - 2.11 Probabilistic Topic Models- Expectation-Maximization Algorithm- Part 2 (00-10-39).csv --> 9.5
  • 3 - 12 - 2.12 Probabilistic Topic Models- Expectation-Maximization Algorithm- Part 3 (00-06-25).csv --> 9.6
  • 3 - 13 - 2.13 Probabilistic Latent Semantic Analysis (PLSA)- Part 1 (00-10-38).csv --> 9.7
  • 3 - 14 - 2.14 Probabilistic Latent Semantic Analysis (PLSA)- Part 2 (00-10-15).csv --> 9.8
  • 3 - 15 - 2.15 Latent Dirichlet Allocation (LDA)- Part 1 (00-10-20).csv --> 9.9
  • 3 - 16 - 2.16 Latent Dirichlet Allocation (LDA)- Part 2 (00-12-03).csv --> 9.10

Week 10

  • 4 - 1 - 3.1 Text Clustering- Motivation (00-15-52).csv --> 10.1
  • 4 - 2 - 3.2 Text Clustering- Generative Probabilistic Models Part 1 (00-16-18).csv --> 10.2
  • 4 - 3 - 3.3 Text Clustering- Generative Probabilistic Models Part 2 (00-08-37).csv --> 10.3
  • 4 - 4 - 3.4 Text Clustering- Generative Probabilistic Models Part 3 (00-14-55).csv --> 10.4
  • 4 - 5 - 3.5 Text Clustering- Similarity-based Approaches (00-17-48).csv --> 10.5
  • 4 - 6 - 3.6 Text Clustering- Evaluation (00-10-11).csv --> 10.6
  • 4 - 7 - 3.7 Text Categorization- Motivation (00-14-37).csv --> 10.7
  • 4 - 8 - 3.8 Text Categorization- Methods (00-11-50).csv --> 10.8
  • 4 - 9 - 3.9 Text Categorization- Generative Probabilistic Models (00-31-18).csv --> 10.9

Week 11

  • 4 - 10 - 3.10 Text Categorization- Discriminative Classifier Part 1 (00-20-34).csv --> 11.1
  • 4 - 11 - 3.11 Text Categorization- Discriminative Classifier Part 2 (00-31-46).csv --> 11.2
  • 4 - 12 - 3.12 Text Categorization- Evaluation Part 1 (00-14-12).csv --> 11.3
  • 4 - 13 - 3.13 Text Categorization- Evaluation Part 2 (00-10-51).csv --> 11.4
  • 5 - 1 - 4.1 Opinion Mining and Sentiment Analysis- Motivation (00-17-51).csv --> 11.5
  • 5 - 2 - 4.2 Opinion Mining and Sentiment Analysis- Sentiment Classification (00-11-47).csv --> 11.6
  • 5 - 3 - 4.3 Opinion Mining and Sentiment Analysis- Ordinal Logistic Regression (00-13-43).csv --> 11.7

Week 12

  • 5 - 4 - 4.4 Opinion Mining and Sentiment Analysis- Latent Aspect Rating Analysis Part 1 (00-15-17).csv --> 12.1
  • 5 - 5 - 4.5 Opinion Mining and Sentiment Analysis- Latent Aspect Rating Analysis Part 2 (00-14-43).csv --> 12.2
  • 5 - 6 - 4.6 Text-Based Prediction (00-12-08).csv --> 12.3
  • 5 - 7 - 4.7 Contextual Text Mining- Motivation (00-06-47).csv --> 12.4
  • 5 - 8 - 4.8 Contextual Text Mining- Contextual Probabilistic Latent Semantic Analysis (00-17-59).csv --> 12.5
  • 5 - 9 - 4.9 Contextual Text Mining- Mining Topics with Social Network Context (00-14-43).csv --> 12.6
  • 5 - 10 - 4.10 Contextual Text Mining- Mining Casual Topics with Time Series Supervision (00-19-37).csv --> 12.7
  • 5 - 11 - 4.11 Course Summary (00-18-36).csv --> 12.8

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.