Code Monkey home page Code Monkey logo

edudata's Introduction

EduData

PyPI Build Status Coverage Status PyPI - Python Version

Convenient interface for downloading and preprocessing dataset in education.

The dataset includes:

Your can also visit our datashop BaseData to get those mentioned-above (most of them) dataset.

Except those mentioned-above dataset, we also provide some benchmark dataset for some specified task, which is listed as follows:

Tutorial

Installation

Git and install by pip

pip install -e .

CLI

edudata $subcommand $parameters1 $parameters2

To see the help information:

edudata -- --help
edudata $subcommand --help

The cli tools is constructed based on fire. Refer to the documentation for detailed usage.

Download Dataset

Before downloading dataset, first check the available dataset:

edudata ls

and get:

assistment-2009-2010-skill
assistment-2012-2013-non-skill
assistment-2015
junyi
KDD-CUP-2010
slepemapy.cz

Download the dataset by specifying the name of dataset:

edudata download assistment-2009-2010-skill

In order to change the storing directory, use the following order:

edudata download assistment-2009-2010-skill $dir

Task Specified Tools

Knowledge Tracing
Format converter

In Knowledge Tracing task, there is a popular format (we named it triple line (tl) format) to represent the interaction sequence records:

5
419,419,419,665,665
1,1,1,0,0

which can be found in Deep Knowledge Tracing. In this format, three lines are composed of an interaction sequence. The first line indicates the length of the interaction sequence, and the second line represents the exercise id followed by the third line, where each elements stands for correct answer (i.e., 1) or wrong answer (i.e., 0)

In order to deal with the issue that some special symbols are hard to be stored in the mentioned-above format, we offer another one format, named json sequence to represent the interaction sequence records:

[[419, 1], [419, 1], [419, 1], [665, 0], [665, 0]]

Each item in the sequence represent one interaction. The first element of the item is the exercise id (in some works, the exercise id is not one-to-one mapped to one knowledge unit(ku)/concept, but in junyi, one exercise contains one ku) and the second one indicates whether the learner correctly answer the exercise, 0 for wrongly while 1 for correctly
One line, one json record, which is corresponded to a learner's interaction sequence.

We provide tools for converting two format:

# convert tl sequence to json sequence
edudata tl2json $src $tar
# convert json sequence to tl sequence
edudata json2tl $src $tar
Dataset Preprocess

The cli tools to quickly convert the "raw" data of the dataset into "mature" data for knowledge tracing task. The "mature" data is in json sequence format and can be modeled by XKT and TKT(TBA)

junyi
# download junyi dataset to junyi/
>>> edudata download junyi
# build knolwedge graph
>>> edudata dataset junyi kt extract_relations junyi/ junyi/data/
# prepare dataset for knwoeldge tracing task, which is represented in json sequence
>>> edudata dataset junyi kt build_json_sequence junyi/ junyi/data/ junyi/data/graph_vertex.json 1000
# after preprocessing, a json sequence file, named student_log_kt_1000, can be found in junyi/data/
# further preprocessing like spliting dataset into train and test can be performed
>>> edudata train_valid_test junyi/data/student_log_kt_1000 -- --train_ratio 0.8 --valid_ratio 0.1 --test_ratio 0.1
Analysis Dataset

This tool only supports the json sequence format. To check the following statical indexes of the dataset:

  • knowledge units number
  • correct records number
  • the number of sequence
edudata kt_stat $filename

Evaluation

In order to better verify the effectiveness of model, the dataset is usually divided into train/valid/test or using kfold method.

edudata train_valid_test $filename1 $filename2 -- --train_ratio 0.8 --valid_ratio 0.1 --test_ratio 0.1
edudata kfold $filename1 $filename2 -- --n_splits 5

Refer to longling for more tools and detailed information.

More works

Refer to our website and github for our publications and more projects

edudata's People

Contributors

tswsxk avatar nnnyt avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.