Light

yale-lily / lecturebank Goto Github PK

View Code? Open in Web Editor NEW

127.0 14.0 26.0 1.42 MB

LectureBank Dataset

Python 100.00%

lecturebank's Introduction

LILY LectureBank

This is the repo for the LectureBank Corpus, with all batches and updates.

Note that we also have a few works using part of the corpus, you can find more details in the LB-Paper folder.

Meta Data

data-versions

lb*.tsv: data with different versions.

ID, Instructor, Title, Topic, URL, Venue, Year

ID: Id of each line.
Instructor: The author name(s).
Title: File tile.
Topic: The Topic Number, check taxonomy.csv for topic name.
URL: Online URL.
Year: Year of the course.
Venue: Name of the university, or GitHub.

We went through a URL check on May, 2022, here are the valid resource numbers:

1020 lb1.tsv
308 lb2.tsv
3564 lb3.tsv
3136 lb4.tsv
1321 lb5.tsv
397 lb6.tsv

NOTE: we combined all five batches of LectureBank, and remove duplicates and invlaid urls. All data can be found in alldata.tsv with a total number to be 7499.

Taxonomy

NLP taxonomy release. In the file taxonomy.csv, we include the taxonomy with 320 topics in a tree structure. The topic ID for each topic shows the parent node. For example, 233 (Relation Extraction) has a parent node to be 23 (Part of Speech Tagging), and topic 23 has its parent node to be 2 (Language Modeling, Syntax, Parsing).

Topic ID: Id of topic.
Topic: topic name.

You can find how this was created in our paper CLICKER: A Computational LInguistics Classification Scheme for Educational Resources.

Other resources

Please visit our website AAN.how.

lecturebank's People

Contributors

Stargazers

Watchers

lecturebank's Issues

LectureBank2 dataset

Can the plain text of LectureBank2 be provided? Most of the URL links are no longer valid.

How to get the concept representation by Doc2Vec?

Hi,
I read your paper, and I'm wondering how you get the concept representation with Doc2Vec？ First, each slide may contains multiple concepts. Second, each concept may be contained in multiple slides. How do you deal with these issues?

Thank you!

R-VGAE data

Hi, your R-VGAE is awesome!
I'm trying to reproduce your work, but you don't seem to have released the training data, if you are not convenient to disclose the data, can you please share the script used to process the data?

Public Code

When will the code be available?

datasets and code

hi, thanks for the code.
In the paper R-VGAE, Sime-supervised uses concept-concept edges, but in the code there is only document-concept, document-document adjacency matrix.
if args.ds.startswith('tf'): if args.labels == 'y': adj_cd, adj_dd, features, tags_nodes = my_load_data_tfidf_semi(args.wmd) else: adj_cd, adj_dd, features = my_load_data_tfidf(args.wmd)

I want to know where concept-concept edge is used?
When using TF-IDF as the embedding feature, how is the feature of the concept obtained?
I know the tags of concepts are 0-321, what are the tags of documents?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.