Code Monkey home page Code Monkey logo

clutrr's Introduction

CLUTRR

Compositional Language Understanding with Text-based Relational Reasoniong

A benchmark dataset generator to test relational reasoning on text

Code for generating data for our paper "CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text" at EMNLP 2019

Dependencies

  • pandas - to store and retrieve in csv
  • names - to generate fancy names
  • tqdm - for fancy progressbars

Install

python setup.py develop

Tasks

CLUTRR is highly modular and thus can be used for various probing tasks. Here we document the various types of tasks available and the corresponding config arguments to generate them. To run a task, refer the following table and run:

python main.py --train_task <> --test_tasks <> <args>

Where, train_task is in the form of <task_id>.<relation_length>, and test_tasks is a comma separated list of the same form. For eg:

python main.py --train_tasks 1.3 --test_tasks 1.3,1.4

You can provide general arguments as well, which are defined in the next section.

Task Description
1 Basic family relations, free of noise
2 Family relations with supporting facts
3 Family relations with irrelevant facts
4 Family relations with disconnected facts
5 Family relations with all facts (2-4)
6 Family relations - Memory task: retrieve the relations already defined in the text
7 Family relations - Mix of Memory and Reasoning - 1 & 6

Generated data is stored in data/ folder. i

Generalizability

Each task mentioned above can be used for different length k of the relations. For example, Task 1 can have a train set of k=3 and test set of k=4,5,6, etc. See the above section in how to provide such arguments quickly.

AMT Paraphrasing

We collect paraphrases for relations k=1,2 and 3 from Amazon Mechanical Turk using ParlAI MTurk interface. The collected paraphrases can be re-used as templates to generate arbitrary large dataset in the above configurations. We will release the templates shortly here.

To use the templates, pass --use_mturk_template flag and location of the template using --template_file argument. The flag --template_length is optional and it governs the maximum length k to use to replace the sentences. The script auto-downloads our collected and cleaned template files from the server using setup() method in main.py.

Transductive and Inductive Setting

CLUTRR provides both transductive and inductive setting for relational reasoning. In the transductive setting, the relation patterns encountered in the training set is the same as in the test set. While this setup is not interesting, it can be used to perform basic sanity checks of the model. In the inductive setting, the relation patterns are split 80-20 in training and testing. Furthermore, with the ability to split AMT placeholders, CLUTRR provides 4 scenarios to play with using the correct flags:

Setup Flags Description
(1) (default) same pattern in train & test, same AMT placeholder = EASY as data leak
(2) --template_split same pattern in train & test, different AMT placeholder = Transductive, medium difficulty
(3) --holdout different pattern in train & test, same AMT placeholder = Inductive, but still could be easy for language models to exploit on the syntax
(4) --template_split --holdout different pattern in train & test, different AMT placeholder = Inductive, and hardest setup

Thanks to @NicolasAG for adding this information in the README.

Rules

We create an ideal simple kinship world, which is derived from a set of clauses or rules. The rules are defined in rules_store.yaml file.

Usage

To generate the simple setup on task 1, first cd into clutrr/clutrr folder, and run:

python main.py --train_tasks 1.2 --test_tasks 1.2 --train_rows 500 --test_rows 10 --equal --holdout --use_mturk_template --data_name "Robust Reasoning - clean - AMT" --unique_test_pattern

Pre-generated datasets used in our paper can be found here.

CLI Usage

usage: main.py [-h] [--max_levels MAX_LEVELS] [--min_child MIN_CHILD]
               [--max_child MAX_CHILD] [--p_marry P_MARRY] [--boundary]
               [--output OUTPUT] [--rules_store RULES_STORE]
               [--relations_store RELATIONS_STORE]
               [--attribute_store ATTRIBUTE_STORE] [--train_tasks TRAIN_TASKS]
               [--test_tasks TEST_TASKS] [--train_rows TRAIN_ROWS]
               [--test_rows TEST_ROWS] [--memory MEMORY]
               [--data_type DATA_TYPE] [--question QUESTION] [-v]
               [-t TEST_SPLIT] [--equal] [--analyze] [--mturk] [--holdout]
               [--data_name DATA_NAME] [--use_mturk_template]
               [--template_length TEMPLATE_LENGTH]
               [--template_file TEMPLATE_FILE] [--template_split]
               [--combination_length COMBINATION_LENGTH]
               [--output_dir OUTPUT_DIR] [--store_full_puzzles]
               [--unique_test_pattern]

optional arguments:
  -h, --help            show this help message and exit
  --max_levels MAX_LEVELS
                        max number of levels
  --min_child MIN_CHILD
                        max number of children per node
  --max_child MAX_CHILD
                        max number of children per node
  --p_marry P_MARRY     Probability of marriage among nodes
  --boundary            Boundary in entities
  --output OUTPUT       Prefix of the output file
  --rules_store RULES_STORE
                        Rules store
  --relations_store RELATIONS_STORE
                        Relations store
  --attribute_store ATTRIBUTE_STORE
                        Attributes store
  --train_tasks TRAIN_TASKS
                        Define which task to create dataset for, including the
                        relationship length, comma separated
  --test_tasks TEST_TASKS
                        Define which tasks including the relation lengths to
                        test for, comma separaated
  --train_rows TRAIN_ROWS
                        number of train rows
  --test_rows TEST_ROWS
                        number of test rows
  --memory MEMORY       Percentage of tasks which are just memory retrieval
  --data_type DATA_TYPE
                        train/test
  --question QUESTION   Question type. 0 -> relational, 1 -> yes/no
  -v, --verbose         print the paths
  -t TEST_SPLIT, --test_split TEST_SPLIT
                        Testing split
  --equal               Make sure each pattern is equal. Warning: Time
                        complexity of generation increases if this flag is
                        set.
  --analyze             Analyze generated files
  --mturk               prepare data for mturk
  --holdout             if true, then hold out unique patterns in the test set
  --data_name DATA_NAME
                        Dataset name
  --use_mturk_template  use the templating data for mturk
  --template_length TEMPLATE_LENGTH
                        Max Length of the template to substitute
  --template_file TEMPLATE_FILE
                        location of placeholders
  --template_split      Split on template level
  --combination_length COMBINATION_LENGTH
                        number of relations to combine together
  --output_dir OUTPUT_DIR
                        output_dir
  --store_full_puzzles  store the full puzzle data in puzzles.pkl file.
                        Warning: may take considerable amount of disk space!
  --unique_test_pattern
                        If true, have unique patterns generated in the first
                        gen, and then choose from it.

Citation

If our work is useful for your research, consider citing it using the following bibtex:

@article{sinha2019clutrr,
  Author = {Koustuv Sinha and Shagun Sodhani and Jin Dong and Joelle Pineau and William L. Hamilton},
  Title = {CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text},
  Year = {2019},
  journal = {Empirical Methods of Natural Language Processing (EMNLP)},
  arxiv = {1908.06177}
}

Papers using CLUTRR

  • Nicolas Gontier, Koustuv Sinha, Siva Reddy, Chris Pal, Measuring Systematic Generalization in Neural Proof Generation with Transformers (NeurIPS 2020) Paper Code & Data
  • Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp, Edward Grefenstette, Tim Rocktäschel, Learning Reasoning Strategies in End-to-End Differentiable Proving (ICML 2020) Paper Code & Data

Join the CLUTRR community

See the CONTRIBUTING file for how to help out.

License

CLUTRR is CC-BY-NC 4.0 (Attr Non-Commercial Inter.) licensed, as found in the LICENSE file.

clutrr's People

Contributors

koustuvsinha avatar pminervini avatar shagunsodhani avatar

Stargazers

Morgan Titcher avatar  avatar Jacky avatar Marc Hadfield avatar Zicheng Zhao avatar WW avatar Tianyu Fan avatar Chengfeng Mao avatar Allie Ellis avatar Blair Johnson avatar Xin Du avatar Programmer avatar 范曹耘 avatar Xanh Ho avatar  avatar Ranran avatar Marco Z avatar Andrew Drozdov avatar  avatar Nicole Liang avatar  avatar Ibrahim Sharaf avatar  avatar  avatar Zhihong Chen avatar Piji Li avatar Alexander L. Hayes avatar Ehsan Shareghi avatar Weiqiu You avatar  avatar Saadullah Amin avatar  avatar Qiming Bao avatar Fangkai Jiao avatar Donato Meoli avatar Roman Hossain Shaon avatar Trenton Bricken avatar Kaiyu Yang avatar  avatar Ryder Caswell avatar  avatar Avinash Madasu avatar lilkypimp1 avatar Alex Gaskell avatar Sihyun Yu avatar Manan Dey avatar Christoph Alt avatar Eric avatar Agnieszka Słowik avatar Shih-Feng Yang avatar Kanishka avatar Hemanth Devarapalli avatar Amir Saffari avatar normanj avatar Gaetano Rossiello avatar  avatar mrjj avatar Dylan Bourgeois avatar Tushar Jain avatar Shyam Sudhakaran avatar Yerzat Dulat avatar Akari Asai avatar Drew Arad Hudson avatar Kuzar Kyzyrbek avatar  avatar Santiago Hincapie avatar Takashi Miyazaki avatar Sam Leung avatar Benjamin Spiegel avatar Stark avatar Feiyang(Vance) Chen  avatar Shuyan Zhou avatar Huiyun Yang avatar Mike avatar Vikas Raunak avatar Yu (Hugo) Chen avatar Patrik Zajec avatar Luxi Xing avatar Seder(方进) avatar BenJueWeng avatar  avatar Andrew Meier avatar 爱可可-爱生活 avatar Chin-Hui Chen avatar Summer avatar yuanke avatar Yi Tay avatar  avatar  avatar

Watchers

Killian Murphy avatar James Cloos avatar  avatar Takashi Miyazaki avatar William L Hamilton avatar  avatar Arun Sathiya avatar  avatar Wisdom d'Almeida avatar paper2code - bot avatar  avatar

clutrr's Issues

yaml syntax

in the store currently there is yaml.load
this returns an error and should be changed to
yaml.safe_load

Erroneous rule?

Hi, thanks for your great work!

After generating data with

python main.py --train_tasks 1.2,1.3 --test_tasks 1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,1.10 --train_rows 5000 --test_rows 500 --holdout

in 1.10_test.csv, I found the following story:

[Laura] has a daughter called [Penny]. The husband of [Penny] is [Robert]. [Craig] is a brother of [Robert]. [Robert] is the father of [William]. [Ruthann] is a sister of [Robert]. [Eugenia] is [Craig]'s daughter. [Alicia] is the aunt of [Eugenia]. [William] is a brother of [Gary]. [Robert] has a son called [Gary]. [Robert] is [Ruthann]'s brother.

Where the target is:

['[Laura] has a daughter called [Alicia]. ']

After convincing myself that this cannot hold, I looked into the proof state also provided in the csv:

[{('Laura', 'daughter', 'Alicia'): [('Laura', 'son', 'Craig'), ('Craig', 'sister', 'Alicia')]},
{('Laura', 'son', 'Craig'): [('Laura', 'son', 'Robert'), ('Robert', 'brother', 'Craig')]},
{('Laura', 'son', 'Robert'): [('Laura', 'daughter', 'Ruthann'), ('Ruthann', 'brother', 'Robert')]},
{('Laura', 'daughter', 'Ruthann'): [('Laura', 'daughter', 'Penny'), ('Penny', 'sister', 'Ruthann')]},
{('Penny', 'sister', 'Ruthann'): [('Penny', 'son', 'Gary'), ('Gary', 'aunt', 'Ruthann')]},
{('Penny', 'son', 'Gary'): [('Penny', 'husband', 'Robert'), ('Robert', 'son', 'Gary')]},
{('Gary', 'aunt', 'Ruthann'): [('Gary', 'father', 'Robert'), ('Robert', 'sister', 'Ruthann')]},
{('Craig', 'sister', 'Alicia'): [('Craig', 'daughter', 'Eugenia'), ('Eugenia', 'aunt', 'Alicia')]},
{('Gary', 'father', 'Robert'): [('Gary', 'brother', 'William'), ('William', 'father', 'Robert')]}]

The spicy part here is in line 5 "Penny sister Ruthann", if this would be true then Penny would be sister and wife of Robert.
However, it is proven with [('Penny', 'son', 'Gary'), ('Gary', 'aunt', 'Ruthann')].

It seems this arises from one of the rules in rules-store.yaml:

child:inv-un:sibling

If I understand correctly this says if A has child B and B has aunt/uncle C then A and C are siblings.
Note, however, this does not hold in general as C could be sibling of the wife/husband of A but not of A.
In our example, exactly this is the case as Ruthann is the sister of Robert but not of Penny.

Sorry if I have a misunderstanding until here, have I overlooked something?

Would it be enough to simply delete this rule from rules-store.yaml and re-generate the data?

Thanks a lot

Robust reasoning dataset for cycle noise (supporting facts) doesn't have edge types for the cycle's edges

Hello!

Thanks for making the CLUTRR dataset available. I have been using it to benchmark compositional reasoning in ML models. I think it is a useful benchmark and have come across multiple instances of it being used in recent papers that present models that tackle reasoning type problems in NLP.

Now, coming to the issue:

I was using the dataset from your EMNLP paper
provided here to test out some graph models. It seems that there is no edge information for task 2.k (where the noise is the addition of nodes that correspond to adding cycles to the original chain in the story graph). For the other types of noise information (3.k, 4.k) it is easy to just random sample edge types since the noise additions are independent/terminal and don't feed back into the same logic graph. But that's not possible for 2.k type tasks.

For example, for the following story:

'[Mary] and her mother [Nettie] went to the mall to try on new clothes. [Mary] has a daughter named [Jennifer] [Cecilia] took her sister, [Mary], out to dinner for her birthday. [Cecilia] bought her mother, [Nettie], a puppy for her birthday. [Ryan] bought a new dress for his daughter [Jennifer].'

whose corresponding edge representation is:

[(0, 1), (1, 2), (2, 3), (2, 4), (4, 3)]

The edge types for only the first three nodes are provided:

['daughter', 'mother', 'mother']

whereas presumably edge (2,4) should have the edge type 'sister' and (4,3) should have an edge type 'mother' for the noise node Cecilia. Looking through the robust reasoning dataset, there is no info on the edge types of noisy nodes.

Can you please provide the corresponding datasets

  • data_7c5b0e70
  • data_06b8f2a1
  • data_523348e6
  • `data_d83ecc3e
    with the noisy edge types?

If not, can you please help me understand how GAT results were obtained in Table 2 of your paper since the graph formulation of the task requires the adjacency matrix with edge type entries right?

Only the first dataset seems important as far the paper is concerned so the rest are not super important. I believe (please correct me if I'm wrong) that the first one is used to report results for GAT in table 2 in the paper since that is the only one where k=2,3 as reported in section 4.2 of the paper.

Thanks!

--test_rows not always accurate

Hello,
I noticed something while generating clultrr data:
This is the exact command I ran: python main.py --train_tasks 4.2,4.3,4.4 --test_tasks 4.2,4.3,4.4,4.5,4.6,4.7,4.8,4.9,4.10 --train_rows 100000 --test_rows 10000 --equal --data_name 'r3-disco_l234' and these are my number of lines in the csv test files:

$ wc -l data/data_r3-disco_l234_*/*_test.csv

10011 data/data_r3-disco_l234_1571563154.7491844/4.10_test.csv  ---> 10k : ok
 3100 data/data_r3-disco_l234_1571563154.7491844/4.2_test.csv   ---> much less than 10k... 
 2941 data/data_r3-disco_l234_1571563154.7491844/4.3_test.csv   ---> much less than 10k... 
 3007 data/data_r3-disco_l234_1571563154.7491844/4.4_test.csv   ---> much less than 10k... 
10025 data/data_r3-disco_l234_1571563154.7491844/4.5_test.csv  ---> 10k : ok
10014 data/data_r3-disco_l234_1571563154.7491844/4.6_test.csv  ---> 10k : ok
10038 data/data_r3-disco_l234_1571563154.7491844/4.7_test.csv  ---> 10k : ok
10044 data/data_r3-disco_l234_1571563154.7491844/4.8_test.csv  ---> 10k : ok
10023 data/data_r3-disco_l234_1571563154.7491844/4.9_test.csv  ---> 10k : ok

I trained on tasks 4.2,4.3,4.4 and it seems like when generating the test sets, it combined all these together as one task: note that 3100+2941+3007~10k

--train_tasks 1.# always generates few stories from task 1.3

Hi,

I noticed that when running python main.py with --train_tasks 1.2,1.4,1.6 it also generated a few stories from task 1.3.

This is the detailed command I run to reproduce the bug: python main.py --train_tasks 1.2,1.4,1.6 --test_tasks 1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,1.10 --train_rows 100000 --test_rows 10000 --equal --template_split
which generated the following training lines:

  • 96,134 lines for relations of length 2
  • 96,326 lines for relations of length 4
  • 94,374 lines for relations of length 6
  • 7,231 lines for relations of length 3
    These numbers can be found with the following command: cat data/.../1.2,1.4,1.6_train.csv | grep task_1.3 | wc -l

A similar behavior is observed with --train_tasks 1.2,1.4,1.8.

I didn't try with other tasks than 1.#

Thanks :)

AMT templates issues

This is a list of issues on the AMT templates:

Known issues:

  • Some templates are not in the correct relation/gender category.
  • Some templates have the wrong pronouns, eg: ENT_0_female with he/him/his.
  • Some templates have Worker: as prefix.
  • Some templates are not complete sentences.
template train or test file listed as should be corrected
ENT_1_female and ENT_0_male . train wife/male-female removed?
ENT_1_female and ENT_0_male . train granddaughter/male-female removed?
ENT_1_male and his son ENT_0_male went to look at cars. ENT_1_male ended up buying the Mustang. test son/male-male father/male-male
ENT_1_female is a girl with a grandmother named ENT_0_female . train grandmother/female-female granddaughter/female-female
ENT_1_female went to visit her grandmother , ENT_0_female , in the retirement home . train grandmother/female-female granddaughter/female-female
ENT_0_female was sick. He stayed home from school and his grandmother, ENT_1_female, watched him. She made him chicken soup to feel better. train grandmother/female-female wrong pronoun. replace he/his/him by she/her/her in template
ENT_1_female took her grandson ENT_0_female to the zoo. He loved feeding the monkeys. train grandmother/female-female replace grandson/he by granddaughter/she in template
ENT_0_female went to visit his grandmother, ENT_1_female, at the nursing home. She was grateful for the company, she had n't had a family visit in months. train grandmother/female-female wrong pronoun. replace he/his/him by she/her/her in template
ENT_0_female stayed with his grandmother ENT_1_female last summer on her farm. He had a great time. train grandmother/female-female wrong pronoun. replace he/his/him by she/her/her in template
Worker: ENT_0_female looks just like her grandmother, ENT_1_female did as a child. train grandmother/female-female remove Worker: from template

data quality issue?

hi i found data_06b8f2a1 1.3_test.csv appears to include lots of wrong annotations. for example, {"Unød: 0": 2, "id": "fe81eae5-c860-417f-8272-fbea0585d016", "story": "[Kathleen] was excited because she was meeting her father, [Henry], for lunch. [Howard] and his son [Wayne] went to look at cars. [Howard] ended up buying the Mustang. [Howard] likes to spend time with his aunt, [Kathleen], who was excellent at cooking chicken.", "query": ["Wayne", "Henry"], "text_query": NaN, "target": "father", "text_target": ["[Henry] was so proud of his son, [Wayne]. he received a great scholarship to college."], "clean_story": "[Howard] and his son [Wayne] went to look at cars. [Howard] ended up buying the Mustang. [Howard] likes to spend time with his aunt, [Kathleen], who was excellent at cooking chicken. [Kathleen] was excited because she was meeting her father, [Henry], for lunch.", "proof_state": [{"('Wayne', 'father', 'Henry')": [["Wayne", "sister", "Kathleen"], ["Kathleen", "father", "Henry"]]}, {"('Wayne', 'sister', 'Kathleen')": [["Wayne", "son", "Howard"], ["Howard", "aunt", "Kathleen"]]}], "f_comb": "son-aunt-father", "task_name": "task_1.3", "story_edges": [[0, 1], [1, 2], [2, 3]], "edge_types": ["son", "aunt", "father"], "query_edge": [0, 3], "genders": "Wayne:male,Howard:male,Kathleen:female,Henry:male", "syn_storøy": NaN, "node_mapping": {"16": 0, "17": 1, "3": 2, "0": 3}, "task_split": "test"}
Henry is definitely NOT Wayne's father?!
In general, lots of mother/father-in-laws were incorrectly annotated as father/mother.
Did I miss anything here?

[META] Revamp - move graph generation logic to GLC

Confession time! I have not been able to keep up with the issues in this repository owing to my own commitments to other projects, covid isolation, among other things. However, I don't want to bore you with excuses anymore! In the last cycle, we developed GraphLog, which inherently follows the same graph generation pipeline and is under use by our lab for several projects. Gaining insights from GraphLog generation, and the follow-up works, I have been able to improve the core graph generation logic to be faster, provable, and reproducible.

I have released the core logic in a separate repo, GLC, to continue its development separately. In the coming week, I plan to integrate GLC with CLUTRR, which would hopefully resolve several issues I have received both through Github and mail about the slow and unreliable generation pipeline. Thank you for your patience and please let me know any features you want through the issues!

Task 6 -- Family relations - Memory task: retrieve the relations already defined in the text

Hi, I think there is something wrong with generating datasets for "Task 6 -- Family relations - Memory task: retrieve the relations already defined in the text" -- If I do e.g.:

$ PYTHONPATH=. python3 main.py --train_tasks 6.2,6.3 --test_tasks 6.2,6.3

I get instances such as this one:

$ tail -n 1 ~/workspace/clutrr/data/data_08aa323e/6.2_test.csv
84,f40ac862-0ba6-4f70-8c3a-9e060edd8bed,[Calvin] is [Henry]'s grandfather.  [Travis] has a son called [Henry]. ,"('Travis', 'Calvin')",Who is [Calvin] from the point of relation of [Travis] ? ,father,['[Calvin] is the father of [Travis]. '],[Travis] has a son called [Henry].  [Calvin] is [Henry]'s grandfather. ,"[{('Travis', 'father', 'Calvin'): [('Travis', 'son', 'Henry'), ('Henry', 'grandfather', 'Calvin')]}]",son-grandfather,task_6.2,"[(0, 1), (1, 2)]","['son', 'grandfather']","(0, 2)","Travis:male,Henry:male,Calvin:male",,"{2: 0, 10: 1, 0: 2}",test

which does not reduce in retrieving relations already defined in the text, but requires some sort of reasoning.

Am I doing anything wrong here?

Releasing v1.3 data

Hi authors, thanks for creating this great dataset!
Would it be possible to share the "GPT3 cleaned data: CLUTRR v1.3" as mentioned in this blog post?
This will save a lot of time for users to generate the data themselves and enable fair comparison of different methods on the same data. Thanks!

Issues in the AMT templates and how to mitigate them

Dear authors, @koustuvsinha @pminervini @shagunsodhani

Thanks for the great work!

I downloaded the dataset from the provided link https://drive.google.com/file/d/1SEq_e1IVCDDzsBIBhoUQ5pOVH5kxRoZF/view and found a few mistakes in the test dataset. Below are 4 mistakes that I found from the first 10 data instances in file data_06b8f2a1/1.3_test.csv in the dataset. It seems to me that a big portion of the data may not be correct.

Index Story Query Target Comment
2 [Kathleen] was excited because she was meeting her father, [Henry], for lunch. [Howard] and his son [Wayne] went to look at cars. [Howard] ended up buying the Mustang. [Howard] likes to spend time with his aunt, [Kathleen], who was excellent at cooking chicken. ('Wayne', 'Henry') father The target should be greatgrandfather.
5 [Johanna] spent a great day shopping with her daughter, [Vickie]. [Vickie] wanted to visit her grandmother [Donna], but [Donna] was asleep. [Johanna] and [Philip] left that evening to go bowling. ("Philip","Donna") mother We cannot tell any relationship for Philip.
6 [Johanna] enjoyed a homemade dinner with her son [Cedric] [Wayne] and his son, [Cedric], went over to [Donna]'s house for the holidays. [Wayne] loved seeing his mother, but [Cedric] was less enthusiastic. ("Johanna","Donna") mother The target should be mother_in_law.
9 [Devin] and his Aunt [Kathleen] flew first class [Devin] has a few children, [Philip], Bradley and Claire [Kathleen] vowed to never trust her father, [Henry] with her debit card again. ("Philip","Henry") father The target should be greatgrandfather.

Since other users already submitted issues to report errors in the dataset a year ago, is there any update to the dataset (e.g., a cleaner version with fewer mistakes)?
Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.