albertqjiang / int Goto Github PK

View Code? Open in Web Editor NEW

34.0 34.0 3.0 5.09 MB

Official code for paper: INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

License: MIT License

Python 6.72% Jupyter Notebook 93.28%

int's People

Contributors

Stargazers

Watchers

Forkers

minchaowu fzyzcjy mtyrolski

int's Issues

Refactor: Configure linter, and fix linting errors

(This, and following issues, are mainly a reminder for myself, as well as for future readers to easily search through changes etc)

When I started my first open source lib, someone told me that "no license means all code are copyrighted" (see: fzyzcjy/flutter_rust_bridge#1). Thus, I would appreciate it if this repository can have an explicit license (e.g. https://github.com/facebookresearch/miniF2F/blob/main/LICENSE), such that I can use it in my research.

observation_to_source throws error if the observation is already proved

e.g.

Traceback (most recent call last):
  File "/home/jychen/.conda/envs/jychen_main/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3433, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_21573/176925322.py", line 1, in <module>
    evaluate.interact_with_prover_sequentially(
  File "/data/jychen/research/code/research_mono/neural_theorem_prover/evaluate.py", line 186, in interact_with_prover_sequentially
    print('obs_after_action', prover.get_observation_string())
  File "/data/jychen/research/code/research_mono/neural_theorem_prover/wrapped_prover.py", line 25, in get_observation_string
    return self.inner.parser.observation_to_source(self.inner.get_observation())
  File "/data/jychen/research/code/third_party/INT/int_environment/proof_system/graph_seq_conversion.py", line 101, in observation_to_source
    source = source + logic_statement_to_seq_string(observation["objectives"][0])
IndexError: list index out of range

typo: "training" part has doc saying "k=3, l=3" while command line seems to be "k=5, l=5"

Correct execution of reproducing the experiments in the paper?

@albertqjiang Hi thanks for the paper with code! I wonder what is the correct command to trigger training that can reproduce the accuracy numbers in the figures (e.g. figure 2 left)?

I have tried the command in training section of README, but it looks quite different than the reported results in the paper. For example, soon it starts to have very high training accuracy (~94%), as can be seen from the following (collapsed) code block, contrary to roughly 75% in figure 2 left.

...
generate datasets
#Generated problems: 50
#Generated problems: 100
#Generated problems: 150
#Generated problems: 200
#Generated problems: 250
#Generated problems: 300
#Generated problems: 350
#Generated problems: 400
#Generated problems: 450
#Generated problems: 500
#Generated problems: 550
#Generated problems: 600
#Generated problems: 650
#Generated problems: 700
#Generated problems: 750
#Generated problems: 800
#Generated problems: 850
#Generated problems: 900
#Generated problems: 950
#Generated problems: 1000
prob 1 fail!
prob 2 success!
prob 3 success!
prob 4 success!
prob 5 success!
prob 6 success!
prob 7 success!
prob 8 success!
prob 9 success!
prob 10 success!
prob 11 success!
prob 12 success!
prob 13 success!
prob 14 success!
prob 15 success!
prob 16 success!
prob 17 success!
prob 18 success!
prob 19 success!
prob 20 success!
prob 21 success!
prob 22 success!
prob 23 success!
prob 24 success!
prob 25 success!
prob 26 success!
prob 27 success!
prob 28 success!
prob 29 success!
prob 30 success!
prob 31 success!
prob 32 success!
prob 33 success!
prob 34 success!
prob 35 success!
prob 36 success!
prob 37 success!
prob 38 success!
prob 39 success!
prob 40 success!
prob 41 success!
prob 42 fail!
prob 43 success!
prob 44 success!
prob 45 success!
prob 46 success!
prob 47 success!
prob 48 success!
prob 49 success!
prob 50 success!
prob 51 success!
prob 52 success!
prob 53 success!
prob 54 success!
prob 55 success!
prob 56 success!
prob 57 success!
prob 58 success!
prob 59 success!
prob 60 success!
prob 61 success!
prob 62 success!
prob 63 success!
prob 64 success!
prob 65 success!
prob 66 success!
prob 67 success!
prob 68 success!
prob 69 success!
prob 70 success!
prob 71 success!
prob 72 success!
prob 73 success!
prob 74 success!
prob 75 success!
prob 76 success!
prob 77 fail!
prob 78 success!
prob 79 success!
prob 80 success!
prob 81 success!
prob 82 success!
prob 83 success!
prob 84 success!
prob 85 fail!
prob 86 fail!
prob 87 success!
prob 88 success!
prob 89 success!
prob 90 success!
prob 91 success!
prob 92 success!
prob 93 success!
prob 94 success!
prob 95 fail!
prob 96 success!
prob 97 success!
prob 98 success!
prob 99 success!
prob 100 success!
...

I will try to play with the code a little bit more, but want to firstly create this issue as a quick feedback that the README (or the code?) may not be super consistent with the paper :/

Speed up data generation by parallelization

Refactor: Make it one package instead of many packages to ease future coordination and avoid PYTHONPATH hack

e.g. linter and doc configuring for one package is easier

(low priority) Refactor: Generate python doc automatically from code comments instead of manual copying

(will do others first)

Refactor: Add unit and integration tests

Willing to PR to refactor the library (before contributing new features)

Hi, before PR for features, I want to firstly PR for refactoring. This is because, from the software engineering perspective, clean code is more maintainable and easy to add new features.

If I do the PR, will you merge it (given good quality code etc)? :)

Sample areas that I may work on:

Adding unit and integration tests
Code cleanup

Code cleanup example: In the code below, there are 4 items returned, but unpacked to only 3 variables

when number of operands are wrong, it throws error

e.g.


File /data/jychen/research/code/research_mono/neural_theorem_prover/evaluate.py:182, in _interact_with_prover_sequentially_single(initial_step, max_steps, debug_name, verbose)
    180 for action in actions:
    181     lemma, input_entities = prover.find_action(action)
--> 182     result = prover.inner.apply_theorem(lemma, input_entities)
    183     result_info = prover.interpret_result(result)
    185     if verbose:

File /data/jychen/research/code/third_party/INT/int_environment/proof_system/prover.py:169, in Prover.apply_theorem(self, theorem, operands)
    167 def apply_theorem(self, theorem, operands):
    168     # Apply a theorem with operands
--> 169     results = theorem.execute_th(operands, mode=self.mode_theorem)
    170     assumptions, conclusions = results["Assumptions"], results["Conclusions"]
    172     # Prevent the scenario [0, 1] -> [0]

File /data/jychen/research/code/third_party/INT/int_environment/proof_system/field_axioms.py:1364, in EquMoveTerm.execute_th(self, operands, mode)
   1359 elif mode == "prove":
   1360     """
   1361     a + b = c, ls(a) => ls(c + (-b))
   1362     2 inputs: [a, c + (-b)]
   1363     """
-> 1364     a, c_minus_b, = [deepcopy(op) for op in operands]
   1365     if is_entity(a) and is_entity(c_minus_b) and is_structured(c_minus_b, "add") \
   1366             and is_structured(c_minus_b.operands[1], "opp"):
   1367         c, minus_b, = c_minus_b.operands

ValueError: not enough values to unpack (expected 2, got 1)

(Willing to PR) Enhance `INT` by features of `Equations` from `HyperTree Proof Search for Neural Theorem Proving`, and extra inequalities from `Formal Mathematics Statement Curriculum Learning`?

Hi, thank you for the wonderful INT dataset and environment! Recently, there is a paper https://arxiv.org/abs/2205.11491 which also contains a simple environment, Equations (mainly see chapter 3, 7.1.3, and appendix B of the paper). (I guess I do not need to introduce more about the paper, since you are supervised by Lample and he seems to be the author of that paper as well ;))

Since the Equations environment is not open sourced, and the INT looks quite promising and extensible, I wonder whether it is possible to extend INT with more features mentioned in Equations?

I am willing to make a PR! (As can be seen from my homepage, I created several open-source libraries and made dozens of PR to Google Flutter)