albertqjiang / int Goto Github PK
View Code? Open in Web Editor NEWOfficial code for paper: INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving
License: MIT License
Official code for paper: INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving
License: MIT License
(This, and following issues, are mainly a reminder for myself, as well as for future readers to easily search through changes etc)
When I started my first open source lib, someone told me that "no license means all code are copyrighted" (see: fzyzcjy/flutter_rust_bridge#1). Thus, I would appreciate it if this repository can have an explicit license (e.g. https://github.com/facebookresearch/miniF2F/blob/main/LICENSE), such that I can use it in my research.
e.g.
Traceback (most recent call last):
File "/home/jychen/.conda/envs/jychen_main/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3433, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_21573/176925322.py", line 1, in <module>
evaluate.interact_with_prover_sequentially(
File "/data/jychen/research/code/research_mono/neural_theorem_prover/evaluate.py", line 186, in interact_with_prover_sequentially
print('obs_after_action', prover.get_observation_string())
File "/data/jychen/research/code/research_mono/neural_theorem_prover/wrapped_prover.py", line 25, in get_observation_string
return self.inner.parser.observation_to_source(self.inner.get_observation())
File "/data/jychen/research/code/third_party/INT/int_environment/proof_system/graph_seq_conversion.py", line 101, in observation_to_source
source = source + logic_statement_to_seq_string(observation["objectives"][0])
IndexError: list index out of range
@albertqjiang Hi thanks for the paper with code! I wonder what is the correct command to trigger training that can reproduce the accuracy numbers in the figures (e.g. figure 2 left)?
I have tried the command in training
section of README, but it looks quite different than the reported results in the paper. For example, soon it starts to have very high training accuracy (~94%), as can be seen from the following (collapsed) code block, contrary to roughly 75% in figure 2 left.
...
generate datasets
#Generated problems: 50
#Generated problems: 100
#Generated problems: 150
#Generated problems: 200
#Generated problems: 250
#Generated problems: 300
#Generated problems: 350
#Generated problems: 400
#Generated problems: 450
#Generated problems: 500
#Generated problems: 550
#Generated problems: 600
#Generated problems: 650
#Generated problems: 700
#Generated problems: 750
#Generated problems: 800
#Generated problems: 850
#Generated problems: 900
#Generated problems: 950
#Generated problems: 1000
prob 1 fail!
prob 2 success!
prob 3 success!
prob 4 success!
prob 5 success!
prob 6 success!
prob 7 success!
prob 8 success!
prob 9 success!
prob 10 success!
prob 11 success!
prob 12 success!
prob 13 success!
prob 14 success!
prob 15 success!
prob 16 success!
prob 17 success!
prob 18 success!
prob 19 success!
prob 20 success!
prob 21 success!
prob 22 success!
prob 23 success!
prob 24 success!
prob 25 success!
prob 26 success!
prob 27 success!
prob 28 success!
prob 29 success!
prob 30 success!
prob 31 success!
prob 32 success!
prob 33 success!
prob 34 success!
prob 35 success!
prob 36 success!
prob 37 success!
prob 38 success!
prob 39 success!
prob 40 success!
prob 41 success!
prob 42 fail!
prob 43 success!
prob 44 success!
prob 45 success!
prob 46 success!
prob 47 success!
prob 48 success!
prob 49 success!
prob 50 success!
prob 51 success!
prob 52 success!
prob 53 success!
prob 54 success!
prob 55 success!
prob 56 success!
prob 57 success!
prob 58 success!
prob 59 success!
prob 60 success!
prob 61 success!
prob 62 success!
prob 63 success!
prob 64 success!
prob 65 success!
prob 66 success!
prob 67 success!
prob 68 success!
prob 69 success!
prob 70 success!
prob 71 success!
prob 72 success!
prob 73 success!
prob 74 success!
prob 75 success!
prob 76 success!
prob 77 fail!
prob 78 success!
prob 79 success!
prob 80 success!
prob 81 success!
prob 82 success!
prob 83 success!
prob 84 success!
prob 85 fail!
prob 86 fail!
prob 87 success!
prob 88 success!
prob 89 success!
prob 90 success!
prob 91 success!
prob 92 success!
prob 93 success!
prob 94 success!
prob 95 fail!
prob 96 success!
prob 97 success!
prob 98 success!
prob 99 success!
prob 100 success!
...
I will try to play with the code a little bit more, but want to firstly create this issue as a quick feedback that the README (or the code?) may not be super consistent with the paper :/
e.g. linter and doc configuring for one package is easier
(will do others first)
Hi, before PR for features, I want to firstly PR for refactoring. This is because, from the software engineering perspective, clean code is more maintainable and easy to add new features.
If I do the PR, will you merge it (given good quality code etc)? :)
Sample areas that I may work on:
Code cleanup example: In the code below, there are 4 items returned, but unpacked to only 3 variables
e.g.
File /data/jychen/research/code/research_mono/neural_theorem_prover/evaluate.py:182, in _interact_with_prover_sequentially_single(initial_step, max_steps, debug_name, verbose)
180 for action in actions:
181 lemma, input_entities = prover.find_action(action)
--> 182 result = prover.inner.apply_theorem(lemma, input_entities)
183 result_info = prover.interpret_result(result)
185 if verbose:
File /data/jychen/research/code/third_party/INT/int_environment/proof_system/prover.py:169, in Prover.apply_theorem(self, theorem, operands)
167 def apply_theorem(self, theorem, operands):
168 # Apply a theorem with operands
--> 169 results = theorem.execute_th(operands, mode=self.mode_theorem)
170 assumptions, conclusions = results["Assumptions"], results["Conclusions"]
172 # Prevent the scenario [0, 1] -> [0]
File /data/jychen/research/code/third_party/INT/int_environment/proof_system/field_axioms.py:1364, in EquMoveTerm.execute_th(self, operands, mode)
1359 elif mode == "prove":
1360 """
1361 a + b = c, ls(a) => ls(c + (-b))
1362 2 inputs: [a, c + (-b)]
1363 """
-> 1364 a, c_minus_b, = [deepcopy(op) for op in operands]
1365 if is_entity(a) and is_entity(c_minus_b) and is_structured(c_minus_b, "add") \
1366 and is_structured(c_minus_b.operands[1], "opp"):
1367 c, minus_b, = c_minus_b.operands
ValueError: not enough values to unpack (expected 2, got 1)
Hi, thank you for the wonderful INT dataset and environment! Recently, there is a paper https://arxiv.org/abs/2205.11491 which also contains a simple environment, Equations
(mainly see chapter 3, 7.1.3, and appendix B of the paper). (I guess I do not need to introduce more about the paper, since you are supervised by Lample and he seems to be the author of that paper as well ;))
Since the Equations environment is not open sourced, and the INT looks quite promising and extensible, I wonder whether it is possible to extend INT with more features mentioned in Equations?
I am willing to make a PR! (As can be seen from my homepage, I created several open-source libraries and made dozens of PR to Google Flutter)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.