dselsam / neurosat Goto Github PK

NeuroSAT: Learning a SAT Solver from Single-Bit Supervision

License: Apache License 2.0

Python 97.81% Shell 2.19%

neurosat's Introduction

NeuroSAT

NeuroSAT is an experimental SAT solver that is learned using single-bit supervision only. We train it as a classifier to predict satisfiability of random SAT problems and it learns to search for satisfying assignments to explain that bit of supervision. When it guesses sat, we can almost always decode the satisfying assignment it has found from its activations. It can often find solutions to problems that are bigger, harder, and from entirely different domains than those it saw during training.

Specifically, we train it as a classifier to predict satisfiability on random problems that look like this:

When making a prediction about a new problem, it guesses unsat with low confidence (light blue) until it finds a satisfying assignment, at which point it guesses sat with very high confidence (red) and converges:

Iteration →

At convergence, the literal embeddings cluster according to the solution it finds:

We can almost always recover the solution by clustering the literal embeddings, thus making NeuroSAT an end-to-end SAT solver.

At test time it can often find solutions to

bigger random problems:

graph coloring problems:

clique detection problems:

dominating set problems:

and vertex cover problems:

Caveats

The graph problems are derived from small random graphs (~10 nodes, ~17 edges on average).
NeuroSAT is a research prototype and is still vastly less reliable than traditional SAT solvers.

Reproducibility

As many readers know too well, facilitating exact reproducibility in machine learning can require a lot of work. NeuroSAT is no exception. We regret that we do not currently provide a push-button way to retrain our exact model on the exact same training data we used in our experiments, though we may provide such functionality in the future depending on the level of interest. For now, we settle for providing our model code, a generator for the distribution of problems we trained on, and enough scaffolding to easily train and test it on small datasets. More utilities will be added in the coming weeks. We hope users will adapt our code to their own infrastructures, improve upon our model, and train it on a greater variety of problems.

Playing with NeuroSAT

The scripts/ directory includes a few scripts to get started.

setup.sh installs dependencies.
toy_gen_data.sh generates toy train and test data.
toy_train.sh trains a model for a few iterations on the toy training data.
toy_test.sh evaluates the trained model on the toy test data.
toy_solve.sh tries to solve the toy test problems.
toy_pipeline.sh runs toy_gen_data.sh, toy_train.sh, toy_test.sh, and toy_solve.sh in sequence.

These scripts can be easily modified to train and test on larger datasets.

Resources

More information about NeuroSAT can be found in the paper https://arxiv.org/abs/1802.03685.

Team

Daniel Selsam, Stanford University
Matthew Lamm, Stanford University
Benedikt Bünz, Stanford University
Percy Liang, Stanford University
Leonardo de Moura, Microsoft Research
David L. Dill, Stanford University

Acknowledgments

This work was supported by Future of Life Institute grant 2017-158712.

neurosat's People

Contributors

Stargazers

Watchers

Forkers

meijun msakai eugenium motus bkj mannykayy rainoftime feifanxu coldenchan cpehle rustleman yluo39github claradepaolis thebananaman codeaudit rahulptel longjohncoder gitrekm kartset esmaeilinia hephaex kruscaal negotiatorvivian willchen05 skypr hao-zi song9446 bayoumim zysszy zizai qunqunqun christopherrosin arash1902 mehwishfatimah miroslavhoudek zeta1999 m-usamasaleem shi27feng ahren09 tommiyi zyfyyzyf stingtheguy skogunde hometownjlu vin-nag guillaume-barthe kk-arman jiangnanhugo codersupreme99101 wbeaching gabventurato troye95 x1a-jk gwstudy gauthamastro martijnv11

neurosat's Issues

Provide data set

Hi
It'd be useful if you could provide an example dataset at least for the toy example. The training script for this one is referring to data/train/sr5.

Thanks
Ben.

`python/testate.py` does not exist

scripts/toy_test.sh references the file python/testate.py which does not exist -- is this supposed to be python/validate.py?

Thanks
Ben

Can't get normal result and I am confused.

Sorry for interrupting you when you are busy with working, I'm trying to modify this network to do some experiment, however I can't get normal result. I haven changed the size of problem by modify gen_data.py,but the result are as the same. For example, the loss are always 0.6931 and the matrix are always 50% accuracy. I wonder how can I get some normal result!

Training
Loading data/train/sr5/data_dir=grp1_npb=60000_nb=8.pkl...
[0] 0.6932 (0.30, 0.20, 0.30, 0.20) [42s]
Start Trian
Loading data/train/sr5/data_dir=grp2_npb=60000_nb=10.pkl...
[1] 0.6932 (0.25, 0.25, 0.25, 0.25) [43s]
Start Trian
Loading data/train/sr5/data_dir=grp3_npb=60000_nb=6.pkl...
[2] 0.6932 (0.20, 0.30, 0.20, 0.30) [43s]
Start Trian
Loading data/train/sr5/data_dir=grp8_npb=60000_nb=7.pkl...
[3] 0.6932 (0.30, 0.20, 0.30, 0.20) [53s]
Start Trian
Loading data/train/sr5/data_dir=grp9_npb=60000_nb=7.pkl...
[4] 0.6932 (0.20, 0.30, 0.20, 0.30) [44s]

Test:
data/test/sr5/data_dir=grp8_npb=60000_nb=9.pkl 0.6932 (0.50, 0.00, 0.50, 0.00)
data/test/sr5/data_dir=grp2_npb=60000_nb=8.pkl 0.6932 (0.50, 0.00, 0.50, 0.00)
data/test/sr5/data_dir=grp10_npb=60000_nb=8.pkl 0.6932 (0.50, 0.00, 0.50, 0.00)
data/test/sr5/data_dir=grp5_npb=60000_nb=8.pkl 0.6932 (0.50, 0.00, 0.50, 0.00)
data/test/sr5/data_dir=grp1_npb=60000_nb=8.pkl 0.6932 (0.50, 0.00, 0.50, 0.00)
data/test/sr5/data_dir=grp9_npb=60000_nb=10.pkl 0.6932 (0.50, 0.00, 0.50, 0.00)
data/test/sr5/data_dir=grp3_npb=60000_nb=8.pkl 0.6932 (0.50, 0.00, 0.50, 0.00)
data/test/sr5/data_dir=grp6_npb=60000_nb=8.pkl 0.6932 (0.50, 0.00, 0.50, 0.00)
data/test/sr5/data_dir=grp4_npb=60000_nb=8.pkl 0.6932 (0.50, 0.00, 0.50, 0.00)
data/test/sr5/data_dir=grp7_npb=60000_nb=9.pkl 0.6932 (0.50, 0.00, 0.50, 0.00)

Guidance on reproducing experiments in paper

Hi --

I see the disclaimer about how this repo doesn't include the code to reproduce the experiments in the paper, but are you able to sketch out what I'd have to do to reproduce some of those experiments? (In particular, I'm most interested in reproducing the results in Table 2, where you show that the learned solver can be applied to SAT-encoded version of other NP problems, but also in the SR(U(40)) experiments)

EDIT: More specifically, a couple of things that could help get me off the ground -- For the experiments described in Table 1, how many problem instances did you train on? How many epochs of training?

Thanks
Ben

Something confused about the max_nodes_per_batch parameters

I find that in the toy examples, you set this hyper-parameter as 60000, however, in the paper, you set this as 12000, which is smaller than 60000.

From my understanding, more nodes mean more expressive representation. I am wondering if my understanding correct? And why you set this hyper-parameter in the toy examples.

Thanks!

About PCA

Hello, I notice you used PCA to get knowledge of what's happening during iterations. So I wonder if it's necessary to do PCA to the data, then use k-means to decode, or just use k-means.

I am new to this field, sorry to bother you if I propose a stupid question.

Procedure to generate `different problems` in NeuroSAT

Hi,

Is there any script which can illustrate the procedure to generate different problems, such as six different random graph distributions and graph coloring problems (3 ≤ k ≤ 5), dominating-set problems (2 ≤ k ≤ 4)), clique-detection problems (3 ≤ k ≤ 5), and vertex cover problems (4 ≤ k ≤ 6).