Sudoku is a popular number puzzle that requires you to fill blanks in a 9X9 grid with digits so that each column, each row, and each of the nine 3ร3 subgrids contains all of the digits from 1 to 9. There have been various approaches to solving that, including computational ones. In this project, we show that simple convolutional neural networks have the potential to crack Sudoku without any rule-based postprocessing.
- NumPy >= 1.11.1
- TensorFlow == 1.1
- To see what Sudoku is, check the wikipedia
- To investigate this task comprehensively, read through McGuire et al. 2013.
- 1M games were generated using
generate_sudoku.py
for training. I've uploaded them on the Kaggle dataset storage. They are available here. - 30 authentic games were collected from http://1sudoku.com.
- 10 blocks of convolution layers of kernel size 3.
generate_sudoku.py
create sudoku games. You don't have to run this. Instead, download pre-generated games.hyperparams.py
includes all adjustable hyper parameters.data_load.py
loads data and put them in queues so multiple mini-bach data are generated in parallel.modules.py
contains some wrapper functions.train.py
is for training.test.py
is for test.
- STEP 1. Download and extract training data.
- STEP 2. Run
python train.py
. Or download the pretrained file.
- Run
python test.py
.
Accuracy is defined as: Number of blanks where the prediction matched the solution / Number of blanks.
// Improve CNN
- epoch nums
- filter size
- regularization
- dropout
- more tainining data (medium, hard, expert, evil)
// Search Problem
- Use prob distributions in heuristic search
// RNN (Bidirectional with LSTM cells)
- list of tuples (blank_cell, num_hints) where num_hints = sum of hints in row,col,square
- sort list by num_hints
- feed in row, col, sqaure of cell with highest num_hints to RNN
- take joint prob for cell as solution
- update board with prediction
- recompute list with newly predicted cell
- choose new largest num_hints cell and continue
- (build structure)
// Deep RL
- very difficult
- we have structure for it though
Visualization expected board --> our board
// Input should now be probability distribution
- lets make it all one-hot
- back to 9 labels
- 0 has equal probability for all labels
bi - directional lstm go across the row, column, square as individual inputs to the lstm each pos in row x batch x
only do softmax at end
feed forward layer to translate the outputs of row, col, sqaure (features) to softmax probability feed forward to softmax layer
becomes probabilities after loss combine 3 outputs then feed forward then loss