crisbal / docker-torch-rnn Goto Github PK
View Code? Open in Web Editor NEWDocker images for using torch-rnn
Home Page: https://hub.docker.com/r/crisbal/torch-rnn/
Docker images for using torch-rnn
Home Page: https://hub.docker.com/r/crisbal/torch-rnn/
Hi all!
I am wondering if there is an easy way to read in the parameter weights from the .t7 files saved during checkpointing. I want to perform some purely mathematical operations on the weights for a research project, but am unsure of an easy way to access them in raw format as stated. Please let me know!
-Matt
After installing the NVIDIA docker image, and loading the Torch RNN docker via:
nvidia-docker run --rm -ti crisbal/torch-rnn:cuda7.5 bash
and preprocessing via
root@3da15ad69af8:~/torch-rnn# python scripts/preprocess.py --input_txt data/library.txt --output_h5 data/library.h5 --output_json data/library.json
Attempting to train the system results in the following:
root@3da15ad69af8:~/torch-rnn# th train.lua -input_h5 data/library.h5 -input_json data/library.json
Running with CUDA on GPU 0
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-9234/cutorch/lib/THC/THCGeneral.c line=608 error=8 : invalid device function
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/nn/Container.lua:67:
In 2 module of nn.Sequential:
./LSTM.lua:128: cuda runtime error (8) : invalid device function at /tmp/luarocks_cutorch-scm-1-9234/cutorch/lib/THC/THCGeneral.c:608
stack traceback:
[C]: in function 'resize'
./LSTM.lua:128: in function <./LSTM.lua:118>
[C]: in function 'xpcall'
/root/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
train.lua:130: in function 'opfunc'
/root/torch/install/share/lua/5.1/optim/adam.lua:33: in function 'adam'
train.lua:187: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
train.lua:130: in function 'opfunc'
/root/torch/install/share/lua/5.1/optim/adam.lua:33: in function 'adam'
train.lua:187: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
When I'm trying to run the training in the base image, I get the following issue:
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/trepl/init.lua:384: module 'cutorch' not found:No LuaRocks module found for cutorch
no field package.preload['cutorch']
no file '/root/.luarocks/share/lua/5.1/cutorch.lua'
no file '/root/.luarocks/share/lua/5.1/cutorch/init.lua'
no file '/root/torch/install/share/lua/5.1/cutorch.lua'
no file '/root/torch/install/share/lua/5.1/cutorch/init.lua'
no file './cutorch.lua'
no file '/root/torch/install/share/luajit-2.1.0-beta1/cutorch.lua'
no file '/usr/local/share/lua/5.1/cutorch.lua'
no file '/usr/local/share/lua/5.1/cutorch/init.lua'
no file '/root/.luarocks/lib/lua/5.1/cutorch.so'
no file '/root/torch/install/lib/lua/5.1/cutorch.so'
no file '/root/torch/install/lib/cutorch.so'
no file './cutorch.so'
no file '/usr/local/lib/lua/5.1/cutorch.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
train.lua:55: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
docker run -ti crisbal/torch-rnn:base bash
Error response from daemon: Cannot start container 5f199861ca44c53040ed2fb58b704d8d1a0325a8ed8f61497570ab6ae190f3f9: no such file or directory
Anyone knows this bug ?
I'm currently running the training with docker exec -it <container_name> bash
to enter the container and use Ctrl+P
then Ctrl+Q
to escape from the container after starting the training script.
Using docker logs <container_name>
will only show the logs when the training is finished, whereas I haven't been able to track the training progress, so that I can planning other tasks on the machine.
Was wondering if there's way to inspect training progress with docker?
Meanwhile, it would be great to have the time/batch
print out as in the char-rnn repo.
Just tested your docker (I run docker 1.12.1) and when executing your example th train.lua -input_h5 data/tiny-shakespeare.h5 -input_json data/tiny-shakespeare.json
I had this error:
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/trepl/init.lua:384: module 'cutorch' not found:No LuaRocks module found for cutorch no field package.preload['cutorch'] no file '/root/.luarocks/share/lua/5.1/cutorch.lua' no file '/root/.luarocks/share/lua/5.1/cutorch/init.lua' no file '/root/torch/install/share/lua/5.1/cutorch.lua' no file '/root/torch/install/share/lua/5.1/cutorch/init.lua' no file './cutorch.lua' no file '/root/torch/install/share/luajit-2.1.0-beta1/cutorch.lua' no file '/usr/local/share/lua/5.1/cutorch.lua' no file '/usr/local/share/lua/5.1/cutorch/init.lua' no file '/root/.luarocks/lib/lua/5.1/cutorch.so' no file '/root/torch/install/lib/lua/5.1/cutorch.so' no file '/root/torch/install/lib/cutorch.so' no file './cutorch.so' no file '/usr/local/lib/lua/5.1/cutorch.so' no file '/usr/local/lib/lua/5.1/loadall.so' stack traceback: [C]: in function 'error' /root/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require' train.lua:55: in main chunk [C]: in function 'dofile' /root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670
Hi,
I'm using the Docker file and it works great with the example shakespeare text.
I'm trying to run:
docker run -it -v ~/data2:/data2 crisbal/torch-rnn:base bash
to mount my data folder and use it, but it doesn't show up. It works for other docker files, so are there any permissions I need to change? Or is there another way I can run it using my data?
I'm new to this and appreciate the help. Thank you
https://hub.docker.com/r/crisbal/torch-rnn/tags/ lists only base
and cuda7.5
Hi, my input file is very big, about 500MB. and the train.lua stop print information when run for a while. the last info it's print is
Epoch 1.02 / 50, i = 994 / 3145800, loss = 4.990288
Epoch 1.02 / 50, i = 995 / 3145800, loss = 5.104537
Epoch 1.02 / 50, i = 996 / 3145800, loss = 4.961758
Epoch 1.02 / 50, i = 997 / 3145800, loss = 4.969568
Epoch 1.02 / 50, i = 998 / 3145800, loss = 5.046015
Epoch 1.02 / 50, i = 999 / 3145800, loss = 4.955519
Epoch 1.02 / 50, i = 1000 / 3145800, loss = 4.886581
and the process CPU run at 100%
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31256 root 20 0 2333920 2.0g 1916 R 108.0 54.5 172:40.70 luajit
is this normal? should I just wait?
Following on from #8, I also note that the CUDA 8.0 build isn't tagged in the registry. Would it be possible to do so?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.