soumith / dcgan.torch Goto Github PK
View Code? Open in Web Editor NEWA torch implementation of http://arxiv.org/abs/1511.06434
License: Other
A torch implementation of http://arxiv.org/abs/1511.06434
License: Other
The pixel value of real images is from 0 to 1, and the value for fake images is from -1 to 1 because of tanh
function. And the real images and fake images are used to train discriminator with different scales.
I wonder whether this design is on purpose, and is there any benefit we can get from this?
after done
'DATA_ROOT=/media/envy/data1t/os_prj/github/DCGAN-tensorflow/data/celebA/img_align_celeba th main.lua'
what's the next move
Hi,
I have been playing around with the DCGAN architecture and I have a question on the similarity among the generated images.
I trained the network on a 140 dimensional vectors sampled from normal(0,1). The results were good after a few epochs, and they look varied too. The generator's output looks like the following:
I modified the above network to take a table of inputs (one 100 dimensional vector and one 40 dimensional vector - both sampled from normal(0,1)) through a parallel table and joined them to make a 140 dimensional vector. The following are some results:
The above two networks are essentially the same because the parameter learning happens only in layers following the join table in 2nd network, and this part of network architecture is the same for both. But the results are more varied in the first network, and there are lots of similar pictures in the 2nd output.
I have observed this on other datasets too. According to my understanding, after training the DCGAN, the generator learns a mapping from Z-vector space to images. Is there a possibility of the generator learning only a "certain set" of images (which are not necessarily in training set, so there's no overfitting) for the whole Z-vector distribution, and output only those images for various Z vector inputs? It would be great if anyone can shed some light on why this might happen.
Thanks!
Exactly as the title says, while training, the display updates every ten counter steps for the first epoch, but for every epoch following, only the real images update. The err_g also seems to change from being quite erractic in the first epoch, to mostly uniform in epochs > 1.
I didn't have this problem on my mac training with cpu, but on ubuntu training with either cpu or gpu this happens every time.
FWIW I'm working with a clean clone of the repo.
~/dcgan.torch$ gpu=1 display_id=40 DATA_ROOT=/media/aferriss/SHARED/myImages dataset=folder th main.lua
Full log: https://gist.github.com/soumith/d8861ada490c53ea666b
{
fineSize : 64
dataset : "folder"
batchSize : 64
nThreads : 4
noise : "normal"
niter : 25
nz : 100
gpu : 1
name : "experiment1"
display_id : 10
display : 1
lr : 0.0002
ngf : 64
ndf : 64
beta1 : 0.5
loadSize : 96
ntrain : inf
}
Random Seed: 2937
Starting donkey with id: 4 seed: 2941
table: 0x87f62f0
Starting donkey with id: 3 seed: 2940
table: 0xb3e9f4c0
Starting donkey with id: 1 seed: 2938
table: 0xb3d8a240
Starting donkey with id: 2 seed: 2939
table: 0xb3c42e78
Creating train metadata
table: 0x89ee6d8
Creating train metadata
table: 0xb38abcb8
Creating train metadata
table: 0xb342b968
Creating train metadata
table: 0xb3c9de38
/mnt/hgfs/vmpublic/DCGAN/torch/install/bin/lua: ...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:183: [thread 4 callback] .../hgfs/vmpublic/DCGAN/dcgan.torch-master/data/dataset.lua:139: attempt to index global 'jit' (a nil value)
stack traceback:
.../hgfs/vmpublic/DCGAN/dcgan.torch-master/data/dataset.lua:139: in function '__init'
...mpublic/DCGAN/torch/install/share/lua/5.2/torch/init.lua:91: in function <...mpublic/DCGAN/torch/install/share/lua/5.2/torch/init.lua:87>
[C]: in function 'dataLoader'
...vmpublic/DCGAN/dcgan.torch-master/data/donkey_folder.lua:82: in main chunk
[C]: in function 'dofile'
...mpublic/DCGAN/torch/install/share/lua/5.2/paths/init.lua:84: in function 'dofile'
/mnt/hgfs/vmpublic/DCGAN/dcgan.torch-master/data/data.lua:42: in function </mnt/hgfs/vmpublic/DCGAN/dcgan.torch-master/data/data.lua:32>
(...tail calls...)
[C]: in function 'xpcall'
...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:234: in function 'callback'
...blic/DCGAN/torch/install/share/lua/5.2/threads/queue.lua:65: in function <...blic/DCGAN/torch/install/share/lua/5.2/threads/queue.lua:41>
[C]: in function 'pcall'
...blic/DCGAN/torch/install/share/lua/5.2/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
[C]: in function 'error'
...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:183: in function 'dojob'
...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:264: in function 'synchronize'
...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:142: in function 'specific'
...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:125: in function <...ic/DCGAN/torch/install/share/lua/5.2/threads/threads.lua:36>
(...tail calls...)
/mnt/hgfs/vmpublic/DCGAN/dcgan.torch-master/data/data.lua:30: in function 'new'
main.lua:38: in main chunk
[C]: in function 'dofile'
...CGAN/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: in ?
$:/dcgan.torch$ ls/dcgan.torch$ DATA_ROOT=cohnExtent dataset=folder th main.lua
arithmetic.lua cache cohnExtent data generate.lua images INSTALL.md LICENSE.md main.lua PATENTS README.md
$:
In root directory, I make the file path cohnExtent and put some images in the directory. But When I run the command, the error like:
$/torch/install/bin/luajit: ...e/$/torch/install/share/lua/5.1/threads/threads.lua:255:
[thread 2 callback] $/dcgan.torch/data/dataset.lua:202: Could not find any image file in the given input paths
[thread 4 callback] $/dcgan.torch/data/dataset.lua:202: Could not find any image file in the given input paths
[thread 1 callback] $/dcgan.torch/data/dataset.lua:202: Could not find any image file in the given input paths
[thread 3 callback]$/dcgan.torch/data/dataset.lua:202: Could not find any image file in the given input paths
Can anyone please tell the uses of the 2 environmental variables - loadSize and fineSize?
Hi,
I'm trying to train on an imagenet dataset of "computer rooms." For some reason, training is going by really quickly [about 20 minutes] and is only able to result in noisy shapes. Here's an example:
I'm sure I'm missing something obvious? Do I just need a bigger dataset? The err_D rates look different from other examples I've seen.
Here's a sample of what's going on:
$ DATA_ROOT=computer dataset=folder th main.lua
{
ntrain : inf
beta1 : 0.5
name : "experiment3"
niter : 25
batchSize : 64
ndf : 64
fineSize : 64
nz : 100
loadSize : 96
gpu : 1
ngf : 64
dataset : "folder"
lr : 0.0002
noise : "normal"
nThreads : 4
display_id : 10
display : 1
}
Random Seed: 2726
Starting donkey with id: 3 seed: 2729
table: 0x0dceea28
Starting donkey with id: 1 seed: 2727
table: 0x0dcadda0
Starting donkey with id: 4 seed: 2730
table: 0x0dccde18
Starting donkey with id: 2 seed: 2728
table: 0x0dd4ea78
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Dataset: folder Size: 960
Epoch: [1][ 0 / 15] Time: 1.973 DataTime: 0.000 Err_G: 0.5680 Err_D: 1.8186
Epoch: [1][ 1 / 15] Time: 1.785 DataTime: 0.001 Err_G: 1.7695 Err_D: 0.9822
Epoch: [1][ 2 / 15] Time: 1.765 DataTime: 0.000 Err_G: 0.4781 Err_D: 1.5481
Epoch: [1][ 3 / 15] Time: 1.806 DataTime: 0.000 Err_G: 1.6676 Err_D: 0.7404
Epoch: [1][ 4 / 15] Time: 1.752 DataTime: 0.000 Err_G: 1.3097 Err_D: 1.0967
Epoch: [1][ 5 / 15] Time: 1.785 DataTime: 0.000 Err_G: 1.0771 Err_D: 1.1696
Epoch: [1][ 6 / 15] Time: 1.740 DataTime: 0.000 Err_G: 1.5180 Err_D: 1.0296
Epoch: [1][ 7 / 15] Time: 1.769 DataTime: 0.001 Err_G: 0.7151 Err_D: 1.2693
Epoch: [1][ 8 / 15] Time: 1.738 DataTime: 0.000 Err_G: 2.8160 Err_D: 0.6199
Epoch: [1][ 9 / 15] Time: 2.265 DataTime: 0.000 Err_G: 0.3205 Err_D: 1.9448
Epoch: [1][ 10 / 15] Time: 1.464 DataTime: 0.000 Err_G: 5.2249 Err_D: 0.7500
Epoch: [1][ 11 / 15] Time: 1.739 DataTime: 0.000 Err_G: 0.8537 Err_D: 1.0718
Epoch: [1][ 12 / 15] Time: 1.742 DataTime: 0.000 Err_G: 1.7960 Err_D: 0.6485
Epoch: [1][ 13 / 15] Time: 1.758 DataTime: 0.000 Err_G: 0.8574 Err_D: 0.9329
Epoch: [1][ 14 / 15] Time: 1.754 DataTime: 0.001 Err_G: 5.5307 Err_D: 0.7156
End of epoch 1 / 25 Time Taken: 28.003
Epoch: [2][ 0 / 15] Time: 1.430 DataTime: 0.000 Err_G: 1.0214 Err_D: 0.8263
Epoch: [2][ 1 / 15] Time: 1.743 DataTime: 0.000 Err_G: 3.3267 Err_D: 0.3615
Epoch: [2][ 2 / 15] Time: 1.744 DataTime: 0.000 Err_G: 0.3200 Err_D: 1.8885
Epoch: [2][ 3 / 15] Time: 1.740 DataTime: 0.000 Err_G: 9.9051 Err_D: 0.5392
Epoch: [2][ 4 / 15] Time: 1.741 DataTime: 0.000 Err_G: 5.0082 Err_D: 0.2205
Epoch: [2][ 5 / 15] Time: 1.739 DataTime: 0.001 Err_G: 0.0109 Err_D: 5.1230
Epoch: [2][ 6 / 15] Time: 1.741 DataTime: 0.000 Err_G: 11.8632 Err_D: 0.3344
Epoch: [2][ 7 / 15] Time: 1.762 DataTime: 0.000 Err_G: 11.7923 Err_D: 0.4543
Epoch: [2][ 8 / 15] Time: 1.744 DataTime: 0.000 Err_G: 3.1376 Err_D: 0.1725
Epoch: [2][ 9 / 15] Time: 2.322 DataTime: 0.000 Err_G: 0.0095 Err_D: 5.2187
Epoch: [2][ 10 / 15] Time: 1.430 DataTime: 0.001 Err_G: 11.9288 Err_D: 0.3458
Epoch: [2][ 11 / 15] Time: 1.743 DataTime: 0.000 Err_G: 12.4909 Err_D: 0.4700
Epoch: [2][ 12 / 15] Time: 1.743 DataTime: 0.000 Err_G: 4.0857 Err_D: 0.1086
Epoch: [2][ 13 / 15] Time: 1.747 DataTime: 0.000 Err_G: 0.0036 Err_D: 6.3310
Epoch: [2][ 14 / 15] Time: 1.745 DataTime: 0.000 Err_G: 12.2344 Err_D: 0.1794
End of epoch 2 / 25 Time Taken: 27.318
Epoch: [3][ 0 / 15] Time: 1.425 DataTime: 0.000 Err_G: 13.8863 Err_D: 0.4748
Epoch: [3][ 1 / 15] Time: 1.739 DataTime: 0.000 Err_G: 7.3155 Err_D: 0.1225
Epoch: [3][ 2 / 15] Time: 1.737 DataTime: 0.000 Err_G: 0.1615 Err_D: 2.2792
Epoch: [3][ 3 / 15] Time: 1.738 DataTime: 0.000 Err_G: 13.2177 Err_D: 0.5127
Epoch: [3][ 4 / 15] Time: 1.739 DataTime: 0.000 Err_G: 12.9756 Err_D: 0.1219
Epoch: [3][ 5 / 15] Time: 1.741 DataTime: 0.000 Err_G: 6.0634 Err_D: 0.1051
Epoch: [3][ 6 / 15] Time: 1.738 DataTime: 0.000 Err_G: 0.0873 Err_D: 2.9455
Epoch: [3][ 7 / 15] Time: 1.737 DataTime: 0.000 Err_G: 13.9099 Err_D: 0.2702
Epoch: [3][ 8 / 15] Time: 1.740 DataTime: 0.000 Err_G: 15.4008 Err_D: 0.2829
Epoch: [3][ 9 / 15] Time: 2.298 DataTime: 0.000 Err_G: 8.0461 Err_D: 0.2743
Epoch: [3][ 10 / 15] Time: 1.427 DataTime: 0.000 Err_G: 0.3507 Err_D: 1.7363
Epoch: [3][ 11 / 15] Time: 1.741 DataTime: 0.000 Err_G: 11.6363 Err_D: 0.3731
Epoch: [3][ 12 / 15] Time: 1.741 DataTime: 0.000 Err_G: 11.4138 Err_D: 0.2604
Epoch: [3][ 13 / 15] Time: 1.736 DataTime: 0.000 Err_G: 4.7800 Err_D: 0.3057
Epoch: [3][ 14 / 15] Time: 1.743 DataTime: 0.000 Err_G: 0.0164 Err_D: 4.5180
End of epoch 3 / 25 Time Taken: 27.280
Hi,
The faces generated with "celebA_25_net_G.t7" have structure artifacts (not centralized and sometimes multiple faces are "stitched" together).
Seems this is not a problem in a different report (https://github.com/Newmu/dcgan_code).
I wonder why this happens. Probably the model on this github was trained with un-aligned face images?
Thanks.
Not all file corpuses are flawless; sometimes files are empty or the suffix doesn't match the format or they get deleted during the run etc. Since dcgan.torch assumes file reads will succeed without any problem, it will crash too if anything is amiss with any of the thousands or millions of files it may read.
This can be fixed by checking reading error status and skipping images that fail with a warning message (and additional verbose option to get the exact filename of the offending file, since getByClass
doesn't propagate its randomly chosen file upwards).
FeepingCreature provided a patch implementing that in data/dataset.lua
, which we've been using without any problem for several days now:
diff --git a/data/dataset.lua b/data/dataset.lua
index 0d39e27..a9d28eb 100644
--- a/data/dataset.lua
+++ b/data/dataset.lua
@@ -232,7 +232,6 @@ function dataset:__init(...)
end
runningIndex = runningIndex + length
end
-
--==========================================================================
-- clean up temporary files
print('Cleaning up temporary files')
@@ -313,6 +312,7 @@ end
function dataset:getByClass(class)
local index = math.ceil(torch.uniform() * self.classListSample[class]:nElement())
local imgpath = ffi.string(torch.data(self.imagePath[self.classListSample[class][index]]))
+ if self.verbose then print('Image path: ' .. imgpath) end
return self:sampleHookTrain(imgpath)
end
@@ -322,7 +322,7 @@ local function tableToOutput(self, dataTable, scalarTable)
local quantity = #scalarTable
assert(dataTable[1]:dim() == 3)
data = torch.Tensor(quantity,
- self.sampleSize[1], self.sampleSize[2], self.sampleSize[3])
+ self.sampleSize[1], self.sampleSize[2], self.sampleSize[3])
scalarLabels = torch.LongTensor(quantity):fill(-1111)
for i=1,#dataTable do
data[i]:copy(dataTable[i])
@@ -336,11 +336,15 @@ function dataset:sample(quantity)
assert(quantity)
local dataTable = {}
local scalarTable = {}
- for i=1,quantity do
+ while table.getn(dataTable)<quantity do
local class = torch.random(1, #self.classes)
- local out = self:getByClass(class)
- table.insert(dataTable, out)
- table.insert(scalarTable, class)
+ local success, out = pcall(function() return self:getByClass(class) end)
+ if success then
+ table.insert(dataTable, out)
+ table.insert(scalarTable, class)
+ else
+ print("failed to get an instance of "..class)
+ end
end
local data, scalarLabels = tableToOutput(self, dataTable, scalarTable)
return data, scalarLabels
I am reimplementing this paper by TensorFlow, but I want to use word-based method instead of char-based one mentioned in the original paper, then I got a high d loss.
I just wonder can I use word-based embedding in principle? If yes, is there any thing I need to consider?
Many thanks.
Hello, I'm getting a tedious error running the cropping script for preprocessing the Celeb-A dataset,
ajay@ajay-h8-1170uk:~/TorchProjects/dcgan$ DATA_ROOT=celebA th data/crop_celebA.lua
/usr/local/bin/luajit: /usr/local/share/lua/5.1/image/init.lua:339: attempt to concatenate local 'ext' (a nil value)
stack traceback:
/usr/local/share/lua/5.1/image/init.lua:339: in function 'load'
data/crop_celebA.lua:7: in main chunk
I tried to run the script line by line from TREPL using,
data = '/home/ajay/TorchProjects/dcgan/celebA/img_align_celeba'
for f in paths.files(data, function(nm) return nm:find('.jpg') end) do
f2 = paths.concat(data, f)
print(f2)
im = image.load(f2)
end
and got a similar error,
/home/ajay/TorchProjects/dcgan/celebA/img_align_celeba
/usr/local/share/lua/5.1/image/init.lua:339: attempt to concatenate local 'ext' (a nil value)
stack traceback:
/usr/local/share/lua/5.1/image/init.lua:339: in function 'load'
[string "for f in paths.files(data, function(nm) retur..."]:4: in main chunk
Sorry about this I've been away from coding for a while?
Hey there,
This seems like a really interesting library, nice work.
Currently training using my own set of images. # of total Epochs is 37.
Dumb question but how do I continue training, using a model but not overwriting or starting from scratch?
For example, in the torch-rnn library, the training script takes a 'init_from' parameter where you specify the last cached training t7 file.
Also, seems like a couple other people are using smaller data sets like me, e.g. 4000 images. I wondered if you had any tips for getting the best kind of results by tweaking the training parameters?
I get the error below on running DATA_ROOT=celebA dataset=folder th main.lua. Please suggest what could possibly be wrong.
ubuntu@tegra-ubuntu:~/work/dcgan.torch$ DATA_ROOT=celebA dataset=folder th main.lua
{
ntrain : inf
beta1 : 0.5
name : "experiment1"
niter : 25
batchSize : 64
ndf : 64
fineSize : 64
nz : 100
loadSize : 96
gpu : 1
ngf : 64
dataset : "folder"
lr : 0.0002
noise : "normal"
nThreads : 4
display_id : 10
display : 1
}
Random Seed: 7694
/usr/local/bin/luajit: /usr/local/share/lua/5.1/trepl/init.lua:384: module 'threads' not found:No LuaRocks module found for threads
no field package.preload['threads']
no file '/home/ubuntu/.luarocks/share/lua/5.1/threads.lua'
no file '/home/ubuntu/.luarocks/share/lua/5.1/threads/init.lua'
no file '/usr/local/share/lua/5.1/threads.lua'
no file '/usr/local/share/lua/5.1/threads/init.lua'
no file './threads.lua'
no file '/usr/local/share/luajit-2.0.4/threads.lua'
no file '/home/ubuntu/.luarocks/lib/lua/5.1/threads.so'
no file '/usr/local/lib/lua/5.1/threads.so'
no file './threads.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/trepl/init.lua:384: in function 'require'
/home/ubuntu/work/dcgan.torch/data/data.lua:1: in main chunk
[C]: in function 'dofile'
main.lua:37: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0000d055
Hello - I've been working to get the face examples running on OSX. I've used Torch before, but new to DCGANs.
After running through the full install a couple of times, including completely re-installing Torch (and trying cltorch, before reverting to Torch) I am still having problems.
Running DATA_ROOT=celebA dataset=folder th main.lua
results in the following error:
/Users/james/torch/install/bin/luajit: /Users/james/torch/install/share/lua/5.1/trepl/init.lua:384: module 'cunn' not found:No LuaRocks module found for cunn no field package.preload['cunn'] no file '/Users/james/.luarocks/share/lua/5.1/cunn.lua' no file '/Users/james/.luarocks/share/lua/5.1/cunn/init.lua' no file '/Users/james/torch/install/share/lua/5.1/cunn.lua' no file '/Users/james/torch/install/share/lua/5.1/cunn/init.lua' no file '/Users/james/torch-cl/install/share/lua/5.1/cunn.lua' no file '/Users/james/torch-cl/install/share/lua/5.1/cunn/init.lua' no file './cunn.lua' no file '/Users/james/torch/install/share/luajit-2.1.0-beta1/cunn.lua' no file '/usr/local/share/lua/5.1/cunn.lua' no file '/usr/local/share/lua/5.1/cunn/init.lua' no file '/Users/james/.luarocks/lib/lua/5.1/cunn.so' no file '/Users/james/torch/install/lib/lua/5.1/cunn.so' no file '/Users/james/torch/install/lib/cunn.dylib' no file '/Users/james/torch-cl/install/lib/cunn.dylib' no file '/Users/james/torch-cl/install/lib/lua/5.1/cunn.so' no file './cunn.so' no file '/usr/local/lib/lua/5.1/cunn.so' no file '/usr/local/lib/lua/5.1/loadall.so' stack traceback: [C]: in function 'error' /Users/james/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require' main.lua:126: in main chunk [C]: in function 'dofile' ...ames/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x0109892d10
Running luarocks install cunn
fails when it tries to install cutorch:
CMake Error at /usr/local/Cellar/cmake/3.6.0/share/cmake/Modules/FindCUDA.cmake:619 (message): Specify CUDA_TOOLKIT_ROOT_DIR Call Stack (most recent call first): CMakeLists.txt:7 (FIND_PACKAGE) -- Configuring incomplete, errors occurred! See also "/tmp/luarocks_cutorch-scm-1-5945/cutorch/build/CMakeFiles/CMakeOutput.log". Error: Failed installing dependency: https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec - Build error: Failed building.
As I understand it, CUDA requires a GPU, which I don't have - but the instructions for dcgan.torch suggest that a GPU isn't necessary, so I believe there should be a way around this?
Many thanks.
Running
gpu=1 batchSize=1 imsize=10 noisemode=linefull net=bedrooms_4_net_G.t7 th generate.lua
Gives error
{
gpu : 1
noisemode : "linefull"
name : "generation1"
noisetype : "normal"
batchSize : 1
net : "bedrooms_4_net_G.t7"
imsize : 10
nz : 100
display : 1
}
/home/ubuntu/torch-distro/install/bin/luajit: /home/ubuntu/dcgan.torch/util.lua:61: attempt to call method 'apply' (a nil value)
stack traceback:
/home/ubuntu/dcgan.torch/util.lua:61: in function 'load'
generate.lua:24: in main chunk
[C]: in function 'dofile'
...rch-distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670
Likewise, running
DATA_ROOT=FireLoop dataset=folder th main.lua
Gives similar error
/home/ubuntu/torch-distro/install/bin/luajit: main.lua:82: attempt to call method 'apply' (a nil value)
stack traceback:
main.lua:82: in main chunk
[C]: in function 'dofile'
...rch-distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670
On an AWS g2.2xlarge machine, with cuDNN.
Lua 5.2.3
I understand that sampleHookTrain takes an image path and reads the image into a tensor. But I don't see any definition for it in the dataset module. I think I am missing something here. Can someone please explain?
Thanks in advance.
Hi,
I am trying to understand how the distribution of Z vector affects the training and the subsequent generation of images from the trained generator. The paper hasn't mentioned any significant effects of using different kinds of distributions to sample Z vectors from. From my experiments, I found that it matters a lot for the quality and type of images generated. For example, the following are some images generated after training the DCGAN on Celeb dataset for 25 iterations using uniform(0,1) distribution for sampling the Z vectors.
Also, after training the DCGAN on a normal(0,1) distribution, the corresponding trained generator's results on a Z vector not sampled from this normal distribution weren't good.
Can anyone give any tips on choosing the right kind of distribution for Z vector sampling based on the kind of training data we use?
Like the title says, I get the error below when running main.lua. I definitely have cuDNN installed (works with Theano just fine), and the code works with gpu=0. Any ideas?
$ DATA_ROOT=myimages dataset=folder th main.lua
{
ntrain : inf
beta1 : 0.5
name : "experiment1"
niter : 25
batchSize : 64
ndf : 64
fineSize : 64
nz : 100
loadSize : 96
gpu : 1
ngf : 64
dataset : "folder"
lr : 0.0002
noise : "normal"
nThreads : 4
display_id : 10
display : 1
}
Random Seed: 2675
Starting donkey with id: 4 seed: 2679
table: 0x0fa4c738
Starting donkey with id: 1 seed: 2676
table: 0x0fa6c4d8
Starting donkey with id: 3 seed: 2678
table: 0x0fa8c110
Starting donkey with id: 2 seed: 2677
table: 0x0facc670
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Dataset: folder Size: 498
/Users/adamferriss/torch/install/bin/luajit: ...adamferriss/torch/install/share/lua/5.1/nn/LeakyReLU.lua:24: attempt to call field 'LeakyReLU_updateOutput' (a nil value)
stack traceback:
...adamferriss/torch/install/share/lua/5.1/nn/LeakyReLU.lua:24: in function 'updateOutput'
...damferriss/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
main.lua:160: in function 'opfunc'
...s/adamferriss/torch/install/share/lua/5.1/optim/adam.lua:33: in function 'adam'
main.lua:214: in main chunk
[C]: in function 'dofile'
...riss/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x010ef19770
I found npy format here: https://github.com/Newmu/dcgan_code/tree/master/models. But this model's generator generates 32x32 images. I was wondering if there's a dcgan model trained on ImageNet with a fineSize=64, available in torch.
Thanks!
Error: Failed installing dependency: http://luarocks.org/repositories/rocks/async-1.1-1.rockspec - Could not satisfy dependency: love ~> 0.9
how to install love ???
In the lines 69-73 of main.lua:
netG:add(SpatialFullConvolution(ngf * 8, ngf * 4, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 4)):add(nn.ReLU(true))
-- state size: (ngf*4) x 8 x 8
netG:add(SpatialFullConvolution(ngf * 4, ngf * 2, 4, 4, 2, 2, 1, 1))
netG:add(SpatialBatchNormalization(ngf * 2)):add(nn.ReLU(true))
However Figure 1. of the paper (http://arxiv.org/pdf/1511.06434v2.pdf) shows kernel width of 5 for upconvolutions.
Hi,
I am currently using cudnn 5 and the code is unchanged in main.lua, and here is the error i get.
All tests on cudnn pass. I also updated cutorch, and torch, ended up with the same error.
Any help is much appreciated !!
Thanks,
Hi,
After training the network with Celeb faces dataset, a forward pass on 10 noise vectors into the generator gives decent results like below:
But when I passed same vector replicated as a batch of 10 to the trained generator it gave the following:
It looks like low frequencies are not present. Similar results were obtained when a single vector was passed.
I think I am missing something. Is the forward pass on trained generator affected by the number and kind of Z vectors passed?
Any insight is appreciated. Thanks!
Hello, thanks a lot for helping the community with your code!
I'm training the GAN with ~4000 grey images of faces for 250 epochs and saving the network every 10 epochs. I am, however, having trouble figuring out how to select the best network that I should use for generating new images.
Would the sum of all errG within one epoch be a good score for how good the generative network is performing at that point during training?
Thanks a lot again!
Hi,
The description in the code says that it is applied during testing. Can anyone please clarify what test setting it means?
Thanks!
Hi - I've been doing a lot of work lately with interpolation in latent space, and I think linear interpolation might not be the best interpolation operator for high dimensional spaces. Though admittedly this is common practice, this seemed as good a place as any to discuss this, since the dcgan code seems to do exactly that here:
noiseL = torch.FloatTensor(opt.nz):uniform(-1, 1)
noiseR = torch.FloatTensor(opt.nz):uniform(-1, 1)
if opt.noisemode == 'line' then
-- do a linear interpolation in Z space between point A and point B
-- each sample in the mini-batch is a point on the line
line = torch.linspace(0, 1, opt.batchSize)
for i = 1, opt.batchSize do
noise:select(1, i):copy(noiseL * line[i] + noiseR * (1 - line[i]))
end
I'm starting with the assumption that torch.FloatTensor(opt.nz):uniform(-1, 1)
is a valid way to uniformly sample from the prior in the latent space. In the examples below, I'll leave the nz
dimension at the default of 100
. Let's do an experiment and see what the expected lengths of these vectors are.
I see a gaussian with mean about 5.76 and with 0.25 standard deviation. I believe this means that >99% of vectors would be expected to have a length between 4.8 and 6.8 (4 standard deviations out). This result should not be a big surprise if we think about taking 100 independent random numbers and then running them through the distance formula.
But now let's think about the effects of linear interpolation between these random vectors. At an extreme, we have the linearly interpolated midpoints halfway between any two of these vectors - let's see what the expected lengths of these are.
So now we have a gaussian with a mean vector of 4.06 and 0.24 standard deviation. Needless to say, these are not the same distribution, and in fact they are effectively disjoint - the probability of an item from the second appearing in the first is vanishingly small. In other words, the points on the linearly interpolated path are many standard deviations away from points expected in the prior distribution.
If my premise is correct that torch.FloatTensor(opt.nz):uniform(-1, 1)
performs a uniform sampling across the latent space (a big if, and I'd like to verify this!), then the prior is more shaped like a hypersphere. In that case, spherical interpolation makes a lot more sense, and in my own experiments I've had good qualitative results with this approach. Curious what others think. Also note that this reasoning could be extended beyond just interpolation since this would also affect other interpretable operations - such as finding the average in a subset of labeled data (eg: average man or woman in faces).
Is there any significance to scaling the values of training images to the range [-1,1] other than just normalizing them to remove any inherent bias among the dimensions?
Thanks!
why I trained like 1.1 Train a face generator using the Celeb-A dataset
can't be used in Vector Arithmetic
Hi,
I'm attempting to use the GAN framework in a different setting than images and I was using this code as a reference. I noticed something odd: When we call optim.adam on fDx, the parameters in netD get updated, but we use the output with respect to the original parameters when we call it on fGx. Shouldn't we call forward on netD inside fGx rather than recycle the previous output so that both output and gradInput are computed with respect to the same parameters?
Thanks,
Shawn
Hi Soumith, appreciate much for sharing the code.
Recently, I am trying to use your code to train dcgan on svhn. I tried many network architectures and hyperparameters, but failed to reproduce analogous performance as evaluated by the model shared by Alec Radford. What I can get is ~69% using 32x32 images and 1000 training labels. Unfortunately, I found Alec did not release the training code to get the model, so there is no reference to set up the network arch and determine the hyperparameters. So I would like to know whether you or any others have get a model that has similar performance to Alec's, say, 75+% accuracy on svhn based on 1000 labels.
thanks in advance!
Probably not the wisest decision to upgrade to 16.04 already but it's (basically) too late now. The error message below persists after completely reinstalling torch and all deps. I'm running Cuda 7.5 and have various cudnn primitive binding libs install in /usr/local/cuda-7.5/lib64
. Cuda is in my PATH and also have LD_LIBRARY_PATH, CUDA_HOME, CUDA_LIB, CUDA_BIN, CPATH, LIBRARY_PATH setup. I'm so lost haha
jamis@jamis:~/src/dcgan.torch$ gpu=1 net=checkpoints/celebA_25_net_G.t7 th generate.lua
{
gpu : 1
noisemode : "random"
name : "generation1"
noisetype : "normal"
batchSize : 32
net : "checkpoints/celebA_25_net_G.t7"
imsize : 1
nz : 100
display : 1
}
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> output]
(1): nn.SpatialFullConvolution(100 -> 512, 4x4)
(2): nn.SpatialBatchNormalization
(3): nn.ReLU
(4): nn.SpatialFullConvolution(512 -> 256, 4x4, 2,2, 1,1)
(5): nn.SpatialBatchNormalization
(6): nn.ReLU
(7): nn.SpatialFullConvolution(256 -> 128, 4x4, 2,2, 1,1)
(8): nn.SpatialBatchNormalization
(9): nn.ReLU
(10): nn.SpatialFullConvolution(128 -> 64, 4x4, 2,2, 1,1)
(11): nn.SpatialBatchNormalization
(12): nn.ReLU
(13): nn.SpatialFullConvolution(64 -> 3, 4x4, 2,2, 1,1)
(14): nn.Tanh
}
/home/jamis/torch/install/bin/luajit: /home/jamis/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
...h/install/share/lua/5.1/cudnn/SpatialFullConvolution.lua:114: attempt to perform arithmetic on field 'adjW' (a nil value)
stack traceback:
...h/install/share/lua/5.1/cudnn/SpatialFullConvolution.lua:114: in function 'createIODescriptors'
...h/install/share/lua/5.1/cudnn/SpatialFullConvolution.lua:312: in function <...h/install/share/lua/5.1/cudnn/SpatialFullConvolution.lua:310>
[C]: in function 'xpcall'
/home/jamis/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/jamis/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
/home/jamis/torch/install/share/lua/5.1/optnet/init.lua:376: in function 'optimizeMemory'
generate.lua:82: in main chunk
[C]: in function 'dofile'
...amis/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405d50
WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/jamis/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/home/jamis/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
/home/jamis/torch/install/share/lua/5.1/optnet/init.lua:376: in function 'optimizeMemory'
generate.lua:82: in main chunk
[C]: in function 'dofile'
...amis/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405d50
In loadImage function of donkey_folder lua script, scaling is performed on input images such that the aspect ratio is maintained. I do understand that the objects in images don't look distorted (in comparison to their original versions) after scaling if aspect ratio is maintained. Are there any other reasons for doing this?
How would one go about visualizing the filters of this network? I tried doing something like
net = util.load(opt.net, opt.gpu)
filters = net:get(13).weight
image.save("filters.png", image.toDisplayTensor{input=filters})
The output is super tiny on this layer though, 24x44 pixels.
How can I visualize more of the learned filters from other layers of the network? It looks like the conv layers are 1, 4, 7, 10, and 13.
Sampling from the other layers seems like it just outputs noise. Also looks like most of the other examples for visualizing conv layer filters are using the itorch image function, and I'd like to do it outside the interactive notebook if possible.
OSX 10.11.5.
$ DATA_ROOT=myimages dataset=folder th main.lua
{
ntrain : inf
beta1 : 0.5
name : "experiment1"
niter : 25
batchSize : 64
ndf : 64
fineSize : 64
nz : 100
loadSize : 96
gpu : 1
ngf : 64
dataset : "folder"
lr : 0.0002
noise : "normal"
nThreads : 4
display_id : 10
display : 1
}
Random Seed: 251
Starting donkey with id: 1 seed: 252
table: 0x02b80fc8
Starting donkey with id: 2 seed: 253
table: 0x02ba0ca8
Starting donkey with id: 3 seed: 254
table: 0x02bc0f40
Starting donkey with id: 4 seed: 255
table: 0x02be0d88
Creating train metadata
table: 0x02ef7ee0
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
Creating train metadata
table: 0x02bf1af0
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
Creating train metadata
table: 0x02e00c70
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
Creating train metadata
table: 0x02d971a8
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
/tmp/lua_2AEOyi: line 1: gfind: command not found
/tmp/lua_9DgkXe: line 1: gfind: command not found
/tmp/lua_psrQx3: line 1: gfind: command not found
/tmp/lua_aZfoE8: line 1: gfind: command not found
now combine all the files to a single large file
now combine all the files to a single large file
now combine all the files to a single large file
now combine all the files to a single large file
load the large concatenated list of sample paths to self.imagePath
load the large concatenated list of sample paths to self.imagePath
load the large concatenated list of sample paths to self.imagePath
load the large concatenated list of sample paths to self.imagePath
sh: gwc: command not found
sh: gwc: command not found
sh: gwc: command not found
sh: gwc: command not found
/Users/WS18/torch/install/bin/luajit: /Users/WS18/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 1 callback] /Users/WS18/dcgan.torch/data/dataset.lua:198: attempt to perform arithmetic on a nil value
stack traceback:
/Users/WS18/dcgan.torch/data/dataset.lua:198: in function '__init'
/Users/WS18/torch/install/share/lua/5.1/torch/init.lua:91: in function </Users/WS18/torch/install/share/lua/5.1/torch/init.lua:87>
[C]: in function 'dataLoader'
/Users/WS18/dcgan.torch/data/donkey_folder.lua:82: in main chunk
[C]: in function 'dofile'
/Users/WS18/dcgan.torch/data/data.lua:42: in function </Users/WS18/dcgan.torch/data/data.lua:32>
[C]: in function 'xpcall'
/Users/WS18/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
/Users/WS18/torch/install/share/lua/5.1/threads/queue.lua:65: in function </Users/WS18/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
/Users/WS18/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
[C]: in function 'error'
/Users/WS18/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
/Users/WS18/torch/install/share/lua/5.1/threads/threads.lua:264: in function 'synchronize'
/Users/WS18/torch/install/share/lua/5.1/threads/threads.lua:142: in function 'specific'
/Users/WS18/torch/install/share/lua/5.1/threads/threads.lua:125: in function 'Threads'
/Users/WS18/dcgan.torch/data/data.lua:30: in function 'new'
main.lua:38: in main chunk
[C]: in function 'dofile'
...WS18/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0101e83d10
(This is related to but not identical to issue #2 .)
Cropping is particularly undesirable on very small images like 64x64 where it may delete a lot of the image (especially when the images come pre-centered and cropped already). Currently, you cannot run dcgan.torch with no cropping despite the configurable arguments like loadSize=64 fineSize=64
suggesting that should be possible. This is not due to design but a bug in the cropping code in data/donkey_folder.lua
, it seems; said in issue #2:
Right now, loadSize has to be greater than fineSize (because of a bug in the cropping logic). So it's okay to have loadSize=65 fineSize=64 th main.lua
I messed around some with the responsible trainHook
and I think the bug can be fixed by simply checking for the case where the original H/W are greater than the fineSize
value and if they aren't, feeding 0s into the crop function, so the new version would look like this:
-- do random crop if fineSize/sampleSize is configured to be smaller than NN's input dimensions, loadSize
local iW = input:size(3)
local iH = input:size(2)
local oW = sampleSize[2]
local oH = sampleSize[2]
if (iW > oW) then
w1 = math.ceil(torch.uniform(1e-2, iW-oW))
else
w1 = 0
end
if (iH > oH) then
h1 = math.ceil(torch.uniform(1e-2, iH-oH))
else
h1 = 0
end
local out = image.crop(input, w1, h1, w1 + oW, h1 + oH)
assert(out:size(2) == oW)
assert(out:size(3) == oH)
Or to diff it:
diff --git a/data/donkey_folder.lua b/data/donkey_folder.lua
index 3a82393..5248f4e 100644
--- a/data/donkey_folder.lua
+++ b/data/donkey_folder.lua
@@ -52,17 +52,26 @@ local mean,std
local trainHook = function(self, path)
collectgarbage()
local input = loadImage(path)
+
+ -- do random crop if fineSize/sampleSize is configured to be smaller than NN's input dimensions, loadSize
local iW = input:size(3)
local iH = input:size(2)
-
- -- do random crop
- local oW = sampleSize[2];
+ local oW = sampleSize[2]
local oH = sampleSize[2]
- local h1 = math.ceil(torch.uniform(1e-2, iH-oH))
- local w1 = math.ceil(torch.uniform(1e-2, iW-oW))
+ if (iW > oW) then
+ w1 = math.ceil(torch.uniform(1e-2, iW-oW))
+ else
+ w1 = 0
+ end
+ if (iH > oH) then
+ h1 = math.ceil(torch.uniform(1e-2, iH-oH))
+ else
+ h1 = 0
+ end
local out = image.crop(input, w1, h1, w1 + oW, h1 + oH)
assert(out:size(2) == oW)
assert(out:size(3) == oH)
+
-- do hflip with probability 0.5
if torch.uniform() > 0.5 then out = image.hflip(out); end
out:mul(2):add(-1) -- make it [0, 1] -> [-1, 1]
This seems to work both in the 64x64px default version and the 128x128px fork, eg
$ nThreads=1 DATA_ROOT=myimages dataset=folder batchSize=2 loadSize=128 fineSize=128 nz=75 ngf=106 ndf=48 gpu=0 th main-128.lua
{
ntrain : inf
beta1 : 0.5
name : "experiment1"
niter : 25
batchSize : 2
ndf : 48
fineSize : 128
nz : 75
loadSize : 128
gpu : 0
ngf : 106
dataset : "folder"
lr : 0.0002
noise : "normal"
nThreads : 1
display_id : 10
display : 1
}
Random Seed: 5143
Starting donkey with id: 1 seed: 5144
table: 0x406af6b8
Loading train metadata from cache
Dataset: folder Size: 442215
Epoch: [1][ 0 / 221107] Time: 8.181 DataTime: 0.003 Err_G: 1.1998 Err_D: 1.1637
Epoch: [1][ 1 / 221107] Time: 5.115 DataTime: 0.001 Err_G: 0.3660 Err_D: 1.5839
Epoch: [1][ 2 / 221107] Time: 5.965 DataTime: 0.001 Err_G: 2.8597 Err_D: 1.7219
Epoch: [1][ 3 / 221107] Time: 6.163 DataTime: 0.001 Err_G: 0.1956 Err_D: 2.2080
Epoch: [1][ 4 / 221107] Time: 5.537 DataTime: 0.001 Err_G: 0.7360 Err_D: 1.9527
Epoch: [1][ 5 / 221107] Time: 6.300 DataTime: 0.001 Err_G: 6.8542 Err_D: 3.6255
...
And looking at the displayed training sample images in the display server, they don't look cropped like before. So although I haven't run anything to completion, I think that fix works.
After commit e057802, net:evaluate() mode no longer displays correct images, but noise.
I have two trained nets, one before and another after this commit. In net:training() mode they both have a normal behaviour, but in net:evaluate() the first one correctly generates the same images as in training mode, but the later just outputs noise.
Here's an example of an MNIST trained model after the commit. Left one is on training mode, right one is on evaluate.
After some tests I can say that it is not the way nets are loaded - a network trained before the commit but loaded after it works correctly.
Also, I have run some batch iterations before saving the model in order to update running_mean and running_var from BN layers.
When I run DATA_ROOT=xxx dataset=folder th main.sh
I get the information:
[C]: in function 'xpcall'
/home/xhs/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
/home/xhs/torch/install/share/lua/5.1/threads/queue.lua:65: in function </home/xhs/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
/home/xhs/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
[C]: in function 'error'
/home/xhs/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
/home/xhs/torch/install/share/lua/5.1/threads/threads.lua:264: in function 'synchronize'
/home/xhs/torch/install/share/lua/5.1/threads/threads.lua:142: in function 'specific'
/home/xhs/torch/install/share/lua/5.1/threads/threads.lua:125: in function 'Threads'
/home/xhs/dcgan.torch/data/data.lua:30: in function 'new'
main.lua:38: in main chunk
[C]: in function 'dofile'
.../xhs/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
by google I find if the version is not matching? How to solve the question?Thank you!
the paper "unsupervised representation learning with deep convolutional generative adversarial networks" introduce the result of visualizing the discriminator features.And how to do it using torch or tensorflow ?
welcome to talk with me.
Hi, is there a torch code incorporating the training improvements described in the paper "Improved Techniques for Training GANs"?
How can I add my picture as c in arithmetic.lua
Hi,
I was doing some experiments with the DCGAN with MNIST dataset. Once the GAN is trained, I have noticed that the generated images do not only depend on the original vector noise that originated it, but also the other vector noises that were also given as an input to create other images.
Let's put a simple example. I have the input vector Z of the generator (9x100x1x1), which is made of 9 subvector noises of dimensionality 1x100x1x1. The generator, then, outputs these nine 32x32 generated images:
Let's say that, for whatever reason, I'm interested in replicating only the middle left image. So, if I input just the 1x100x1x1 vector instead of the 9x100x1x1, what I obtain instead is this generated image:
Which is far from being identical to the previous middle left image. So, why is this happening? Shouldn't the generated images be the same regardless of how many input vectors are you using?
This is important if you want to replicate results (for encoding purposes, for example), as you need to exactly input all the input vector noises, not only the one you are interested in.
Thanks.
I am a bit confused about if there is any difference between the formulations of min_G max_D or max_D min_G in GAN?
or these two will be actually equal after iterative updating of G and D?
I tried to run the main.lua using the command "DATA_ROOT=celebA dataset=folder th main.lua", but encountered the errors as follows
{
ntrain : inf
beta1 : 0.5
name : "experiment1"
niter : 25
batchSize : 64
ndf : 64
fineSize : 64
nz : 100
loadSize : 96
gpu : 1
ngf : 64
dataset : "folder"
lr : 0.0002
noise : "normal"
nThreads : 4
display_id : 10
display : 1
}
Random Seed: 8565
Starting donkey with id: 1 seed: 8566
table: 0x40928780
Starting donkey with id: 2 seed: 8567
table: 0x417e0f70
Starting donkey with id: 3 seed: 8568
table: 0x40945ec0
Starting donkey with id: 4 seed: 8569
table: 0x414d2728
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Dataset: folder Size: 33436
/usr/local/torch7/install/bin/luajit: main.lua:82: attempt to call method 'apply' (a nil value)
stack traceback:
main.lua:82: in main chunk
[C]: in function 'dofile'
...cal/torch7/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406640
I am sure the Torch is installed correctly. What is the problem? I am looking forward to your answer. Thanks a lot.
I followed first part of tutorial steps written in README file.
After downloaded celebA files, I executed pre-processing and training.
I didn't change any main.lua or crop_celebA.lua scripts.
After training all 25 epochs, I executed generate.lua script with the last created model in checkpoints, named as 'experiment1_25_net_G.t7' (as the largest number after 'experiment1')
Computer generated a single png image file, but this is not like tutorial result image.
The result image is more like generated feature map.
Was it supposed to be generate feature image? Or should I process another step further?
How to view this generated image as normal image?
I already checked ldisplay, but any image haven't shown up at this time.
Hey - having some issues getting this running, would greatly appreciate some other eyes on this as I'm still young in my deep learning understanding.
It appears that I'm successfully able to generate checkpoints using
DATA_ROOT=500 dataset=folder gpu=1 th main.lua
I'm running a GTX 1080 with CUDA 8.0, and I get a folder full of checkpoints.
But when I try to run generate.lua, I get
"unknown Torch class <torch.CudaTensor>" errors.
Any thoughts? Here is what I'm seeing. Any tips would be majorly appreciated!
gpu=1 net=checkpoints/experiment1_10_net_G.t7 th generate.lua
{
gpu : 1
noisemode : "random"
name : "generation1"
noisetype : "normal"
batchSize : 32
net : "checkpoints/experiment1_10_net_G.t7"
imsize : 1
nz : 100
display : 1
}
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/torch/File.lua:343: unknown Torch class <torch.CudaTensor>
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/torch/File.lua:343: in function 'readObject'
/root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/root/torch/install/share/lua/5.1/nn/Module.lua:158: in function 'read'
/root/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
/root/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
generate.lua:24: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
Using a GPU-trained model, running generate.lua fails as follows
gpu=1 net=experiment1_432_net_G.t7 th generate.lua
{
gpu : 1
noisemode : "random"
name : "generation1"
noisetype : "normal"
batchSize : 32
net : "experiment1_432_net_G.t7"
imsize : 1
nz : 100
display : 1
}
/home/hannu/torch/install/bin/luajit: /home/hannu/torch/install/share/lua/5.1/torch/File.lua:343: unknown Torch class <torch.CudaTensor>
stack traceback:
[C]: in function 'error'
/home/hannu/torch/install/share/lua/5.1/torch/File.lua:343: in function 'readObject'
/home/hannu/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/home/hannu/torch/install/share/lua/5.1/nn/Module.lua:158: in function 'read'
/home/hannu/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
/home/hannu/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
generate.lua:24: in main chunk
[C]: in function 'dofile'
...annu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x004065d0
Looking at the code, I noticed that the model is loaded before cunn and cudnn are required. I moved/placed
if opt.gpu > 0 then
require 'cunn'
require 'cudnn'
end
at the beginning and the code now runs without problems.
Newbie question! (Yet again ๐ )
What does the function optimizeInferenceMemory in the utils.lua
do exactly? As in, how does it reduce memory overhead/consumption?
So I and another were trying out dcgan.torch to see how well it would work on image sets more complicated than faces (kudos on writing an implementation much easier to get up and running than the original dcgan-theano, BTW; we really weren't looking forward to figuring out how to get HDF5 image input working, although some details could use work - like, why is nThreads=1
by default?), and I became concerned that 64x64 images were just too little to convey all the details and would lead to a poorly-trained NN.
Experimenting with the options, it seems that one can get dcgan.torch to work with almost the whole image by setting the full image size to be very similar to that of the crop size: loadSize=65 fineSize=64
. Or one could downscale all the images on disk with a command like ls *.jpg | parallel mogrify -resize 65536@
. (I am still trying it out but dcgan appears to make much faster progress when trained on almost-full images at 65x65 than when trained on 64x64 crops of full-resolution images.)
The full image still winds up being extremely low resolution, though. Reading through main.lua
and donkey_folder.lua
is a little confusing. It looks as if we're supposed to be able to increase the size of trained images by increasing fineSize
and also the two parameters governing the size of the base layer of the generator & discriminator NNs, so we thought that using better images would be as simple as loadSize=256 fineSize=255 ngf=255 ndf=255
- load a decent-resolution image, crop it minimally, and feed it into the NNs of same size.
But that doesn't work. In fact, we can't find a setting of fineSize
other than 64 which doesn't immediately crash dcgan.torch regardless of what we set the other options to. Are we misunderstanding the config options' intent, or is there a bug somewhere?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.