Comments (4)
So when you create a ParallelDatasetIterator
, the dataset (in your case, the BatchDataset
) gets serialized in order for it to be copied to the threads. For data that lives in Torch tensors or in tds objects, shared serialization is used; so this should not lead to high additional memory usage. For Lua objects (strings, tables, etc.), an actual copy is performed, which costs memory.
If you are using LuaJIT, the memory used for these Lua objects will count to LuaJIT's 1-2G memory limit depending on your the platform you're compiling to; see https://kvitajakub.github.io/2016/03/08/luajit-memory-limitations/
Your error message makes me think that is the memory limit you're hitting here. The best way to resolve this is to try and find out which Lua tables in your dataset take up most of the memory, and copy those into tds.Vec
/ tds.Hash
objects.
from torchnet.
@lvdmaaten Thanks for explaining. I follow the usage of ParallelDatasetIterator in the mnist example of torchnet:
local function getIterator(mode)
return tnt.ParallelDatasetIterator{
nthread = 1,
init = function() require 'torchnet' end,
closure = function()
return tnt.BatchDataset{
batchsize = 128,
dataset = tnt.ListDataset(file_list_text,function(fp)
local data, gt = torch.load(fp)
return {input=data,target=gt}
end)
Each file listed in "file_list_text" is a torch serialization file of size 100KB-200KB.
When BatchDataset gets seriallized, I think it won't actually load these data files. I don't see what could cause the memory consumption here. I tried using tds.Hash instead of "{input=data,target=gt}", but the code won't run:
FATAL THREAD PANIC: (write) /root/torch/install/share/lua/5.1/torch/File.lua:141: Unwritable object at <?>.callback.closure.tds.C
from torchnet.
Add require 'tds'
to the init
closure to resolve those kinds of serialization issues. Note that ParallelDatasetIterator
is a relatively thin wrapper on top of torch-threads
, so it may be instructive to read up on how torch-threads
works: https://github.com/torch/threads
from torchnet.
Adding require 'tds' to the init closure makes no difference.
The ParallelDatasetIterator usage in mnist example should be the standard way to work with it. I just can't get it to work. In my case, if the number of samples increases to 220,000, it won't even work when nthread=1. I guess I have to look at other ways, as you suggest. Thanks, Ivdmaaten.
from torchnet.
Related Issues (20)
- Document uncorrect about "transform.perm"
- for ListDataset, add an onComplete argument HOT 2
- OptimEngine.test not implemented HOT 2
- fatal thread panic on parallelDatasetIterator HOT 1
- Improve ParallelDatasetIterator documentation HOT 13
- How can i use MSE criterion? HOT 5
- IndexedDataset using string as index for large dataset HOT 4
- returning vector in ListDataset problem. HOT 2
- This error is unclear - what is the problem with my code that is causing this? HOT 3
- Segmentation fault (core dumped) HOT 8
- Bug report: not entering into iterator until thorough depth. HOT 2
- ClassErrorMeter throwing size mismatch error HOT 1
- meter.MultilabelConfusionMeter invalid argument error HOT 1
- some bugs of transform.merge() HOT 1
- bug in transform.tablemergekeys() HOT 1
- Unable to install qlua, and therefore: qlua: module 'torchnet' not found HOT 1
- Hi, I doubt that whether I can use torchnet in win10 64x. Could anybody tell me? HOT 2
- How to add torchnet to a custom package on kaggle HOT 1
- RecursionError with meter.ConfusionMeter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torchnet.