Code Monkey home page Code Monkey logo

Comments (4)

lvdmaaten avatar lvdmaaten commented on August 13, 2024

So when you create a ParallelDatasetIterator, the dataset (in your case, the BatchDataset) gets serialized in order for it to be copied to the threads. For data that lives in Torch tensors or in tds objects, shared serialization is used; so this should not lead to high additional memory usage. For Lua objects (strings, tables, etc.), an actual copy is performed, which costs memory.

If you are using LuaJIT, the memory used for these Lua objects will count to LuaJIT's 1-2G memory limit depending on your the platform you're compiling to; see https://kvitajakub.github.io/2016/03/08/luajit-memory-limitations/

Your error message makes me think that is the memory limit you're hitting here. The best way to resolve this is to try and find out which Lua tables in your dataset take up most of the memory, and copy those into tds.Vec / tds.Hash objects.

from torchnet.

zhouyong64 avatar zhouyong64 commented on August 13, 2024

@lvdmaaten Thanks for explaining. I follow the usage of ParallelDatasetIterator in the mnist example of torchnet:

local function getIterator(mode)
return tnt.ParallelDatasetIterator{
nthread = 1,
init = function() require 'torchnet' end,
closure = function()
return tnt.BatchDataset{
batchsize = 128,
dataset = tnt.ListDataset(file_list_text,function(fp)
local data, gt = torch.load(fp)
return {input=data,target=gt}
end)
Each file listed in "file_list_text" is a torch serialization file of size 100KB-200KB.
When BatchDataset gets seriallized, I think it won't actually load these data files. I don't see what could cause the memory consumption here. I tried using tds.Hash instead of "{input=data,target=gt}", but the code won't run:
FATAL THREAD PANIC: (write) /root/torch/install/share/lua/5.1/torch/File.lua:141: Unwritable object at <?>.callback.closure.tds.C

from torchnet.

lvdmaaten avatar lvdmaaten commented on August 13, 2024

Add require 'tds' to the init closure to resolve those kinds of serialization issues. Note that ParallelDatasetIterator is a relatively thin wrapper on top of torch-threads, so it may be instructive to read up on how torch-threads works: https://github.com/torch/threads

from torchnet.

zhouyong64 avatar zhouyong64 commented on August 13, 2024

Adding require 'tds' to the init closure makes no difference.
The ParallelDatasetIterator usage in mnist example should be the standard way to work with it. I just can't get it to work. In my case, if the number of samples increases to 220,000, it won't even work when nthread=1. I guess I have to look at other ways, as you suggest. Thanks, Ivdmaaten.

from torchnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.