Code Monkey home page Code Monkey logo

Comments (9)

andrew-bydlon avatar andrew-bydlon commented on August 14, 2024

@ejguan: Do you have any suggestions for properly resetting Dataloader 2 after each epoch? With e.g. worker_reset_fn.

from data.

Adenialzz avatar Adenialzz commented on August 14, 2024

Hello, I have also encountered a situation where the DL2 memory usage has skyrocketed. I have temporarily decided to switch back to DL1. May I ask how to set up datapipe+DL1 for multi process and multi card training? Do I need to set up distributed sampling in DL1?

from data.

andrew-bydlon avatar andrew-bydlon commented on August 14, 2024

@Adenialzz: To get what I showed above, it's more or less the same setup as for the a torch dataset. Replace the dataset with a datapipe.

sampler = DistributedSampler(datapipe) if distributed else None
return DataLoader(datapipe, sampler=sampler, num_workers=num_workers, pin_memory=pin_memory, batch_size=batch_size)

from data.

Adenialzz avatar Adenialzz commented on August 14, 2024

This DistributedSampler requires my dataset(datapipe) must have len method, but the length of my datapipe cannot be calculated cause it is a iterable datapipe. Have you ever met problem like this?

from data.

andrew-bydlon avatar andrew-bydlon commented on August 14, 2024

I'll give it a try today.

from data.

Adenialzz avatar Adenialzz commented on August 14, 2024

Thanks, please let me know when you make progress.

from data.

andrew-bydlon avatar andrew-bydlon commented on August 14, 2024

Sorry for the delay @Adenialzz. You are correct that it doesn't work with DDP and without a length on an iterable data pipe. I reverted to DL2 despite its notably slower performance as it only really occurs at the start of the epoch.

from data.

Adenialzz avatar Adenialzz commented on August 14, 2024

set torch.utils.data.graph_settings.apply_sharding(datapipe, world_size, rank) seems to solve the problem in my case.

from data.

keunwoochoi avatar keunwoochoi commented on August 14, 2024

@Adenialzz hi, could i ask you for a clarification? how was it used to fixed which problem exactly? i'd appreciate it very much.

from data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.