Code Monkey home page Code Monkey logo

dl_hpc_starter_pack's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dl_hpc_starter_pack's Issues

Having a weird issue where the current working directory is not in the path for dlhpcstarter:

(simplified_22) bracewell-i1 simplified_22$ dlhpcstarter -t cifar10 -c baseline
Initial utilisation on GPU:0 is 0.057.
Initial utilisation on GPU:1 is 0.990.
Initial utilisation on GPU:2 is 0.288.
Initial utilisation on GPU:3 is 0.411.
Traceback (most recent call last):
  File "/scratch2/nic261/environments/simplified_22/bin/dlhpcstarter", line 8, in <module>
    sys.exit(main())
  File "/scratch2/nic261/environments/simplified_22/lib/python3.9/site-packages/dlhpcstarter/__main__.py", line 37, in main
    stages_fnc = importer(definition='stages', module='.'.join(['task', args.task, 'stages']))
  File "/scratch2/nic261/environments/simplified_22/lib/python3.9/site-packages/dlhpcstarter/utils.py", line 46, in importer
    module = importlib.import_module(module)
  File "/apps/python/3.9.4/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'task'
(simplified_22) bracewell-i1 simplified_22$

i.e. it cannot see the task directory

(simplified_22) bracewell-i1 simplified_22$ ls
README.md  jobs.sh  main.py  notes.txt  requirements.txt  task  tools
(simplified_22) bracewell-i1 simplified_22$

@ashgillman thoughts?

Create directory structure for a task

E.g.

An entrypoint could do the following:

dlhpcstarter.create_task cifar10

would create the following:

task
task/cifar10
task/cifar10/model
task/cifar10/config

More flexibility for the location of `module`, `definition`, and `config`

Allow the user to give a relative or absolute path to the module and definition of a model so that it does not have to be in task/TASK/model or task/TASK/config

Maybe have something like the following alternative variables:

module_path
definition_path
config_path

that can be used instead of

module
definition
config

Question on Configuring Custom Learning Rate Schedulers and Alternative Optimizers

Hello,

Firstly, I would like to extend my gratitude towards the development team for creating the dl_hpc_starter_pack. It's been immensely helpful for kickstarting projects in a high-performance computing context, and the integration with PyTorch Lightning and Hydra for configurations has notably streamlined the process.

I have successfully experimented with the CIFAR10 Baseline model as described in the documentation and looked into implementing customizations based on my project's requirements. While I appreciate the configurational ease provided, I encountered challenges trying to adjust the learning rate scheduling function and explore optimizers beyond the provided examples.

Learning Rate Scheduling:

I understand from the provided examples that optimizers are configured within the model's configure_optimizers method as seen in the Inheritance and Baseline model scripts. However, I am looking for guidance on integrating custom learning rate schedulers (e.g., cosine annealing) within this setup. Could you provide some insights or a template on how to correctly implement this?
Configuring Other Optimizers:

Similarly, I am interested in trying out different optimizers like RMSprop or Adamax. Is the process as straightforward as substituting the optimizer class in the configure_optimizers method, or are there other considerations (especially with respect to the configuration files or scheduler integration)?
Lastly, I wanted to ensure that my approach aligns with the package's aim to promote rapid development via configuration files and class inheritance. Therefore, any advice on maintaining or enhancing this aspect while introducing the customizations mentioned above would be greatly appreciated.

Thank you for your support and looking forward to your response.

Best regards, Xiwei Deng

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.