Code Monkey home page Code Monkey logo

hpc-utils's Introduction

hpc-utils's People

Contributors

namiyousef avatar

Watchers

 avatar

hpc-utils's Issues

Worker Process for cleaning up job outputs

  • Messages to be stored in each project
  • Needs to get project metadata, for each project needs to get the jobs
  • For each job saves a 'message' (e.g. publisher/subscriber) as a text file
  • Acknowledged messages are deleted, logs are saved on the worker
  • Needs to update job complete fied

Instructions for using UCL Basic Cluster

Logging In

  1. If working remotely, connect to the UCL VPN
  2. ssh into the relevant node using your college credentials. For the GPU it is myriad

Setting Up Environments for PyTorch Jobs

In order to use virtual environments that are compatible for GPU jobs, you must run the following commands before compiling your env.

module unload compilers mpi
module load compilers/gpu/4.9.2
module load python/3.7.4
module load cuda/10.1.243/gnu-4.9.2
module load cudnn/7.5.0.56/cuda-10.1

This is to ensure that your libraries are compiled correctly to work with the GPU.

Submitting Jobs

Testing Jobs

Instructions for using UCL CS Cluster

Logging In

  1. If working remotely, make sure that you are on the College VPN
  2. ssh into the relevant node. You can find a full list of these nodes here: https://hpc.cs.ucl.ac.uk/quickstart/
!Your CS Username is different than your College username. This is typically a shortened version of your name, for example 'yousnami'

!Logging into the gateway requires two passwords subsequently. The first is your College password, the second is your Computer Science account password. 
  1. It is recommended that you first ssh into knuckles using the method above, and then into your desired compute using a ssh command (this should no longer require any authentication)

Setting Up Environment

Submitting Jobs

Testing

Useful Links

  • CS Cluster Home Page: any of the options here may require further authentication. The username and password are not related to your user account. These are to be found in the first email you get set when you activate the account

Don't like dependency on projects

Currently strategy uses the scripts template to fill out the run_job function based on the env vars provided. This script template is tied to the project, which may be stored in a different place. Would like to remove dependency.

Automatic way of adding environment variables to run.py without causing issues, where run exists in the project that you're trying to run on the GPU?

For now not looking to fix this but adding issue in case it arises later. Also need to think about integration tests to make debugging easier.

Troubleshooting

  1. Unable to connect through ssh because of an error in .bashrc

If you write a script that is broken in .bashrc, what will happen is that you'll be unable to connect to the ssh service, as each time your try, the .bashrc will break and thus the ssh pipe with it.

Luckily, there is a hacky solution. If this happens, you have to run control + c after you've ssh'd into your computer, but before the session is able to run .bashrc (with your broken code). Doing so will load you into the system at an intermediate stage where not all things will be available to you. At this stage, you should fix your .bashrc

Cluster installation issue?

  • Sometimes installations fail using prepare_virtual_env, need to examine why
  • At the moment there is a known bug with charset-normalizer==2.1.0 not working with requests. Manually downgrade to charset-normalizer==2.0.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.