Code Monkey home page Code Monkey logo

ansible-dgl's Introduction

Ansible playbooks for distributed DGL

Ansible playbooks for the deployment of a cluster for distributed DGL.

Only the following setup is supported and tested:

  • Ubuntu 20.04 LTS on all nodes
  • PyTorch as DGL backend (only CPU)
  • NFS for sharing files between nodes

Make sure that all nodes have Ubuntu installed and are reachable via SSH.

Usage

Define the inventory file

Create a new ansible inventory containing the IP addresses of the master and worker instances:

$ cat > hosts << EOL
[master]
<YOUR_MASTER_IP>

[worker]
<YOUR_WORKER_IP_1>
<YOUR_WORKER_IP_2>
<YOUR_WORKER_IP_3>
EOL

Execute the playbooks

To set up the DGL cluster, execute:

$ ansible-playbook -i hosts setup-cluster.yml

To re-start the DGL cluster, execute:

$ ansible-playbook -i hosts restart-cluster.yml

To update apt packages or change the DGL/Pytorch version of the DGL cluster, execute:

$ ansible-playbook -i hosts update-cluster.yml

Configuration variables

Variable Description Default value
dgl_version Version of Deep Graph Library (DGL) 0.9.0
pytorch_version Version of PyTorch 1.12.1
python_version Version of the Python interpreter 3.9
workspace Path of DGL workspace (shared via NFS with all nodes) /home/ubuntu/workspace
extra_pip_packages List of Python packages to be additionally installed ogb, networkx
ansible_user System username where DGL is installed (SSH access required) ubuntu

Advanced setup with SMB/CIFS share

By default, the playbooks installs an internally hosted NFS and shares storage from the master node with all workers. If you want to use an external data storage provider via samba/cifs, you can set the variable use_cifs to true. You have to specify the following required variables if you are using this option:

Variable Description
mount_username Username for SMB/CIFS share
mount_password Password for SMB/CIFS share
mount_src Mount source path of SMB/CIFS share

You can define a local secret file to store the variables:

$ cat > .secrets.yml << EOL
use_cifs: true
mount_username: <YOUR_USERNAME>
mount_password: <YOUR_PASSWORD>
mount_src: <YOUR_MOUNT_SOURCE>
EOL

and then launch the playbook with an additional variable file:

$ ansible-playbook -i hosts setup-cluster.yml -e ".secrets.yml"

Since username and password are confidential information, the playbook does not store these variables in the fstab file, so you have to remount on every reboot.

ansible-dgl's People

Contributors

d-stoll avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.