Code Monkey home page Code Monkey logo

distributed_systems_lab's Introduction

Distributed Systems Lab

Structure of the project

The project is composed of multiple folders to separate code, data and jobs. Each of the folders can contain sub-folders to categorize internal files from each other.
The main structure of the project is divided into:

Folder Description
code Code contains all files that can be exicuted as a part of a simulation, or the simulation itself.
data Data contains all input files required by the simulations.
figures Figures contain all generated figures that need to be stored.
jobs Jobs contains all the SLURM jobs that execute simulations on the DAS-5.

Installation guide

The project supports local execution as well as execution on the DAS-5. Different supercomputers or clusters that use the SLURM workload manager might also be compatible, but there are no guarantees. All the code is written with Python 3 in mind, so make sure that is the version you are using.

Installing the necessary packages for a local setup is different than installing the packages on the DAS-5. Thus both are described in their own section below.

Local instructions

In order to install all packages locally, one can choose to create a virtual environment to keep everything separate from other environments. When in the desired environment, install all required packages with pip3:

pip3 install -r requirements.txt

The project makes use of mpi4py package, which relies on the MPI header files. If come across an error loading the MPI header files, please make sure that the libopenmpi-dev is installed on your system. Or install the header files simply with:

sudo apt install libopenmpi-dev

DAS-5 instructions

When installing the packages on the DAS-5, make sure that the Python 3.6.0 and Intel MPI modules are loaded:

module load python/3.6.0
module load intel-mpi/64/5.1.2/150

Then install all required packages in userspace with pip3:

pip3 install --user -r requirements.txt

It is also possible to partition datasets on the DAS-5. The system uses the KaHIP partitioner, which can easily be installed with: ./manage.sh get_KaHIP followed by ./manage.sh build_KaHIP.

Workflow

Multiple steps are required before simulations can be tested. This section will give a quick overview of what steps to take for getting your first results.

  1. Add the requested dataset to the array in ./manage.sh
  2. Fetch the datasets via: ./manage get_data
  3. Extract the datasets via: ./manage extract_data
  4. Create a job via: ./manage create_job <variables>
  5. Partition the dataset according to the job (when on the DAS-5) via: ./manage.sh create_partitions <variables>
  6. Run the job via: ./manage.sh run_job <variables>
  7. Compute resulting properties via: ./manage.sh compute_properties <variables>

The sections below will explain each step in more detail.

Fetching datasets

In order to add the requested dataset to the list of available datasets, open manage.sh and add the dataset name to the DATASETS array in the top of the file. All datasets are fetched from the Graphalytics website, so you will only need to enter the name of a dataset from this website.

After the datasets are added, the zip can be downloaded via:

./manage.sh get_data

and extracted via:

./manage.sh extract_data

Creating jobs

A job can easily be created via the manage.sh script. This is done with the create_job command, which needs multiple extra arguments:

./manage.sh create_job <job_name> <simulation_name> <scale_factor> \
    <dataset_name> <number_of_nodes> <time_in_minutes> <do_stitch> \
    <ring_stitch> <connectivity>

Here is a quick overview of what are possible values for these variables:

Variable Valid type Description
simulation_name str Path to the simulation file in /code/simulations/ that ends on .py
scale_factor float Factor to scale the graph with.
dataset_name str Name of the dataset to use.
number_of_nodes int The total number of nodes to use during execution.
time_in_minutes int How long the job maximally may take in minutes.
do_stitch bool If the resulting samples should be stitched together.
ring_stitch bool If the resulting samples should be stitched together using a ring topology. If set to false, the random topology will be used.
connectivity float The fraction of edges that are added during the stitching fase.

The create_job script checks the validity of all these variables and gives errors accordingly. The script only checks whether the dataset name is provided and not if the dataset is present on the machine. This because jobs could be created locally while the dataset is only downloaded on the DAS-5.

If all tests are passed with no errors, a new job folder is created containing a default SLURM script. The default script is composed of a header and a body. The header contains:

#!/usr/bin/env bash
#SBATCH -J <job_name>
#SBATCH -o jobs/<job_name>/<job_name>.out
#SBATCH --partition=defq
#SBATCH -n <number_of_nodes>
#SBATCH -N <number_of_nodes>
#SBATCH -t <time_in_minutes>
SIMPATH="code/simulations/"
SIMFILE="<simulation_name>"
DATASET="<dataset_name>"
JOBNAME="<job_name>"
SCALE="<scale_factor>"
DO_STITCH="<do_stitch>"
RING_STITCH="<ring_stitch>"
CONN="<connectivity>"

As you can see, the default script header contains placeholder lines for specifying some SLURM variables. These are:

  • -J for the name of the job that will be shown in the queue
  • -o for the output path of the .out file from the SBATCH script
  • -n for the number of tasks that will be spawned. This system uses one task per node, so this value is equal to the number of nodes.
  • -N for the number of machines you want to use for the tasks
  • -t for the time after which the job is shutdown by DAS-5.

--parition=defq makes sure that all reserved nodes are in one cluster. Beneath the SLURM commands, there is a short list of variables that were specified during the create_job process. The default body uses these variables to make every simulation runs on the DAS-5 in its own environment. The body of the default script contains:

# Check if the dataset is partitioned correctly for the requested job.
COMP_NODES=$(( SLURM_NTASKS - 1 ))
if [ ! -d "${PWD}/data/${DATASET}/${DATASET}-${COMP_NODES}-partitions" ]; then
    echo "Dataset '${DATASET}' is not partitioned for ${COMP_NODES} Compute Nodes."
    exit 1
fi

# Load modules.
module load python/3.6.0
module load intel-mpi/64/5.1.2/150

# Define paths for the job to work with.
RUNDIR="/var/scratch/${USER}/${JOBNAME}"
TMP_DATA="${RUNDIR}/data"
TMP_RES="${RUNDIR}/results"
TMP_PLAY="${RUNDIR}/playground"

# Create directories for the playground, data and results on the TMP partition.
mkdir -p "${RUNDIR}"
mkdir -p "${TMP_DATA}"
mkdir -p "${TMP_RES}"
mkdir -p "${TMP_PLAY}"

# Copy Vertex and Partitions data to TMP partition.
mkdir -p "${TMP_DATA}/${DATASET}/"
cp "${PWD}/data/${DATASET}/${DATASET}.v" -t "${TMP_DATA}/${DATASET}/"
cp -r "${PWD}/data/${DATASET}/${DATASET}-${COMP_NODES}-partitions/" \
    -t "${TMP_DATA}/${DATASET}/"

#  Copy existing results to TMP partition.
cp -r "${PWD}/jobs/${JOBNAME}/results/." -t "${TMP_RES}/."

# Run simulation.
srun -n "${SLURM_NTASKS}" --mpi=pmi2 python3 "code/run_simulation.py" \
    "${SIMPATH}${SIMFILE}" "${SCALE}" "${DATASET}" "${DO_STITCH}" \
    "${RING_STITCH}" "${CONN}" "${TMP_PLAY}" "${TMP_DATA}" "${TMP_RES}"

# Compute properties of the resulting graph and copy those to HOME partition.
./manage.sh compute_properties "${JOBNAME}"
cp "${TMP_RES}/scaled_graph_properties.json" -t "${PWD}/jobs/${JOBNAME}/results/."

First, the job checks if the dataset is partitioned correctly for the use of this job. The number of partitions should be equal to the number of nodes minus one, as one node is used as the head node during execution.

Second, the Python and MPI modules are loaded, which are needed for working with mpi4py. When running a job on the DAS-5, all work is done on a SCRATCH partition, which is mounted under /var/scratch/$USER/. After the modules are loaded, three folders are created within a folder that is used solely by the job. The folders are:

  • /data for storing all .v and .e files from the specified dataset.
  • /results for storing the simulation's results.
  • /playground for saving temporary files during execution.

After these directories are created, the data files and any existing results on the HOME partition are copied over. Then the simulation is executed. In order to run a program on multiple nodes, the srun command is used. This command utilizes the variables set within the header of the script. When the job is done, the statistics are computed and copied to the results folder on the HOME partition.

Creating partitions

Partitions can easily be created using the manage.sh via:

./manage.sh create_partitions <dataset> <number_of_partitions>

The <dataset> variable takes the name of a dataset that needs to be partitioned. The <number_of_partitions> variable is the number of partitions that are created in the end. Note that this has to be equal to the number of compute nodes used by the created job.

In order to run the create_partitions script, the KaHIP partioning algorithm needs to be installed. The code for KaHIP can be fetched via:

./manage.sh get_KaHIP

and build via:

./manage.sh build_KaHIP

Running jobs

A job can be run locally or on the DAS-5. Both execute a previously created job file that is located in the jobs folder. The execution is triggered by the manage.sh script. This is done with the run_job command for running on the DAS-5 or run_local for running locally. Both commands need the job name as the second argument:

./manage.sh run_job <job_name>

The script checks if the specified job exists and gives an error if this is not the case. Otherwise, the specified job is executed. If the job is executed on the DAS-5, it is placed in the job queue.

Your current queued jobs can be listed with:

squeue -u $USER

The whole queue is presented if the -u $USER flag is not provided.

Computing properties

The simulations result in a scaled graph, which is represented by the scaled_graph.v and scaled_graph.e files in the results folder. These files are used as input when computing properties of the resulting graph. This is done via the compute_properties command of the manage.sh script. The command takes paths to the two files as arguments:

./manage.sh compute_properties <path_to_v_file> <path_to_e_file>

The output of the properties script are the properties stored in JSON format. This file is called scaled_graph_properties.json and is stored in the same directory as the vertex and edge files are located.

distributed_systems_lab's People

Contributors

bigdgw avatar estsaon avatar okkevaneck avatar tmonster avatar

Watchers

 avatar  avatar  avatar

Forkers

estsaon

distributed_systems_lab's Issues

Make a Head Node Class

Head Node is responsible for

  • Stitching together all graph sampled from compute nodes
  • listening for heartbeats
    • send kill signal to compute nodes to stop burning
  • writing resulting graph to a file

Fix graph interpretation.

We should read 2 graph files.
a .v file, and a .e file.

Changes to graph interpreter and Graph.add_vertex_and_neighbor

Implement MPI communication in compute node classes

Issue to implement MPI communication in compute node classes

  • Implement a thread to listen for communication from other compute nodes
  • Implement a thread to listen for communication from the HEAD node telling the compute node to stop
  • Implement a function to send the head node heartbeats that communicate how many nodes have been sampled by the compute node
  • Implement function to listen for restart call

Compute Node might not close correctly

When ComputeNode is blocking on send (heartbeat), after the head send a kill request. The head will never accept that send and thus the ComputeNode wont stop by it self.

Implement functionality to run simulations locally

  • manage difference in environment variables. (i.e "LOCAL" variable)
  • Create all the compute node classes for each simulated node.
  • run the node instance in it's own thread
  • handle communication between compute nodes with MPI??

Add a decorator to time certain functions for measuring time of certain functions

Multiple functions need the decorator.

graph.spread should probably be measured

  • list how many nodes has been burned per X time units

  • Compute node should measure how many messages are sent, how many messages are received.

  • Head node should keep track of when heartbeats are received, how many nodes are burned per heartbeat.

OSError: [Errno 122] Disk quota exceeded

Current data storage structure:

data/
|
| ---- kgs/
|        |
|        | ---- kgs-2-partitions/
|        |        |
|        |        | ---- node1.e
|        |        |
|        |        | ---- node1.p
|        |        |
|        |        | ---- node2.p 
|        |        |
|        |        | ---- node2.e
...
...
|        | ---- kgs-16-partitions/
|        |        |
|        |        | ---- node1.e
|        |        |
|        |        | ---- node1.p
...
...

|        |        | ---- node16.p 
|        |        |        
|        |        | ---- node16.e 

Update:
Memory is still a problem especially when doing partitioning on a large graph (like S scale in LDBC datasets)

develop upscaling algorithm

To upsacle X2

run downscaling to (0.5) algorithm 4 times
Then stitch together all (0.5) samples.

head node sends a restart command when downscaling iteration is done (i.e 0.5 nodes collected)

Halted forest fire script vs wild fire script

Compute nodes are responsible for sending messages to other compute nodes

In a halted forest fire there are two ideas that need be implemented.

  1. Don't pass messaged
  2. Read from assigned partition files. <- might be it's own task
  3. Run algorithm on assigned graph
  4. Parse scale up call vs scale down call, what the percentage of the scale up/down call is.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.