softsys4ai / unicorn Goto Github PK

A Framework for Reasoning about System Performance using Causal AI

License: MIT License

Python 98.20% Shell 1.72% Dockerfile 0.07%

machine-learning systems causality causal-inference performance-analysis performance-tuning performance-testing optimization

unicorn's Introduction

Artifact evaluation (EuroSys 2022)

This artifact was awarded the Available, Functional, and Reproducible Badges.

For detailed instructions to reproduce our results, please use functionality and reproducibility.

Reviews and Rebuttal

EuroSys'21 (first submission, rejected) -> FSE'21 (second submission, rejected) -> EuroSys'22 (third submission, accepted)

We benefited a lot by learning from previous rejections of this work, and therefore, to help other researchers in the Systems community, we release all reviews and rebuttal for the Unicorn paper and its associated artifact:

Unicorn

EuroSys 2022 Title: Reasoning about Configurable System Performance through the lens of Causality
Md Shahriar Iqbal, Rahul Krishna, Mohammad Ali Javidian, Baishakhi Ray, and Pooyan Jamshidi

Unicorn is a performance analysis, debugging, and optimization tool designed for highly configurable systems with causal reasoning and inference. Users or developers can query Unicorn to resolve a performance issue or optimize performance.

Overview

Abstract

Modern computer systems are highly configurable, with the total variability space sometimes larger than the number of atoms in the universe. Understanding and reasoning about the performance behavior of highly configurable systems due to a vast variability space is challenging. State-of-the-art methods for performance modeling and analyses rely on predictive machine learning models; therefore, they become (i) unreliable in unseen environments (e.g., different hardware, workloads) and (ii) produce incorrect explanations. To this end, we propose a new method, called Unicorn, which (i) captures intricate interactions between configuration options across the software-hardware stack and (ii) describes how such interactions impact performance variations via causal inference. We evaluated Unicorn on six highly configurable systems, including three on-device machine learning systems, a video encoder, a database management system, and a data analytics pipeline. The experimental results indicate that Unicorn outperforms state-of-the-art performance optimization and debugging methods. Furthermore, unlike the existing methods, the learned causal performance models reliably predict performance for new environments.

How to use Unicorn

Unicorn is used for performing tasks such as performance optimization and performance debugging in offline and online modes.

Offline mode: Unicorn can be run on any device that uses previously measured configurations in offline mode.
Online mode: In the online mode, the measurements are performed from NVIDIA Jetson Xavier, NVIDIA Jetson TX2, and NVIDIA Jetson TX1 devices directly while the experiments are running. To collect measurements from these devices sudo privilege is required to set a device to a new configuration before measurement.

Unicorn can be used for debugging and optimization for objectives such as latency (inference_time) and energy (total_energy_consumption) in both offline and online modes. Unicorn has been implemented on six software systems such as DEEPSTREAM (Deepstream), XCEPTION (Image), BERT (NLP), DEEPSPEECH (Speech), X264 (x264), and SQLITE (sqlite).

Setup

To get started, you'll need to have docker and docker-compose. On desktop systems like Docker Desktop for Mac and Windows, Docker Compose is included as part of those desktop installs. You can get them here: https://docs.docker.com/desktop/mac/install/.

NOTE: We'll be using docker-compose, and all docker-compose commands must be run from withtin the repository's root folder.

First clone this repository, and cd into the repository:

git clone [email protected]:softsys4ai/unicorn.git
cd unicorn

Next, build the artifact with docker-compose. From the repository root, run:

docker-compose up --build --detach

You'll see the following output:

❯ docker-compose up --build --detach
 Building unicorn
 [+] Building 1.6s (16/16) FINISHED
 => [internal] load build definition from Dockerfile                                                          0.0s
 => => transferring dockerfile: 609B                                                                          0.0s
 => [internal] load .dockerignore                                                                             0.0s
 => => transferring context: 2B                                                                               0.0s
 => [internal] load metadata for docker.io/library/python:3.6.2                                               0.0s
 => [ 1/12] FROM docker.io/library/python:3.6.2                                                               0.0s
 => CACHED [ 2/12] RUN pip install --upgrade pip                                                              0.0s
 => CACHED [ 3/12] RUN pip install -U numpy                                                                   0.0s
 => CACHED [ 4/12] RUN pip install -U pandas                                                                  0.0s
 => CACHED [ 5/12] RUN pip install -U javabridge                                                              0.0s
 => CACHED [ 6/12] RUN pip install -U pydot                                                                   0.0s
 => CACHED [ 7/12] RUN pip install -U graphviz                                                                0.0s
 => CACHED [ 8/12] RUN pip install git+git://github.com/bd2kccd/py-causal                                     0.0s
 => CACHED [ 9/12] RUN pip install git+git://github.com/fmfn/BayesianOptimization                             0.0s
 => CACHED [10/12] RUN pip install scipy matplotlib     seaborn networkx causalgraphicalmodels     causalnex  0.0s
 => [11/12] RUN pip install pyyaml                                                                            1.4s
 => [12/12] WORKDIR /root                                                                                     0.0s
 => exporting to image                                                                                        0.1s
 => => exporting layers                                                                                       0.0s
 => => writing image sha256:6c803cd540fc03ac0535a24571769063651c6c0ddd0e1f24fb1241fb9277dc56                  0.0s
 => => naming to docker.io/library/unicorn_unicorn                                                            0.0s

 Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
 Creating unicorn ... done

Debugging

Unicorn supports debugging and fixing single-objective and multi-objective performance faults in offline and online modes. It also supports root cause analysis of these fixes using metrics such as accuracy, precision, recall, and gain.

Single-objective debugging

To debug single-objective faults in using Unicorn, please use the following command:

docker-compose exec unicorn python ./tests/run_unicorn_debug.py  -o objective -s softwaresystem -k hardwaresystem -m mode

Example

To debug single-objective latency faults for Xception in JETSON TX2 in the offline mode using Unicorn, please use the following command:

docker-compose exec unicorn python ./tests/run_unicorn_debug.py  -o inference_time -s Image -k TX2 -m offline

To debug single-objective energy faults for Bert in JETSON Xavier in the online mode using Unicorn please use the following command:

docker-compose exec unicorn python ./tests/run_unicorn_debug.py  -o total_energy_consumption -s NLP -k Xavier -m online

Multi-objective debugging

To debug multi-objective faults using Unicorn, please use the following command:

docker-compose exec unicorn python ./tests/run_unicorn_debug.py  -o objective1 -o objective2 -s softwaresystem -k hardwaresystem -m mode

Example

To debug multi-objective latency and energy faults for Deepspeech in JETSON TX2 in the offline mode using Unicorn, please use the following command:

docker-compose exec unicorn python ./tests/run_unicorn_debug.py  -o inference_time -o total_energy_consumption -s Speech  -k TX2 -m offline

Optimization

Unicorn supports single-objective and multi-objective optimization in offline and online modes.

Single-objective optimization

To run single-objective optimization using Unicorn, please use the following command:

docker-compose exec unicorn python ./tests/run_unicorn_optimization.py  -o objective -s softwaresystem -k hardwaresystem -m mode

Example

To run single-objective latency optimization for Xception in JETSON TX2 in the offline mode using Unicorn, please use the following command:

docker-compose exec unicorn python ./tests/run_unicorn_optimization.py  -o inference_time -s Image -k TX2 -m offline

To run single-objective energy optimization for Bert in JETSON Xavier in the online mode using Unicorn, please use the following command:

docker-compose exec unicorn python ./tests/run_unicorn_optimization.py  -o total_energy_consumption -s NLP -k Xavier -m online

Multi-objective debugging

To run multi-objective optimization in the using Unicorn, please use the following command:

docker-compose exec unicorn python ./tests/run_unicorn_optimization.py  -o objective1 -o objective2 -s softwaresystem -k hardwaresystem -m mode

Example

To run multi-objective latency and energy optimization for Deepspeech in JETSON TX2 in the offline mode using Unicorn, please use the following command:

docker-compose exec unicorn python ./tests/run_unicorn_optimization.py  -o inference_time -o total_energy_consumption -s Deepspeech  -k TX2 -m offline

Transferability

Unicorn supports both single and multi-objective transferability in online and offline modes. However, the current version is not tested for multi-objective transferability. To determine the single-objective transferability of Unicorn, please use the following command:

docker-compose exec unicorn python ./tests/run_unicorn_transferability.py  -o objective -s softwaresystem -k hardwaresystem -m offline

Example

To run single-objective latency transferability for Xception in JETSON TX2 in the offline mode using Unicorn, please use the following command:

docker-compose exec unicorn python ./tests/run_unicorn_transferability.py  -o inference_time -s Image -k TX2 -m offline

To run single-objective energy transferability for Bert in JETSON Xavier in the offline mode using Unicorn, please use the following command:

docker-compose exec unicorn python ./tests/run_unicorn_transferability.py  -o total_energy_consumption -s NLP -k Xavier -m offline

Data generation

To run experiments on NVIDIA Jetson Xavier, NVIDIA Jetson TX2, and NVIDIA Jetson TX1 devices for a particular software a flask app is required to be launched. Please use the first command to start the app in the localhost. Once the app is up and running, please use the second command to start measuring configurations.

docker-compose exec unicorn python ./services/run_service.py softwaresystem
docker-compose exec unicorn python ./services/run_params.py softwaresystem

Example

To initialize a flask app with Xception software system, please use:

docker-compose exec unicorn python ./services/run_service.py Image

Once the flask app is running and the modelserver is ready, then please use the following command to collect performance measurements for different configurations:

docker-compose exec unicorn python ./services/run_params.py Image

Baselines

Instructions to run the debugging and optimizations baselines used in Unicorn is described in baselines.

Unicorn usage with different datasets

Instructions to use Unicorn with a different dataset are described in others.

Docker teardown

After experimentation, consider stoping and removing any docker related caches.

echo "Stops any running docker compose services, and removes related caches"
docker-compose rm -fsv

How to cite

If you use Unicorn in your research or the dataset in this repository please cite the following:

@inproceedings{iqbal2022unicorn,
  title={Unicorn: Reasoning about Configurable System Performance through the lens of Causality},
  author={Iqbal, Md Shahriar and Krishna, Rahul and Javidian, Mohammad Ali and Ray, Baishakhi and Jamshidi, Pooyan},
  booktitle={EuroSys '22: Proceedings of the Seventeen European Conference on Computer Systems},
  year={2022}
}

Contacts

Please please feel free to contact via email if you find any issues or have any feedbacks. Thank you for using Unicorn.

Name	Email
Md Shahriar Iqbal	[email protected]
Rahul Krishna	[email protected]
Pooyan Jamshidi	[email protected]

📘 License

Unicorn is released under the terms of the MIT License.

unicorn's People

Contributors

Stargazers

Watchers

Forkers

leawise yuhala abirhossen786 alexvoedi wyhmhs ztz1989 socioprophet

unicorn's Issues

Run MLPerf benchmark for Image recognition.

Run MLPerf Benchmark with ResNet 50 + ImageNet on different hardware (Jetson Xavier and TX2, Possibly on GPU cloud).

graph-iter-1

Questions regarding offline mode and entropy-based orientation

Dear experts of Unicorn,

Thanks a lot for open-sourcing this excellent research. I have read your EuroSys paper and learned a lot! I just have two questions regarding the codebase due to my lack of knowledge:

If I understand correctly, the offline mode of Unicorn debugging experiments cannot be reproduced for all the test scenarios (i.e., hardware+software). In particular, I can only find the measurement.json file under the Single Objective, Image folders of TX2 and Xavier. In the other debug directories, there is only the data.csv file. Could you give me some hints to reproduce the offline experiments for the other scenarios?
I am really interested in the entropy-based edge orientation approach, detailed in Sec. 4 of the EuroSys paper (i.e., resolving partially directed edges). However, I failed to find the corresponding algorithm in the codebase. For instance, in the resolve_edges method of causal_model.py, the edges seem to be oriented based on fixed rules without involving entropies.

    # replace trail and undirected edges with single edges using entropic policy
    for i in range (len(PAG)):
        if trail_edge in PAG[i]:
            PAG[i]=PAG[i].replace(trail_edge, directed_edge)
        elif undirected_edge in PAG[i]:
                PAG[i]=PAG[i].replace(undirected_edge, directed_edge)
        else:
            continue

    for edge in PAG:
        cur = edge.split(" ")
        if cur[1]==directed_edge:
            node_one = self.colmap[int(cur[0].replace("X", ""))-1]               
            node_two = self.colmap[int(cur[2].replace("X", ""))-1]
            options[node_one][directed_edge].append(node_two)
        elif cur[1]==bi_edge:
            node_one = self.colmap[int(cur[0].replace("X", ""))-1]
            node_two = self.colmap[int(cur[2].replace("X", ""))-1]
            
            options[node_one][bi_edge].append(node_two)
        else: print ("[ERROR]: unexpected edges")

I did find a function that computed the entropy for the EnCore method in the debugging_based.py. But maybe it is not the same. Could you point me to the right location of the entropy-based method that orients the undetermined edges of FCI? Thanks in advance!

Best regards,
Tianzhu

Update the ground truth datasets for each type of performance fault.

Update ground truth for each fault by using the configurations that provide 80% or more gain and recompute accuracy, precision, and recall with a confidence interval.

How does the workload affect the graph?

Try a different ML dataset
CIFR test size

Update Causal Structure Learning Algorithm.

-- Use FCI with the entropic approach to resolving edges.
-- Breakdown computation efforts required for causal structure discovery, computing path causal effects, computing individual treatment effect, and measuring recommended configurations.

More comparisons

Method	Where?	When	link
∆LDA	ECML	2007	http://pages.cs.wisc.edu/~jerryzhu/ssl/pub/rlda.pdf
SmartConf	ASPLOS	2018	https://people.cs.uchicago.edu/~hankhoffmann/autoconf.pdf
BestConfig	SoCC	2017	https://arxiv.org/pdf/1710.03439.pdf
LEO	SIGARCH	2015	https://dl.acm.org/doi/pdf/10.1145/2786763.2694373

How to resolve bi-directed edges and cycles in the causal graph?

Randomly -- not an appropriate answer for the reviewer
Use FCI/FGS/PC (besides expert knowledge) which makes much looser assumptions about causal sufficiency to inform NOTEARS

Bootstrap+Active Learning

Run MLPerf benchmark with Facebook DLRM.

Run MLPerf Benchmark with Facebook DLRM on different hardware (Jetson Xavier and TX2, Possibly on GPU cloud). Change software (RMC1, RMC2, and RMC3) and change workload (single stream, multi-stream and offline, varying number of queries for inference.)

Appendix post-submission with proofs in the replication package

Real world case study with a self-driving car system composition

Use Fig. 3 from here: https://www.bdti.com/InsideDSP/2017/03/14/NVIDIA to explain a real world scenario https://forums.developer.nvidia.com/t/cuda-performance-issue-on-tx2/50477 to show it works

online video

online_video.mp4

Policies for handing edge-type mismatches

When are the policies applied?

bi-directed & no-edge → we get a confidence score- whichever edge direction has the highest confidence use that direction.
Un-directed edge & no-edge → no edge
Tail has a bubble and head has arrow → keep the directed edge and remove the bubble
No-edge & edge → edge
No-edge & no-edge → no-edge

When are the policies applied?

Bubble/un-directed edge - selection variables
Bi-directed edge - hidden variables

When are the policies applied?

Case 1: Greedy-- apply the above rules at every step
- At each iteration there is a DAG (say DAG_t, DAG_t-1, ...)
- If there are conflicts keep the counts of how many times an edge a->b, b->a, a--/--b, appears, use the one that the max count.
Case 2: Apply in the end.

NOTEARS pseudocode

Overview

Run Scalability experiments with Facebook DLRM systems.

--- Performance analysis of the Facebook DLRM systems with different configurations. Show how difficult it is to debug for misconfigurations in real-world production systems and discuss challenges. Discuss the richness in performance landscape (more complex behavior).
--- Run CAUPER, BugDoc, SMAC, DeltaDebugging, Encore, and CBI on the DLRM fault dataset and evaluate using the ground truth dataset for both single and multi-objective performance faults.
--- Show proof of scalability of CAUPER in Facebook DLRM system with a high number of allowable values taken by different configuration options.
--- Write about the evaluation of Facebook DLRM systems. Analyze by 3 slices of latency, energy and heat.

run video

uni_vide.mp4

Evaluation of Source Environments

Need to determine the transfer learning pipeline. Determine the following:
--- How good is the source modeling?
--- How much update is needed?
--- Explainability (what are the changes across environments)
--- Experiments with different source budgets

Use TX2's configurations on TX1 and Xavier

Make sure invalid options are normalized

online video vpn

online_video_vpn.mp4

What is the motivation for the three HW systems?

@iqbal128855 List some previous papers that use these systems:

Run MLPerf benchmark for NLP.

Run MLPerf Benchmark with Google BERT + SQuAD 1.1 dataset on different hardware (Jetson Xavier and TX2, Possibly on GPU cloud)

Structure Learning

Enrich the causal models with Functional Causal Model (FCM) using CGNN and work with visualization for FCM
Update causal model with Causal Interaction model and compare with CGNN.
Comparison of CGNN, FCI (entropic calculation), and Causal Interaction model.
If we use CGNN need to find the correct strategy -
--- how to find the initial skeleton?

Bootstrap sampling

Sample with replacement X% of the data
Build a causal model
Sample another X% (with replacement) and build a causal model
- How to update the edges' weight thresholds?
- Keep edges between two nodes when they appear > Y% of the times and don't violate constraints

Learning a Causal Models with each one
Use the union of the edges
- How do we handle discrepancies between the two

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.