Intro for OpenFederatedLearning
- Federated AI Technology Enabler, Webank Fin tech , https://github.com/OpenFederatedLearning/FATE https://github.com/OpenFederatedLearning/eggroll
- FedML, USC, https://github.com/OpenFederatedLearning/FedML
- Fedlearner, Bytedance, https://github.com/OpenFederatedLearning/fedlearner
- Harmonia, AL Labs Taiwan, https://github.com/OpenFederatedLearning/harmonia
- PaddleFL, Baidu, https://github.com/OpenFederatedLearning/PaddleFL
- PySyft, OpenMined, https://github.com/OpenFederatedLearning/PySyft
- Tensorflow Federated, Google, https://github.com/OpenFederatedLearning/federated
- 9NFL, Jingdong, https://github.com/OpenFederatedLearning/9nfl
Privacy preserving technique allows sharing sensitive personal information while preserving users' privacy. Below are the technologies used in Federated Learning:
- Anonymization (weakest)
- Differential Privacy
- Fully Homomorphic Encryption (FHE)
- Secure Multi-Party Computation (MPC)
- Zero Knowledge Proof (e.g. Blockchain)
- Confidential Computing through TEE (e.g. SGX and TrustZone)
-
Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs NSDI'23
-
TopoOpt: Optimizing the Network Topology for Distributed DNN Training NSDI'23
-
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression ASPLOS'23
-
Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training arxiv
-
Zeus: Understanding and Optimizing {GPU} Energy Consumption of {DNN} Training NSDI'23
-
ModelKeeper: Accelerating DNN Training via Automated Training Warmup NSDI'23
-
HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework VLDB'22
-
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning OSDI'22
-
FastMoE: A Fast Mixture-of-Expert Training System arXiv preprint arXiv:2103.13262
-
λDNN: Achieving Predictable Distributed DNN Training with Serverless Architectures TC'21
-
STRONGHOLD: Fast and Affordable Billion-scale Deep Learning Model Training SC'22
-
AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness NeurIPS '22
-
Whale: Efficient Giant Model Training over Heterogeneous {GPUs} ATC'22
-
Out-of-order backprop: an effective scheduling technique for deep learning Eurosys'22
-
Varuna: Scalable, Low-cost Training of Massive Deep Learning Models Eurosys'22
-
Megatron-LM SC'21
-
Chimera: efficiently training large-scale neural networks with bidirectional pipelines SC'21
-
Piper: Multidimensional Planner for DNN Parallelization NeurIPS'21
-
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
-
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models ICML'21
-
DAPPLE: An Efficient Pipelined Data Parallel Approach for Large Models Training PPOPP'21
-
TeraPipe:Large-Scale Language Modeling with Pipeline Parallelism ICML'21
-
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications OSDI'20
-
KungFu OSDI'20
-
A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters OSDI'20
-
GeePS: Scalable Deep Learning on Distributed GPUs with a GPU-Specialized Parameter Server Eurosys'16
-
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
-
Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access Eurosys'23
-
Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs.
-
MPCFormer: fast, performant, and private transformer inference with MPC ICLR'23
-
High-throughput Generative Inference of Large Language Model with a Single GPU
-
Cocktail: A Multidimensional Optimization for Model Serving in Cloud NSDI'22
-
Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing ATC'22
-
Abacus SC'21
-
Serving DNNs like Clockwork: Performance Predictability from the Bottom Up OSDI'20
-
Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving ATC'19
-
Nexus: a GPU cluster engine for accelerating DNN-based video analysis SOSP'19
-
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts MLSYS'23
-
AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers
-
ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning ASPLOS'23
-
Lucid: A Non-Intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs ASPLOS'23
-
Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning NSDI'23
-
Multi-Resource Interleaving for Deep Learning Training SIGCOMM'22
-
Synergy : Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters OSDI'22
-
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning OSDI'21
-
Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs SOCC'21
-
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads OSDI'20
-
Spada: Accelerating Sparse Matrix Multiplication with Adaptive Dataflow ASPLOS'23
-
MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters SOCC'22
-
Accpar: Tensor partitioning for heterogeneous deep learning accelerators HPCA'20
-
Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs ASPLOS'23
-
iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud TPDS'22
-
Efficient Quantized Sparse Matrix Operations on Tensor Cores SC'22
-
Pets ATC'22
-
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections OSDI'21
-
APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores SC'21
-
iGUARD SOSP'21
-
Baechi: Fast Device Placement on Machine Learning Graphs SOCC'20
-
Data Movement Is All You Need: A Case Study on Optimizing Transformers
-
COGNN SC'22
-
TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs
-
GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs OSDI'21
-
Marius: Learning Massive Graph Embeddings on a Single Machine OSDI'21
-
Accelerating Large Scale Real-Time GNN Inference Using Channel Pruning VLDB'21
-
Reducing Communication in Graph Neural Network Training SC'20