Code Monkey home page Code Monkey logo

fedma's Introduction

Federated Learning with Matched Averaging

This is the code accompanying the ICLR 2020 paper "Federated Learning with Matched Averaging " Paper link: [https://openreview.net/forum?id=BkluqlSFDS]

Overview


FedMA algorithm is designed for federated learning of modern neural network architectures e.g. convolutional neural networks (CNNs) and LSTMs. FedMA constructs the shared global model in a layer-wise manner by matching and averaging hidden elements (i.e. channels for convolution layers; hidden states for LSTM; neurons for fully connected layers) with similar feature extraction signatures.

Depdendencies


Tested stable depdencises:

  • python 3.6.5 (Anaconda)
  • PyTorch 1.1.0
  • torchvision 0.2.2
  • CUDA 10.0.130
  • cuDNN 7.5.1
  • lapsolver 1.0.2

Data Preparation


Language Models:

For the language model experiments, we used the Shakespeare dataset provided by project Leaf. Following the instructions to prepare Shakespeare dataset, we choose to use non-i.i.d., full-size dataset, and split 80% of the data points into the training dataset. Moreover, we set minimum number of samples per user at 9K. Thus, the following command returns our data partitioning:

./preprocess.sh -s niid --sf 1.0 -k 0 -t sample -tf 0.8 -k 9

Image Classification:

We simulate a heterogeneous partition for which batch sizes and class proportions are unbalanced. We simulate a heterogeneous partition by sampling proportion of the data points in each class across participating clients from a Dirichlet distribution. Due to the small concentration parameter (0.5) of the Dirichlet distribution, some sampled batches may not have any examples of certain classes of data. Details about this partition can be found in the partition_data function in ./utils.py.

Experients over Language Task:


The source code involving language task experiments i.e. LSTM over the Shakespeare dataset locates in the folder FedMA/language_modeling. And we summarize the functionality of each script below.

Script Functionality
ensemble_accuracy_calculator.py Evaluating the performance of ensemble accross local models trained on paritipating clients.
language_main.py Conducting FedAvg and FedProx experiments, which are used as baseline methods.
language_oneshot_matching.py Evaluating the performance of one-shot match i.e. PFNM-style model fusion.
language_whole_training.py Centralized training over one device i.e. we combine the local datasets and coduct centralized training. This is the strongest possible baseline for any Federated Leaarning method.
lstm_fedma_with_comm.py Our proposed "FedMA with communication algorithm".

Experients over Image Classification Task:


The main result related to the image classification task i.e. VGG-9 on CIFAR-10 can be reproduced via running ./run.sh. The following arguments to the ./main.py file control the important parameters of the experiment.

Argument Description
model The CNN architecture that each client train locally.
dataset Dataset to use. We use CIFAR-10 to study FedMA.
lr Inital learning rate that will be use.
retrain_lr The learning rate for the local re-training process. Usually set to the same value as lr
batch-size Batch size for the optimizers e.g. SGD or Adam.
epochs Locally training epochs.
retrain_epochs Local re-training epochs.
n_nets Number of participating local clients.
partition Data partitioning strategy. Set to hetero-dir for the simulated heterogeneous CIFAR-10 dataset.
comm_type Federated learning methods. Set to fedavg, fedprox, or fedma.
comm_round Number of communication rounds to use in fedavg, fedprox, and fedma.
retrain Flag to retrain the model or load from checkpoint.
rematching Flag to re-conduct the matching process or load from checkpoint.

Sample command

python main.py --model=moderate-cnn \
--dataset=cifar10 \
--lr=0.01 \
--retrain_lr=0.01 \
--batch-size=64 \
--epochs=20 \
--retrain_epochs=20 \
--n_nets=16 \
--partition=hetero-dir \
--comm_type=fedma \
--comm_round=50 \
--retrain=True \
--rematching=True

Interpretability of FedMA:


The results of interpretability we presented in the FedMA paper are summerized in a jupyter notebook i.e. ./jupyter_notebook/Interpretability_fedma.ipynb.

Handling Data Bias Experiments:


The handeling data bias experiments we presented the FedMA paper are summerized in the script ./dist_skew_main.py. To reproduce the experiment, one can simply run:

bash run_dist_skew.sh

Sample command

python dist_skew_main.py --model=moderate-cnn \
--dataset=cifar10 \
--lr=0.01 \
--retrain_lr=0.01 \
--batch-size=64 \
--epochs=10 \
--retrain_epochs=20 \
--n_nets=2 \
--partition=homo \
--comm_type=fedma \
--retrain=True \
--rematching=True

Citing FedMA:


@inproceedings{
Wang2020Federated,
title={Federated Learning with Matched Averaging},
author={Hongyi Wang and Mikhail Yurochkin and Yuekai Sun and Dimitris Papailiopoulos and Yasaman Khazaeni},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=BkluqlSFDS}
}

fedma's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fedma's Issues

Provide more details about the experiments?

Good job! I am very interested in this work and I tried run the experiments mentioned in the paper. My questions are:

  • How to run the experiments(CIFAR10, MNIST, Shakespeare), it seems that only CIFAR10 experiment available now.
  • How the FedMA work? The term retrain and rematching confused me.

Thank you.

Does hyper parameter "retrain_epoch" lead to extra training for FedMA?

Dear authors,

Your work is very impressive and thanks for open-sourcing the code!

Please correct me if I got it wrong - In each round of FedMA, you need to retrain (num_layers * retrain_epoch) total epochs, not including (retrain_epoch) epochs for the whole model to be updated. So the extra computation will be very intensive if you set a relatively large "retrain_epoch" (say 20 as shown in the sample command). Could you share the specific number for this hyper-parameter to reproduce your results on Cifar-10?

Thanks in advance and looking forward to your reply.

I what to run this in heterogeneous data, but there are some errors

When i wangt to use the model of simple-cnn under heterogeneous data, there are some errors , my commond is :
python dist_skew_main.py --model=simple-cnn --dataset=cifar10 --lr=0.01 --retrain_lr=0.01 --batch-size=64 --epochs=10 --retrain_epochs=10 --n_nets=10 --partition=hetero-dir --comm_type=fedma --comm_round=10 --retrain=True --rematching=True

and the error is :
Traceback (most recent call last):
File "dist_skew_main.py", line 1181, in
args.partition, args.n_nets, args_alpha, args=args)
File "/home/wjj/three/FedMA-master/utils.py", line 274, in partition_data_dist_skew
traindata_cls_counts = record_net_data_stats(y_train, net_dataidx_map, logdir)
UnboundLocalError: local variable 'net_dataidx_map' referenced before assignment

the code is
image

Can you tell me how to solve this problem

Use Cnn matching as a black box

I want to implement FedMA in a FML framework, is there a function inside this repo that i can use as a black box?
I want it to have input lets say two layers from client cnn and one from the global and to return the matched output.

i want to change the hyper paramerters language model

When I try to change the NUM_LAYERS in lstm_fedma_with_comm.py,it will have some problem,so i change the RNNmodel's hyper parameters nlayers.but still can't debug.
for example,when i change the NUM_LAYERS = 4(2-layer LSTM (4 layers: encoder|hidden LSTM1|hidden LSTM2|decoder)),so i change the nlayers=2.

Traceback (most recent call last):
File "language_oneshot_matching.py", line 504, in
matching_shapes=matching_shapes)
File "/home/hx/github/FedMA/language_modeling/language_fedma.py", line 258, in layerwise_fedma
reconstructed_bias = [split_bias(batch_weights[j][layer_index+3+2]) for j in range(J)]
File "/home/hx/github/FedMA/language_modeling/language_fedma.py", line 258, in
reconstructed_bias = [split_bias(batch_weights[j][layer_index+3+2]) for j in range(J)]
IndexError: list index out of range

Unable to run lstm_fedma_with_comm.py file

I tried to run the lstm_fedma_with_comm.py file to reproduce the paper results. But I got file not found error for the following files:
lstm_matching_assignments, lstm_matching_shapes and matched_global_weights.

image

some qustions about oneshot_matching experiment

Hi, i have something that confused me:

  1. What is the difference between oneshot_matching and BBP_MAP?
  2. The retrain process in matching actually introduces multiple original data information, so does the matching really reflect his ability to aggregation in a single communication?

some questions about initialization and retrain

Hi, thank you for sharing your outstanding work! I have some questions about the settings in the paper, could you please support some more details about the experiment settings?

  1. In the code, the J clients do not share the same initialization, and they are retrained before the first global round. Is there any difference of the settings between the retrain process at first and the later local-retrain?(like fedavg local retrain)
  2. What is the batch-size of the dataset corresponding to the results in this paper?
  3. Does the experiment in the paper use the following command?
    python main.py --model=moderate-cnn \ --dataset=cifar10 \ --lr=0.01 \ --retrain_lr=0.01 \ --batch-size=64 \ --epochs=150 \ --retrain_epochs=150 \ --n_nets=16 \ --partition=hetero-dir \ --comm_type=fedma \ --comm_round=10 \ --retrain=True \ --rematching=True

Thanks very much

Running FedMA with simple cnn

Hi,
Very good job! I love your method.
I tried to run FedMA using the "simple-cnn" model, but there is a miss-match in the size of matched_cnn (which is bigger than the original model) and the weights after alignment.

The error is:
File "../main.py", line 97, in trans_next_conv_layer_backward
reshaped = layer_weight.reshape(reconstructed_next_layer_shape).transpose(1, 0, 2, 3).reshape(next_layer_shape[0], -1)
ValueError: cannot reshape array of size 3750 into shape (15,25,5,5)

Question on code in language_fedma.py

Hi,

Thank you for your wonderful work and making the code public.

I have a question regarding the code in language_fedma.py. In line 302-303, why is there a 'pass' if layer_index ==2 for example (please refer to attached screenshot)? What happens to the case where we have more layers?
image

Can I use the code from lines 309-313 again for layer_index ==2 instead of a pass?

Thank you very much!

Reproduce results from run_dist_skew.sh

Lot of intersting ideas in the paper.
I am trying to reproduce the results for fedavg using "run_dist_skew.sh". In Fig 4 in paper the quoted accuracy is around 66%. But in my runs, accuracy hardly increases beyond 50%. Could you please help with the settings needed to reproduce the results in Fig4 for fedavg ??
PC

Running FedMA with large input data shape

Hi @hwang595, a few weeks ago I made some questions in another issue thread about I problem that I had when trying to train a model with input image shape greater or equal to 224x224. Since then, I tried to reduce the dimensions of my problem to the default size, i.e. 32x32, and it worked well! But when I run using 224x224, I'm still locked in this training part.

So I'm gonna ask my questions here again:

  • Is there such a relationship? Training input size and FedMA communication process? If that's true, what can we do about it?
  • By adding a different model, in which part of the code should I take care? Besides changing, for example, the input dimensions to 1x224x224?

Obs.: As I'm working with medical images it is critical resize them.

Thanks for the great work!

Question about reproducing results in the paper (LSTM on Shakespeare dataset)

Hello,
Thank you for the great work. I am studying federated learning in NLP. I tried to reproduce the results in the paper (mainly LSTM on Shakespeare dataset) but results seem very off from what it should be. Please help me recheck what I missed in my experiments.

(1) The Shakespeare data preprocessing is noted like below in the paper:

Screenshot from 2020-10-22 17-16-29

So I use the command like this to preprocess the data:

./preprocess.sh -s niid --sf 1.0 -k 0 -t sample -tf 0.8 -k 10000

(2) It is indicated the the paper that experiments were done with 1-Layer LSTM.
image

Anyway, reading from the code, I believe it is equal to setting:

NUM_LAYERS=3

As it will have one input layer, one output layer and one hidden LSTM layer (where the invariant permutation problem is addressed by FedMA)

(3) It is noted in the paper that FedAvg and FedProx awere trained with 33 communication rounds, while FedMA was trained with 11 communication rounds (because each round of FedMA requires 3 communication rounds correspoding to number of LSTM layers). I actually used 30 for FedAvg and FedProx and 10 for FedMA like these:

For FedAvg
python language_main.py --mode=fedavg --comm-round=30

For FedProx
python language_main.py --mode=fedprox --comm-round=30

For FedMA
python language_main.py --mode=fedma --comm-round=10
(I do not think --comm-round has any effect in FedMA anyway because the code perform single round of FedMA)
Then I performed the rest of FedMA communication round by running
python lstm_fedma_with_comm.py
(The lstm_fedma_with_comm.py has 10 communication rounds hard-coded)

(4) The results seem not aligned with what indicated in the paper. While FedProx got lower test accuracy than FedAvg, but FedMA also got lower accuracy than FedAvg too.

For FedAvg
image

For FedProx
image

For FedMA
Result from the first step (language_main.py)
image

Result from the second step (lstm_fedma_with_comm.py)
image

Results from the paper
image

  • Actually my FedAvg got substantially higher accuracy than in the paper. It reach 0.5 test accuracy while non of these 3 approachs reach such accuracy in the paper.
    ** I did not tune E (local training epoch) and use default value (5) but the results are still not align with indicated in the paper for E=5 anyway.
    image

Thank you in advance for your help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.