Good job! I am very interested in this work and I tried run the experiments mentioned

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

Thanks, <a class="user-mention notranslate" data-hovercard-type="user" da

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

That was quicker than I expected, huge thanks <a class="user-mention notranslate" data

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Provide more details about the experiments? about fedma HOT 15 OPEN

ibm commented on June 26, 2024

Provide more details about the experiments?

from fedma.

Comments (15)

hwang595 commented on June 26, 2024 3

Hi @ddayzzz, glad to hear your interest!

To answer your questions:

Actually all CIFAR-10, MNIST, and Shakespeare experiments are available. Sorry for not clarifying this in a more detailed way. To run CIFAR-10 and MNIST experiments, you can refer to FedMA/run.sh. Set --model=moderate-cnn --dataset=cifar10 and --model=lenet --dataset=mnist will give you CIFAR-10 and MNIST experiments respectively. We didn't try the iterative FedMA experiments on MNIST though since PFNM already gives good performance. To run Shakespeare experiments, you can refer to FedMA/language_modeling/run_language_main.sh and FedMA/language_modeling/run_fedma_with_comm.sh. Running run_language_main.sh will complete the very first local training process and FedMA. It then saves three intermediate results i.e. lstm_matching_assignments, lstm_matching_shapes, and matched_global_weights. With those, you can run run_fedma_with_comm.sh, which tries to load those intermediate results as starting point and start the iterative FedMA process. The reason for organizing the scripts in this way is to avoid training the local models and matching them repeatedly, which also relates to your second question.
To run FedMA, you can just set --comm_type=fedma. The term retrain exists since in the very beginning of federated learning, participating users will need to train their local models first (so it needs to be set as --retrain=True for the first time you run the experiment). In our simulated environment, after local training, all trained local models will be saved. Thus after the first time you run the experiment, you can choose to set --retrain= (leaving blank means False here) to avoid local training repeatedly.

Please feel free to report any issues and to create PRs. We will be happy to help and work with you.

from fedma.

hwang595 commented on June 26, 2024 1

Hi @joaolcaas, use comm_round greater than 1 for any experiment other than MNIST+LeNet will work. For LeNet, we will need to adjust the code a little bit to make it work since we didn't conduct that experiment previously. But please let me know if you're interested in running multi-round FedMA on LeNet, I will make it work.

from fedma.

joaolcaas commented on June 26, 2024

Hi @hwang595. I'm looking into your repo/paper and I saw that you guys did report mnist dataset using lenet on the paper that was published at ICLR. I'm trying to reproduce by change some lines at the code but some tricky problems are happening. How hard is it to reproduce the same result showed at matched averaging?

from fedma.

hwang595 commented on June 26, 2024

Hi @joaolcaas, thanks a lot for your interests in our work! Can you please provide more details on the issue you encountered for the MNIST experiments? Based on our tests, PFNM already provides a good accuracy on MNIST in both homogeneous and heterogeneous settings. Thus, this repo focuses more on the CIFAR-10 and Shakespeare experiments.

But I'm happy to help with resolving the issues in the MNIST experiment. Please also feel free to take a look at the PFNM github repo.

Thanks!

from fedma.

joaolcaas commented on June 26, 2024

Thanks, @hwang595. As you requested, more details below.

First I tried to run the experiment using this command to check how the algorithm behaves:
python main.py --model=lenet --dataset=mnist --lr=0.01 --retrain_lr=0.01 --batch-size=64 --epochs=10 --retrain_epochs=10 --n_nets=3 --partition=hetero-dir --comm_type=fedma --comm_round=10 --retrain=True --rematching=True

The output was the following:

I saw that block_patching does not support lenet architecture, so I add SimpleLenetContainerConvBlocks, a half-model with just convolutional layers and their operations from your lenet implementation. Then I tried to run again and got a different error. At BB_MAP loop, when layer_index reaches 3, the following error occurs:

That was as far as I could get.

from fedma.

wildphoton commented on June 26, 2024

Thanks, @hwang595. As you requested, more details below.

First I tried to run the experiment using this command to check how the algorithm behaves:
python main.py --model=lenet --dataset=mnist --lr=0.01 --retrain_lr=0.01 --batch-size=64 --epochs=10 --retrain_epochs=10 --n_nets=3 --partition=hetero-dir --comm_type=fedma --comm_round=10 --retrain=True --rematching=True

The output was the following:

I saw that block_patching does not support lenet architecture, so I add SimpleLenetContainerConvBlocks, a half-model with just convolutional layers and their operations from your lenet implementation. Then I tried to run again and got a different error. At BB_MAP loop, when layer_index reaches 3, the following error occurs:

That was as far as I could get.

@hwang595 Hi Hongyi, I met the same issue. I think in the released code, the shape estimator is not defined for LeNet.

from fedma.

hwang595 commented on June 26, 2024

Hi @joaolcaas @wildphoton, thanks for providing the detailed error messages. I can replicate your issues. As I mentioned since this repository focuses more on the CIFAR-10 and Shakespeare experiments, when I made the first commit, I didn't realize the MNIST+LeNet part of code is not up-to-date. Sorry about that!

I made the fixes, it should run without problem now. But please keep in mind that we still don't support the multi-round version of MNIST+LeNet since one round of FedMA already gives >97% accuracy and matches the accuracy we can expect by the ensemble method. But I'm happy to make further improvement to support multi-round FedMA if you are interested.

I'm happy to help more on your experiments and provide more detailed information!

from fedma.

joaolcaas commented on June 26, 2024

That was quicker than I expected, huge thanks @hwang595.

I'm testing for now, but you are saying that if I use comm_round greater than 1 the experiment will not work?

from fedma.

joaolcaas commented on June 26, 2024

@hwang595 yeah, it would be really amazing if you can do that!

from fedma.

jefersonf commented on June 26, 2024

Hi @hwang595, I understand that the experiments focus on well-known models and small datasets. What if we try large datasets and models with different architectures? I failed to make modifications like changing the model to, e.g., DenseNet and MobileNet, and adding another dataset with input shape greater than 32x32.

I've been trying the following scenario: Use ModerateCNNContainer and a dataset with input shape reduced to 224x224 and config. as it follows.

...
--model=moderate-cnn \ 
--dataset=<large-dataset>
--epochs=10 \
--retrain_epochs=5 \
--n_nets=3 \
--partition=hetero-dir \
--comm_type=fedma \
--comm_round=10

Obs.: This dataset has only 10 examples per class in a total of 5 classes. This setting is used just to test the training pipeline.

But the training process takes too long and also it stucks in that part bellow.

Was that expected? Maybe I'm doing something wrong..

I've been concerned about the relationship of input dimensions to the general complexity that the Matching Average adds during communications. So, my questions are:

Is there such a relationship? Training input size and FedMA communication time? If that's true, what can we do about it?
By adding a different model, in which part of the code should I take care? Besides changing, for example, the input dimensions to 1x224x224? (I confess I get lost sometimes when it comes to data transformations/reshapes and everything else.)

from fedma.

wildphoton commented on June 26, 2024

Hi @joaolcaas @wildphoton, thanks for providing the detailed error messages. I can replicate your issues. As I mentioned since this repository focuses more on the CIFAR-10 and Shakespeare experiments, when I made the first commit, I didn't realize the MNIST+LeNet part of code is not up-to-date. Sorry about that!

I made the fixes, it should run without problem now. But please keep in mind that we still don't support the multi-round version of MNIST+LeNet since one round of FedMA already gives >97% accuracy and matches the accuracy we can expect by the ensemble method. But I'm happy to make further improvement to support multi-round FedMA if you are interested.

I'm happy to help more on your experiments and provide more detailed information!

@hwang595 Thanks for your quick response! I tried the updated code, but it gave me the following error by running

python main.py --model=lenet --dataset=mnist --lr=0.01 --retrain_lr=0.01 --batch-size=64 --epochs=2 --retrain_epochs=2 --n_nets=2 --partition=homo --comm_type=fedma --comm_round=1 --retrain=True

from fedma.

joaolcaas commented on June 26, 2024

Hi @joaolcaas @wildphoton, thanks for providing the detailed error messages. I can replicate your issues. As I mentioned since this repository focuses more on the CIFAR-10 and Shakespeare experiments, when I made the first commit, I didn't realize the MNIST+LeNet part of code is not up-to-date. Sorry about that!
I made the fixes, it should run without problem now. But please keep in mind that we still don't support the multi-round version of MNIST+LeNet since one round of FedMA already gives >97% accuracy and matches the accuracy we can expect by the ensemble method. But I'm happy to make further improvement to support multi-round FedMA if you are interested.
I'm happy to help more on your experiments and provide more detailed information!

@hwang595 Thanks for your quick response! I tried the updated code, but it gave me the following error by running
python main.py --model=lenet --dataset=mnist --lr=0.01 --retrain_lr=0.01 --batch-size=64 --epochs=2 --retrain_epochs=2 --n_nets=2 --partition=homo --comm_type=fedma --comm_round=1 --retrain=True

yep, the same error here, unfortunately

from fedma.

hwang595 commented on June 26, 2024

Hi @wildphoton @joaolcaas, thanks a lot for trying it out! Yes, that error is as expected since you already entered the fedma_comm function, which runs the multi-round FedMA. I will update the code base to make LeNet+MNIST compatible with multi-round FedMA soon. Please stay tuned!

But even before fedma_comm you should already finished one-round FedMA right e.g. the code finishes merging the 2 locally trained model?

from fedma.

joaolcaas commented on June 26, 2024

@hwang595 oh, I got it. Basically fedma with LeNet + MNIST will work only until this line, right? We have to pre-train the models and then make 1 comm_round of fedma without enter in fedma_comm.

I think that happened here, broke onlye inside fedma_comm

from fedma.

wildphoton commented on June 26, 2024

Hi @wildphoton @joaolcaas, thanks a lot for trying it out! Yes, that error is as expected since you already entered the fedma_comm function, which runs the multi-round FedMA. I will update the code base to make LeNet+MNIST compatible with multi-round FedMA soon. Please stay tuned!

But even before fedma_comm you should already finished one-round FedMA right e.g. the code finishes merging the 2 locally trained model?

So --comm_round=0 actually means round 1? If I understand it correctly, PFNM is basically one round FedMA?

from fedma.

Provide more details about the experiments? about fedma HOT 15 OPEN

Comments (15)

Related Issues (16)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent