securefederatedai / openfl Goto Github PK
View Code? Open in Web Editor NEWAn open framework for Federated Learning.
Home Page: https://openfl.readthedocs.io/en/latest/index.html
License: Apache License 2.0
An open framework for Federated Learning.
Home Page: https://openfl.readthedocs.io/en/latest/index.html
License: Apache License 2.0
Is your feature request related to a problem? Please describe.
This is more like an enhancement. Some environments don't allow you to use specific ports (for example 8888 which seems to be the default port used by Jupiter included in fx). It would be good to allow the users to specify the port they want.
Describe the solution you'd like
It would be good to allow the users to specify the port they want. There is already --ip
option in fx tutorial start
. Addint --port
would solve the problem.
Additional context
Logs from starting the notebook
[I 11:27:36.940 NotebookApp] Serving notebooks from local directory: /home/ubuntu/anaconda3/envs/test/lib/python3.6/site-packages/openfl-tutorials
[I 11:27:36.940 NotebookApp] Jupyter Notebook 6.4.0 is running at:
[I 11:27:36.940 NotebookApp] http://aggregator:8888/?token=f78c90d7649f1166bf83df9f0f6f69c6e605494b9b4a3a23
[I 11:27:36.940 NotebookApp] or http://127.0.0.1:8888/?token=f78c90d7649f1166bf83df9f0f6f69c6e605494b9b4a3a23
[I 11:27:36.940 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 11:27:36.945 NotebookApp] No web browser found: could not locate runnable browser.
Bug while creating CA:
fx pki install -p </path/to/ca/dir> --ca-url <host:port>
Got an exception:
Password:
Repeat for confirmation:
[16:23:46] INFO Creating CA ca.py:157
CA binaries from github will be downloaded now [Y/n]: y
[16:23:51] INFO Downloading step-ca_linux_0.17.2_amd64.tar.gz.sig ca.py:57
EXCEPTION : Unknown archive format './step-ca_linux_0.17.2_amd64.tar.gz.sig'
Traceback (most recent call last):
...
File "/home/akhorkin/.virtualenvs/openfl/bin/fx", line 8, in <module>
sys.exit(entry())
File "/home/akhorkin/.virtualenvs/openfl/lib/python3.8/site-packages/openfl/interface/cli.py", line 214, in entry
error_handler(e)
File "/home/akhorkin/.virtualenvs/openfl/lib/python3.8/site-packages/openfl/interface/cli.py", line 173, in error_handler
raise error
File "/home/akhorkin/.virtualenvs/openfl/lib/python3.8/site-packages/openfl/interface/cli.py", line 212, in entry
cli()
File "/home/akhorkin/.virtualenvs/openfl/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/home/akhorkin/.virtualenvs/openfl/lib/python3.8/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/home/akhorkin/.virtualenvs/openfl/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/akhorkin/.virtualenvs/openfl/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/akhorkin/.virtualenvs/openfl/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/akhorkin/.virtualenvs/openfl/lib/python3.8/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/home/akhorkin/.virtualenvs/openfl/lib/python3.8/site-packages/openfl/interface/pki.py", line 64, in install_
install(ca_path, ca_url, password)
File "/home/akhorkin/.virtualenvs/openfl/lib/python3.8/site-packages/openfl/component/ca/ca.py", line 168, in install
download_step_bin(url, 'step-ca_linux', 'amd', prefix=ca_path, confirmation=False)
File "/home/akhorkin/.virtualenvs/openfl/lib/python3.8/site-packages/openfl/component/ca/ca.py", line 59, in download_step_bin
shutil.unpack_archive(f'{prefix}/{name}', f'{prefix}/step')
File "/usr/local/lib/python3.8/shutil.py", line 1223, in unpack_archive
raise ReadError("Unknown archive format '{0}'".format(filename))
shutil.ReadError: Unknown archive format './step-ca_linux_0.17.2_amd64.tar.gz.sig'
Desktop:
Expected behavior:
.zip
archiveDescribe the bug
When following the setup instructions using conda 4.6, the resulting conda environment failed to install openfl in the environment. While not an openfl bug, we may want to determine how to make these instructions work on older conda versions.
NOTE: I am using the 'develop' branch.
To Reproduce
Steps to reproduce the behavior:
There are no tests for openfl-tutorials/interactive_api.
Its have core scenarious for openfl and if some change broke this functionality it sould be fixed as soon as possible. Becouse it is the entry point for new users. And if something was broken here the user can deside that all library isn't working.
It would be greate to create environment for this notebooks and run it on CI.
https://arxiv.org/pdf/1910.07796.pdf
We'd like to support FedCurv for robust aggregation.
I'm a student and trying to understand this FL framework. I've started with notebook tutorials, like the one called "new_python_api_Tensorflow_MNIST.ipynb". I've run the scenario with 2 collaborators, each one in a respective container. I noticed that a collaborator always (more than 10 run) has better accuracy metrics better than another one even if:
Could this be a feature of this FL framework that I don't know or maybe I haven't understand well how the learning phases works? Thanks for your patience and support.
In some machines, when I run a federation (even if it both the collaborators and the aggregator are on the same machine) they fail to establish a connection.
I have started facing this problem specifically in the interactive API, I have been unsuccessful in running a federation as collaborator is unable to connect to the port where aggregator has started the gRPC server.
Reproducing the error:
I did not do anything differently than what is already mentioned in the tutorial.
I created a fresh conda environment, installed the openfl library and finally tried to replicate the experiment.
We tried to debug this error and in the process we found out that the gRPC server from the aggregator runs exclusively on IPv6. Whereas, collaborator tries to listen to IPv4. We even tried to hardcode the server and the port numbers but we were unable to make it work. We suspect that the error has something to do with the way the gRPC server are started in https://github.com/intel/openfl/blob/c3c0886aefeb09f426fc3726be0f65de2b344e22/openfl/transport/grpc/server.py and https://github.com/intel/openfl/blob/c3c0886aefeb09f426fc3726be0f65de2b344e22/openfl/transport/grpc/client.py
I think this error can pose a potentially big problem in the future. Therefore, please look into it.
Thanks
OpenFL should provide an option to save a log of the TensorDB after failure or when a user hits Ctrl-C
, This will make debugging aggregated values significantly easier and allow users to submit more informative issues
Hi there,
I was trying to do fx plan initialize
but encountered some error message with the data loader as follows:
fx workspace create --prefix ${HOME}/2dunet --template tf_2dunet
pip install -r requirments.txt
fx plan initialize
FileNotFoundError: [Errno 2] No such file or directory: "'/raid/datasets/BraTS17/by_institution_NIfTY/1'"
Nevertheless, we also tried with files and directories that are valid and must exist there such as '/home/', but the error massage is still there, saying No such file or directory: "'/home/'"
Describe the bug
Following the OpenFL documentation instructions here but I get the following error when running pip install openfl
:
ERROR: Could not find a version that satisfies the requirement openfl
ERROR: No matching distribution found for openfl
-OS: MacOS Big Sur
Is your feature request related to a problem? Please describe.
During the initialization of the federation (federated environment), the fx
command is very useful. However, it performs some "additional tasks", that are typically not required (or, in the future may be problematic), and need to be 'rolled-back' manually.
A list of non-necessary actions:
Do not call pip install -r requirements.txt``inside
fx workspace create`
fx workspace create
creates workspace from a template, but it also calls pip install -r requirements.txt
inside.tf_2dunet
, it installs TensorFlow 2.3.1, which is non-current version as of 05/2021 -- the current version is 2.4.1. So typically, there is a big chance it will roll-back already installed (and working) tensorflow version in the user's python environment to the previous version, which may not be working for him. Moreover, if the user want to change/modify his model and/or supply his own pretrained one.do not check data folders inside fx plan initialize
fx plan initialize
take into consideration also data paths set in the <workspace>\plan\data.yaml
file. But since the fx plan initialize
is called on aggregator, and the data folders for individual clients are on a completely different computers, it must not be assumed they can be accessible from the aggregator.Describe the solution you'd like
pip ...
from fx
toolstep_config, cert, pass_file, config
- @dmitryagapov - #150~/.local/workspace/
) with standard naming convention ('director.crt', 'envoy_one.crt', 'envoy_two.crt', etc.) so long living entities can start without always providing path for root_cert, cert, and private_key (defaults can still be overridden) - separated issue was created #161Describe the bug
When launching the federated training in Jupyter notebooks of openfl-tutorials
folder, I noticed that different collaborators achieve the same validation metric values. That could be possible if collaborators had the same data, but the data is randomly split. Looks like there is an issue of defining the Data Loader for each collaborator.
To Reproduce
Steps to reproduce the behavior:
openfl-tutorials/Federated_PyTorch_UNET_Tutorial.ipynb
in Jupyter.Expected behavior
Collaborators have different metrics due to random data split.
Screenshot
It would be great to have fx
+ TAB
combination (like standard Linux autocomplete) feature for current CLI.
Describe the bug
To run envoy we need to install extra requirements that are not installing with openfl.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
envoy is working correct
I propose to create a general mechanism for receiving configurable data and exclude the default values for them from the code, and set them in default configurable file. For example, the director will take params from cli, if they are passed, otherwise from director.yaml in the director workspace, otherwise from openfl-workspace/default/director.yaml
.
Recently, OpenFL has expanded the types of custom aggregation functions that can be computed on collaborator models. Some participants from the FeTS Challenge had complex aggregation strategies they wished to implement that were based on novel calculated metrics that could be used for a future round.
Now we are using ['100-200','300-400']
to describe height and width.
Maybe there will be better alternative ((100, 200), (300,400))
, or even create special type for it.
https://github.com/intel/openfl/pull/151/files#r692170647
Describe the bug
The "model_states" dictionary in the run_experiment function appears to be unused. Perhaps a hold-over from graph sharing?
I am able to comment out all lines involving "model_states" without any impact to the experiment that I can find.
Hi,
If we want to use privacy preserving technologies such as differential privacy in securing aggregation in openfl, are there any tutorials or class interfaces which we can override to include the added security?
Are there any tutorials or class interfaces in openfl in which custom aggregation algorithms can be included other than federated averaging? Edit: I just realize there is new documentation added to custom averaging at https://openfl.readthedocs.io/en/latest/overriding_agg_fn.html
Thanks.
As a user I would like to keep track of my Federated Learning experiments and plot statistics of the model performance.
One implementation may be using a Model Database, such as ModelDB https://github.com/VertaAI/modeldb
This could simply plugin to our current code via the Python API. There are some nice features such as model and data versioning (Git-like) and dashboards.
As of now, the envoy command asks for director's URI whereas the director for IP and port. It would be great if we could pass in the director's URI at both nodes.
Describe the bug
I am trying to setup a federation based on the '' following the documentation written here
The problem is, that the command fx plan initialize
(as mentioned in the point 7) fails due to the checks for non-existing data folders. In default setup, it looks for path (which seems to be some 'leftovers' from your development environment), and even after specifying the local paths, it tries to look for them somewhere else.
To Reproduce
Steps to reproduce the behavior:
tf_2dunet
export WORKSPACE_TEMPLATE=tf_2dunet
export WORKSPACE_PATH=${HOME}/projects/my-work/openfl-federations/federation_0.2
cd ${WORKSPACE_PATH}
fx workspace create --prefix ${WORKSPACE_PATH} --template ${WORKSPACE_TEMPLATE}
requirements.txt
are installed via pip.pip install -r requirements.txt
manually, as mentioned in the point 6, of the tutorial is not necessaryfx
command will not update pip requirements.fx plan initialize
ends with the error:EXCEPTION : [Errno 2] No such file or directory: "'/raid/datasets/BraTS17/by_institution_NIfTY/1'"
EXCEPTION : [Errno 2] No such file or directory: "'/raid/datasets/BraTS17/by_institution_NIfTY/1'"
Traceback (most recent call last):
File "/home/rstoklas/miniconda3/envs/open-fl/bin/fx", line 8, in
sys.exit(entry())
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/openfl/interface/cli.py", line 194, in entry
error_handler(e)
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/openfl/interface/cli.py", line 155, in error_handler
raise error
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/openfl/interface/cli.py", line 192, in entry
cli()
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/openfl/interface/plan.py", line 78, in initialize
task_runner = plan.get_task_runner(collaborator_cname)
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/openfl/federated/plan/plan.py", line 298, in get_task_runner
defaults[SETTINGS]['data_loader'] = self.get_data_loader(
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/openfl/federated/plan/plan.py", line 286, in get_data_loader
self.loader = Plan.Build(**defaults)
File "/home/rstoklas/miniconda3/envs/open-fl/lib/python3.8/site-packages/openfl/federated/plan/plan.py", line 173, in Build
instance = getattr(module, class_name)(**settings)
File "/home/rstoklas/projects/my-work/openfl-federations/federation_0.2/code/tfbrats_inmemory.py", line 29, in init
X_train, y_train, X_valid, y_valid = load_from_NIfTI(parent_dir=data_path,
File "/home/rstoklas/projects/my-work/openfl-federations/federation_0.2/code/brats_utils.py", line 93, in load_from_NIfTI
subdirs = os.listdir(path)
FileNotFoundError: [Errno 2] No such file or directory: "'/raid/datasets/BraTS17/by_institution_NIfTY/1'"
plan/data.yaml
to point to the existing directories, it fails:
{'01-win': 'data/client-01', '02-pegas': 'data/client-02', '03-pegas': 'data/client-03'}
INFO Building 🡆 Object TensorFlowBratsInMemory from code.tfbrats_inmemory Module. plan.py:168
INFO Settings 🡆 {'batch_size': 64, 'percent_train': 0.8, 'collaborator_count': 2, 'data_group_name': 'brats', 'data_path': plan.py:171
'data/client-01'}
INFO Override 🡆 {'defaults': 'plan/defaults/data_loader.yaml'} plan.py:173
EXCEPTION : need at least one array to concatenate
Traceback (most recent call last):
File "c:\anaconda3\envs\open-fl\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\anaconda3\envs\open-fl\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Anaconda3\envs\open-fl\Scripts\fx.exe_main.py", line 7, in
File "c:\anaconda3\envs\open-fl\lib\site-packages\openfl\interface\cli.py", line 194, in entry
error_handler(e)
File "c:\anaconda3\envs\open-fl\lib\site-packages\openfl\interface\cli.py", line 155, in error_handler
raise error
File "c:\anaconda3\envs\open-fl\lib\site-packages\openfl\interface\cli.py", line 192, in entry
cli()
File "c:\anaconda3\envs\open-fl\lib\site-packages\click\core.py", line 829, in call
return self.main(*args, **kwargs)
File "c:\anaconda3\envs\open-fl\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "c:\anaconda3\envs\open-fl\lib\site-packages\click\core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\anaconda3\envs\open-fl\lib\site-packages\click\core.py", line 1259, in invoke
return process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\anaconda3\envs\open-fl\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\anaconda3\envs\open-fl\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "c:\anaconda3\envs\open-fl\lib\site-packages\click\decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "C:\Anaconda3\envs\open-fl\Lib\site-packages\openfl\interface\plan.py", line 77, in initialize
data_loader = plan.get_data_loader(collaborator_cname)
File "c:\anaconda3\envs\open-fl\lib\site-packages\openfl\federated\plan\plan.py", line 293, in get_data_loader
self.loader = Plan.Build(**defaults)
File "c:\anaconda3\envs\open-fl\lib\site-packages\openfl\federated\plan\plan.py", line 179, in Build
instance = getattr(module, class_name)(**settings)
File "C:\Users\rstoklas\cernbox\work\my-projects\FL-phase-3_network\federation-0.1\code\tfbrats_inmemory.py", line 29, in init
X_train, y_train, X_valid, y_valid = load_from_NIfTI(parent_dir=data_path,
File "C:\Users\rstoklas\cernbox\work\my-projects\FL-phase-3_network\federation-0.1\code\brats_utils.py", line 125, in load_from_NIfTI
imgs_train = np.concatenate(imgs_all_train, axis=0)
File "<array_function internals>", line 5, in concatenate
ValueError: need at least one array to concatenate
Expected behavior
Screenshots
If applicable, add screenshots to help explain your problem.
Error with modified and correct paths:
Desktop (please complete the following information):
Describe the bug
After a clean install of openfl (develop branch) in a new conda environment, when running the pytorch MNIST tutorial, cell 2 fails due to "PIL" not found. Fixed by installing 'pillow' in the conda env.
Hi all, I build the openfl docker image (with current master), and I'm trying Keras Mnist tutorial with a docker container. However, currently, I get the following error:
final_fl_model = fx.run_experiment(collaborators,override_config={'aggregator.settings.rounds_to_train':5})
File "/usr/local/lib/python3.8/dist-packages/openfl/native/native.py", line 297, in run_experiment
collaborator.run_simulation()
File "/usr/local/lib/python3.8/dist-packages/openfl/component/collaborator/collaborator.py", line 147, in run_simulation
self.do_task(task, round_number)
File "/usr/local/lib/python3.8/dist-packages/openfl/component/collaborator/collaborator.py", line 192, in do_task
input_tensor_dict = self.get_numpy_dict_for_tensorkeys(
File "/usr/local/lib/python3.8/dist-packages/openfl/component/collaborator/collaborator.py", line 214, in get_numpy_dict_for_tensorkeys
return {k.tensor_name: self.get_data_for_tensorkey(k) for k in tensor_keys}
File "/usr/local/lib/python3.8/dist-packages/openfl/component/collaborator/collaborator.py", line 214, in <dictcomp>
return {k.tensor_name: self.get_data_for_tensorkey(k) for k in tensor_keys}
File "/usr/local/lib/python3.8/dist-packages/openfl/component/collaborator/collaborator.py", line 290, in get_data_for_tensorkey
nparray = self.get_aggregated_tensor_from_aggregator(
File "/usr/local/lib/python3.8/dist-packages/openfl/component/collaborator/collaborator.py", line 328, in get_aggregated_tensor_from_aggregator
tensor = self.client.get_aggregated_tensor(
File "/usr/local/lib/python3.8/dist-packages/openfl/component/aggregator/aggregator.py", line 334, in get_aggregated_tensor
raise ValueError("Aggregator does not have an aggregated tensor"
ValueError: Aggregator does not have an aggregated tensor for TensorKey(tensor_name='dense_3/kernel:0', origin='aggregator_plan.yaml_a379411e', round_number=0, report=False, tags=('model',))
Working environment in docker container
To Reproduce
I think, there is an issue here: https://github.com/intel/openfl/blob/1aa2b16509a1a9a97983760a45aa1e5f133e9e30/openfl/native/native.py#L288
since the model of the first collaborator model was not initialized as the last collaborator: https://github.com/intel/openfl/blob/1aa2b16509a1a9a97983760a45aa1e5f133e9e30/openfl/native/native.py#L259
In addition, there is a sallow copy for the plan of each collaborator: https://github.com/intel/openfl/blob/develop/openfl/native/native.py#L206
Could you please help to check?
Thank you!
Describe the bug
On the Configuring the Federation page there is a "Syntax error in graph mermaid version 8.9.1" error
Screenshots:
Describe the bug
Trying to install docker image gives the following error:
$ sudo docker pull intel/openfl
Using default tag: latest
Error response from daemon: manifest for intel/openfl:latest not found: manifest unknown: manifest unknown
To Reproduce
sudo docker pull intel/openfl
Expected behavior
The intel/openfl image will be successfully installed
Desktop (please complete the following information):
Hey guys,
i really liked the OpenFL and after some readings found that would be quite interesting to have the ARFL as a choose option to substitute FedAvg since it should increase the accuracy a lot in real world scenarios where the data from each client isn't trust able.
article: https://arxiv.org/pdf/2101.05880.pdf
Hello!
I hope whoever reading it is safe and doing great. It seems like people outside intel are unable to join the slack, can anyone give invite for people outside slack
When I have some errors in my code running federation, I get only the error without full traceback with call stack. For example: [17:46:46] ERROR Collaborator failed: list index out of range
. And I have no way of knowing where specifically error is. I only have a message above and link to https://github.com/intel/openfl/blob/c2796b6c3a425436d38c3b3b7f8867e2ea4f9918/openfl/component/envoy/envoy.py#L59
Is your feature request related to a problem? Please describe.
Currently, there is an ambiguity of image name "openfl" in the Docker Hub, since there is product called "Open Flash Library".
The "docker pull openfl" (as stated in the documentation) will point to the openfl/openfl
, which is the other product:
https://hub.docker.com/r/openfl/openfl
Finding the correct docker image intel/openfl does not provide very confidence, since there is no associated description,
Describe the solution you'd like
docker pull intel/openfl
In avoid manual copying for current PKIs/certificate exchange between Aggregator & Collaborators we need an automatic system for that.
Hi I am following tf_2dunet on https://openfl.readthedocs.io/en/latest/running_the_federation.baremetal.html#creating-workspaces , However, fx plan initialize command is killed without any error. I have downloaded Brats data and added the data path in data.yaml file
Hi there,
I am trying to process some 3D medical images (some .nii.gz files) with openFL but I am having some trouble doing so. My data loader is as follows: (data loader from 3D_unet model)
def get_dataset(self):
self.num_train = int(self.numFiles * self.train_test_split)
numValTest = self.numFiles - self.num_train
ds = tf.data.Dataset.range(self.numFiles).shuffle(
self.numFiles, self.random_seed) # Shuffle the dataset
ds_train = ds.take(self.num_train).shuffle(
self.num_train, self.shard) # Reshuffle based on shard
ds_val_test = ds.skip(self.num_train)
self.num_val = int(numValTest * self.validate_test_split)
self.num_test = self.num_train - self.num_val
ds_val = ds_val_test.take(self.num_val)
ds_test = ds_val_test.skip(self.num_val)
ds_train = ds_train.map(lambda x: tf.py_function(self.read_nifti_file,
[x, True], [tf.float32, tf.float32]),
num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_val = ds_val.map(lambda x: tf.py_function(self.read_nifti_file,
[x, False], [tf.float32, tf.float32]),
num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.map(lambda x: tf.py_function(self.read_nifti_file,
[x, False], [tf.float32, tf.float32]),
num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.repeat()
ds_train = ds_train.batch(self.batch_size)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)
batch_size_val = 4
ds_val = ds_val.batch(batch_size_val)
ds_val = ds_val.prefetch(tf.data.experimental.AUTOTUNE)
batch_size_test = 1
ds_test = ds_test.batch(batch_size_test)
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)
return ds_train, ds_val, ds_test
, which output some PrefetchObjects ds_train, ds_val and ds_test. However, according to the data loader file, I believe OpenFL is expecting data loaders to outputs X_train, y_train, X_valid, y_valid, and some follow-up operations (e.g., get batch) with them. I personally found it easier if we can have an option to use the PrefetchObjects directly instead of converting them to X_train, y_train etc.
So I was wondering if OpenFL can have some ways to enable data loaders for the nii.gz files?
Thank you so much for your attention!
Long-Living entities
The idea behind introducing Long-Living entities is that we would explicitly separate stages of setting up a Federation (which is a set of connected nodes) and running the experiment. This allows users to set up a Federation once, with PKI exchange and setting up a correct network settings and then within a one Federation run multiple experiments.
To accomplish this goal we need to implement few more logical entities:
For a simplified version of the proposed workflow, please refer to a picture:
Is your feature request related to a problem? Please describe.
Currently, the packaging is on wheel only, which is fine for pip but can introduce some issues when we run into mis-matched dependency versions when packaging openfl with other packages.
Describe the solution you'd like
Adding an sdist
in addition to the wheel would make things much easier.
Describe alternatives you've considered
N.A.
Additional context
N.A.
Hi,
I am trying out "fx collaborator generate-cert-request -n COL.LABEL" from https://openfl.readthedocs.io/en/latest/running_the_federation.certificates.html and I got the below error:
I am using the "keras_cnn_mnist" template.
How do I resolve the issue?
Thanks.
In the openfl/openfl-workspace/torch_unet_kvasir/code/fed_unet_runner.py
Line23 : It should be
def __init__(self, device='cpu', **kwargs):
Instead of
def __init__(self, device='cuda', **kwargs):
This seems to be a typo, as in the description it is written that the default device is set to be 'cpu'.
Without this change, one would encounter a run-time error due to CUDA not found.
Set default path for step-ca/step CLI binary and certificates (i.e. ~/.local/workspace/) with standard naming convention ('director.crt', 'envoy_one.crt', 'envoy_two.crt', etc.) so long living entities can start without always providing path for root_cert, cert, and private_key (defaults can still be overridden)
Is your feature request related to a problem? Please describe.
When experiment is set on director, director instantly creates and runs aggregator. Will be better if only one aggregator is run at one time.
Describe the solution you'd like
Create structure for experiment.
Create dict, list or queue for experiments.
Task Assigner:
It would be great to have not only Avg operator for tensor aggregation, but also few other simple operations like Geometric Mean, Median, etc.
gRPC is currently pinned to version 1.30. Tensorflow 2.4+ requires a later version. The gRPC version was originally pinned because of sporadic network issues, but this is likely fixed with change to short lived gRPC client connection.
At the end of a round, the aggregator currently calls a set sequence of functions to compute the aggregation of the collaborator models and task metrics. This aggregation procedure is highly tuned for the specific set of tasks normally called in an experiment (aggregated_model_validation
, train_batches
, and local_model_validation
). Adding new tasks with new TensorKey tags does not always behave as expected with this rigid aggregation procedure.
We should instead provide an interface where users can add their own aggregation tasks. This goes a step beyond the current AggregationFunctionInterface
, because it would be applied beyond TensorKeys marked with the ('trained',')
tag, and would and could be made more general. Aggregator Tasks could further be customized to be attached to collaborator tasks, run in sequence, or run one or several at the beginning / end of round. The default set of Aggregator Tasks would execute at the end of the round. The first would compute the weighted average of metrics and report them, and the second would run aggregation on the collaborator models with compression / decompression, and the decision logic for saving the best model could be a third (this would allow easy user customization for saving a model on a metric besides best accuracy).
The exact interface for the aggregator tasks is TBD, but the tasks should be provided access to the TensorDB (read+write), the TensorCodec, and an interface to save models
Describe the bug
The collaborator gives the following error EXCEPTION : [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 451477504 bytes. Error code 12 (Cannot allocate memory)
while running the 'New Interactive Python API (experimental) notebook' (Pytorch using kvasir dataset).
After successfully running round 0, the collaborators receive the new weights from the aggregator and one of the collaborators crashes with the above error. Even though we have checked our RAM and disk memory, it was sufficient before this exception.
To Reproduce
Steps to reproduce the behaviour:
fx collaborator start -d data.yaml -n one
fx collaborator start -d data.yaml -n two
Expected behavior
The model should run more rounds.
Screenshots
Desktop (please complete the following information):
Is your feature request related to a problem? Please describe.
Currently, OpenFL only has the option to be installed through pip, which prevents it to be added to packages that require C/C++ libraries.
Describe the solution you'd like
A conda recipe would be very useful to mitigate this, which I am happy to work on. 😄
Describe alternatives you've considered
N.A.
Additional context
Needs #44
Python native API currently executes collaborators in a sequential way, however it could be done parallel, since their execution is independent.
for round_num in range(rounds_to_train):
for col in plan.authorized_cols:
collaborator = collaborators[col]
model.set_data_loader(collaborator_dict[col].data_loader)
if round_num != 0:
model.rebuild_model(round_num, model_states[col])
collaborator.run_simulation()
model_states[col] = model.get_tensor_dict(with_opt_vars=True)
https://github.com/intel/openfl/blob/704dfd5b958fadf6aafd073c882beec5875b7006/openfl/interface/collaborator.py#L358
Take 0 index of empty list. I think it should be updated_crts
or previous_crts
instead of cert_difference
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.