openmined / pygrid-deprecated---see-pysyft- Goto Github PK

View Code? Open in Web Editor NEW

613.0 61.0 218.0 13.64 MB

A Peer-to-peer Platform for Secure, Privacy-preserving, Decentralized Data Science

License: Apache License 2.0

Python 99.22% Shell 0.36% Dockerfile 0.09% HCL 0.33%

python peer-to-peer pygrid

pygrid-deprecated---see-pysyft-'s People

Contributors

Stargazers

Watchers

Forkers

agent007 mjvypr shubhampachori12110095 iamtrask pablog12 fgoebels kgkon bmorphism sleepyginger cmrfrd robynph sriharikapu adamrainsby mariuszjarzab daviddemeij yanndupis ahirner cthorey jvmncs vbawa akshayjh bendecoste iamsubhokarmakar hung084 eos21 layog robcarney apilipis navneet-nmk vatchalabakthavatchalu albertmundu shaozw gleim decoderkurt param087 vu1seek afcarl fossabot yumatch adrianlsk plasma86 4mber flyingcarpet-network rachanan heyvidy mahendarkarthikeya sbm367 mangotalk idris-aitmoulay hereismari workshopcode mccorby zylophone jasopaum 3lv27 jeamick eshikafe mikewlange harkirat155 todun mariankh1 parthatom iampkmone arjitc12 vishwaparekh tallalj danyele ibtidah midokura-silvia javablack95 afnanamin syyunn joaolcaas vkrit orpheuskey-foss marcusvlc matfrei karlhigley inasic gmuraru allibell systemshift raheja pariyat matthiaslau nonamephysics dnuang gantir jandremarais v-supreetha taapasx28 mars-college h4ll afzaal-hussain sophontec chilang youben11 jazken prtfw javanlacerda

pygrid-deprecated---see-pysyft-'s Issues

Windows installation script

Remote Code Execution (Security)

For full pytorch support we will need remote code execution of arbitrary code (defined by someone who has defined a model)

We will also likely need remote code execution for data adapters, although these are more auditable (would still be nice to have, though)

some things I have considered so far (as well as their downsides)

chroot jail

what it is
Basically we tell a process that /grid/runhere is the root directory and we manually link everything they are allowed to do (basically nothing) this is not that secure because there are ways to break out of it.

docker

We can look into docker, I have read docker is also no truly secure

Torch Integration: inherit_registration method

Currently, grad and data of Variables are constructed with their own registration attributes (including self.id). It would be better if they have the same registration attributes as the Variable that they belong too. The only truly different registration attribute is id; that is, if x = Variable(some_tensor), then x.id != x.data.id != x.grad.id, which is what needs to be fixed. However, we should write a general function that copies all registration attributes over to grad and data tensors, in case we need to include more registration attributes in the future.

Learning Decisions not just Predictions

Context

Instead of training models for predicting an unconditional outcome in the future (e.g. how much will it rain tomorrow, what digit do these pixels represent) we want to train them to select an optimal actions.
The most fundamental case of this, when actions sequences share no underlying state, is known a bandit problem. A richer version of this model, that seems suitable here, since the client wanting to train the model might not take the action chosen by those training the model.

User Story:

A doctor faces a sequence of patients arrive and I can observe their characteristics symptoms (some feature vector for each one) and can select a treatment from some finite set to give them, after treating them, observes if the patient recovers or not (reward 1 or 0). Train a model that will choose treatment conditional on symptoms.

For a first version do as in the standard bandit setting and blindly ignore the action actually taken, only use what the algorithm said should be learned. For a second version it would be nice for the client carrying out the action to be able to give feedback saying he actually took a different action.

Relevant Literature

Computational Principal Agent Problems https://econtheory.org/ojs/index.php/te/article/viewForthcomingFile/1815/18790/1
Compliance aware bandits https://arxiv.org/abs/1602.02852 . Im currently revising that manuscript, so any comments very welcome.

Let task owner own validation

The task owner never validates who actually is making the best model.

Please discuss ideas on how to do this.

cc @iamtrask I believe if the data owner runs validation too often, we will lead to over fitting, as well as if there are lots of nodes then he might become overwhelmed and unable to validate enough.

Extend worker side execution in #129 to work with variable

Requires resolving #148.
See #156 for how this is done for Tensors.

Installation and Dependency Management

@jvmancuso brought up a good point in my pull request #102

Question -- is there a way to automatically install the correct version of torch based on CUDA version? I'm really not sure, but if installing torch from requirements.txt will automatically install the CPU version, and we want to use a version compiled for CUDA, we may want to have the user do this on their own and add that to the install instructions.

Right now, this isn't a huge issue, because torch is only imported for certain prototypes, but I'm currently working on a pretty big PR that's going to make PyTorch the main interface for Grid.

I thought I'd open an issue for discussion purposes to start figuring out how to handle these dependencies. Ideally, there'd be a single line or two to run as a developer to setup (e.g. python setup.py install) your development environment.

Then, conda, pip wheels, and binaries could be packaged for distribution on release, similar to pytorch or other libraries.

In terms of installing torch, a part of the setup script could detect whether or not a system dependency is missing and prompt them to install, or ask if they wish to use it?

In terms of the installer, it'd probably be best to be as smart as possible about what's available on the system and the context of the installation (e.g. cuda support available) and then taking advantage of it optimistically.

Thoughts?

move `pretty_print` functions out of client/grid

There are a few utility functions to print stuff near the bottom of client/grid. These should likely be in a utility class.

Tensorflow loads when loading TorchClient

Shouldn't be necessary to load Keras stuff when working with the PyTorch interface

Tracing Torch code

long-term, we need a way to lazily trace commands and send them to specific workers. was planning on using a context manager for this. we can just hook and unhook torch every time we switch contexts, which will bring local execution of everything in torch back up to it's normal speed, plus around 400 milliseconds entering and exiting the hooking context. for training loops, this is gonna be a huge speedup, but then we can also have the currently implemented interactive mode if you need it for development or debugging. I think it's best if we merge everything else into Grid before trying this out though

Get method abi locally

Right now the method abi (the data encoded as hex referencing the method code and arguments) comes from Bygone, this is not required. This can be generated in pure python as long as the user has a copy of this file https://raw.githubusercontent.com/OpenMined/Bygone/master/build/TrainingGrid.json. The JSON field 'abi' contains the abi for all the methods. Python web3 can probably be used to extract individual method's abi.

Example how to create contract in web3.js: https://github.com/OpenMined/Bygone/blob/master/contract.js#L37

Example how to use the contract object to get the abi in web3.js: https://github.com/OpenMined/Bygone/blob/master/index.js#L331

Normalize structure of special hooks and ensure original function attributes are inherited via functools.wrap

E.g. tensor_type.__init__ should have same attributes/signature as original __init__ after overloading This should include docstring and any other usual attributes. "Special hooks" denote those not covered by overload_function or overload_method.

Torch Integration: IPFS

we need to fill in send_command and receive_command so that they work with ipfs.

Wallet password prompt discussion

As of this #42 PR, whenever a raw transaction is sent to the ethereum blockchain for processing Grid will prompt you for your password. This is not ideal and should probably only prompt you once for a batch of transactions. For a worker it will need to be an asynchronous flow where the user can at some point in the future send a batch of feedback to the smart contract. I believe a similar thing can happen with the data scientist/client. Once some models have been trained they can be prompted to send transactions/feedback. This could be automated but I don't think we should do it that way to make it more secure and as far as I know the real person would need to enter in the feedback anyway.

Default wallet to open when signing transactions

Ideally we want all transactions to be signed locally on the data scientists machine. This is pretty close to being completed but I think we want the signing to automatically use a local wallet or create one if one doesn't exists. Being able to enter a password for the encrypted wallet from jupyter notebook would be neat as well.

Related issue for encrypted wallet here: #5

See how signing can currently happen in this notebook, search for set_identity (can't link to lines in the notebooks 😲): https://github.com/OpenMined/Grid/blob/master/notebooks/Keras%20Grid.ipynb

Ability for workers/clients to leave feedback

Once a job has been completed and the payment has been settled the worker and client need to leave feedback for one another. This can be done using a smart contract on an ethereum network. The feedback left would basically be a mapping of the following: client address or worker address => [transaction id, feedback value]. The transaction id would be the id of the transaction that transferred the ether to the worker. Client address or worker address is the address of who's receiving the feedback. So for every client or worker there would be a mapping from their address to a list of all the feedback they have received.

Prerequisites

There are some ethereum blockchain identity issues that must be solved along with this:

Private key generation: #8
Default wallet to use: #6
Encrypted Wallet: #5

Grid Tree data adapters are very limited

Right now tree mode is limited in what it can do (MNIST).

When data scientists specify a task, they don't specify what format the data must be in, just what the task should accomplish. Whenever scientists propose an architecture, the specify the input shape, but they don't really specify what format a node has to have data in.

In the MNIST demo, we propose an architecture where first layer takes in 784 shape and outputs 10.
A node must have data in the directory data/mnist which is specified in the task. However, the file format is completely arbitrary. The demo uses .npz format, which is common for mnist, but what do we do for arbitrary tasks?

Does the data scientist provide an adapter as well as a spec around the data they are speculating on?

Extend Grid to take advantage of IPFS file sharding

IPFS has a max block size of 1MB for security reasons. They've implemented sharding as a way to store larger files/directories on IPFS (see ipfs/notes#76, ipfs/kubo#3042, and also https://github.com/ipfs/js-ipfs-unixfs#usage for an example of how it's used in JS).

This becomes a problem for us, since we'll often want to send tensor objects that contain more than 1MB of data. For example, a 50-dimensional word embedding over a vocabulary of 100,000 words would normally require sending an embedding matrix of at least 50*100000*32/1000000/8=20MB. Training a matrix like this presents a range of challenges, but even freezing it and sending it once would be feasible and useful for users, so this is definitely something we want to be able to do to allow for a larger class of architectures to be trained on Grid.

The goal here would be to figure out a way to do JSON sharding with py-ipfs-api, and then to integrate those changes into Grid.

Better heuristics for when nodes should give up training

In the MNIST demo, a common problem happens when running with 3 nodes, which is that one of the nodes starts to be unable to positively contribute to training.

Worker A and worker B start lowering the loss together, and they send their model to worker C, who's data makes the loss go up, so he doesn't publish.

If worker C ran more epochs or changed his training rates, he'd probably be able to lower the error rate.

Basically, the current scheme to decide if a model has improved is very naive. We should discuss how to make it better!

Integration tests hang after completed

The integration tests hang after being completed. This is caused by the client launching a thread that never shuts down. It could be caused by this thread: https://github.com/OpenMined/Grid/blob/master/grid/clients/base.py#L46 but there may be others.

Find a way around pythereum not working on windows

Right now pythereum won't build on windows. This will need to be fixed before we're able to do ethereum stuff on windows with the grid.

Persist jobs when no one is listening

If a data scientist tries to create a job and there are no nodes on line, then that job will be lost forever.

Ideally, the job will be stored somewhere and whenever a worker is idle, they can check to see if there is a backlog of jobs that they can work on.

Torch Integration: Worker side command execution

Unpack commands on worker side, performs computation, send resulting tensor's attributes back (excluding self.worker).

Encrypted Wallet Creation

There is a prototype for wallet creation here https://github.com/OpenMined/Grid/blob/master/grid/bygone/bygone.py#L94. Its not really ideal because it should probably be encrypted with a password.

More robust error handling

Client need to be notified of errors on worker nodes. This isn't necessarily specific to the TorchService, but it's likely to happen there very often, and there's no way to robustly prevent incorrect Torch code from being sent to a worker. We need to have a way of notifying the Client when they send a bad command, likely by sending a return to sender message that contains either a Grid-specific error message (e.g. 'command' isn't a torch command, 'obj' isn't a torch object, etc.) or a normal Python error message from a stack trace (e.g. Runtime Error: cannot call .data on a torch.Tensor: did you intend to use autograd.Variable?).

Create unit tests for existing code

Right now a lot of code isn't covered by unit tests. This is an open-ended issue where you can create your own project by decided to write some unit tests to cover an area of code you're interested in. Just create a file with your tests in the tests directory. These tests can then be run by running pytest.

Torch Integration: TorchService

need to integrate all of this into TorchService. this will be a substantial overhaul, and will take some time, but the code from the notebooks should transition pretty smoothly. abstracting as much of the hooking away from the service as possible will be prudent

Ability for a client to make payments

Once a worker has completed a job the client who issued the job needs a way of compensating the worker. The simplest flow for this is that once the trained model has been received the client sends a certain amount of ether to the worker via an ethereum network.

Prerequisites

There are some ethereum blockchain identity issues that must be solved along with this:

Private key generation: #8
Default wallet to use: #6
Encrypted Wallet: #5

Bounty greater than sum total of gas spent

When a client posts a job the bounty cannot be greater than the total amount of gas spent on the network. This is to prove that they have done some sort of work already and that they have the means to settle up the bounty.

Also the worker should check to see if the client has enough ether to send to the worker if the job is completed and the client needs to check to see if the worker has enough reputation to process the job.

Client errors out when verbose=True

Seems to be a problem with stats collection.

Code to reproduce:
client = TorchClient(verbose=True) (note the problem shows up in base.BaseClient)

Stack trace:

Traceback (most recent call last):
  File "/Users/jasonmancuso/anaconda/envs/openmined/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/jasonmancuso/anaconda/envs/openmined/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/jasonmancuso/Grid/grid/clients/base.py", line 44, in ping_known_then_refresh
    self.refresh_network_stats(print_stats=verbose)
  File "/Users/jasonmancuso/Grid/grid/clients/base.py", line 115, in refresh_network_stats
    len(self.stats) - 1, stat))
  File "/Users/jasonmancuso/Grid/grid/clients/pretty_printer.py", line 47, in print_node
    stat_str = self.print_compute(idx, node)
  File "/Users/jasonmancuso/Grid/grid/clients/pretty_printer.py", line 26, in print_compute
    ping = str(stat['ping_time']).split(".")
KeyError: 'ping_time'```

Error when node was running

listing workers...
?!?!?!?!?! openmined:list_workers:QmNxbPtZu1GkXcLE5hzvYkRrcf1kRvxX8cTEPErqAkBwbx []
listing workers...
?!?!?!?!?! openmined:list_workers:QmQf3mhWWHgCv26gkPPjhT4BjipgNZARSFWtpC5GoZ25kc []
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 543, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 302, in _error_catcher
    yield
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 598, in read_chunked
    self._update_chunk_length()
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 547, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/requests/models.py", line 745, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 432, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 626, in read_chunked
    self._original_response.close()
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 320, in _error_catcher
    raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/Grid/grid/pubsub/base.py", line 186, in listen_to_channel_impl
    for m in new_messages:
  File "/home/ubuntu/Grid/grid/ipfsapi/http.py", line 108, in stream_decode
    for data in res:
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/requests/models.py", line 748, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 543, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 302, in _error_catcher
    yield
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 598, in read_chunked
    self._update_chunk_length()
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 547, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/requests/models.py", line 745, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 432, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 626, in read_chunked
    self._original_response.close()
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 320, in _error_catcher
    raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/Grid/grid/pubsub/base.py", line 186, in listen_to_channel_impl
    for m in new_messages:
  File "/home/ubuntu/Grid/grid/ipfsapi/http.py", line 108, in stream_decode
    for data in res:
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/requests/models.py", line 748, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

Windows installation/usage tutorial

Fix existing linter errors: Fix long lines

Ideally we want to keep lines under 80 characters for people coding with small screens. There are a lot of lines currently longer than 80 and some a lot longer than 80. These are mostly string literals, need to break these into a different file or multi-line strings somehow.

Notebook cannot run as is when using docker

Step to reproduce:

Launch the grid using docker
Try to run a notebook

As the grid is not in the python path, the notebooks are not able to run as is.
A fix can be to manually add the grid to the PYTHON PATH at the start of every notebook..

sys.path.append(os.environ['ROOT_DIR'])

or better, add it directly into the docker image.

Prevent TorchClient from printing outside of its iPython cell

TorchClient is set up so that it will asynchronously connect to other Grid nodes. This is convenient during development, so that you can continue to run other cells that don't rely on workers you haven't connected to yet. There is also a verbose argument to TorchClient that controls whether or not printing should be done during this process. But because it's connecting and printing asynchronously, if we're running this in a Jupyter notebook, the printed text can be printed in future cells you're running.

Your mission, should you choose to accept, is to hook into Jupyter notebook extensions and prevent this from happening.

You can reproduce this issue by running from grid.client.torch import TorchClient client = TorchClient(verbose = True) in a notebook and then running several cells of arbitrary code below that while the client is connecting to other nodes.

Dockerize Python Grid Edge Nodes

For both improved security and convenience, we need the ability for people to simply download a docker image for a Grid node - with that image automagically running the appropriate IPFS server and grid worker daemon.

Acceptance Criteria

automatically builds from master branch
runs IPFS server
runs daemon
ships with dependencies
can run on linux, windows, and mac (tested)
automatically attaches to NVIDIA GPUs for all 3 major frameworks (keras, pytorch, tensorflow) (tested) (https://github.com/NVIDIA/nvidia-docker)

Can't use grid with huge datasets

Even the full mnist set fails (all 60000 images)

Automate contributing guidelines at bottom of README

I've added some contributing guidelines at the bottom of the README on how to make sure your code is good before sending a PR. It would be good if these steps were made automatic.

Run worker inside pytest when running integration tests

It would be ideal if we could run the worker automatically inside pytest when running integration tests. Currently this error occurs:

    self.run()
  File "/anaconda3/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/justinpatriquin/projects/Grid/grid/workers/base_worker.py", line 158, in listen_to_channel_impl
    out = handle_message(message)
  File "/Users/justinpatriquin/projects/Grid/grid/services/fit_worker.py", line 33, in fit_worker
    return self.fit_keras(decoded)
  File "/Users/justinpatriquin/projects/Grid/grid/services/fit_worker.py", line 50, in fit_keras
    model = keras_utils.ipfs2keras(self.api, decoded['model_addr'])
  File "/Users/justinpatriquin/projects/Grid/grid/lib/keras_utils.py", line 15, in ipfs2keras
    return deserialize_keras_model(api.cat(model_addr))
  File "/Users/justinpatriquin/projects/Grid/grid/lib/keras_utils.py", line 34, in deserialize_keras_model
    model = keras.models.load_model('temp_model2.h5')
  File "/anaconda3/lib/python3.6/site-packages/keras/models.py", line 246, in load_model
    topology.load_weights_from_hdf5_group(f['model_weights'], model.layers)
  File "/anaconda3/lib/python3.6/site-packages/keras/engine/topology.py", line 3382, in load_weights_from_hdf5_group
    K.batch_set_value(weight_value_tuples)
  File "/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2373, in batch_set_value
    get_session().run(assign_ops, feed_dict=feed_dict)
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run
    run_metadata_ptr)
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1083, in _run
    'Cannot interpret feed_dict key as Tensor: ' + e.args[0])
TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("Placeholder:0", shape=(2, 8), dtype=float32) is not an element of this graph.```

Get models out of grid and back to jupyter notebook.

The grid will train models in tree mode, and will publish models to IPFS, but there is not currently any good way for a data scientist to pull models back out from the grid and use them.

One idea might look like this

c = Client()
models = c.get_best_models_for('mnist')

# models is a list of keras.Sequential (or w/e the best architecture was
models.predict(input)

Please share your ideas on a good flow for this.

Torch Integration: Remove dependency of torch hooking on having list of worker ids

Currently, hooking begins with worker_ids, meaning you have to specify the worker IDs before hooking into Torch. This also means that the client doesn't currently have control over where each Torch command goes (by default, it currently goes to every worker ID you registered during hooking). We'd like to change this, which means that client-side tensors need to keep track of all workers that have a particular tensor, not just the last worker they sent it to. will require a few subtle changes to register_object and tensor.send, as well as to the generic torch hooking wrappers (this'll actually simplify those a bit by removing the outer decorator)

Drop out of training if you don't think you'll finish first

We need a mechanism to quit training a model if the trainer doesn't think it will finish first.

For example, if Alice is running a worker node on her macbook air and Bob is running a node on his gaming computer, Bob will probably train any model faster. Both Alice & Bob should publish their progress as they go so Alice will know that she is not going to win and can give up. She would know this when Bob tells her he is 50% done and she is only 5% done. She can then try to pick up work while she is idle again and Bob is still working.

We should do this issue first and then use it in conjunction to make a very slick system to complete #28.

Add automated formating/style checking to travis

It would be good to automatically check formatting with travis.

Slices of slices of Tensors aren't being registered.

When iterating over a slice of a tensor, the new chunks aren't registered. So far, the only known problem with this is that printing large tensors raises an AttributeError, although there are likely other bugs that result that we just don't know about yet.

Code to reproduce: print(torch.FloatTensor(128,128)).

Private key generation

Investigate if there is a better way to generate private keys for a wallet. Currently done here https://github.com/OpenMined/Grid/blob/master/grid/bygone/bygone.py#L96. On the stackoverflow question where this code sample is from there was some concern expressed that os.urandom is potential not secure. Finding a better way to do this might be a good idea! Ganache and others seem to use a mnemonic to generate private keys. Doing something similar could be a good idea.

support more languages

currently only keras is supported, other languages would be nice!

Bring GridConfiguration back to pubsub grid

One of the main benefits of Grid is being able to specify different learning rates, # of epochs, etc and train them all at the same time.

Blockchain implementation supports this but pubsub doesn't. We need to bring it back.

I think pubsub will have have an easier time supporting the ability to have different agents train different parts of the configuration.

E.g.

c1 = GridConfiguration(
  model=m1,
  epochs=20
)

c2 = GridConfiguration(
  model=m2,
  epochs=200
)

r = grid.train(input, target, configurations=[c1,c2])

One agent could pick up c1, another picks up c2.

Fix existing linter errors: Bare Except

There are currently a lot of reports about Bare Excepts from the linter. For example, ./grid/services/listen_for_openmined_nodes.py:48:17: E722 do not use bare except'. This is not ideal and we should be explicitly be catching the errors we want to catch. Not catching specific errors can make debugging difficult.

Torch Integration: Continuous client side commands

when a tensor is remote, the thing that's returned on the client side isn't a tensor object that can be used in further computation yet (mainly because send_command and receive_command haven't been implemented). we need a way to return Tensors on the client side that are pointers that result from computations done elsewhere. this should be pretty simple -- when the client receives the object that results from a remote operation, that object should come with the attributes of the resulting tensor(s), so we can construct an 'empty' tensor on the client side that has the same hooked methods and gets registered with the client, except it gets registered with the same attributes as the remote tensor (except for worker, which will not be needed, and is_pointer_to_remote, which will be opposite). this will make it so that chains of commands will still be able to execute on the client side. right now, local chains of commands can execute, but remote ones can't because there are functions (send_command/receive_command) that are doing placeholder printing in the case when the tensors are remote. this will need to be done inside the wrappers, in the cases when has_remote is True (for assign_workers_function) and when self.is_pointer_to_remote is True (for assign_workers_method).

Implement Torch Grid and Client Notebook

Like the Keras notebook, implement one for PyTorch:
https://github.com/OpenMined/Grid/blob/master/notebooks/pubsub/Keras%20Grid%20Client%20and%20Worker.ipynb

TODO

Implement fit, serialize and deserialize support for torch.
Create a python notebook that works like the one for Keras

openmined / pygrid-deprecated---see-pysyft- Goto Github PK

pygrid-deprecated---see-pysyft-'s People

Contributors

Stargazers

Watchers

Forkers

pygrid-deprecated---see-pysyft-'s Issues

chroot jail

docker

Context

User Story:

Relevant Literature

Prerequisites

Prerequisites

Recommend Projects

Recommend Topics

Recommend Org