Code Monkey home page Code Monkey logo

pygrid-deprecated---see-pysyft-'s People

Contributors

amrmkayid avatar benardi avatar bendecoste avatar cereallarceny avatar dayvsonnascimento avatar gmuraru avatar hereismari avatar iamtrask avatar ianlivingstone avatar ionesiojunior avatar jlebensold avatar jmaunon avatar joaolcaas avatar justin1121 avatar madhavajay avatar marcusvlc avatar matthiaslau avatar mccorby avatar midokura-silvia avatar monuelo avatar nilanshrajput avatar prtfw avatar robert-wagner avatar sachin-101 avatar tallalj avatar tcp avatar vbawa avatar victorperezc avatar vvmnnnkv avatar yanndupis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pygrid-deprecated---see-pysyft-'s Issues

Remote Code Execution (Security)

For full pytorch support we will need remote code execution of arbitrary code (defined by someone who has defined a model)

We will also likely need remote code execution for data adapters, although these are more auditable (would still be nice to have, though)

some things I have considered so far (as well as their downsides)

chroot jail

what it is
Basically we tell a process that /grid/runhere is the root directory and we manually link everything they are allowed to do (basically nothing) this is not that secure because there are ways to break out of it.

docker

We can look into docker, I have read docker is also no truly secure

Torch Integration: inherit_registration method

Currently, grad and data of Variables are constructed with their own registration attributes (including self.id). It would be better if they have the same registration attributes as the Variable that they belong too. The only truly different registration attribute is id; that is, if x = Variable(some_tensor), then x.id != x.data.id != x.grad.id, which is what needs to be fixed. However, we should write a general function that copies all registration attributes over to grad and data tensors, in case we need to include more registration attributes in the future.

Learning Decisions not just Predictions

Context

Instead of training models for predicting an unconditional outcome in the future (e.g. how much will it rain tomorrow, what digit do these pixels represent) we want to train them to select an optimal actions.
The most fundamental case of this, when actions sequences share no underlying state, is known a bandit problem. A richer version of this model, that seems suitable here, since the client wanting to train the model might not take the action chosen by those training the model.

User Story:

A doctor faces a sequence of patients arrive and I can observe their characteristics symptoms (some feature vector for each one) and can select a treatment from some finite set to give them, after treating them, observes if the patient recovers or not (reward 1 or 0). Train a model that will choose treatment conditional on symptoms.

For a first version do as in the standard bandit setting and blindly ignore the action actually taken, only use what the algorithm said should be learned. For a second version it would be nice for the client carrying out the action to be able to give feedback saying he actually took a different action.

Relevant Literature

Let task owner own validation

The task owner never validates who actually is making the best model.

Please discuss ideas on how to do this.

cc @iamtrask I believe if the data owner runs validation too often, we will lead to over fitting, as well as if there are lots of nodes then he might become overwhelmed and unable to validate enough.

Installation and Dependency Management

@jvmancuso brought up a good point in my pull request #102

Question -- is there a way to automatically install the correct version of torch based on CUDA version? I'm really not sure, but if installing torch from requirements.txt will automatically install the CPU version, and we want to use a version compiled for CUDA, we may want to have the user do this on their own and add that to the install instructions.

Right now, this isn't a huge issue, because torch is only imported for certain prototypes, but I'm currently working on a pretty big PR that's going to make PyTorch the main interface for Grid.

I thought I'd open an issue for discussion purposes to start figuring out how to handle these dependencies. Ideally, there'd be a single line or two to run as a developer to setup (e.g. python setup.py install) your development environment.

Then, conda, pip wheels, and binaries could be packaged for distribution on release, similar to pytorch or other libraries.

In terms of installing torch, a part of the setup script could detect whether or not a system dependency is missing and prompt them to install, or ask if they wish to use it?

In terms of the installer, it'd probably be best to be as smart as possible about what's available on the system and the context of the installation (e.g. cuda support available) and then taking advantage of it optimistically.

Thoughts?

Tracing Torch code

long-term, we need a way to lazily trace commands and send them to specific workers. was planning on using a context manager for this. we can just hook and unhook torch every time we switch contexts, which will bring local execution of everything in torch back up to it's normal speed, plus around 400 milliseconds entering and exiting the hooking context. for training loops, this is gonna be a huge speedup, but then we can also have the currently implemented interactive mode if you need it for development or debugging. I think it's best if we merge everything else into Grid before trying this out though

Get method abi locally

Right now the method abi (the data encoded as hex referencing the method code and arguments) comes from Bygone, this is not required. This can be generated in pure python as long as the user has a copy of this file https://raw.githubusercontent.com/OpenMined/Bygone/master/build/TrainingGrid.json. The JSON field 'abi' contains the abi for all the methods. Python web3 can probably be used to extract individual method's abi.

Example how to create contract in web3.js: https://github.com/OpenMined/Bygone/blob/master/contract.js#L37

Example how to use the contract object to get the abi in web3.js: https://github.com/OpenMined/Bygone/blob/master/index.js#L331

Wallet password prompt discussion

As of this #42 PR, whenever a raw transaction is sent to the ethereum blockchain for processing Grid will prompt you for your password. This is not ideal and should probably only prompt you once for a batch of transactions. For a worker it will need to be an asynchronous flow where the user can at some point in the future send a batch of feedback to the smart contract. I believe a similar thing can happen with the data scientist/client. Once some models have been trained they can be prompted to send transactions/feedback. This could be automated but I don't think we should do it that way to make it more secure and as far as I know the real person would need to enter in the feedback anyway.

Default wallet to open when signing transactions

Ideally we want all transactions to be signed locally on the data scientists machine. This is pretty close to being completed but I think we want the signing to automatically use a local wallet or create one if one doesn't exists. Being able to enter a password for the encrypted wallet from jupyter notebook would be neat as well.

Related issue for encrypted wallet here: #5

See how signing can currently happen in this notebook, search for set_identity (can't link to lines in the notebooks ๐Ÿ˜ฒ): https://github.com/OpenMined/Grid/blob/master/notebooks/Keras%20Grid.ipynb

Ability for workers/clients to leave feedback

Once a job has been completed and the payment has been settled the worker and client need to leave feedback for one another. This can be done using a smart contract on an ethereum network. The feedback left would basically be a mapping of the following: client address or worker address => [transaction id, feedback value]. The transaction id would be the id of the transaction that transferred the ether to the worker. Client address or worker address is the address of who's receiving the feedback. So for every client or worker there would be a mapping from their address to a list of all the feedback they have received.

Prerequisites

There are some ethereum blockchain identity issues that must be solved along with this:

Private key generation: #8
Default wallet to use: #6
Encrypted Wallet: #5

Grid Tree data adapters are very limited

Right now tree mode is limited in what it can do (MNIST).

When data scientists specify a task, they don't specify what format the data must be in, just what the task should accomplish. Whenever scientists propose an architecture, the specify the input shape, but they don't really specify what format a node has to have data in.

In the MNIST demo, we propose an architecture where first layer takes in 784 shape and outputs 10.
A node must have data in the directory data/mnist which is specified in the task. However, the file format is completely arbitrary. The demo uses .npz format, which is common for mnist, but what do we do for arbitrary tasks?

Does the data scientist provide an adapter as well as a spec around the data they are speculating on?

Extend Grid to take advantage of IPFS file sharding

IPFS has a max block size of 1MB for security reasons. They've implemented sharding as a way to store larger files/directories on IPFS (see ipfs/notes#76, ipfs/kubo#3042, and also https://github.com/ipfs/js-ipfs-unixfs#usage for an example of how it's used in JS).

This becomes a problem for us, since we'll often want to send tensor objects that contain more than 1MB of data. For example, a 50-dimensional word embedding over a vocabulary of 100,000 words would normally require sending an embedding matrix of at least 50*100000*32/1000000/8=20MB. Training a matrix like this presents a range of challenges, but even freezing it and sending it once would be feasible and useful for users, so this is definitely something we want to be able to do to allow for a larger class of architectures to be trained on Grid.

The goal here would be to figure out a way to do JSON sharding with py-ipfs-api, and then to integrate those changes into Grid.

Better heuristics for when nodes should give up training

In the MNIST demo, a common problem happens when running with 3 nodes, which is that one of the nodes starts to be unable to positively contribute to training.

Worker A and worker B start lowering the loss together, and they send their model to worker C, who's data makes the loss go up, so he doesn't publish.

If worker C ran more epochs or changed his training rates, he'd probably be able to lower the error rate.

Basically, the current scheme to decide if a model has improved is very naive. We should discuss how to make it better!

Persist jobs when no one is listening

If a data scientist tries to create a job and there are no nodes on line, then that job will be lost forever.

Ideally, the job will be stored somewhere and whenever a worker is idle, they can check to see if there is a backlog of jobs that they can work on.

More robust error handling

Client need to be notified of errors on worker nodes. This isn't necessarily specific to the TorchService, but it's likely to happen there very often, and there's no way to robustly prevent incorrect Torch code from being sent to a worker. We need to have a way of notifying the Client when they send a bad command, likely by sending a return to sender message that contains either a Grid-specific error message (e.g. 'command' isn't a torch command, 'obj' isn't a torch object, etc.) or a normal Python error message from a stack trace (e.g. Runtime Error: cannot call .data on a torch.Tensor: did you intend to use autograd.Variable?).

Create unit tests for existing code

Right now a lot of code isn't covered by unit tests. This is an open-ended issue where you can create your own project by decided to write some unit tests to cover an area of code you're interested in. Just create a file with your tests in the tests directory. These tests can then be run by running pytest.

Torch Integration: TorchService

need to integrate all of this into TorchService. this will be a substantial overhaul, and will take some time, but the code from the notebooks should transition pretty smoothly. abstracting as much of the hooking away from the service as possible will be prudent

Ability for a client to make payments

Once a worker has completed a job the client who issued the job needs a way of compensating the worker. The simplest flow for this is that once the trained model has been received the client sends a certain amount of ether to the worker via an ethereum network.

Prerequisites

There are some ethereum blockchain identity issues that must be solved along with this:

Private key generation: #8
Default wallet to use: #6
Encrypted Wallet: #5

Bounty greater than sum total of gas spent

When a client posts a job the bounty cannot be greater than the total amount of gas spent on the network. This is to prove that they have done some sort of work already and that they have the means to settle up the bounty.

Also the worker should check to see if the client has enough ether to send to the worker if the job is completed and the client needs to check to see if the worker has enough reputation to process the job.

Client errors out when verbose=True

Seems to be a problem with stats collection.

Code to reproduce:
client = TorchClient(verbose=True) (note the problem shows up in base.BaseClient)

Stack trace:

Traceback (most recent call last):
  File "/Users/jasonmancuso/anaconda/envs/openmined/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/jasonmancuso/anaconda/envs/openmined/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/jasonmancuso/Grid/grid/clients/base.py", line 44, in ping_known_then_refresh
    self.refresh_network_stats(print_stats=verbose)
  File "/Users/jasonmancuso/Grid/grid/clients/base.py", line 115, in refresh_network_stats
    len(self.stats) - 1, stat))
  File "/Users/jasonmancuso/Grid/grid/clients/pretty_printer.py", line 47, in print_node
    stat_str = self.print_compute(idx, node)
  File "/Users/jasonmancuso/Grid/grid/clients/pretty_printer.py", line 26, in print_compute
    ping = str(stat['ping_time']).split(".")
KeyError: 'ping_time'```

Error when node was running

listing workers...
?!?!?!?!?! openmined:list_workers:QmNxbPtZu1GkXcLE5hzvYkRrcf1kRvxX8cTEPErqAkBwbx []
listing workers...
?!?!?!?!?! openmined:list_workers:QmQf3mhWWHgCv26gkPPjhT4BjipgNZARSFWtpC5GoZ25kc []
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 543, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 302, in _error_catcher
    yield
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 598, in read_chunked
    self._update_chunk_length()
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 547, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/requests/models.py", line 745, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 432, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 626, in read_chunked
    self._original_response.close()
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 320, in _error_catcher
    raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/Grid/grid/pubsub/base.py", line 186, in listen_to_channel_impl
    for m in new_messages:
  File "/home/ubuntu/Grid/grid/ipfsapi/http.py", line 108, in stream_decode
    for data in res:
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/requests/models.py", line 748, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 543, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 302, in _error_catcher
    yield
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 598, in read_chunked
    self._update_chunk_length()
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 547, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/requests/models.py", line 745, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 432, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 626, in read_chunked
    self._original_response.close()
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/urllib3/response.py", line 320, in _error_catcher
    raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/Grid/grid/pubsub/base.py", line 186, in listen_to_channel_impl
    for m in new_messages:
  File "/home/ubuntu/Grid/grid/ipfsapi/http.py", line 108, in stream_decode
    for data in res:
  File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/requests/models.py", line 748, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

Fix existing linter errors: Fix long lines

Ideally we want to keep lines under 80 characters for people coding with small screens. There are a lot of lines currently longer than 80 and some a lot longer than 80. These are mostly string literals, need to break these into a different file or multi-line strings somehow.

Notebook cannot run as is when using docker

Step to reproduce:

  1. Launch the grid using docker
  2. Try to run a notebook

As the grid is not in the python path, the notebooks are not able to run as is.
A fix can be to manually add the grid to the PYTHON PATH at the start of every notebook..

sys.path.append(os.environ['ROOT_DIR'])

or better, add it directly into the docker image.

Prevent TorchClient from printing outside of its iPython cell

TorchClient is set up so that it will asynchronously connect to other Grid nodes. This is convenient during development, so that you can continue to run other cells that don't rely on workers you haven't connected to yet. There is also a verbose argument to TorchClient that controls whether or not printing should be done during this process. But because it's connecting and printing asynchronously, if we're running this in a Jupyter notebook, the printed text can be printed in future cells you're running.

Your mission, should you choose to accept, is to hook into Jupyter notebook extensions and prevent this from happening.

You can reproduce this issue by running from grid.client.torch import TorchClient client = TorchClient(verbose = True) in a notebook and then running several cells of arbitrary code below that while the client is connecting to other nodes.

Dockerize Python Grid Edge Nodes

For both improved security and convenience, we need the ability for people to simply download a docker image for a Grid node - with that image automagically running the appropriate IPFS server and grid worker daemon.

Acceptance Criteria

  • automatically builds from master branch
  • runs IPFS server
  • runs daemon
  • ships with dependencies
  • can run on linux, windows, and mac (tested)
  • automatically attaches to NVIDIA GPUs for all 3 major frameworks (keras, pytorch, tensorflow) (tested) (https://github.com/NVIDIA/nvidia-docker)

Run worker inside pytest when running integration tests

It would be ideal if we could run the worker automatically inside pytest when running integration tests. Currently this error occurs:

    self.run()
  File "/anaconda3/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/justinpatriquin/projects/Grid/grid/workers/base_worker.py", line 158, in listen_to_channel_impl
    out = handle_message(message)
  File "/Users/justinpatriquin/projects/Grid/grid/services/fit_worker.py", line 33, in fit_worker
    return self.fit_keras(decoded)
  File "/Users/justinpatriquin/projects/Grid/grid/services/fit_worker.py", line 50, in fit_keras
    model = keras_utils.ipfs2keras(self.api, decoded['model_addr'])
  File "/Users/justinpatriquin/projects/Grid/grid/lib/keras_utils.py", line 15, in ipfs2keras
    return deserialize_keras_model(api.cat(model_addr))
  File "/Users/justinpatriquin/projects/Grid/grid/lib/keras_utils.py", line 34, in deserialize_keras_model
    model = keras.models.load_model('temp_model2.h5')
  File "/anaconda3/lib/python3.6/site-packages/keras/models.py", line 246, in load_model
    topology.load_weights_from_hdf5_group(f['model_weights'], model.layers)
  File "/anaconda3/lib/python3.6/site-packages/keras/engine/topology.py", line 3382, in load_weights_from_hdf5_group
    K.batch_set_value(weight_value_tuples)
  File "/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2373, in batch_set_value
    get_session().run(assign_ops, feed_dict=feed_dict)
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run
    run_metadata_ptr)
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1083, in _run
    'Cannot interpret feed_dict key as Tensor: ' + e.args[0])
TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("Placeholder:0", shape=(2, 8), dtype=float32) is not an element of this graph.```

Get models out of grid and back to jupyter notebook.

The grid will train models in tree mode, and will publish models to IPFS, but there is not currently any good way for a data scientist to pull models back out from the grid and use them.

One idea might look like this

c = Client()
models = c.get_best_models_for('mnist')

# models is a list of keras.Sequential (or w/e the best architecture was
models.predict(input)

Please share your ideas on a good flow for this.

Torch Integration: Remove dependency of torch hooking on having list of worker ids

Currently, hooking begins with worker_ids, meaning you have to specify the worker IDs before hooking into Torch. This also means that the client doesn't currently have control over where each Torch command goes (by default, it currently goes to every worker ID you registered during hooking). We'd like to change this, which means that client-side tensors need to keep track of all workers that have a particular tensor, not just the last worker they sent it to. will require a few subtle changes to register_object and tensor.send, as well as to the generic torch hooking wrappers (this'll actually simplify those a bit by removing the outer decorator)

Drop out of training if you don't think you'll finish first

We need a mechanism to quit training a model if the trainer doesn't think it will finish first.

For example, if Alice is running a worker node on her macbook air and Bob is running a node on his gaming computer, Bob will probably train any model faster. Both Alice & Bob should publish their progress as they go so Alice will know that she is not going to win and can give up. She would know this when Bob tells her he is 50% done and she is only 5% done. She can then try to pick up work while she is idle again and Bob is still working.

We should do this issue first and then use it in conjunction to make a very slick system to complete #28.

Slices of slices of Tensors aren't being registered.

When iterating over a slice of a tensor, the new chunks aren't registered. So far, the only known problem with this is that printing large tensors raises an AttributeError, although there are likely other bugs that result that we just don't know about yet.

Code to reproduce: print(torch.FloatTensor(128,128)).

Private key generation

Investigate if there is a better way to generate private keys for a wallet. Currently done here https://github.com/OpenMined/Grid/blob/master/grid/bygone/bygone.py#L96. On the stackoverflow question where this code sample is from there was some concern expressed that os.urandom is potential not secure. Finding a better way to do this might be a good idea! Ganache and others seem to use a mnemonic to generate private keys. Doing something similar could be a good idea.

Bring GridConfiguration back to pubsub grid

One of the main benefits of Grid is being able to specify different learning rates, # of epochs, etc and train them all at the same time.

Blockchain implementation supports this but pubsub doesn't. We need to bring it back.

I think pubsub will have have an easier time supporting the ability to have different agents train different parts of the configuration.

E.g.

c1 = GridConfiguration(
  model=m1,
  epochs=20
)

c2 = GridConfiguration(
  model=m2,
  epochs=200
)

r = grid.train(input, target, configurations=[c1,c2])

One agent could pick up c1, another picks up c2.

Fix existing linter errors: Bare Except

There are currently a lot of reports about Bare Excepts from the linter. For example, ./grid/services/listen_for_openmined_nodes.py:48:17: E722 do not use bare except'. This is not ideal and we should be explicitly be catching the errors we want to catch. Not catching specific errors can make debugging difficult.

Torch Integration: Continuous client side commands

when a tensor is remote, the thing that's returned on the client side isn't a tensor object that can be used in further computation yet (mainly because send_command and receive_command haven't been implemented). we need a way to return Tensors on the client side that are pointers that result from computations done elsewhere. this should be pretty simple -- when the client receives the object that results from a remote operation, that object should come with the attributes of the resulting tensor(s), so we can construct an 'empty' tensor on the client side that has the same hooked methods and gets registered with the client, except it gets registered with the same attributes as the remote tensor (except for worker, which will not be needed, and is_pointer_to_remote, which will be opposite). this will make it so that chains of commands will still be able to execute on the client side. right now, local chains of commands can execute, but remote ones can't because there are functions (send_command/receive_command) that are doing placeholder printing in the case when the tensors are remote. this will need to be done inside the wrappers, in the cases when has_remote is True (for assign_workers_function) and when self.is_pointer_to_remote is True (for assign_workers_method).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.