Code Monkey home page Code Monkey logo

nerlnet's Introduction

NErlNet

version

Nerlnet is an open-source library for research of distributed machine learning algorithms on IoT devices that gives full insights into both edge devices that run neural network models and network performance and statistics. Nerlnet can simulate distributed ML architectures and deploy them on various IoT devices.

Nerlnet library combines the following languages to achieve a stable and efficient distributed ML system platform:
• The communication layer of Nerlnet is based on an HTTP web server library, Cowboy.
• C++ OpenNN library (based on Eigen) implements the neural network on edge compute devices.
• An API-Server based on Python Flask allows the user to control experiments and ML phases executed on a Nerlnet cluster.

image image image image

A Json script defines a distributed network layout that consists the following instances:
Edge Compute Device (ECD) which is a worker that runs a neural network model.
Sensor, generates data and send it through the network.
Router that connects ECDs, sensors and other routers.
Communication with Nerlnet is done through a simple python API that can be easily used through Jupyter notebook.
The API allows the user to collect statistics insights of a distributed machine learning network:
Messages, throughput, loss, predictions, ECD performance monitor.

References and libraries:

  • OpenNN, an open-source neural networks library for machine learning.
  • Cowboy an HTTP server for Erlang/OTP.
  • NIFPP C++11 Wrapper for Erlang NIF API.
  • Rebar3, an Erlang tool that makes it easy to create, develop, and release Erlang libraries, applications, and systems in a repeatable manner.
  • Simple Cpp Logger, simple cpp logger headers-only implementation.

Nerlnet is developed by David Leon, Dr. Yehuda Ben-Shimol, and the community of Nerlnet open-source contributors.

Introducing Nerlnet

720p_Nerlnet.Intro.mp4

Nerlnet Architecture:

Nerlnet Architecture

Build and Run Nerlnet:

Recommended cmake version 3.26
Minimum erlang version otp 25 (Tested 24,25,26)
Minimum gcc/g++ version 10.3.0

On every device that is a part of Nerlnet cluster the following steps should be taken:

  1. Clone this repository with its subomdules git clone --recurse-submodules <link to this repo> NErlNet
  2. Run sudo ./NerlnetInstall.sh
    2.1 With argument -i script builds and installs Erlang, latest stable, and CMake. (validate that erlang is not installed before executing installation from source) 2.2 On successful installation, NErlNet directory is accessible
        via the following path: /usr/local/lib/nerlnet-lib
  3. Run ./NerlnetBuild.sh
  4. Test Nerlnet by running: ./tests/NerlnetFullFlowTest.sh
  5. Nerlplanner is a Nerlnet tool to generate required jsons files to setup a distributed system of Nerlnet.
    To use NerlPlanner execute ./NerlPlanner.sh (support starts from version 1.3.0).
    Create json files of distributed configurations, connection map and experiment flow as follows: dc_<any name>.json
    conn_<any name>.json
    exp_<any name>.json
  6. Run ./NerlnetRun.sh to start Nerlnet.
  7. Use API-Server to load generated jsons (step 4) and execute Nerlnet experiment.

Python API and Jupyter-lab (For Api-Server):

Minimum Python version: 3.8

  1. Open a jupyter lab environment using ./NerlnetJupyterLaunch.sh -d <experiment_direcotry>
    1.1 Use -h to see the help menu of NerlnetJupyterLaunch.sh script.
    1.2 If --no-venv option is selected then required modules can be read from src_py/requirements.txt.
  2. Read the instructions of importing Api-Server within the generated readme.md file inside <experiment_directory> folder.
  3. Follow the example: https://github.com/leondavi/NErlNet/blob/master/examples/example_run.ipynb

Contact Email: [email protected]

Gratitudes

Microsoft

A grant of Azure credits as part of Microsoft’s Azure credits for open source projects program (2024).

Amazon AWS

A grant of AWS credits as part of AWSOpen program for open source projects (2024).

nerlnet's People

Contributors

aslanso avatar bsyehuda avatar dolby360 avatar dolby360z avatar dordor7 avatar evgenyan95 avatar galhilu avatar gitter-badger avatar guyperets106 avatar halfway258 avatar hugina avatar kapelnik avatar leondavi avatar noashapira8 avatar ohad123 avatar orisadek avatar zivmo99 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nerlnet's Issues

seg error in AEC's training function

The AEC's training function would crash when exiting it due to pointers problems in the function's arguments (data and autoencoder_data). We solved this by copying the tensor data to tensor data_temp. Need to find a way to make this function work without inefficient copying.

Improve message passing convention between Client and MainServer

The message from client is sent to server thru the router.
It is not clear what is done with Body of HTTP in mainserver incoming message from router

predict(cast, {predictRes,WorkerName,InputName,ResultID,PredictNerlTensor, Type, _TimeTook}, State = #client_statem_state{myName = MyName, msgCounter = Counter,nerlnetGraph = NerlnetGraph,timingMap = TimingMap}) ->
    NewTimingMap = updateTimingMap(WorkerName,TimingMap),

    {RouterHost,RouterPort} = nerl_tools:getShortPath(MyName,"mainServer",NerlnetGraph),

    %io:format("Client got result from predict-~nInputName: ~p,ResultID: ~p, ~nResult:~p~n",[InputName,ResultID,Result]),
    nerl_tools:http_request(RouterHost,RouterPort,"predictRes", term_to_binary({atom_to_list(WorkerName),InputName,ResultID,{PredictNerlTensor, Type}})),
    {next_state, predict, State#client_statem_state{timingMap =NewTimingMap, msgCounter = Counter+1}};

Misleading handlers names

@GuyPerets106
We should fix this right after FullFlowCI integration.
These terms are VERY misleading!

createClientsAndWorkers:
{"/weightsVector",clientStateHandler, [vector,ClientStatemPid]}

Critical mistake - weightsVector is actually batchHandler and vector atom should change to batch_handler

createRouters:
{"/weightsVector",routingHandler, [rout,RouterGenServerPid]},
Critical mistake - weightsVector is actually batchHandler
rout atom change to routing

init of client:
vector -> gen_statem:cast(Client_StateM_Pid,{sample,Body});
change atom vector to batch
change statem pattern to {batch,Body}

routerGenserver:
nerl_tools:sendHTTP(MyName, To, "weightsVector", Body),
Critical mistake: change weightsVector to atom_to_str(batchHandler)

sendBatch method in sourceStatem:
Change http request of weightVector - it is not weights BUT a batch

createRouters and weightsVector methods - change hookname of weightsVector and its function accordingly to batch_handler

affected files:

  • src_erl/NerlnetApp/src/Router/routingHandler.erl
  • src_erl/NerlnetApp/src/Router/routerGenserver.erl
  • src_erl/NerlnetApp/src/Client/clientStateHandler.erl
  • src_erl/NerlnetApp/src/Client/clientStatem.erl
  • src_erl/NerlnetApp/src/Source/sourceStatem.erl
  • src_erl/NerlnetApp/src/nerlnetApp_app.erl

Several scaling/pooling layers should be supported.

ScalingLayer* scaling_layer_pointer = neural_network->get_scaling_layer_pointer();

At the moment only a single scaling layer method selection is supported on json configuration.
Scaling method should be configurable per each scaling layer that is defined in neural network.
Therefore, the input json should be a list of constants, the length of this list equals to the number of scaling layers appears in the model.

The same for pooling layers.

Raspberry pi - After compilation with g++-8.4

@kapelnik Please review the following error message

===> Failed to boot nerlNetServer for reason {bad_return,
                                                         {{nerlNetServer_app,
                                                           start,
                                                           [normal,[]]},
                                                          {'EXIT',
                                                           {{badmatch,
                                                             {error,
                                                              {badmatch,
                                                               {error,
                                                                undef}}}},
                                                            [{clientStatem,
                                                              start_link,1,
                                                              [{file,
                                                                "/home/pi/workspace/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/Client/clientStatem.erl"},
                                                               {line,47}]},
                                                             {nerlNetServer_app,
                                                              createClientsAndWorkers,
                                                              2,
                                                              [{file,
                                                                "/home/pi/workspace/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},
                                                               {line,107}]},
                                                             {nerlNetServer_app,
                                                              start,2,
                                                              [{file,
                                                                "/home/pi/workspace/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},
                                                               {line,80}]},
                                                             {application_master,
                                                              start_it_old,4,
                                                              [{file,
                                                                "application_master.erl"},
                                                               {line,
                                                                277}]}]}}}}

Model/Data validation nif step

We need an implementation of a pre step that perform Model<-->Data validation which means that nif methods validate that data size fits the network input size before train/predict are called.

sendSamples Wrong Frequency

In file: src_erl/NerlnetApp/Source/sourceStatem.erl
Function spawnTransmitter gets 'Frequncy' and computes Ms as 1000*(1/Freq)
Function sendSamples calls left_print which prints how many batches left to send , but it doesn't match the Frequency (very slow).

We should check what happens between each batch transmission and why we don't reach the desired frequency.

Is it possible to run NerlNet on single machine?

Hi, I am new to this and wondering if there is any possible way we could run NerlNet on a single PC?

I've been attempting to set it up on my PC by following the example provided in the document with a changes to my local address and port only.

Steps to re-produce:

  • Successfully installed NerlNet via ./NerlNetInstall.sh
  • Setup Python environment via ./NerlnetJupyterEnvGenerator.sh -j ./examples/
  • Running example_run.ipynb to step api_server_instance.selectJsons() with following choices:
Architecture: 
0.	arch_1PCSIM6WorkerMNist.json

Connection Map Files
0.	conn_1Router1Client1S.json

Experiments Flow Files
1.	exp_1Worker1SourceHealth.json

I did change in arch_*.json by these changes only:

  "devices": [
    {
      "host": "0.0.0.0",
      "entities": "mainServer,c1,c2,c3,c4,c5,c6,s1,r1,r2,r3,r4,r5,r6,apiServer"
    }
  ],
  "apiServer": 
    {
      "host": "0.0.0.0",
      "port": "8080",
      "args": ""
    }
  ,
  "nerlGUI": 
    {
      "host": "0.0.0.0",
      "port": "8096",
      "args": ""
    }
  ,
  "mainServer": 
    {
      "host": "0.0.0.0",
      "port": "8484",
      "args": ""
    }
  • Successfully api_server_instance.sendJsonsToDevices() and received this log from apiServer
Sending JSON paths to devices...
Init JSONs sent to devices

But received this from NerlNet console

2023-08-19T19:36:32.616293+07:00 info: nerlNetServer_app/start@92: This device IP: "192.168.1.10"
2023-08-19T19:37:04.155398+07:00 info: nerlNetServer_app/start@98: ArchitectureAdderess: <<"arch.json">>, CommunicationMapAdderess : <<"conn.json">>
2023-08-19T19:37:04.177674+07:00 notice: Host IP="192.168.1.10"
2023-08-19T19:37:04.177983+07:00 error: crasher: initial call: application_master:init/4, pid: <0.186.0>, registered_name: [], exit: {{bad_return,{{nerlNetServer_app,start,[normal,[]]},{'EXIT',{{badkey,<<"192.168.1.10">>},[{erlang,map_get,[<<"192.168.1.10">>,#{<<"0.0.0.0">> => [mainServer,c1,c2,c3,c4,c5,c6,s1,r1,r2,r3,r4,r5,r6,apiServer]}],[{error_info,#{module => erl_erts_errors}}]},{jsonParser,json_to_ets,2,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/init/jsonParser.erl"},{line,136}]},{jsonParser,getHostEntities,3,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/init/jsonParser.erl"},{line,165}]},{nerlNetServer_app,parseJsonAndStartNerlnet,1,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},{line,144}]},{nerlNetServer_app,start,2,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},{line,100}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}}}},[{application_master,init,4,[{file,"application_master.erl"},{line,142}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [<0.185.0>], message_queue_len: 1, messages: [{'EXIT',<0.187.0>,normal}], links: [<0.185.0>,<0.44.0>], dictionary: [], trap_exit: true, status: running, heap_size: 610, stack_size: 28, reductions: 216; neighbours:
2023-08-19T19:37:04.178554+07:00 notice: Application: nerlNetServer. Exited: {bad_return,{{nerlNetServer_app,start,[normal,[]]},{'EXIT',{{badkey,<<"192.168.1.10">>},[{erlang,map_get,[<<"192.168.1.10">>,#{<<"0.0.0.0">> => [mainServer,c1,c2,c3,c4,c5,c6,s1,r1,r2,r3,r4,r5,r6,apiServer]}],[{error_info,#{module => erl_erts_errors}}]},{jsonParser,json_to_ets,2,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/init/jsonParser.erl"},{line,136}]},{jsonParser,getHostEntities,3,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/init/jsonParser.erl"},{line,165}]},{nerlNetServer_app,parseJsonAndStartNerlnet,1,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},{line,144}]},{nerlNetServer_app,start,2,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},{line,100}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}}}}. Type: temporary.
2023-08-19T19:37:04.179228+07:00 notice: Application: cowboy. Exited: stopped. Type: temporary.
2023-08-19T19:37:05.161489+07:00 notice: Application: ranch. Exited: stopped. Type: temporary.
2023-08-19T19:37:05.161603+07:00 notice: Application: cowlib. Exited: stopped. Type: temporary.
===> Failed to boot nerlNetServer for reason {bad_return,
                                                         {{nerlNetServer_app,
                                                           start,
                                                           [normal,[]]},
                                                          {'EXIT',
                                                           {{badkey,
                                                             <<"192.168.1.10">>},
                                                            [{erlang,map_get,
                                                              [<<"192.168.1.10">>,
                                                               #{<<"0.0.0.0">> =>
                                                                  [mainServer,
                                                                   c1,c2,c3,
                                                                   c4,c5,c6,
                                                                   s1,r1,r2,
                                                                   r3,r4,r5,
                                                                   r6,
                                                                   apiServer]}],
                                                              [{error_info,
                                                                #{module =>
                                                                   erl_erts_errors}}]},
                                                             {jsonParser,
                                                              json_to_ets,2,
                                                              [{file,
                                                                "/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/init/jsonParser.erl"},
                                                               {line,136}]},
                                                             {jsonParser,
                                                              getHostEntities,
                                                              3,
                                                              [{file,
                                                                "/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/init/jsonParser.erl"},
                                                               {line,165}]},
                                                             {nerlNetServer_app,
                                                              parseJsonAndStartNerlnet,
                                                              1,
                                                              [{file,
                                                                "/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},
                                                               {line,144}]},
                                                             {nerlNetServer_app,
                                                              start,2,
                                                              [{file,
                                                                "/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},
                                                               {line,100}]},
                                                             {application_master,
                                                              start_it_old,4,
                                                              [{file,
                                                                "application_master.erl"},
                                                               {line,
                                                                293}]}]}}}}

I did check again in this dir /src_erl/Communication_Layer seem like it successful transfer the json but unable to parse.

Can you help me to get it up and running please?
Thanks a lot!

Error running chown command and enabling nerlnet.service

It appears that some directories are missing during installation:

missing operand after '/usr/local/lib/nerlnet-lib/log'
Try 'chown --help' for more information.
chown: missing operand after '/usr/local/lib/nerlnet-lib/NErlNet'
Try 'chown --help' for more information.
chown: missing operand after '/usr/local/lib/nerlnet-lib/NErlNet/build'
Try 'chown --help' for more information.
enable and start nerlnet.servicessing during the installation process.

Running inside ubuntu LTS.

openNN printing problem

OpenNN's perform_training() command prints undesired outputs.
We should find a way to disable those prints.
Screenshot from 2022-04-04 17-27-30
.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.