lablup / backend.ai-kernels Goto Github PK
View Code? Open in Web Editor NEWRepository of Backend.AI-enabled container recipes
Home Page: https://www.backend.ai
License: GNU Lesser General Public License v3.0
Repository of Backend.AI-enabled container recipes
Home Page: https://www.backend.ai
License: GNU Lesser General Public License v3.0
This is an issue delegating lablup/sorna-agent#1.
(It is too long to type in the first line of commit messages...)
Some languages offer standardized logging (e.g., Python's logging
module and Julia's info()
, warn()
functions). Let's wrap them and provide a prettier output by distinguishing them via a separate type of stream: "log". (Currently the new PUSH/PULL agent protocol only recognizes "stdout", "stderr", "media", "finished", "waiting-input" message types.)
Many deep learning codes requires a lot of memory and computation time.
We need some automated way to measure the maximum memory used and computation time for a given example code, for better capacity planning and scheduler designs.
Add C++ language support
ref) https://github.com/NVIDIA/nvidia-docker
We need to rebuild our caffe/tensorflow images based on NVIDIA's cuda base images.
Add support for Rust language
For Python kernels, we also need to update sorna-media package to v0.3.
Minimum features:
[b'stdout', b'utf8-encoded-text']
and [b'stderr', b'utf8-encoded-text']
.[b'finished', b'']
when execution is done.Optional features:
self.handle_input
in Python 3 impl.)
Tips:
test_run.py
in python3 kernel directory to test the main programs before building docker containers for fast iteration of debugging and development.Add a custom Git command shell for Git tutorial courses.
TensorFlow v1.0 is released last week.
Egoing has reported an issue that he could not see the result of the following code:
var 입력한비밀번호 = '1111';
var 소금의크기 = 32;
var 암호화반복횟수 = 10000;
var 암호의길이 = 32;
var crypto = require('crypto');
crypto.randomBytes(소금의크기, function(오류, 소금){
crypto.pbkdf2(입력한비밀번호, 소금, 암호화반복횟수, 암호의길이, 'sha512', function(오류, 생성된암호){
console.log(생성된암호.toString('hex'));
});
});
This is due to the current nodejs kernel just goes through the synchronous part until it sends the execution result and callbacks generated by the user code are executed later.
We need a "blocking" mechanism until all user callbacks finish as well as temporarily removing existing sorna-side callbacks from the event loop.
As the result, I have found a small hacky open source project that uses C++ addon to access uv_run()
function, and patched it to implement a blocking call until all callbacks finish:
abbr/deasync#53
Then, I have added unref()
/ ref()
support to zeromq.node project:
JustinTulloss/zeromq.node#503
Now we can implement a proper blocking call for nodejs4 kernel.
Add Go language support.
go get
-like functionality?Many kernels reuse the same intial Dockerfile procedures. Let's enable caching for them.
A collection of custom package inclusion requests.
Several code examples (e.g., this one) using TensorFlow crash due to thread limits imposed by our jail.
terminate called after throwing an instance of 'std::system_error'
what(): Operation not permitted
The root cause is libeigen (a C++ matrix calculation library) used by TensorFlow which reads OMP_NUM_THREADS
environment variable to initialize its thread pool.
ref) http://eigen.tuxfamily.org/dox/TopicMultiThreading.html
Jail should be compiled in Linux (preferably the same version of Ubuntu as REPL kernels use) and thus native Docker environments requires a separate Ubuntu image setup.
Let's add some helper scripts for building new jail binaries.
Add support for PyTorch.
set convert-meta off
in /etc/inputrc
to allow output of 8-bit characterslocale-gen en_US.UTF-8
and set LANG environment variable so that bash can handle multi-byte UTF-8 characters correctly. (e.g., backspace should delete each Unicode char like a single char.)/etc/vim/vimrc.local
/etc/vim/vimrc
and /usr/share/vim/vim74/debian.vim
already has syntax highlighting, eol, nocompatible settings.More to come.
... so that other people can easily update the service images.
Currently, we use only a single sorna instance, sorna.lablup
, but this should be extended to cover multiple instances via docker-registry.lablup
.
In development phase, many engineers often interrupt ongoing executions when they realize something is going to be wrong. Jupyter notebook also supports interrupts using SIGINT signal from the notebook server to the kernel process. Let's support it.
There are some issue to think:
initialization of multiarray raised unreported exception
Interactive terminal support for tutorial/workshops.
TensorFlow kernels are not working on high-end servers due to process/thread limits in our jail.
This is probably due to mis-reporting of sysconf(_SC_NPROCESSORS_ONLN)
library call, which reports the full CPU count instead of Docker-allocated cpuset.
Keras is a wrapper around existing DL libraries.
Let's add support for it as two separate kernels images: tf + keras / theano + keras.
tf + keras image will be an upgrade of the current python3-tensorflow images.
sqlite-based data manipluation course
(demands exist at research/consulting firms, even without programming skills)
/home/joongi/venv-ipython/lib/python3.5/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
Remove or suppress above warning messages when a fresh kernel first uses matplotlib.
Maybe we could run font cache building process during docker builds.
Add C language support
For PL classes.
Add Java language support
Some C/C++ libraries used by kernel's 3rd-party packages implicitly spawn as many threads as the number of available CPU cores, and this exceeds the default child process/thread limit (32) in servers with a high number of cores. It causes crashes or indefinite hangs of kernels. 😞
Write a set of parametrized test suites that uses language/version-specific example codes to test basic zero-mq REPL functionality of new/updates images.
Some users have tried input()
in Python kernels during code-golf sessions at conferences. In such case they saw "unexpected" timeouts because most request-reply based kernels cannot handle user inputs.
Until we have a nice user-input handling in the front-ends, we need to explicitly disable them and show error messages to the user.
On AWS p2.xlarge instance, TF kernels give the following warnings:
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
My initial tests show that it's viable to migrate to Alpine Linux for our kernel images.
__fprintf_chk
, __vfprintf_chk
)The core ideas to reduce image size are:
apk
, and a few utililties such as scanelf
. libc is replaced with musl transparently.apk
) in Alpine Linux provides a concept of "virtual" package installation space, so we can easily purge a set of packages. Also, most Alpine Linux pacakges are made to be independent with minimal cross-dependencies.Challenges remainig:
apt-get
has a cli argument --no-install-recommends
that skips installation of recommended packages but the main dependencies. This could reduce the kernel image sizes a lot.
Let's test this.
NOTE: Since we have basic unit tests for docker images, it would be sufficient to check if the tests are passed after building images after applying --no-install-recommends
.
Docker now supports multi-staged builds in the stable releases. Let's use it.
Let's support execution of shell scripts as well in the query mode.
kill()
system call to a specific pid./home/sorna
)Proxy issue for lablup/sorna-agent#30.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.