tensorchord / envd Goto Github PK
View Code? Open in Web Editor NEW๐๏ธ Reproducible development environment
Home Page: https://envd.tensorchord.ai/
License: Apache License 2.0
๐๏ธ Reproducible development environment
Home Page: https://envd.tensorchord.ai/
License: Apache License 2.0
For example, users need to make sure that the code change is committed and pushed before the environment is destroyed. Thus we need to support rules to define lifecycle hooks.
Support jupyter to develop.
Ref #1
Users want to know all the dependencies in the environment. We can provide a command like MIDI dependency list
to list them.
bkClient, err := client.New(clicontext.Context, "unix:///run/buildkit/buildkitd.sock")
We should avoid it.
We can create the buildkitd container to boostrap.
Code may be one of the first-class objects in the build language. The code will exist in the resulting container and users develop models and update them. After the code is changed, you can commit and push to remote.
CODE:
GIT SSH /root/.ssh/github-key
GIT LOCAL $HOME/private-repo
GIT CLONE [email protected]/tensorchord/private-repo.git
# TODO: Support branch and commit
Just like docker build ./subdir
, we need to support users to use other dirs instead of the current working dir to build MIDI.
&cli.PathFlag{
Name: "path",
Usage: "Path to the directory containing the build.MIDI (Default is current directory)",
Value: ".",
},
Ref #43 (comment)
[+] Building 103.2s (11/11) FINISHED
=> docker-image://docker.io/nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 2.8s
=> => resolve docker.io/nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 2.6s
=> local://context 3.1s
=> => transferring context: 50.52kB 0.1s
=> CACHED sh -c apt-get update && apt-get install -y --no-install-recommends python3 python3-pip 0.0s
=> CACHED apt install gcc 0.0s
=> CACHED pip install -i https://mirror.sjtu.edu.cn/pypi/web/simple jupyter ormb 0.0s
=> CACHED mkdir /var/midi/remote 0.0s
=> CACHED mkdir /var/midi/bin 0.0s
=> CACHED copy /examples/ssh_keypairs/public.pub /var/midi/remote/authorized_keys 0.0s
=> CACHED copy /bin/midi-ssh /var/midi/bin/midi-ssh 0.0s
=> CACHED merge (apt install gcc, pip install -i https://mirror.sjtu.edu.cn/pypi/web/simple jupyter ormb, copy /bin/midi-ssh /var/midi/bin/midi-ssh) 0.0s
=> exporting to oci image format 96.6s
=> => exporting layers 0.0s
=> => exporting manifest sha256:32bbf82b70e17ca70b11b88310ef02450db2bed3b94765415650a2227baa63cf 3.1s
=> => exporting config sha256:9bcaf4d291970033f2a6316dbf11912e77de402c81f0b11896f16c8bab19360b 1.2s
=> => sending tarball 91.6s
The image is built in buildkit, and it does not exist in the docker host. Thus we need to pipe the buildkit build image into the docker host. It takes about 100s for a 20G base image docker load. It is too slow. We need to optimize it.
Support bash/zsh auto complete
Idle environments should be culled.
Users want to define the entrypoints e.g. MIDI serve
to run some user-defined logics.
Support ssh with rules to attach into the environment
Ref #1
base(os="ubuntu20.04", language="python3")
print(midi.os)
We need to provide some built-in variables to support conditional build like this
if midi.os == "ubuntu":
ubuntu_apt_source("xxx")
priority/low
up command runs the container and ssh into it.
We should add a real example, maybe a MIDI file for ResNet model training, to illustrate what MIDI does.
priority/low
Support vscode remote-ssh to develop
Ref #1
scp is not supported for the container. We should support it.
We need to brainstorm the name of the project. MIDI is not friendly for SEO.
Investigate if we should or can support distributed training. It is in low priority since most users do not need distributed DL training.
We need to init a docker client to load the image into the docker host. But we may got the error here
ERRO[2022-04-21T22:08:35+08:00] failed to load docker image: Error response from daemon: client version 1.42 is too new. Maximum supported API version is 1.41
We should suggest setting the envvar DOCKER_API_VERSION to avoid the issue when encountering this.
export DOCKER_API_VERSION=1.41
Support MIDIfile as the input, instead of the hard coding file name in the code.
We can prototype the desired README although there is no runnable code, to help us understand the benefit of the project and the features that we need.
We need set up the test process and add more unit/integration test cases. There are many interfaces in the code, thus it should be easy to mock them and test the main logic.
Discuss whether we can expose the lower level primitive of buildkit into starlark
For example, the torch installation logic is complex. https://pytorch.org/get-started/locally/
We can write the logic as below, which is quite straightforward
# expose p as the builtin command
def pip_package(name):
p.cache("/root/.pip/cache")
p.exec("pip install {}".format(name))
def install_torch():
if global.cuda_version == '10.2':
pip_package("torch torchvision torchaudio")
elif global.cuda_version == '11.3':
pip_package("torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113")
elif global.cuda_verion is None:
pip_package("torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu")
Users may install some new dependencies in the environment and they are not tacked by the declarative manifest file. We need to support a command like MIDI sync
to sync the change to the manifest file.
I think the sshd server may be not running when the client is trying to connect. We can add some retry to avoid it.
midi up
error: dial to [172.17.0.2:2222](http://172.17.0.2:2222/) failed dial tcp [172.17.0.2:2222](http://172.17.0.2:2222/): connect: connection refused
Now we have a simple way to extract the publisher/version/extension from the frontend func calls. But it does not always work
For example dbaeumer.vscode-eslint-1.1.1
. Current implementation gets the version eslint-1.1.1
We need to find the right first -
, instead of the left first -
for indexExtension
func ParsePlugin(p string) (Plugin, error) {
indexPublisher := strings.Index(p, ".")
publisher := p[:indexPublisher]
indexExtension := strings.Index(p[indexPublisher:], "-") + indexPublisher
extension := p[indexPublisher+1 : indexExtension]
version := p[indexExtension+1:]
logrus.WithFields(logrus.Fields{
"publisher": publisher,
"extension": extension,
"version": version,
}).Debug("vscode plugin is parsed")
return Plugin{
Publisher: publisher,
Extension: extension,
Version: version,
}, nil
}
Not sure whether we should choose /root/ as the default directory. What's the common practise for the user when using docker? Does people use root or create other users?
Now we use a simple struct to keep the packages and dependencies. It works in most cases. But if users need to run some custom commands, the expression can be hard to support in the current design.
Thus we may need to re-design this in the future.
Currently the SSH is hard-coded at examples/ssh_keypairs. We may need to make it flexible to configure.
Now the resulting image is not kept in the local docker daemon. We should keep it there.
We need to provide built-in support for dependencies.
vscode remote, jupyter and terminal are the most used tools for data scientists to develop models. Thus we need to support them as first-class object.
run(command="ls -l")
Users may need to run some specific commands. We should support that.
Users need to get the data to run the training jobs. Thus the build language needs to support it
Now the progress is ugly, we should show them in a fancy way.
We do not check if the client can connect to the buildkitd in bootstrap command.
priority/medium .
IMO, the crucial use of pkg/error is it implements the method Unwrap, and thus we can use errors.Is to see if two errors are the same.
But pkg/error is in the maintain mode, should we use this one instead?
CUDA and other runtime dependencies are optional. Runtime override also helps us in this scenario like #1
To mount the folder into the container, several options available:
docker run -v ...
We map the host dir to the container at runtimesshfs
Use volume plugin to mount the fs through ssh(sftp). Worse performance comparing to -v
solution, but works when the container runs on the different machinesNow we enable CGO in target build-local
, but it is not friendly to develier. We can write a new target build-and-deliver to disable CGO
/chore
We need a new plugin in vscode to lint/auto-complete the MIDI lang.
We need to support mirrors in the frontend lang.
apt_source("""
deb https://mirror.sjtu.edu.cn/ubuntu focal main restricted
deb https://mirror.sjtu.edu.cn/ubuntu focal-updates main restricted
deb https://mirror.sjtu.edu.cn/ubuntu focal universe
deb https://mirror.sjtu.edu.cn/ubuntu focal-updates universe
deb https://mirror.sjtu.edu.cn/ubuntu focal multiverse
deb https://mirror.sjtu.edu.cn/ubuntu focal-updates multiverse
deb https://mirror.sjtu.edu.cn/ubuntu focal-backports main restricted universe multiverse
deb http://archive.canonical.com/ubuntu focal partner
deb https://mirror.sjtu.edu.cn/ubuntu focal-security main restricted universe multiverse
""")
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.