Code Monkey home page Code Monkey logo

modeldb's People

Contributors

anandjakhaniya avatar andreykuznetsovqs avatar captemulation avatar cheahuychou avatar conradoverta avatar convoliution avatar coryverta avatar daniel-verta avatar dependabot[bot] avatar eheinlein-verta avatar evgeniilezhnin avatar ewagner-verta avatar hariharsubramanyam avatar hmacdonald-verta avatar jkwatson-verta avatar justinanderson avatar kirillgulyaev avatar liam-verta avatar liuverta avatar marcelo-verta-ai avatar mcslao avatar mpvartak avatar mvartakatverta avatar nhatsmrt avatar rogertangos avatar sanjay-ganeshan avatar snyk-bot avatar sviswana avatar verta-sre-bot avatar weienlee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

modeldb's Issues

Django Integration - Some Random Questions

Hello guys,

First of all some greetings, this work looks absolutely great. I have a few questions for you.

  1. How mmany models do we you think it is worth to manage to start feeling benefits using such a system?
  2. I use Django and custom version of this package: https://github.com/fridiculous/django-estimators, what pros and cons would you see to make the switch (or not doing it) ?
  3. Do you actually believe it is more efficient to save each model in a DB rather than some cloud storage such as AWS S3 and save the link to the model in the app database ?
  4. Do you have any benchmark on how long it takes to load a model from ModelDB and save a model to ModelDB ?
  5. Do you have views on making on pre-compiled container available on Docker Hub for instance? It would be so wonderful to quickly spin-up the last release and test it live.

Thanks a lot for your answers, I think some of the answers to this questions should appear somewhere in the Readme.md. It is absolutely a must know for someone who is interested in going live with this project.

Once again, thank you very much for your contribution to the ML community.

Best regards from France,

Jonathan D.

Error in ModelDbSyncer line number 268

Backend TkAgg is interactive backend. Turning interactive mode on. Traceback (most recent call last): File "C:\Program Files\JetBrains\PyCharm 2017.2.1\helpers\pydev\pydevd.py", line 1596, in <module> globals = debugger.run(setup['file'], None, None, is_module) File "C:\Program Files\JetBrains\PyCharm 2017.2.1\helpers\pydev\pydevd.py", line 1023, in run pydev_imports.execfile(file, globals, locals) # execute the script File "C:\Program Files\JetBrains\PyCharm 2017.2.1\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "C:/Users/vibhatia/PycharmProjects/modeldbexample/demo/linearmodel.py", line 34, in <module> df, target, test_size=0.3) File "C:\Users\vibhatia\AppData\Local\Programs\Python\Python35\lib\site-packages\modeldb\sklearn_native\ModelDbSyncer.py", line 258, in train_test_split_fn result = split_dfs[:len(split_dfs) / 2] TypeError: slice indices must be integers or None or have an __index__ method

After changing the line to result = split_dfs[:int(len(split_dfs) / 2)] It works I am using python 3.5 . Lemme know if this works and I can make the change and do a pull request

[Feature] R client for ModelDB

Implement ModelDB client library in R. Includes:

  • Writing a thrift client in R

  • Implementing ModelDB light logging functionality

  • Implementing operator level logging in R

Running docker-compose on ubuntu gives error

I am using an Ubuntu VM on Azure to setup modelDB . I am using Docker to spin up the instance. When I try to run docker-compose up I get the following error

The following packages have unmet dependencies: mongodb-org-shell : Depends: libssl1.0.0 (>= 1.0.1) but it is not installable E: Unable to correct problems, you have held broken packages. ERROR: Service 'backend' failed to build: The command '/bin/sh -c apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 0C49F3730359A14518585931BC711F9BA15703C6 && echo "deb [ arch=amd64 ] http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-3.4.list && apt-get update && apt-get install -y maven sqlite g++ make automake bison flex pkg-config libevent-dev libssl-dev libtool mongodb-org-shell && apt-get clean && update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' returned a non-zero code: 100

Can someone provide me some pointer what I am doing wrong.

This bug is important as we at Adobe are trying to analyze the feature set of Model DB to see if we can use it for our use case

[Feature] Add support for model monitoring

Once a model has been trained/registered with ModelDB, expose functionality to add metadata to the model. This can be periodic metrics, perhaps failures etc.

Potential APIs:

appendMetadata(int modelId, KV keyvalue)

Fix Python Pipeline Tests

Two of the python tests have been failing for some time now. Could someone please take a look at these and either fix the test or what they're testing? I'm not really familiar enough with it to do it myself.

testPipelineEvent.test_overall_pipeline_fit_event and testPipelineEvent.test_pipeline_first_fit_stage

======================================================================
FAIL: test_overall_pipeline_fit_event (testPipelineEvent.TestPipelineEvent)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testPipelineEvent.py", line 105, in test_overall_pipeline_fit_event
    utils.is_equal_transformer_spec(spec, expected_spec, self)
  File "/Users/arcarter/code/modeldb/client/python/modeldb/tests/utils.py", line 153, in is_equal_transformer_spec
    tester.assertEqual(len(spec1.hyperparameters), len(spec2.hyperparameters))
AssertionError: 14 != 10

======================================================================
FAIL: test_pipeline_first_fit_stage (testPipelineEvent.TestPipelineEvent)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testPipelineEvent.py", line 149, in test_pipeline_first_fit_stage
    utils.is_equal_transformer_spec(spec, expected_spec, self)
  File "/Users/arcarter/code/modeldb/client/python/modeldb/tests/utils.py", line 153, in is_equal_transformer_spec
    tester.assertEqual(len(spec1.hyperparameters), len(spec2.hyperparameters))
AssertionError: 7 != 3

ServerLogicException

When I am trying to run the SimpleSampleWithModelDB.py, Im getting this error on both server and client side.
image
May i know what's the real problem that I am facing?

Between how do i clear the log or metadata file? I was able to launch the frontend with the title Simple Sample but not able to load the metadata with an error.
image

image

Spark scala client is not released in a public repo

The scala client is not released in public repo. In this scenario every user who wants to write a client will have to build it locally in there computer. The scala client must be released in a public repo.

Model DB UI does not show models if Evaluate is not called

The UI does not show models on which the evaluate is not called . In our case the evaluate is not called in most of the cases and there is no way of visualizing those models if Evaluate is not called.

The culprit seems to be the following lines in the UI

   for (var i=0; i<models.length; i++) {
          var model_metrics = models[i].metrics;
          var metrics = [];
          models[i].show = false;
          for (key in model_metrics) {
            if (model_metrics.hasOwnProperty(key)) {
              var val = Object.keys(model_metrics[key]).map(function(k){return model_metrics[key][k]})[0];
              val = Math.round(parseFloat(val) * 1000) / 1000;
              metrics.push({
                "key": key,
                "val": val
              });
              models[i].show = true;
            }
          }

          models[i].metrics = metrics;
        }
        models = models.filter(function(model) {
          return model.show;
        });

If I comment the lines

        models = models.filter(function(model) {
          return model.show;
        });

the models are shown but it gives an extra model called pipeline model which must not be shown if the fit sync is called on a pipeline Any reason why this is happening.

image

Using only the frontend?

Does modeldb embrace a use case of utilizing only the frontend? The scikit-learn and Apache Spark version requirements for the client do not fit with my environment. I see the configurations can be yaml files, but I am wondering if this is a logical use case–to manually create the yaml files and use the front end to view model results.

Multiple Label columns present in thrift file, Single Label Column in SQL

In the .thrift file, in FitEvent, the type of "LabelColumns" is list.
In CreateDb.sql, you store "LabelColumn" (no s in SQL) as a single string.
Are there supposed to be multiple label columns, separated by commas, in the SQL table? If so, we should change this to "LabelColumns" and add the appropriate comment

Fix project duplication when syncing from YAML

If two YAML files are sync'd containing both containing:

PROJECT:
  NAME: my_name
  DESCRIPTION: my_description

but containing different MODEL information, the frontend does not correctly aggregate this under one project, and instead creates two distinct project IDs with the same name & desc.

[Feature] Add user accounts and authentication to ModelDB

ModelDB currently does not have a concept of users or accounts. All data is visible to all users. However, with this model, users cannot do things like bookmark models or annotate them (for themselves as opposed to across all users).

Adding user accounts and authentication will enable these functions along with access control.

[FEATURE] Demo site

There should be a live, public example of ModelDB for people to play with. Visitors should get to browse both a Jupyter notebook with example models to run and ModelDB's Node.js frontend to see reports about those models.

Architecture:

  • Single ModelDB server
    • Mongo server
    • ModelDB Java backend
    • ModelDB Node.js frontend
  • Multiple ephemeral Jupyter notebooks with ModelDB python library

This will also require the creation of one or more .ipynb files that can be run by users to create reports in ModelDB.

[Feature] Continuous Integration Framework for ModelDB

Set up a continuous integration system for ModelDB including automated building and testing.

Requirements:

  • ModelDB spans multiple languages and frameworks: scala, python, java
  • CI must be able to interface with all of these languages
  • Must be reasonably easy to use

[Minor] slf4j dependency for pom.xml

On running ./start_server.sh

[INFO] ------------------------------------------------------------------------
[INFO] Building modeldb 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- exec-maven-plugin:1.5.0:java (default-cli) @ modeldb ---
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Starting the simple server on port 6543...

Requires a dependency on simple logging facade for Java (https://www.slf4j.org). Most recent version is 1.7.24.

[FEATURE] PyPi Package for Modeldb

There should be a pip installable package for modeldb's python client. This will allow people to more easily deploy modeldb.

A PyPi package may require moving syncer.json, or duplicating its hardcoded values into ConfigUtils.py

This is currently being worked on at the PyPI test site

[BUG] Root Url error

http://localhost:3000/

First noticed at 152a63a
Bug not present at dea7604

TypeError: api.testConnection is not a function
    at /Users/arcarter/code/modeldb/frontend/routes/index.js:7:7
    at Layer.handle [as handle_request] (/Users/arcarter/code/modeldb/frontend/node_modules/express/lib/router/layer.js:95:5)
    at next (/Users/arcarter/code/modeldb/frontend/node_modules/express/lib/router/route.js:131:13)
    at Route.dispatch (/Users/arcarter/code/modeldb/frontend/node_modules/express/lib/router/route.js:112:3)
    at Layer.handle [as handle_request] (/Users/arcarter/code/modeldb/frontend/node_modules/express/lib/router/layer.js:95:5)
    at /Users/arcarter/code/modeldb/frontend/node_modules/express/lib/router/index.js:277:22
    at Function.process_params (/Users/arcarter/code/modeldb/frontend/node_modules/express/lib/router/index.js:330:12)
    at next (/Users/arcarter/code/modeldb/frontend/node_modules/express/lib/router/index.js:271:10)
    at Function.handle (/Users/arcarter/code/modeldb/frontend/node_modules/express/lib/router/index.js:176:3)
    at router (/Users/arcarter/code/modeldb/frontend/node_modules/express/lib/router/index.js:46:12)

Model DB Scala client gets timeout exception on fitsync for a long job

Steps to reproduce.

Create a spark job which runs for more than 10 minutes.
Cal fitSync on any estimator in Spark client.
17/10/11 08:13:33 ERROR ApplicationMaster: User class threw exception: com.twitter.finagle.ConnectionFailedException: Connection timed out at remote address: <DNS:port> from service: <DNS:port>. Remote Info: Upstream Address: Not Available, Upstream Client Id: Not Available, Downstream Address: <DNS:port>, Downstream Client Id: <DNS:port>, Trace Id: 0bb6d8aea19c2260.0bb6d8aea19c2260<:0bb6d8aea19c2260 com.twitter.finagle.ConnectionFailedException: Connection timed out at remote address: <DNS:port> from service:<DNS:port>. Remote Info: Upstream Address: Not Available, Upstream Client Id: Not Available, Downstream Address: <DNS:port>, Downstream Client Id: <DNS:port>, Trace Id: 0bb6d8aea19c2260.0bb6d8aea19c2260<:0bb6d8aea19c2260 Caused by: java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at com.twitter.finagle.util.ProxyThreadFactory$$anonfun$newProxiedRunnable$1$$anon$1.run(ProxyThreadFactory.scala:19) at java.lang.Thread.run(Thread.java:748)

for a job less than 7 minutes the results are fine

host=localhost not possible with docker mac

Not sure if anyone else has experienced this same issue.

I am using the docker-compose modeldb installation on a mac, and unable to connect the docker-machine to localhost. Here's another forum discussing the issue: https://forums.docker.com/t/using-localhost-for-to-access-running-container/3148

As a workaround, I used this script for the [path to modeldb directory] that find/replaces all the instances of localhost in the routing files.

sed -i -e 's/localhost/<docker-machine address>/g' server/src/main/resources/reference.conf
sed -i -e 's/localhost/<docker-machine address>/g' server/src/main/resources/reference-docker.conf
sed -i -e 's/localhost/<docker-machine address>/g' server/src/main/resources/reference-test.conf
sed -i -e 's/localhost/<docker-machine address>/g' client/syncer.json
sed -i -e 's/localhost/<docker-machine address>/g' frontend/util/check_thrift.js
sed -i -e 's/localhost/<docker-machine address>/g' frontend/util/thrift.js
sed -i -e 's/localhost/<docker-machine address>/g' client/python/modeldb/basic/Structs.py

For , do: $ docker-machine ls

NAME      ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER        ERRORS
default   -        virtualbox   Running   tcp://192.168.99.100:2376           v17.05.0-ce   

After that I was able to add new projects to the docker/modeldb.

I can branch and submit a pull request. Or, is there a better way? Please advise.

Creating annotations give 500 Error

When I try to create some annotations on a model it gives us 500 error

Transformer is not defined
ReferenceError: Transformer is not defined
    at Object.storeAnnotation (E:\machinelearning\git\modeldb\frontend\util\api.js:109:27)
    at E:\machinelearning\git\modeldb\frontend\routes\models.js:30:7
    at Layer.handle [as handle_request] (E:\machinelearning\git\modeldb\frontend\node_modules\express\lib\router\layer.js:95:5)
    at next (E:\machinelearning\git\modeldb\frontend\node_modules\express\lib\router\route.js:131:13)
    at Route.dispatch (E:\machinelearning\git\modeldb\frontend\node_modules\express\lib\router\route.js:112:3)
    at Layer.handle [as handle_request] (E:\machinelearning\git\modeldb\frontend\node_modules\express\lib\router\layer.js:95:5)
    at E:\machinelearning\git\modeldb\frontend\node_modules\express\lib\router\index.js:277:22
    at param (E:\machinelearning\git\modeldb\frontend\node_modules\express\lib\router\index.js:349:14)
    at param (E:\machinelearning\git\modeldb\frontend\node_modules\express\lib\router\index.js:365:14)
    at Function.process_params (E:\machinelearning\git\modeldb\frontend\node_modules\express\lib\router\index.js:410:3)

image
#268

[Feature] Ability to use with custom data from shell

Imagine that I have a long custom pipeline of experiment managed with a bash script.
Is there a way to pass custom output/artifacts from such pipeline to ModelDB?

I believe some kind of RestAPI would be nice.
It will be ok to use other languages (generate artifacts somehow, run python script to put this to db)

[Feature] Drop SQLite in favor of MongoDB

suggestion: remove SQLite footprint from ModelDB and use MongoDB as sole database.

use case: to have ModelDB running in production, we have it running in a Docker container with the Mongo database mounted as a volume from the host OS outside container. this allows us to quickly respawn in case the container goes down, and enables data recovery. with regards the SQL db, it's not only redundant given the introduction of the Mongo db, but its path is also hardcoded, so mounting it separately is more of a hack.

change the mongo metadata parameters to take in URL

Today the mongo metadata store takes in host and port. the issue with this approach is that we can not define authentication information as generally the URL contains the authentication information. The proposal is to change the host and port to URL which can entail the host and port approach if you set the URL to be : and also more complex use case where we need to provide more information to mongo client

Counting number of rows in a DataFrame is slow

Currently, I've commented out a line that counts the number of rows in a DataFrame.

Counting the number of rows in a DataFrame is a slow operation. However, we use the numRows field in our DataFrame thrift struct. If we insist on keeping the numRows field, then we need to accept the cost of counting the number of DataFrame rows (this requires a full sequential scan of the dataset).

I wonder if it's inappropriate for ModelDB, which is supposed to be a low-overhead tool, to perform a sequential scan of a DataFrame. If we do this, then the overhead of ModelDB will skyrocket and will scale linearly with the size of the user's dataset.

I'm thinking of making the "should we count the number of rows?" configurable. If users are willing to accept the cost of ModelDB counting the number of rows in the DataFrame, then they can indicate so. If they need the performance, they can indicate so. Also, to avoid recounting the number of rows in the DataFrame, we can do some caching of the counts.

Thoughts?

[BUG] Mongo does not shut down

mongo --eval "db.getSiblingDB('admin').shutdownServer()" doesn't actually shut down the server correctly. I've had a better experience just doing

ps -a | grep mongo
kill [PID]

This could obviously use some more elaboration, and I could look at it more. But it's worth noting for now that the instructions aren't doing what they're supposed to.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.