Code Monkey home page Code Monkey logo

mindsdb_server's People

Contributors

george3d6 avatar stpmax avatar surendra1472 avatar torrmal avatar wodend avatar zoranpandovski avatar zwerg44 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

guolsnetgap

mindsdb_server's Issues

Component Separation

Separate mindsdb server into two components:

  • Datasource Manager
  • HTTP Interface Interface

Make sure to allow for the adding of new interfaces and components.

Setup environment for testing databases integration

Everyone (in the backend team) should setup environment that include:

  • Installation of clickhouse
  • dev (i.e. with headers) installation of mysql or mariadb
  • dev (i.e. with headers) installation of postgres

This is in order to test and work on integration mindsdb (server) with various databases, it's better to have these setup now rather than later since you may run into installation issues (in which case feel free to ask in the backend channel and/or google stuff)

Make sure that all of this is working by running the Unit tests for the Mysql, Potgres and Clickhouse datasource in mindsdb and confirming they pass. (To run those you'll need the extra datasource dependencies listed here: https://github.com/mindsdb/mindsdb/blob/master/optional_requirements_extra_data_sources.txt)

Please confirm here once you've got everything set up.

save predictor metadata on start and after each phase

this is a mindsdb module issue, but refelects on server

  • Save metadata as soon as model is created, even if its just the name,
    make sure to add the attributes of datasource name and the fields its being asked to predict as well as initial status: started. And map this to the returned data on the server.

  • save after each phase, save the predictor metadata and store the phase at which its at. map the phase name to the status attribute on the server response.

used_row_count is not correct

Used row count doesnt add up. See response

"status": "complete",
    "current_phase": "Trained",
    "name": "nr_rooms",
    "version": "1.2.1",
    "data_preparation": {
        "accepted_margin_of_error": 0.0,
        "total_row_count": 5037,
        "used_row_count": 630,
        "test_row_count": 505,
        "train_row_count": 4029,
        "validation_row_count": 503
    },

Use `spawn` for every mindsdb task that require torch

Torch only likes running on spawned processes.

Ideally most processes should be forked instead but we can spawn mindsdb just fine, make sure that's done for every train/test/predict call and the results are properly returned.

Add locking mechanism

Not a priority until we roll out the new minsdb server AND have the test suite in place

We'll need a locking mechanism when accessing the datasource through the DataStore and mindsdb through the upcoming mindsdb interface.

This will need to support locking between processes and ideally it can also be implemented in mindsdb native such that users can use all the APIs in paralle.

At the moment the way mindsdb_server operates with datasources is not, in theory, "concurrency safe" but (especially provided it's run on a linux machine) this should never be an issue in practice.

I don't expect this to b here either BUT at some point we might encounter it and it's better to prepare for these things in advance.

A good candidate for this is Ilock:

https://github.com/symonsoft/ilock

Which seems to be a very well implemented OS-neutral file-system based lock. I've read through the code and tbh I'm not sure the implementation would work out perfectly on all OS-es (i.e. I can see weird edge cases) but it seems to be more than good enough.

Add integration manager / cron-runner component

Add a component that handles our various integrations (e.g. clickhouse) by running periodic and one-off tasks.

This should include:

  • Updating the integration with references to all new datasource created
  • Updating the integration with references to all new predictors created
  • Any special handling logic for that specific datasource (though, maybe this should be part of the datasource component instead?)
  • Running one-off tasks such as creating configuration files for the integration or running an "installer" that helps the user setup the integration

OS specific distributions

Create OS specific distributions (e.g. rpm package. snap package, dbm package, whatever are the popular formats on windows/OSX).

This should include a bit of research into (see above) the popular formats we should be supporting for each platform and how easy a distribution for them should be to create.

I think that all of the dev should leave their feedback here in terms of:

a) What packages they can easily create because they've done so before
b) What are the common formats in their particular niche of technology

Also this might turn into a community issue once we make progress since we could try and get those packages up onto the official repositories.

Personally I feel comfortable doing the PKGBUILD (arch) and can probably handle the snap, dbm and rpm. Ideally someone else would be in charge of designing the OSx and Windows specific distributions.

For an existing implementation of something like this see the way we build and distribute the docker image.

Once this is done we should also update the docs.

Also, make sure we don't bite more than we can chew, these can be hard to maintain so we should:

a) Only have them for mindsdb_server, which can then be used to install mindsdb, lightwood and dataskilelt
b) Only document the ones that we are sure work and keep the "install manually" instructions in our documentation for those that don't

download of predictor not working

On MAC, this is the error:


Exported model to nr_rooms.zip 127.0.0.1 - - [04/Jun/2019 11:04:26] "�[1m�[35mGET /predictors/nr_rooms/download HTTP/1.1�[0m" 500 -

Exception on /predictors/nr_rooms/download [GET] Traceback (most recent call last): File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/shutil.py", line 557, in move os.rename(src, real_dst) FileNotFoundError: [Errno 2] No such file or directory: 'nr_rooms.zip' -> 'tmp/nr_rooms.zip' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask/app.py", line 1832, in full_dispatch_request rv = self.dispatch_request() File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask/app.py", line 1818, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask_restplus/api.py", line 325, in wrapper resp = resource(*args, **kwargs) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask/views.py", line 88, in view return self.dispatch_request(*args, **kwargs) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask_restplus/resource.py", line 44, in dispatch_request resp = meth(*args, **kwargs) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/mindsdb_server/namespaces/predictor.py", line 229, in get shutil.move(fname, fpath) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/shutil.py", line 571, in move copy_function(src, real_dst) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/shutil.py", line 257, in copy2 copyfile(src, dst, follow_symlinks=follow_symlinks) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/shutil.py", line 121, in copyfile with open(dst, 'wb') as fdst: FileNotFoundError: [Errno 2] No such file or directory: 'tmp/nr_rooms.zip'

it seems like the file is not being generates as tmp/nr_rooms.zip

Allow for training of predictors from clickhous

This should be done via inserting into the table:

mindsdb.predictors
                (name String,
                predict_cols String,
                select_data_query String,
                training_options String
                )```

Which should already by created whenever the clickhouse interface is constructed.

Add contributors agreement and try to apply it to post license-switch contributions

We should go through all contributions since we switch from an MIT License to a GPL-3.0 License and either:

a) Have all contributors agree to and sign something like the ASF Contributor License Agreement or alternatively remove their contributions.

b) In the future we should have some easy way of allowing anyone that contributes code to sign and agreement similar to the way the Apache foundation does it.

This is for {insert legal reasons I would make a mess of explaining}, feel free to ask or send us an email or ask a question here in case you don't agree with this policy or think it's in some way disadvantageous to Mindsdb and/or it's open source contributors.

Research into installation

Do some research regrading the new installation procedure.

There may be easy gains that we are "missing" which could improve the amount of environment we can currently install on.

Potential leads leads:

  • automatic setup of virtual environment other than virtualevn
  • cython compilation
  • fallback dependencies (e.g. legacy torch versions with better windows support)

cannot upload data sources

I get this error:

Exception on /datasources/yep [PUT]
Traceback (most recent call last):
  File "/Users/jorgetorres/Library/Python/3.7/lib/python/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/jorgetorres/Library/Python/3.7/lib/python/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.7/site-packages/flask_restplus/api.py", line 325, in wrapper
    resp = resource(*args, **kwargs)
  File "/Users/jorgetorres/Library/Python/3.7/lib/python/site-packages/flask/views.py", line 88, in view
    return self.dispatch_request(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/flask_restplus/resource.py", line 44, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/flask_restplus/marshalling.py", line 136, in wrapper
    resp = f(*args, **kwargs)
  File "/Users/jorgetorres/src/mindsdb/mindsdb_server/mindsdb_server/namespaces/datasource.py", line 101, in put
    names = [x['name'] for x in get_datasources()]
  File "/Users/jorgetorres/src/mindsdb/mindsdb_server/mindsdb_server/namespaces/datasource.py", line 43, in get_datasources
    with open(os.path.join(ROOT_STORAGE_DIR, ds_name, 'metadata.json'), 'r') as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'storage/predictors/metadata.json'

Add environment configuration

The mindsdb server should run some environment configuration upon installation.

  • Create /var/mindsdb/ directory that contains config.json which can be used to configure lightwood, mindsdb, mindsdb_server... etc.
  • Add the daemon file to wherever the specific OS stores daemon file and do any OS-specific configuration to the daemon file
  • Add a default directory structure to /var/mindsdb/ for storing datasource, predictors and anything else that might need storing.

model metadata mapping issue - modelanalysis->targetcolumn->accuracy_histogram

It is important to understand that the accuracy_histogram in metadata

model_analysis[target_i][accuracy_histogram][x]
maps one to one to
data_analysis[target_columns_metadata'][target_i]['data_distribution']['data_histogram'][x]

Right now, I cant really tell what is it mapping to right now, but its wrong, see image,
image

but the idea is that you can inspect the distributions per bucket of the target variable.

X should be the number of rooms

model_analysis[target_i][accuracy_histogram][y] should be the accuracy of the model for that particular bucket

Clickhouse integration

Once #69 is done add the integration manager for clickhouse, wait until #69 is done until we add details.

We may also clickhouse integration as a template for #69.

Improve the installers

Imporve the current installers in order to handle more edge-cases and operating systems.

Also, potentially, add an option for the integration of scout into the installers and also a graphical wizard for OSes that easily allow us to do so.

Define and implement testing for the MVP

We need to define a few test cases for the clickhouse integration for the MVP that we want to be ready by Thursday.

E.g.:

  1. Start mindsdb_server as a python module
  2. Make sure we are connect to clickhouse via the MYSQL API
  3. Create a table with data x,y,z in clickhouse
  4. Create a clickhouse datasource from said table (via the HTTP interface)
  5. Train a new predictor through clickhouse (via the MYSQL API) using the data from (4)
  6. Make a prediction using data passed via a WHEN clause to the predictor trained during (5)

Ideally these should be a bit more in-depth (e.g. specifying the exact query by which the predictor should be trained and predicted from, specifying the exact data to insert into clickhouse)

@torrmal I will pass this one over to you since you probably have a better idea of what you want to see in the MVP.

ImportError: cannot import name 'cached_property' from 'werkzeug'

Installing the latest minds_server error throws Import error with the latest Flask-restplus version.
Flask-restplus have Werkzeug as a dependency that has some import changes in (1.0.0)latest version. noirbizarre/flask-restplus#777 (comment)

Time series encoders can't be loaded
Traceback (most recent call last):
  File "server.py", line 1, in <module>
    from mindsdb_server.namespaces.predictor import ns_conf as predictor_ns
  File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/mindsdb_server/__init__.py", line 6, in <module>
    from mindsdb_server.server import start_server
  File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/mindsdb_server/server.py", line 1, in <module>
    from mindsdb_server.namespaces.predictor import ns_conf as predictor_ns
  File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/mindsdb_server/namespaces/predictor.py", line 12, in <module>
    from flask_restplus import Resource, abort
  File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/flask_restplus/__init__.py", line 4, in <module>
    from . import fields, reqparse, apidoc, inputs, cors
  File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/flask_restplus/fields.py", line 17, in <module>
    from werkzeug import cached_property
ImportError: cannot import name 'cached_property' from 'werkzeug' (/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/werkzeug/__init__.py)

Install doesn't fail if a part of the install fails

One Richie's environment the installer failed (not space left on SSD) for lightwood/mindsdb, but mindsdb and the server still got installed.

If a sub-dependency can't be installed the installation process should crash instead.

add predictor accuracy to predictors endpoint

GET /predictors does not contain accuracy

[
    {
        "name": "nr_rooms",
        "version": "1.2.1",
        "is_active": false,
        "data_source": "home_rentals.csv",
        "predict": [
            "number_of_rooms"
        ],
        "status": "complete",
        "current_phase": "Trained",
        "train_end_at": "2019-05-31T10:56:01",
        "updated_at": "2019-05-31T10:57:02",
        "created_at": "2019-05-31T10:55:00"
    }
]

See PredictorStatus entity

data preparation no being popualted on metadata mapping

on the metdata no 'data_preparation' is being mapped,

example of desired output:

'data_preparation': {
'accepted_margin_of_error': 0.2,
'total_row_count': 18000,
'used_row_count': 10000,
'test_row_count': 1000,
'validation_row_count': 1000,
'train_row_count': 8000
},

Weird OSX install issues

We observed some weird install issues on Richie's machin whereby:

cd "$SERVER_PATH"

Fails, SERVER_PATH is set to '/blah/blah/Applications Support/blah/blahand the cd command goes to/blah/blah/Applications` instead.

cd ""$SERVER_PATH"" didn't help either, I'm at a bit of a loss for why this is happening, the behavior was not observed on other OSX machiens with the same pathing. Part of me things, based on the logs, that the whitespace in Applications Support get replaced with a tab GUI side for some reason.

Barring any fix in the install script and after 1 hour we found nothing, it might be easiest to fix this in the GUI by just using a place other than Applications Support to install stuff..,

Create tests for the clickhouse integration

Create a test-suite for the clickhous integration.

This should include:

  • Create a clickhouse data source (and check the data consistency)
  • Train a predictor with a clickhouse data source
  • Check it's been auto-updated into clickhouse
  • Check the table definition for the predictors
  • Get a prediction from clickhouse using a clickhouse datasource
  • Get a prediction using data inside clickhouse passed via WHEN

trained predictor not returning anything on model analysis and data analysis

on a trained predictor I only get:

{
    "status": "complete",
    "current_phase": "Trained",
    "name": "nrooms",
    "version": "1.2.5",
    "data_preparation": {
        "accepted_margin_of_error": 0.0,
        "total_row_count": 5037,
        "used_row_count": 5037,
        "test_row_count": 505,
        "train_row_count": 4029,
        "validation_row_count": 503
    },
    "accuracy": 0.247,
    "data_analysis": {
        "target_columns_metadata": [],
        "input_columns_metadata": []
    },
    "model_analysis": []
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.