mindsdb / mindsdb_server Goto Github PK

View Code? Open in Web Editor NEW

4.0 13.0 1.0 14.83 MB

MindsDB server allows you to consume and expose MindsDB workflows, through http.

License: GNU General Public License v3.0

Python 95.07% Shell 3.33% Batchfile 1.44% Dockerfile 0.16%

mindsdb-server python3 rest-api flask-restx mindsdb mindsdb-api python machine-learning ml xai

mindsdb_server's People

Contributors

Stargazers

Watchers

Forkers

guolsnetgap

mindsdb_server's Issues

Merge Mindsdb SQL (mindsdb mysql mime proxy)

Merge Mindsdb mysql mime proxy as an interface once #67 is completed.

upload large files timeout issue

see this link on how to fix this issue:

https://stackoverflow.com/questions/44727052/handling-large-file-uploads-with-flask

pointer on where this is on the server code:

https://github.com/mindsdb/mindsdb_server/blob/master/mindsdb_server/namespaces/datasource.py#L115

Component Separation

Separate mindsdb server into two components:

Datasource Manager
HTTP Interface Interface

Make sure to allow for the adding of new interfaces and components.

Setup environment for testing databases integration

Everyone (in the backend team) should setup environment that include:

Installation of clickhouse
dev (i.e. with headers) installation of mysql or mariadb
dev (i.e. with headers) installation of postgres

This is in order to test and work on integration mindsdb (server) with various databases, it's better to have these setup now rather than later since you may run into installation issues (in which case feel free to ask in the backend channel and/or google stuff)

Make sure that all of this is working by running the Unit tests for the Mysql, Potgres and Clickhouse datasource in mindsdb and confirming they pass. (To run those you'll need the extra datasource dependencies listed here: https://github.com/mindsdb/mindsdb/blob/master/optional_requirements_extra_data_sources.txt)

Please confirm here once you've got everything set up.

save predictor metadata on start and after each phase

this is a mindsdb module issue, but refelects on server

Save metadata as soon as model is created, even if its just the name,
make sure to add the attributes of datasource name and the fields its being asked to predict as well as initial status: started. And map this to the returned data on the server.
save after each phase, save the predictor metadata and store the phase at which its at. map the phase name to the status attribute on the server response.

used_row_count is not correct

Used row count doesnt add up. See response

"status": "complete",
    "current_phase": "Trained",
    "name": "nr_rooms",
    "version": "1.2.1",
    "data_preparation": {
        "accepted_margin_of_error": 0.0,
        "total_row_count": 5037,
        "used_row_count": 630,
        "test_row_count": 505,
        "train_row_count": 4029,
        "validation_row_count": 503
    },

Use `spawn` for every mindsdb task that require torch

Torch only likes running on spawned processes.

Ideally most processes should be forked instead but we can spawn mindsdb just fine, make sure that's done for every train/test/predict call and the results are properly returned.

Add locking mechanism

Not a priority until we roll out the new minsdb server AND have the test suite in place

We'll need a locking mechanism when accessing the datasource through the DataStore and mindsdb through the upcoming mindsdb interface.

This will need to support locking between processes and ideally it can also be implemented in mindsdb native such that users can use all the APIs in paralle.

At the moment the way mindsdb_server operates with datasources is not, in theory, "concurrency safe" but (especially provided it's run on a linux machine) this should never be an issue in practice.

I don't expect this to b here either BUT at some point we might encounter it and it's better to prepare for these things in advance.

A good candidate for this is Ilock:

https://github.com/symonsoft/ilock

Which seems to be a very well implemented OS-neutral file-system based lock. I've read through the code and tbh I'm not sure the implementation would work out perfectly on all OS-es (i.e. I can see weird edge cases) but it seems to be more than good enough.

predictors/x/columns broken

The endpoint predictors/X/columns is broken

Add integration manager / cron-runner component

Add a component that handles our various integrations (e.g. clickhouse) by running periodic and one-off tasks.

This should include:

Updating the integration with references to all new datasource created
Updating the integration with references to all new predictors created
Any special handling logic for that specific datasource (though, maybe this should be part of the datasource component instead?)
Running one-off tasks such as creating configuration files for the integration or running an "installer" that helps the user setup the integration

publish to pypi when ci passes

when merge to master:

when passes CI publish to pypi

Added mindsdb & lightwood as dependencies + add version requirements to other dependencies

Ideally, we shouldn't use the "latest" mindsdb and lightwood all the time, we should release the server less often and stick to more stable versions of both.

export/import existing predictors

new predictors stay permanently in training

when you train a new predictor the metadata stays in training for ever

Implement the `predict` endpoint that's able to take some post/url params and return a prediction and predict based on that

Lets implement it for now so that it can simply take a json documents, which will be passed, straight as contents of the when={}

OS specific distributions

Create OS specific distributions (e.g. rpm package. snap package, dbm package, whatever are the popular formats on windows/OSX).

This should include a bit of research into (see above) the popular formats we should be supporting for each platform and how easy a distribution for them should be to create.

I think that all of the dev should leave their feedback here in terms of:

a) What packages they can easily create because they've done so before
b) What are the common formats in their particular niche of technology

Also this might turn into a community issue once we make progress since we could try and get those packages up onto the official repositories.

Personally I feel comfortable doing the PKGBUILD (arch) and can probably handle the snap, dbm and rpm. Ideally someone else would be in charge of designing the OSx and Windows specific distributions.

For an existing implementation of something like this see the way we build and distribute the docker image.

Once this is done we should also update the docs.

Also, make sure we don't bite more than we can chew, these can be hard to maintain so we should:

a) Only have them for mindsdb_server, which can then be used to install mindsdb, lightwood and dataskilelt
b) Only document the ones that we are sure work and keep the "install manually" instructions in our documentation for those that don't

Modify data schema to Souct v2.0 data specs

Modify the data schema to allow transmission of the modified get_model_data results.

* Parse the datasources to determine file-type & headers

use mindsdb FileDS, try to load it and it will give you the columns as follows

from mindsdb.libs.data_sources.file_ds import FileDS

ds = FileDS(url/path)
columns = list(ds.columns.values)

Migrate to flask-restx

noirbizarre/flask-restplus#770

download of predictor not working

On MAC, this is the error:


Exported model to nr_rooms.zip 127.0.0.1 - - [04/Jun/2019 11:04:26] "�[1m�[35mGET /predictors/nr_rooms/download HTTP/1.1�[0m" 500 -

Exception on /predictors/nr_rooms/download [GET] Traceback (most recent call last): File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/shutil.py", line 557, in move os.rename(src, real_dst) FileNotFoundError: [Errno 2] No such file or directory: 'nr_rooms.zip' -> 'tmp/nr_rooms.zip' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask/app.py", line 1832, in full_dispatch_request rv = self.dispatch_request() File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask/app.py", line 1818, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask_restplus/api.py", line 325, in wrapper resp = resource(*args, **kwargs) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask/views.py", line 88, in view return self.dispatch_request(*args, **kwargs) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask_restplus/resource.py", line 44, in dispatch_request resp = meth(*args, **kwargs) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/mindsdb_server/namespaces/predictor.py", line 229, in get shutil.move(fname, fpath) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/shutil.py", line 571, in move copy_function(src, real_dst) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/shutil.py", line 257, in copy2 copyfile(src, dst, follow_symlinks=follow_symlinks) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/shutil.py", line 121, in copyfile with open(dst, 'wb') as fdst: FileNotFoundError: [Errno 2] No such file or directory: 'tmp/nr_rooms.zip'

it seems like the file is not being generates as tmp/nr_rooms.zip

Allow for training of predictors from clickhous

This should be done via inserting into the table:

mindsdb.predictors
                (name String,
                predict_cols String,
                select_data_query String,
                training_options String
                )```

Which should already by created whenever the clickhouse interface is constructed.

Add contributors agreement and try to apply it to post license-switch contributions

We should go through all contributions since we switch from an MIT License to a GPL-3.0 License and either:

a) Have all contributors agree to and sign something like the ASF Contributor License Agreement or alternatively remove their contributions.

b) In the future we should have some easy way of allowing anyone that contributes code to sign and agreement similar to the way the Apache foundation does it.

This is for {insert legal reasons I would make a mess of explaining}, feel free to ask or send us an email or ask a question here in case you don't agree with this policy or think it's in some way disadvantageous to Mindsdb and/or it's open source contributors.

Research into installation

Do some research regrading the new installation procedure.

There may be easy gains that we are "missing" which could improve the amount of environment we can currently install on.

Potential leads leads:

automatic setup of virtual environment other than virtualevn
cython compilation
fallback dependencies (e.g. legacy torch versions with better windows support)

cannot upload data sources

I get this error:

Exception on /datasources/yep [PUT]
Traceback (most recent call last):
  File "/Users/jorgetorres/Library/Python/3.7/lib/python/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/jorgetorres/Library/Python/3.7/lib/python/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.7/site-packages/flask_restplus/api.py", line 325, in wrapper
    resp = resource(*args, **kwargs)
  File "/Users/jorgetorres/Library/Python/3.7/lib/python/site-packages/flask/views.py", line 88, in view
    return self.dispatch_request(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/flask_restplus/resource.py", line 44, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/flask_restplus/marshalling.py", line 136, in wrapper
    resp = f(*args, **kwargs)
  File "/Users/jorgetorres/src/mindsdb/mindsdb_server/mindsdb_server/namespaces/datasource.py", line 101, in put
    names = [x['name'] for x in get_datasources()]
  File "/Users/jorgetorres/src/mindsdb/mindsdb_server/mindsdb_server/namespaces/datasource.py", line 43, in get_datasources
    with open(os.path.join(ROOT_STORAGE_DIR, ds_name, 'metadata.json'), 'r') as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'storage/predictors/metadata.json'

Add environment configuration

The mindsdb server should run some environment configuration upon installation.

Create /var/mindsdb/ directory that contains config.json which can be used to configure lightwood, mindsdb, mindsdb_server... etc.
Add the daemon file to wherever the specific OS stores daemon file and do any OS-specific configuration to the daemon file
Add a default directory structure to /var/mindsdb/ for storing datasource, predictors and anything else that might need storing.

import existing predictors with renaming

At this moment imported predictor will replace existing if their names match.
Need to add opportunity change name of imported predictor.

model metadata mapping issue - modelanalysis->targetcolumn->accuracy_histogram

It is important to understand that the accuracy_histogram in metadata

model_analysis[target_i][accuracy_histogram][x]
maps one to one to
data_analysis[target_columns_metadata'][target_i]['data_distribution']['data_histogram'][x]

Right now, I cant really tell what is it mapping to right now, but its wrong, see image,

but the idea is that you can inspect the distributions per bucket of the target variable.

X should be the number of rooms

model_analysis[target_i][accuracy_histogram][y] should be the accuracy of the model for that particular bucket

add CI and version icons to README.md

Clickhouse integration

Once #69 is done add the integration manager for clickhouse, wait until #69 is done until we add details.

We may also clickhouse integration as a template for #69.

Make sure predictors create via HTTP are visible in clickhouse

Improve the installers

Imporve the current installers in order to handle more edge-cases and operating systems.

Also, potentially, add an option for the integration of scout into the installers and also a graphical wizard for OSes that easily allow us to do so.

Support data type/subtype associated to data source

The user should be able to add a type and subtype to each column in a data source, then, when mindsdb is trained using that data source, the types/subtypes should be transmitted to mindsdb.

broken server after commit 48615293415bbd1f4d7ad207283090199bc07f9d

the server is not connecting on the latest version

last working version was:

4861529

Analyze dataset fails with no logs

The endpoint is called, returns 404 but no error is logged onto the console... look into why this is (noticed on Amie's machine).

predictors storage and name

store all predicror data in the server/storage/predictors folder

Define and implement testing for the MVP

We need to define a few test cases for the clickhouse integration for the MVP that we want to be ready by Thursday.

E.g.:

Start mindsdb_server as a python module
Make sure we are connect to clickhouse via the MYSQL API
Create a table with data x,y,z in clickhouse
Create a clickhouse datasource from said table (via the HTTP interface)
Train a new predictor through clickhouse (via the MYSQL API) using the data from (4)
Make a prediction using data passed via a WHEN clause to the predictor trained during (5)

Ideally these should be a bit more in-depth (e.g. specifying the exact query by which the predictor should be trained and predicted from, specifying the exact data to insert into clickhouse)

@torrmal I will pass this one over to you since you probably have a better idea of what you want to see in the MVP.

Create daemon files

Create deamon files (or equivalent) for as many OSes as this is easy to do on (e.g. takes less than a day of work).

These files might have to be configured dynamically based on the installation environment.

Take as an example the current linux daemon file: https://github.com/mindsdb/mindsdb_server/blob/master/mindsdb_server.service

ImportError: cannot import name 'cached_property' from 'werkzeug'

Installing the latest minds_server error throws Import error with the latest Flask-restplus version.
Flask-restplus have Werkzeug as a dependency that has some import changes in (1.0.0)latest version. noirbizarre/flask-restplus#777 (comment)

Time series encoders can't be loaded
Traceback (most recent call last):
  File "server.py", line 1, in <module>
    from mindsdb_server.namespaces.predictor import ns_conf as predictor_ns
  File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/mindsdb_server/__init__.py", line 6, in <module>
    from mindsdb_server.server import start_server
  File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/mindsdb_server/server.py", line 1, in <module>
    from mindsdb_server.namespaces.predictor import ns_conf as predictor_ns
  File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/mindsdb_server/namespaces/predictor.py", line 12, in <module>
    from flask_restplus import Resource, abort
  File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/flask_restplus/__init__.py", line 4, in <module>
    from . import fields, reqparse, apidoc, inputs, cors
  File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/flask_restplus/fields.py", line 17, in <module>
    from werkzeug import cached_property
ImportError: cannot import name 'cached_property' from 'werkzeug' (/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/werkzeug/__init__.py)

Install doesn't fail if a part of the install fails

One Richie's environment the installer failed (not space left on SSD) for lightwood/mindsdb, but mindsdb and the server still got installed.

If a sub-dependency can't be installed the installation process should crash instead.

add predictor accuracy to predictors endpoint

GET /predictors does not contain accuracy

[
    {
        "name": "nr_rooms",
        "version": "1.2.1",
        "is_active": false,
        "data_source": "home_rentals.csv",
        "predict": [
            "number_of_rooms"
        ],
        "status": "complete",
        "current_phase": "Trained",
        "train_end_at": "2019-05-31T10:56:01",
        "updated_at": "2019-05-31T10:57:02",
        "created_at": "2019-05-31T10:55:00"
    }
]

See PredictorStatus entity

data preparation no being popualted on metadata mapping

on the metdata no 'data_preparation' is being mapped,

example of desired output:

'data_preparation': {
'accepted_margin_of_error': 0.2,
'total_row_count': 18000,
'used_row_count': 10000,
'test_row_count': 1000,
'validation_row_count': 1000,
'train_row_count': 8000
},

predictor data issues

Max reported there are some problems with the predictor data.

Weird OSX install issues

We observed some weird install issues on Richie's machin whereby:

cd "$SERVER_PATH"

Fails, SERVER_PATH is set to '/blah/blah/Applications Support/blah/blahand the cd command goes to/blah/blah/Applications` instead.

cd ""$SERVER_PATH"" didn't help either, I'm at a bit of a loss for why this is happening, the behavior was not observed on other OSX machiens with the same pathing. Part of me things, based on the logs, that the whitespace in Applications Support get replaced with a tab GUI side for some reason.

Barring any fix in the install script and after 1 hour we found nothing, it might be easiest to fix this in the GUI by just using a place other than Applications Support to install stuff..,

json schema as opposed to retplus field based model

move from field-based model to json schema, so that we can import the schema from mindsdb model.

leave only two entities in mindsdb_server/namespaces/entities
predictor_status
predictor_metadata

see:

https://flask-restplus.readthedocs.io/en/stable/marshalling.html#define-model-using-json-schema

Create tests for the clickhouse integration

Create a test-suite for the clickhous integration.

This should include:

Create a clickhouse data source (and check the data consistency)
Train a predictor with a clickhouse data source
Check it's been auto-updated into clickhouse
Check the table definition for the predictors
Get a prediction from clickhouse using a clickhouse datasource
Get a prediction using data inside clickhouse passed via WHEN

trained predictor not returning anything on model analysis and data analysis

on a trained predictor I only get:

{
    "status": "complete",
    "current_phase": "Trained",
    "name": "nrooms",
    "version": "1.2.5",
    "data_preparation": {
        "accepted_margin_of_error": 0.0,
        "total_row_count": 5037,
        "used_row_count": 5037,
        "test_row_count": 505,
        "train_row_count": 4029,
        "validation_row_count": 503
    },
    "accuracy": 0.247,
    "data_analysis": {
        "target_columns_metadata": [],
        "input_columns_metadata": []
    },
    "model_analysis": []
}