mindsdb / mindsdb_server Goto Github PK
View Code? Open in Web Editor NEWMindsDB server allows you to consume and expose MindsDB workflows, through http.
License: GNU General Public License v3.0
MindsDB server allows you to consume and expose MindsDB workflows, through http.
License: GNU General Public License v3.0
Merge Mindsdb mysql mime proxy as an interface once #67 is completed.
see this link on how to fix this issue:
https://stackoverflow.com/questions/44727052/handling-large-file-uploads-with-flask
pointer on where this is on the server code:
https://github.com/mindsdb/mindsdb_server/blob/master/mindsdb_server/namespaces/datasource.py#L115
Separate mindsdb server into two components:
Make sure to allow for the adding of new interfaces and components.
Everyone (in the backend team) should setup environment that include:
This is in order to test and work on integration mindsdb (server) with various databases, it's better to have these setup now rather than later since you may run into installation issues (in which case feel free to ask in the backend channel and/or google stuff)
Make sure that all of this is working by running the Unit tests for the Mysql, Potgres and Clickhouse datasource in mindsdb and confirming they pass. (To run those you'll need the extra datasource dependencies listed here: https://github.com/mindsdb/mindsdb/blob/master/optional_requirements_extra_data_sources.txt)
Please confirm here once you've got everything set up.
this is a mindsdb module issue, but refelects on server
Save metadata as soon as model is created, even if its just the name,
make sure to add the attributes of datasource name and the fields its being asked to predict as well as initial status: started. And map this to the returned data on the server.
save after each phase, save the predictor metadata and store the phase at which its at. map the phase name to the status attribute on the server response.
Used row count doesnt add up. See response
"status": "complete",
"current_phase": "Trained",
"name": "nr_rooms",
"version": "1.2.1",
"data_preparation": {
"accepted_margin_of_error": 0.0,
"total_row_count": 5037,
"used_row_count": 630,
"test_row_count": 505,
"train_row_count": 4029,
"validation_row_count": 503
},
Torch only likes running on spawned
processes.
Ideally most processes should be forked instead but we can spawn mindsdb just fine, make sure that's done for every train/test/predict call and the results are properly returned.
Not a priority until we roll out the new minsdb server AND have the test suite in place
We'll need a locking mechanism when accessing the datasource through the DataStore and mindsdb through the upcoming mindsdb interface.
This will need to support locking between processes and ideally it can also be implemented in mindsdb native such that users can use all the APIs in paralle.
At the moment the way mindsdb_server operates with datasources is not, in theory, "concurrency safe" but (especially provided it's run on a linux machine) this should never be an issue in practice.
I don't expect this to b here either BUT at some point we might encounter it and it's better to prepare for these things in advance.
A good candidate for this is Ilock:
https://github.com/symonsoft/ilock
Which seems to be a very well implemented OS-neutral file-system based lock. I've read through the code and tbh I'm not sure the implementation would work out perfectly on all OS-es (i.e. I can see weird edge cases) but it seems to be more than good enough.
The endpoint predictors/X/columns is broken
Add a component that handles our various integrations (e.g. clickhouse) by running periodic and one-off tasks.
This should include:
when merge to master:
Ideally, we shouldn't use the "latest" mindsdb and lightwood all the time, we should release the server less often and stick to more stable versions of both.
when you train a new predictor the metadata stays in training for ever
Lets implement it for now so that it can simply take a json documents, which will be passed, straight as contents of the when={}
Create OS specific distributions (e.g. rpm package. snap package, dbm package, whatever are the popular formats on windows/OSX).
This should include a bit of research into (see above) the popular formats we should be supporting for each platform and how easy a distribution for them should be to create.
I think that all of the dev should leave their feedback here in terms of:
a) What packages they can easily create because they've done so before
b) What are the common formats in their particular niche of technology
Also this might turn into a community issue once we make progress since we could try and get those packages up onto the official repositories.
Personally I feel comfortable doing the PKGBUILD (arch) and can probably handle the snap, dbm and rpm. Ideally someone else would be in charge of designing the OSx and Windows specific distributions.
For an existing implementation of something like this see the way we build and distribute the docker image.
Once this is done we should also update the docs.
Also, make sure we don't bite more than we can chew, these can be hard to maintain so we should:
a) Only have them for mindsdb_server, which can then be used to install mindsdb, lightwood and dataskilelt
b) Only document the ones that we are sure work and keep the "install manually" instructions in our documentation for those that don't
Modify the data schema to allow transmission of the modified get_model_data
results.
use mindsdb FileDS, try to load it and it will give you the columns as follows
from mindsdb.libs.data_sources.file_ds import FileDS
ds = FileDS(url/path)
columns = list(ds.columns.values)
On MAC, this is the error:
Exported model to nr_rooms.zip 127.0.0.1 - - [04/Jun/2019 11:04:26] "�[1m�[35mGET /predictors/nr_rooms/download HTTP/1.1�[0m" 500 -
Exception on /predictors/nr_rooms/download [GET] Traceback (most recent call last): File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/shutil.py", line 557, in move os.rename(src, real_dst) FileNotFoundError: [Errno 2] No such file or directory: 'nr_rooms.zip' -> 'tmp/nr_rooms.zip' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask/app.py", line 1832, in full_dispatch_request rv = self.dispatch_request() File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask/app.py", line 1818, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask_restplus/api.py", line 325, in wrapper resp = resource(*args, **kwargs) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask/views.py", line 88, in view return self.dispatch_request(*args, **kwargs) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/flask_restplus/resource.py", line 44, in dispatch_request resp = meth(*args, **kwargs) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/site-packages/mindsdb_server/namespaces/predictor.py", line 229, in get shutil.move(fname, fpath) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/shutil.py", line 571, in move copy_function(src, real_dst) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/shutil.py", line 257, in copy2 copyfile(src, dst, follow_symlinks=follow_symlinks) File "/Users/jorgetorres/Library/Application Support/mindsdb_gui/mindsdb_server/env/lib/python3.7/shutil.py", line 121, in copyfile with open(dst, 'wb') as fdst: FileNotFoundError: [Errno 2] No such file or directory: 'tmp/nr_rooms.zip'
it seems like the file is not being generates as tmp/nr_rooms.zip
This should be done via inserting into the table:
mindsdb.predictors
(name String,
predict_cols String,
select_data_query String,
training_options String
)```
Which should already by created whenever the clickhouse interface is constructed.
We should go through all contributions since we switch from an MIT License to a GPL-3.0 License and either:
a) Have all contributors agree to and sign something like the ASF Contributor License Agreement or alternatively remove their contributions.
b) In the future we should have some easy way of allowing anyone that contributes code to sign and agreement similar to the way the Apache foundation does it.
This is for {insert legal reasons I would make a mess of explaining}, feel free to ask or send us an email or ask a question here in case you don't agree with this policy or think it's in some way disadvantageous to Mindsdb and/or it's open source contributors.
Do some research regrading the new installation procedure.
There may be easy gains that we are "missing" which could improve the amount of environment we can currently install on.
Potential leads leads:
virtualevn
I get this error:
Exception on /datasources/yep [PUT]
Traceback (most recent call last):
File "/Users/jorgetorres/Library/Python/3.7/lib/python/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/jorgetorres/Library/Python/3.7/lib/python/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/usr/local/lib/python3.7/site-packages/flask_restplus/api.py", line 325, in wrapper
resp = resource(*args, **kwargs)
File "/Users/jorgetorres/Library/Python/3.7/lib/python/site-packages/flask/views.py", line 88, in view
return self.dispatch_request(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/flask_restplus/resource.py", line 44, in dispatch_request
resp = meth(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/flask_restplus/marshalling.py", line 136, in wrapper
resp = f(*args, **kwargs)
File "/Users/jorgetorres/src/mindsdb/mindsdb_server/mindsdb_server/namespaces/datasource.py", line 101, in put
names = [x['name'] for x in get_datasources()]
File "/Users/jorgetorres/src/mindsdb/mindsdb_server/mindsdb_server/namespaces/datasource.py", line 43, in get_datasources
with open(os.path.join(ROOT_STORAGE_DIR, ds_name, 'metadata.json'), 'r') as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'storage/predictors/metadata.json'
The mindsdb server should run some environment configuration upon installation.
/var/mindsdb/
directory that contains config.json
which can be used to configure lightwood, mindsdb, mindsdb_server... etc./var/mindsdb/
for storing datasource, predictors and anything else that might need storing.At this moment imported predictor will replace existing if their names match.
Need to add opportunity change name of imported predictor.
It is important to understand that the accuracy_histogram in metadata
model_analysis[target_i][accuracy_histogram][x]
maps one to one to
data_analysis[target_columns_metadata'][target_i]['data_distribution']['data_histogram'][x]
Right now, I cant really tell what is it mapping to right now, but its wrong, see image,
but the idea is that you can inspect the distributions per bucket of the target variable.
X should be the number of rooms
model_analysis[target_i][accuracy_histogram][y] should be the accuracy of the model for that particular bucket
Imporve the current installers in order to handle more edge-cases and operating systems.
Also, potentially, add an option for the integration of scout into the installers and also a graphical wizard for OSes that easily allow us to do so.
The user should be able to add a type and subtype to each column in a data source, then, when mindsdb is trained using that data source, the types/subtypes should be transmitted to mindsdb.
The endpoint is called, returns 404 but no error is logged onto the console... look into why this is (noticed on Amie's machine).
store all predicror data in the server/storage/predictors folder
We need to define a few test cases for the clickhouse integration for the MVP that we want to be ready by Thursday.
E.g.:
WHEN
clause to the predictor trained during (5)Ideally these should be a bit more in-depth (e.g. specifying the exact query by which the predictor should be trained and predicted from, specifying the exact data to insert into clickhouse)
@torrmal I will pass this one over to you since you probably have a better idea of what you want to see in the MVP.
Create deamon files (or equivalent) for as many OSes as this is easy to do on (e.g. takes less than a day of work).
These files might have to be configured dynamically based on the installation environment.
Take as an example the current linux daemon file: https://github.com/mindsdb/mindsdb_server/blob/master/mindsdb_server.service
Installing the latest minds_server error throws Import error with the latest Flask-restplus version.
Flask-restplus have Werkzeug as a dependency that has some import changes in (1.0.0)latest version. noirbizarre/flask-restplus#777 (comment)
Time series encoders can't be loaded
Traceback (most recent call last):
File "server.py", line 1, in <module>
from mindsdb_server.namespaces.predictor import ns_conf as predictor_ns
File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/mindsdb_server/__init__.py", line 6, in <module>
from mindsdb_server.server import start_server
File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/mindsdb_server/server.py", line 1, in <module>
from mindsdb_server.namespaces.predictor import ns_conf as predictor_ns
File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/mindsdb_server/namespaces/predictor.py", line 12, in <module>
from flask_restplus import Resource, abort
File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/flask_restplus/__init__.py", line 4, in <module>
from . import fields, reqparse, apidoc, inputs, cors
File "/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/flask_restplus/fields.py", line 17, in <module>
from werkzeug import cached_property
ImportError: cannot import name 'cached_property' from 'werkzeug' (/home/zoran/MyProjects/mindsdb_server/ser/lib/python3.7/site-packages/werkzeug/__init__.py)
One Richie's environment the installer failed (not space left on SSD) for lightwood/mindsdb, but mindsdb and the server still got installed.
If a sub-dependency can't be installed the installation process should crash instead.
GET /predictors does not contain accuracy
[
{
"name": "nr_rooms",
"version": "1.2.1",
"is_active": false,
"data_source": "home_rentals.csv",
"predict": [
"number_of_rooms"
],
"status": "complete",
"current_phase": "Trained",
"train_end_at": "2019-05-31T10:56:01",
"updated_at": "2019-05-31T10:57:02",
"created_at": "2019-05-31T10:55:00"
}
]
See PredictorStatus entity
on the metdata no 'data_preparation' is being mapped,
example of desired output:
'data_preparation': {
'accepted_margin_of_error': 0.2,
'total_row_count': 18000,
'used_row_count': 10000,
'test_row_count': 1000,
'validation_row_count': 1000,
'train_row_count': 8000
},
Max reported there are some problems with the predictor data.
We observed some weird install issues on Richie's machin whereby:
cd "$SERVER_PATH"
Fails, SERVER_PATH
is set to '/blah/blah/Applications Support/blah/blahand the cd command goes to
/blah/blah/Applications` instead.
cd ""$SERVER_PATH""
didn't help either, I'm at a bit of a loss for why this is happening, the behavior was not observed on other OSX machiens with the same pathing. Part of me things, based on the logs, that the whitespace in Applications Support
get replaced with a tab GUI side for some reason.
Barring any fix in the install script and after 1 hour we found nothing, it might be easiest to fix this in the GUI by just using a place other than Applications Support
to install stuff..,
move from field-based model to json schema, so that we can import the schema from mindsdb model.
leave only two entities in mindsdb_server/namespaces/entities
predictor_status
predictor_metadata
see:
https://flask-restplus.readthedocs.io/en/stable/marshalling.html#define-model-using-json-schema
Create a test-suite for the clickhous integration.
This should include:
WHEN
on a trained predictor I only get:
{
"status": "complete",
"current_phase": "Trained",
"name": "nrooms",
"version": "1.2.5",
"data_preparation": {
"accepted_margin_of_error": 0.0,
"total_row_count": 5037,
"used_row_count": 5037,
"test_row_count": 505,
"train_row_count": 4029,
"validation_row_count": 503
},
"accuracy": 0.247,
"data_analysis": {
"target_columns_metadata": [],
"input_columns_metadata": []
},
"model_analysis": []
}
it seems like the server is not able to delete the trained predictors fully, they still list on GET predictors
Build a depenency graph for this library that includes the licenses of all of our dependencies to make sure they are compatible with GLP-3.0
if its a file make sure that you keep track of where they will be stored, there should be a configuration variable for mindsdb server as to what path to store the uploaded files, and when a datasource is deleted, it should also delete the file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.