mindsdb / mindsdb Goto Github PK

View Code? Open in Web Editor NEW

21.3K 378.0 2.7K 256.18 MB

The platform for customizing AI from enterprise data

Home Page: https://mindsdb.com

License: Other

Python 99.79% Mako 0.01% Smarty 0.04% Dockerfile 0.09% Makefile 0.01% HCL 0.06%

ml artificial-intelligence hacktoberfest mysql huggingface llm ai mongodb postgres timeseries

mindsdb's Introduction

Website · Docs · Community Slack

📖 About us

MindsDB is the platform for customizing AI from enterprise data.

With MindsDB, you can deploy, serve, and fine-tune models in real-time, utilizing data from databases, vector stores, or applications, to build AI-powered apps - using universal tools developers already know.

MindsDB integrates with numerous data sources, including databases, vector stores, and applications, and popular AI/ML frameworks, including AutoML and LLMs. MindsDB connects data sources with AI/ML frameworks and automates routine workflows between them. By doing so, we bring data and AI together, enabling the intuitive implementation of customized AI systems.

Learn more about features and use cases of MindsDB here.

🚀 Get Started

To get started, install MindsDB locally via Docker or Docker Desktop, following the instructions in linked doc pages.

MindsDB enhances SQL syntax to enable seamless development and deployment of AI-powered applications. Furthermore, users can interact with MindsDB not only via SQL API but also via REST APIs, Python SDK, JavaScript SDK, and MongoDB-QL.

🎯 Solutions	⚙️ SQL Query Examples
🤖 Fine-Tuning	`FINETUNE mindsdb.hf_model FROM postgresql.table;`
📚 Knowledge Base	`CREATE KNOWLEDGE_BASE my_knowledge FROM (SELECT contents FROM drive.files);`
🔍 Semantic Search	`SELECT * FROM rag_model WHERE question='What product is best for treating a cold?';`
⏱️ Real-Time Forecasting	`SELECT * FROM binance.trade_data WHERE symbol = 'BTCUSDT';`
🕵️ Agents	`CREATE AGENT my_agent USING model='chatbot_agent', skills = ['knowledge_base'];`
💬 Chatbots	`CREATE CHATBOT slack_bot USING database='slack',agent='customer_support';`
⏲️ Time Driven Automation	`CREATE JOB twitter_bot ( <sql_query1>, <sql_query2> ) START '2023-04-01 00:00:00';`
🔔 Event Driven Automation	`CREATE TRIGGER data_updated ON mysql.customers_data (sql_code)`

💡 Examples

MindsDB enables you to deploy AI/ML models, send predictions to your application, and automate AI workflows.

Discover more tutorials and use cases here.

AI Workflow Automation

This category of use cases involves tasks that get data from a data source, pass it through an AI/ML model, and write the output to a data destination.

Common use cases are anomaly detection, data indexing/labeling/cleaning, and data transformation.

This example showcases the data enrichment flow, where input data comes from a PostgreSQL database and is passed through an OpenAI model to generate new content which is saved into a data destination.

We take customer reviews from a PostgreSQL database. Then, we deploy an OpenAI model that analyzes all customer reviews and assigns sentiment values. Finally, to automate the workflow for incoming customer reviews, we create a job that generates and saves AI output into a data destination.

-- Step 1. Connect a data source to MindsDB
CREATE DATABASE data_source
WITH ENGINE = "postgres",
PARAMETERS = {
    "user": "demo_user",
    "password": "demo_password",
    "host": "samples.mindsdb.com",
    "port": "5432",
    "database": "demo",
    "schema": "demo_data"
};

SELECT *
FROM data_source.amazon_reviews_job;

-- Step 2. Deploy an AI model
CREATE ML_ENGINE openai_engine
FROM openai
USING
    openai_api_key = 'your-openai-api-key';

CREATE MODEL sentiment_classifier
PREDICT sentiment
USING
    engine = 'openai_engine',
    model_name = 'gpt-4',
    prompt_template = 'describe the sentiment of the reviews
						strictly as "positive", "neutral", or "negative".
						"I love the product":positive
						"It is a scam":negative
						"{{review}}.":';

DESCRIBE sentiment_classifier;

-- Step 3. Join input data with AI model to get AI output
SELECT input.review, output.sentiment
FROM data_source.amazon_reviews_job AS input
JOIN sentiment_classifier AS output;

-- Step 4. Automate this workflow to accomodate real-time and dynamic data
CREATE DATABASE data_destination
WITH ENGINE = "engine-name",      -- choose the data source you want to connect to save AI output
PARAMETERS = {                    -- list of available data sources: https://docs.mindsdb.com/integrations/data-overview
    "key": "value",
	...
};

CREATE JOB ai_automation_flow (
	INSERT INTO data_destination.ai_output (
		SELECT input.created_at,
			   input.product_name,
			   input.review,
			   output.sentiment
		FROM data_source.amazon_reviews_job AS input
		JOIN sentiment_classifier AS output
		WHERE input.created_at > LAST
	);
);

AI System Deployment

This category of use cases involves creating AI systems composed of multiple connected parts, including various AI/ML models and data sources, and exposing such AI systems via APIs.

Common use cases are agents and assistants, recommender systems, forecasting systems, and semantic search.

This example showcases AI agents, a feature developed by MindsDB. AI agents can be assigned certain skills, including text-to-SQL skills and knowledge bases. Skills provide an AI agent with input data that can be in the form of a database, a file, or a website.

We create a text-to-SQL skill based on the car sales dataset and deploy a conversational model, which are both components of an agent. Then, we create an agent and assign this skill and this model to it. This agent can be queried to ask questions about data stored in assigned skills.

-- Step 1. Connect a data source to MindsDB
CREATE DATABASE data_source
WITH ENGINE = "postgres",
PARAMETERS = {
    "user": "demo_user",
    "password": "demo_password",
    "host": "samples.mindsdb.com",
    "port": "5432",
    "database": "demo",
    "schema": "demo_data"
};

SELECT *
FROM data_source.car_sales;

-- Step 2. Create a skill
CREATE SKILL my_skill
USING
    type = 'text2sql',
    database = 'data_source',
    tables = ['car_sales'],
    description = 'car sales data of different car types';

SHOW SKILLS;

-- Step 3. Deploy a conversational model
CREATE ML_ENGINE langchain_engine
FROM langchain
USING
      openai_api_key = 'your openai-api-key';
      
CREATE MODEL my_conv_model
PREDICT answer
USING
    engine = 'langchain_engine',
    model_name = 'gpt-4',
    mode = 'conversational',
    user_column = 'question' ,
    assistant_column = 'answer',
    max_tokens = 100,
    temperature = 0,
    verbose = True,
    prompt_template = 'Answer the user input in a helpful way';

DESCRIBE my_conv_model;

-- Step 4. Create an agent
CREATE AGENT my_agent
USING
    model = 'my_conv_model',
    skills = ['my_skill'];

SHOW AGENTS;

-- Step 5. Query an agent
SELECT *
FROM my_agent
WHERE question = 'what is the average price of cars from 2018?';

SELECT *
FROM my_agent
WHERE question = 'what is the max mileage of cars from 2017?';

SELECT *
FROM my_agent
WHERE question = 'what percentage of sold cars (from 2016) are automatic/semi-automatic/manual cars?';

SELECT *
FROM my_agent
WHERE question = 'is petrol or diesel more common for cars from 2019?';

SELECT *
FROM my_agent
WHERE question = 'what is the most commonly sold model?';

Agents are accessible via API endpoints.

🤝 Contribute

If you’d like to contribute to MindsDB, install MindsDB for development following this instruction.

You’ll find the contribution guide here.

We are always open to suggestions, so feel free to open new issues with your ideas, and we can guide you!

This project is released with a Contributor Code of Conduct. By participating in this project, you agree to follow its terms.

Also, check out the rewards and community programs here.

🤍 Support

If you find a bug, please submit an issue on GitHub here.

Here is how you can get community support:

Post a question at MindsDB Slack Community.
Ask for help at our GitHub Discussions.
Ask a question at Stackoverflow with a MindsDB tag.

If you need commercial support, please contact the MindsDB team.

💚 Current contributors

Made with contributors-img.

🔔 Subscribe to updates

Join our Slack community and subscribe to the monthly Developer Newsletter to get product updates, information about MindsDB events and contests, and useful content, like tutorials.

⚖️ License

For detailed licensing information, please refer to the LICENSE file.

mindsdb's People

Contributors

Stargazers

Watchers

Forkers

ifoundthetao adammcarrigan daniellsm ramkumar78 niko2756 philipjadler jadedmonkeys bala1718 davepowell torrmal pramodtoraskar tommy-ros strategist922 okbalefthanded ailibrary surendra1472 mailmahee quantumpacket yushu-liu rotorliu aeternu davidmuhr piandpower lzhenn shibei00 jbowles zmoon111 singleton7 slamj1 osvaldoalvaradodev manuti rishiranjjan snewhouse elientumba2019 dreamkeep heysachin maitreya2954 sfrias thiagoalmeidasa geetab19 sweetbai fmarrabal yarenty secantsquared hemmingway westeast cclauss siffi26 freshy969 nlqq ushukkla pwilken johannesferner andrewfarley bgbi robdll arunkumarramanan vovietanh bharathi26 muthmano-dev sand47 codemercs kanokkorn volpatto perryhau 113771169 jamwine hc10024 starkblaze01 isaacjlwu dubey00 shemmanyu idkwim pathcl skarabi alex-orlovskyi blessyjoyk dineshresearch paperpanks chankeypathak prashant0598 mortenhauberg jeanron100 sthkindacrazy arijit-pande a1ip lancewalk87 ameybarve15 justdoit1024 aykuttasil tlhcelik mrvnmchm tahajalili manrajgrover prakash2403 ayush9398 vedarth wangkanger onceagainitsrxvn madhusudhan-made

mindsdb's Issues

How to hide this predicted log.

python script

from mindsdb import *

# First we initiate MindsDB
mdb = MindsDB()

# use the model to make predictions
result = mdb.predict(predict='rental_price', when={'number_of_rooms': 2,'number_of_bathrooms':1, 'sqft': 1190}, model_name='home_rentals')

# you can now print the results
print('The predicted price is ${price} with {conf} confidence'.format(price=result.predicted_values[0]['rental_price'], conf=result.predicted_values[0]['prediction_confidence']))

predicted log

[START] StatsLoader
[END] StatsLoader, execution time: 0.022 seconds
[START] DataExtractor
[END] DataExtractor, execution time: 0.002 seconds
[START] DataVectorizer
[END] DataVectorizer, execution time: 0.000 seconds
[START] ModelPredictor
Predict: model home_rentals, epoch 0
Starting model...
Inferring from model and data...
predicting batch...
Predict: model home_rentals [OK], TOTAL TIME: 0.34 seconds
[END] ModelPredictor, execution time: 0.336 seconds

How about adding requirements.txt to the project?

Added requirements.txt for development. Reference: #3

NLP Text categorization support for Chinese corpus

Describe the bug
Load a Chinese text file,The encoding format of the file is GB18030.
run the train.py, Prompt exception：UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 2: invalid start byte

To Reproduce
Load a Chinese text file

train.py :
from mindsdb import *

MindsDB().learn(
#from_file="real_estate_description.xlsx", # the path to the file where we can learn from
from_file="shakes.train.numLines.csv", # the path to the file where we can learn from

predict='行项目数', # the column we want to learn to predict given all the data in the file（test for text categorization）
model_name='real_estate_desc' # the name of this model

)

Expected behavior
NLP text categorization support for Chinese corpus

Screenshots
https://github.com/duanzhihua/mindsdb/blob/master/docs/examples/nlp/unicodedecode.png
https://github.com/duanzhihua/mindsdb/blob/master/docs/examples/nlp/ChineseCorpus.png

Desktop (please complete the following information):

OS: Windows 10

After upgrading mindsdb new version it is still asking for upgrade while generating model.

Describe the bug
Done the following steps to upgrade mindsdb

pip3 uninstall mindsdb
pip3 install mindsdb --user

Now run the command python3 train.py
It still shows the warning to upgrade the version

Installation doc for Windows

Describe the bug
Installation docs says:
conda install -c blaze
conda install -c sqlite3

But it is not working for me, I tried the below then got worked

conda install -c blaze blaze
conda install -c blaze sqlite3

And the doc(https://github.com/mindsdb/mindsdb/blob/master/docs/Installing.md) it is for Windows 10, so is it not available for Windows 8?

Unable to install package through "pip"

Unable to install package through "pip install mindsdb". Getting Error : Could not find a version that satisfies the requirement torch>=0.4.1 (from mindsdb) (from versions: 0.1.2, 0.1.2.post1)
No matching distribution found for torch>=0.4.1 (from mindsdb)

OS: Windows 10
Python version 3.6.8

CUDA acceleration support

I don't seem to be able to enable CUDA on Ubuntu, Python 3.7, PyTorch 0.4.

I did os.environ['USE_CUDA'] = 'True' and it didn't work.

Export model

Hi Adam, I tried replying to your email but it bounced.

1 question, can models be exported to enable prediction on another machine?
For home automation where people are running their system on a raspberry pi, a workflow I can envision is exporting data to csv (we have a platform for doing that https://data.home-assistant.io/), training e.g. on colab, then exposing the model to the home automation system via a rest API from a docker container.
Thanks

No module named 'requests'

I get the following error after installing and using MindsDB.

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-3376a8669d71> in <module>()
     16 from pathlib import Path
     17 
---> 18 from mindsdb import MindsDB
     19 
     20 from helper import save_results, getModuleFullName, create_feature_list

~/env/lib/python3.6/site-packages/mindsdb/__init__.py in <module>()
      1 from .libs.data_types.data_source import DataSource
      2 from .libs.data_sources import *
----> 3 from .libs.controllers.mindsdb_controller import MindsDBController
      4 
      5 name = "mindsdb"

~/env/lib/python3.6/site-packages/mindsdb/libs/controllers/mindsdb_controller.py in <module>()
      1 import sqlite3
      2 import pandas
----> 3 import requests
      4 import logging
      5 import os

ModuleNotFoundError: No module named 'requests'

Reproduce:

Installing MindsDB in a python virtual environment

Solution:

Install requests by hand

Training error on Pytorch 1.0 with CUDA

Describe the bug
Mindsdb demo worked with CPU but failed with CUDA.

To Reproduce
Just run the following code from python3 console:

import os
os.environ['USE_CUDA'] = 'True'

from mindsdb import *

# First we initiate MindsDB
mdb = MindsDB()

# We tell mindsDB what we want to learn and from what data
mdb.learn(
    from_data="https://raw.githubusercontent.com/mindsdb/mindsdb/master/docs/examples/basic/home_rentals.csv", # the path to the file where we can learn from, (note: can be url)
    predict='rental_price', # the column we want to learn to predict given all the data in the file
    model_name='home_rentals' # the name of this model
)

Expected behavior
Report errors as:

[START] ModelTrainer
Training: model home_rentals, epoch 0
Starting model...
Training model...
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/mindsdb/libs/controllers/transaction_controller.py", line 138, in executeLearn
    self.callPhaseModule('ModelTrainer')
  File "/usr/local/lib/python3.7/site-packages/mindsdb/libs/controllers/transaction_controller.py", line 97, in callPhaseModule
    return module()
  File "/usr/local/lib/python3.7/site-packages/mindsdb/libs/phases/base_module.py", line 73, in __call__
    ret = self.run(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/mindsdb/libs/phases/model_trainer/model_trainer.py", line 78, in run
    TrainWorker.start(self.transaction.model_data, model_name=model_name, ml_model=ml_model, config=config)
  File "/usr/local/lib/python3.7/site-packages/mindsdb/libs/workers/train.py", line 411, in start
    return TrainWorker(data, model_name, ml_model, config)
  File "/usr/local/lib/python3.7/site-packages/mindsdb/libs/workers/train.py", line 80, in __init__
    self.train()
  File "/usr/local/lib/python3.7/site-packages/mindsdb/libs/workers/train.py", line 98, in train
    for train_ret in self.data_model_object.trainModel(self.train_sampler):
  File "/usr/local/lib/python3.7/site-packages/mindsdb/libs/ml_models/pytorch/libs/base_model.py", line 325, in trainModel
    loss, batch_size = model_object.calculateBatchLoss(batch)
  File "/usr/local/lib/python3.7/site-packages/mindsdb/libs/ml_models/pytorch/models/fully_connected_buckets_net/fully_connected_buckets_net.py", line 60, in calculateBatchLoss
    predicted_target, predicted_buckets = self.forward(batch.getInput(flatten=self.flatInput), return_bucket_outputs =True)
  File "/usr/local/lib/python3.7/site-packages/mindsdb/libs/ml_models/pytorch/models/fully_connected_buckets_net/fully_connected_buckets_net.py", line 93, in forward
    output_buckets[col] = self.nets[col](output)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 63, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/functional.py", line 1316, in linear
    ret = torch.addmm(torch.jit._unwrap_optional(bias), input, weight.t())
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #4 'mat1'
[ERROR] 'None'
[ERROR] "Expected object of backend CPU but got backend CUDA for argument #4 'mat1'"

Desktop:

OS: Mac OS 10.13.6
Python 3.7
Pytorch 1.0 (with CUDA)

Mindsdb installation error

Issue Summary
When I try to install mindsdb on my machine I am getting: Failed building wheel for Markupsafe error.

Steps to Reproduce
pip3 install mindsdb --user

Output:
error: invalid command 'bdist_wheel'
Failed building wheel for Markupsafe

Technical details:

Ubuntu 16.04
pip3 version 18.0.1
python 3.5.2 version

"Could not convert string to date" on generating train model

Describe the bug
I took a CSV file containg Date column. While generating a model on a date column getting the error could not convert string to date error

To Reproduce
Steps to reproduce the behavior:
Attached CSV in pdf format
marvel-wikia.pdf

train.py :
from mindsdb import *
MindsDB().learn(
from_file="D://mindsdb/basic/marvel-wikia/marvel-wikia.csv", # the path to the file where we can learn from
predict='FIRST_APPEARANCE', # the column we want to learn to predict given all the data in the file
model_name='marvel_model' # the name of this model
)

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
Should generate the model

Desktop (please complete the following information):

OS: Windows 10

Support cells/columns with sequential values

If we get a cell such as 1 2 3 5 6 or [1,2,3,5,6] or 1,2,3,5,6 we should interpret that as a sequential value, rather than TEXT (which is what would happen now).

Broken website appearance

Add Support to Vanilla Pip

I saw that the installation manual provides support to pip3 only.

Kindly extend support to pip also.

Thank you.

Predicting "from_data" is giving error if the file has less than 30 rows

Describe the bug
While predicting if use the option from_data to predict a list of values at a time from the file, if the file should have at least 30 rows(31 including title) otherwise means if it is less that 30 rows throwing error.

To Reproduce
Steps to reproduce the behavior:

Train any model
Predict using "from_data", but have less than 30 rows in the file
Run prediction it will throw error

Expected behavior
It should predict for whatever number of rows I pass

Screenshots

Desktop (please complete the following information):

Mac OS

null Data Processing

Describe the bug
When analyzing the base station data in the field of communication, it is found that if there is a null value in the data, it will prompt an error.
Traceback (most recent call last):
File "D:\PycharmProjects\git_UC_Berkeley_mindsdb_2019\mindsdb\mindsdb\libs\controllers\transaction_controller.py", line 133, in executeLearn
self.callPhaseModule('DataVectorizer')
File "D:\PycharmProjects\git_UC_Berkeley_mindsdb_2019\mindsdb\mindsdb\libs\controllers\transaction_controller.py", line 97, in callPhaseModule
return module()
File "D:\PycharmProjects\git_UC_Berkeley_mindsdb_2019\mindsdb\mindsdb\libs\phases\base_module.py", line 73, in call
ret = self.run(**kwargs)
File "D:\PycharmProjects\git_UC_Berkeley_mindsdb_2019\mindsdb\mindsdb\libs\phases\data_vectorizer\data_vectorizer.py", line 172, in run
normalized = norm(value=value, cell_stats=stats) # this should return a vector representation already normalized
File "D:\PycharmProjects\git_UC_Berkeley_mindsdb_2019\mindsdb\mindsdb\libs\helpers\norm_denorm_helpers.py", line 71, in norm
normalizedValue = (value - cell_stats['min']) /
TypeError: unsupported operand type(s) for -: 'str' and 'float'
[ERROR] 'None'
[ERROR] "unsupported operand type(s) for -: 'str' and 'float'"

[START] DataVectorizer
Traceback (most recent call last):
File "D:\PycharmProjects\git_UC_Berkeley_mindsdb_2019\mindsdb\mindsdb\libs\controllers\transaction_controller.py", line 133, in executeLearn
self.callPhaseModule('DataVectorizer')
File "D:\PycharmProjects\git_UC_Berkeley_mindsdb_2019\mindsdb\mindsdb\libs\controllers\transaction_controller.py", line 97, in callPhaseModule
return module()
File "D:\PycharmProjects\git_UC_Berkeley_mindsdb_2019\mindsdb\mindsdb\libs\phases\base_module.py", line 73, in call
ret = self.run(**kwargs)
File "D:\PycharmProjects\git_UC_Berkeley_mindsdb_2019\mindsdb\mindsdb\libs\phases\data_vectorizer\data_vectorizer.py", line 172, in run
normalized = norm(value=value, cell_stats=stats) # this should return a vector representation already normalized
File "D:\PycharmProjects\git_UC_Berkeley_mindsdb_2019\mindsdb\mindsdb\libs\helpers\norm_denorm_helpers.py", line 71, in norm
normalizedValue = (float(value) - cell_stats['min']) /
ValueError: could not convert string to float: 'null'
[ERROR] 'None'
[ERROR] "could not convert string to float: 'null'"

To ReproduceSteps to reproduce the behavior:
newData=['null','NaN','100']
col_data = [print(cleanfloat(i)) for i in newData
if str(i) not in ['', str(None),
str(False), str(np.nan), 'NaN', 'nan', 'NA']]

Desktop (please complete the following information):

OS: windows 10

After upgrading new version getting AttributeError on generating the model and it stucks in infinite loop.

Describe the bug
Getting "AttributeError: 'numpy.float64' object has no attribute 'replace' " on generating model.
!
"
To Reproduce
Steps to reproduce the behavior:

Upgrade mindsdb new version by following command:
pip3 install mindsdb --user
Take the time_series examples https://github.com/mindsdb/mindsdb/tree/master/docs/examples/time_series
Run python3 train.py
See error "https://github.com/mindsdb/mindsdb/tree/master/docs/examples/time_series"

Expected behavior
Generate the model

Screenshots
image

Desktop (please complete the following information):

OS: Window 10

-ve accuracy in the beginning of training a model

Describe the bug
I tried to train a model, in the start of training I am getting -ve accuracy values. Is it even valid to get -ve accuracies. I am attaching the data set as well.

Screenshots

AirPassengers.csv.zip

Code snippet:

**def train_air_passengers(self):
    self.mindsDb.learn(predict='Passengers', model_name='my_passengers',
                                from_data='AirPassengers.csv')**

Desktop (please complete the following information):

OS: MacOs

Where is MAX_LENGTH defined?

Describe the bug
MAX_LENGTH is used in decoder_rnn.py but its definition can not be found in this repo.
https://github.com/mindsdb/mindsdb/search?q=MAX_LENGTH&unscoped_q=MAX_LENGTH

To Reproduce
Steps to reproduce the behavior:

flake8 . --count --show-source --statistics --select=E901,E999,F821,F822,F823

flake8 testing of https://github.com/mindsdb/mindsdb on Python 3.7.1

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./mindsdb/libs/ml_models/pytorch/encoders/rnn/decoder_rnn.py:10:76: F821 undefined name 'MAX_LENGTH'
    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
                                                                           ^
1    F821 undefined name 'xrange'
1

Add contribution.txt file

MindsDB is an Open Sorce project. So please add some contribution related documentation to help others to start contributing in this project.

Installation Error with twisted library

Describe the bug
Installation Error: twisted library?

To Reproduce

Create a virtual env with conda
pip install mindsdb --user

Tried with py3.7 and py3.6, same issue. (pip 18.1)

Log

    copying src/twisted/words/xish/xpathparser.g -> build/lib.linux-x86_64-3.7/twisted/words/xish
    running build_ext
    building 'twisted.test.raiser' extension
    creating build/temp.linux-x86_64-3.7
    creating build/temp.linux-x86_64-3.7/src
    creating build/temp.linux-x86_64-3.7/src/twisted
    creating build/temp.linux-x86_64-3.7/src/twisted/test
    gcc -pthread -B /home/frunkad/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/frunkad/anaconda3/include/python3.7m -c src/twisted/test/raiser.c -o build/temp.linux-x86_64-3.7/src/twisted/test/raiser.o
    In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed/syslimits.h:7:0,
                     from /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed/limits.h:34,
                     from /home/frunkad/anaconda3/include/python3.7m/Python.h:11,
                     from src/twisted/test/raiser.c:4:
    /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed/limits.h:194:15: fatal error: limits.h: No such file or directory
     #include_next <limits.h>  /* recurse down to the real one */
                   ^~~~~~~~~~
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/home/frunkad/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-ue7d9bgq/Twisted/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-_9rhp5o5/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/pip-install-ue7d9bgq/Twisted/

Desktop (please complete the following information):
Distributor ID: LinuxMint
Description: Linux Mint 19.1 Tessa
Release: 19.1
Codename: tessa

Documentation on model configuration

Hello mindsdb team.

I'm taking a look to your library I haven't seen any docs about what kind of neural network is used (number of layers, etc).

Could you please point me in the right direction?

Thanks.

AttributeError: 'int' object has no attribute 'replace'

Describe the bug
Training new model throws AttributeError: 'int' object has no attribute 'replace' error.

To Reproduce
Steps to reproduce the behavior:

Dataset used: https://www.kaggle.com/zynicide/wine-reviews
Run train.py
See error: AttributeError: 'int' object has no attribute 'replace'

Screenshots

Desktop (please complete the following information):

Additional context
The error happens in stats_generator.py getTextType function. Somehow the data that comes in getTextType is not well parsed and comes in the format:

 'Seven Pillars',
   'Cellar 1879 Blend',
   'Conner Lee Vineyard',
   'Charles Vineyard Clone O5',
   9,
   'au Naturel',
   'Alma Grande Reserva',
   'Taberner No1',
   'Puiten',
   'Compendium',

So when cell_wseparator.replace() is called on 9 it throws error.

installation issue for python3

Describe the bug
A clear and concise description of what the bug is.
Trouble installing mindsdb in python3.6.2
To Reproduce
Steps to reproduce the behavior:

go to terminal and make sure using python 3.6.2
run pip install mindsdb
See error: Could not find a version that satisfies the requirement tinydb-serialization>=1.0.4 (from mindsdb) (from versions: ) No matching distribution found for tinydb-serialization>=1.0.4 (from mindsdb)

Expected behavior
Expect to install successfully

Screenshots

Desktop (please complete the following information):

OS: macOS 10.13.6

Additional context

Getting "all the input arrays must have same number of dimensions" error while training the data

Describe the bug
Getting "all the input arrays must have same number of dimensions" error while training the data, when I have too many None values in my csv file. Instead of avoiding these rows, mindsdb is stopping training the data.

To Reproduce
Steps to reproduce the behavior:
I am attaching the files which I tried to train. I tried to predict column "installment".

Expected behavior
Mindsdb should train the model by dropping "None" value rows

Screenshots
Please see the attachments

Desktop (please complete the following information):

OS: MacOs

Loan_data_with_error.csv.zip

List of available trained models

It would be great if we can have some options like mindsdb.getExistingModels some kind of feature to see all the existing models that I trained. So that I can pick easily one and proceed with my stuff.

Could not convert string to float when comma is used as Decimal Separator

Describe the bug
In some of the datasets, there could be values with a different Decimal separator. For e.g, if we have data with comma separator "15,6" an error is thrown.

To Reproduce
Steps to reproduce the behavior:

Run ' python train.py'
See error: "could not convert string to float: '15,2'"

Expected behavior
MindsDB should detect different types of locales and try to parse them as valid floats.

Screenshots

Desktop (please complete the following information):

OS: Ubuntu 16.04
Python 3.7

Additional context
The error is raised in the stats_generator when we try to parse the values as a float. https://github.com/mindsdb/mindsdb/blob/master/mindsdb/libs/phases/stats_generator/stats_generator.py#L312

string index out of range exception on parsing CSV file.

Description
Index out of range exception on executing train.py
To Reproduce
Steps to reproduce the behavior:

Go to folder contains train.py, & csv file currenttweetdatabase.csv
currenttweetdatabase_tweetDS.zip
run python3 train.py
See error : string index out of range

Expected behavior
Should generate the model to predict the most engaged tweet.

Screenshots

Desktop

OS: MAC High Sierra
Additional info if applicable

Getting error on upgrading mindsdb

Describe the bug
Could not upgrade the new version of mindsdb.
To Reproduce
Steps to reproduce the behavior:

Run any train model
Running the train model gives [WARNING] There is a new version of MindsDB 0.8.9.1, please do:
pip3 uninstall mindsdb
pip2 install mindsdb --user
Run "pip3 uninstall mindsdb"
Run "pip2 install mindsdb --user"

Expected behavior
Mindsdb version should successfully upgrade.

Screenshots

Desktop (please complete the following information):

OS: Windows 10

Windows setup not working. Getting error "DLL load fail: The specified module could not be found." on running train.py script

May I know, which formats does it supports like xls, xlsx etc?

Originally posted by @surendra1472 in #7 (comment)

Predicting 0.57 as output where as possible values are 0 or 1

Describe the bug
I have a data set which describes a person diabetic(0) or non diabetic(1) based on different conditions.
Mindbsdb training the data with 0.7 accuracy, but while predicting it is giving 0.57 or something like that as value, where as possible values are 0 or 1

TypeError: 'NoneType' object is not subscriptable

Describe the bug
TypeError is thrown when running integration_tests/run_tests.py. Somehow, the error happens on random runs. It's not related to the Python version because we got the same error on different versions.
The problem is at getStoredTorchObject function: https://github.com/mindsdb/mindsdb/blob/master/mindsdb/libs/ml_models/pytorch/libs/base_model.py#L186
If you print the file_ids on the successful run it displays:
['test_one_label_prediction.pytorch.models.fully_connected_buckets_net.8393dd4a16f0e1c9723f9d9cfa2c39f1']
On failure it is None.

To Reproduce
Steps to reproduce the behavior:

Go to 'integration_testing' directory
Run 'python3 run_tests.py'
See error

Expected behavior
Predict: model test_one_label_prediction [OK], TOTAL TIME: 0.12 seconds
[END] ModelPredictor, execution time: 0.117 seconds

Screenshots
Error on Travis CI:

Local error:

Desktop (please complete the following information):
The same error was on Python 3.5, 3.6 or 3.7 version, both locally and on CI.

Window 10: Commands not working.

Following commands not working on Windows 10 :

conda install -c peterjc123
conda install -c pytorch

I installed these from the below command:
for Windows 10 and Windows Server 2016, CUDA 8
conda install -c peterjc123 pytorch

mindsdb is not accepting the negative values(eg: -20)

Describe the bug
While training, one of my data set contains -ve values(lets say temperature), mindsdb taking it as string and throwing errors

To Reproduce
Steps to reproduce the behavior:

Take any data set with numeric values
Change one of the values to negatives(add <-> before it eg -200)
Train the data
See the error

Expected behavior
It has to train with the negative values as well

Screenshots
Please find the attachment

Desktop (please complete the following information):

OS:MacOs

Taking different number of rows if I train the same model multiple times

Describe the bug
I tried to train same model with the same data set. So I have total 145 rows,
First train: test rows= 17, train rows= 110, total rows used= 127
Second train: test rows= 16, train rows= 112, total rows used= 128
Why it is taking different number of rows each time, why it is dropping few rows(18 out of 145 first time)

Screenshots

Desktop (please complete the following information):

OS: MacOs

Time Series demo is taking more time and No comments

fuel.csv is about 7000 rows, to train this data it is taking almost 3 hours. And it is giving only 88% accuracy. It would be better for a demo to take some reasonable time like rentals(5mins).
There are no comments for these train and predict programs.
Why two csv files: fuel.csv, fuel_predict.csv, it is confusing for the first time user.

Support for Excel data files

It would be nice to have support for Excel file as well. Right now only CSV files are supported.

sqlite3.OperationalError: unable to open database file

Installing MindsDB as instructed in the docs results for me in the following error.

WARNING:root:Cannot store token, Please add write permissions to file:/Users/patrickfurst/.local/lib/python3.6/site-packages/mindsdb/storage/start.mdb_base
Traceback (most recent call last):
  File "mindsdb_test.py", line 7, in <module>
    mdb = MindsDB()
  File "/Users/patrickfurst/.local/lib/python3.6/site-packages/mindsdb/libs/controllers/mindsdb_controller.py", line 34, in __init__
    self.conn = sqlite3.connect(file)
sqlite3.OperationalError: unable to open database file

Reproduce:

Clean install of MindsDB through pip.
Run the sample code from the docs.

Looking at the folders and the code, I see that the storage folder is missing.
Also, wouldn't it make more sense to save the sqlite files outside of the actuall python package ?

Is there an example of image classification

image classification or Image detection.

Provide comparison to other DL frameworks

Is your feature request related to a problem? Please describe.
Not immediately clear to newcomer how this compares/better to just using torque/keras etc

Describe the solution you'd like
Provide clear and concise description of how mindsdb differ pointed to straight in the README

Error while training model on a kaggle dataset

Describe the bug
Error "could not convert string to float: 'SC/PARIS 2131'" while training the model on Kaggle dataset for beginner competition, Titanic: Machine Learning from disaster.

To Reproduce
Steps to reproduce the behavior:

Download the train.csv from here.
Run

from mindsdb import *

MindsDB().learn(
    from_data="train.csv",
    predict='Survived',
    model_name='titanic_model'
)

See error.

Expected behavior
The best-trained model should be returned.

Screenshots

Desktop (please complete the following information):

OS: Ubuntu 18.04 Kubuntu
Used on Jupyter notebook

Additional context
The error seems to be at StatsGenerator step on the 'Ticket' column of the dataset.

Mindsdb changing the columns names and expecting me also to pass the changed

Mindsdb changing the columns names for ease of processing and expecting me also to pass the changed during training and while predicting

To Reproduce
"So my column names are Date;Time;CO(GT);PT08.S1(CO);NMHC(GT);C6H6(GT);PT08.S2(NMHC);NOx(GT);PT08.S3(NOx);NO2(GT);PT08.S4(NO2);PT08.S5(O3);T;RH;AH;Empty
I tried to train CO(GT)"
It will throw error on CO(GT), if we pass CO_GT it is training
Same thing while predicting as well

Expected behavior
It has to map to the changed values internally then train and predict the values

OS: MacOs

Please find the attachments

IndexError: list index out of range when missing predict value

Is your feature request related to a problem? Please describe.
When there is empty string provided as predict value e.g:

result = mdb.predict(predict=' ', model_name='home_rentals')

IndexError: list index out of range error is thrown

Describe the solution you'd like
User friendly message should be thrown e.g
ValueError: Please provide valid predict value

Additional context
We can check for empty predict values in https://github.com/mindsdb/main/blob/76c691c4b18a4723626dfcbff8228da614d93e8b/mindsdb/libs/controllers/mindsdb_controller.py#L170 and raise Value error if predict not provided.

No matching distribution found for socketio>=2.0.0

Describe the bug
Could not find a version that satisfies the requirement socketio>=2.0.0

To Reproduce
Steps to reproduce the behavior:

pip install mindsdb

Expected behavior
socketio doesn't have 2.00 version. https://pypi.org/project/socketio/#history latest is 0.1.5

Additional context
I think we have a typo in requirements.txt

Incorrect syntax in "requirements-win.txt" : command "eventlet=0.24.1" should be "eventlet==0.24.1"

Prediction is throwing unhandle exception if no network

Prediction is throwing lot of un handle exception if no network. I am getting the predictions but along with lot of exceptions also coming.

To Reproduce

Steps to reproduce the behavior:

Train the model
Try to predict
you will get predictions along with lot of unhandle exceptions

Expected behavior
It should give predictions without error

Desktop (please complete the following information):
MacOs

Getting None as prediction value and prediction confidence

Describe the bug
I trained some data and I got around 73% accuracy, so I started predicting by passing different independent variables. Then I got predicting value and confidence both as None. Especially when I include date in the "when"

Expected behavior
It has to give some reasonable predicting value, and prediction confidence(probably some number for confidence)
Screenshots
Please find the screenshot

Desktop (please complete the following information):

OS: MacOs

Training multiple models corrupting all the existing models

First trained a model A, then predicted based on some independent variables working great
So trained another model B, then predicted based on some independent variables working great
Now since it is working great started training two model C and D both at a time( D with few seconds later C)
It is stopping training of C, D and corrupting A and B
Now I can not use A and B to predict anything, nothing left

Attached screen shot of the error
Reproducible: True

mindsdb / mindsdb Goto Github PK

mindsdb's Introduction

📖 About us

🚀 Get Started

💡 Examples

AI Workflow Automation

AI System Deployment

🤝 Contribute

🤍 Support

💚 Current contributors

🔔 Subscribe to updates

⚖️ License

mindsdb's People

Contributors

Stargazers

Watchers

Forkers

mindsdb's Issues

Reproduce:

Solution:

Reproduce:

Recommend Projects

Recommend Topics

Recommend Org