robmarkcole / hass-data-detective Goto Github PK

View Code? Open in Web Editor NEW

183.0 10.0 34.0 10.04 MB

Explore and analyse your Home Assistant data

Home Page: https://data.home-assistant.io/

License: MIT License

Python 61.14% Jupyter Notebook 36.60% Dockerfile 2.26%

home-assistant data data-science home home-automation

hass-data-detective's Introduction

Introduction

The HASS-data-detective package helps you explore and analyse the data in your Home Assistant) database. If you are running HASS-data-detective on a machine with Home Assistant installed, it will automatically discover your database and by default collect information about the entities in your database. See the notebooks directory for examples of using the detective package. If you are on a Raspberry pi you should use the official JupyterLab add-on which includes HASS-data-detective

Installation on your machine

You can either: pip install HASS-data-detective for the latest released version from pypi, or pip install git+https://github.com/robmarkcole/HASS-data-detective.git --upgrade for the bleeding edge version from github. Note that due to the matplotlib dependency, libfreetype6-dev is a requirement on aarch64 platforms (i.e. RPi).

Which version to install?

The 3.0 version from pypi requires the existence of a states_meta table which is not present in older Home Assistant databases. If you get the error (sqlite3.OperationalError) no such table: states_meta then you should install the earlier release with pip install HASS-data-detective==2.6

Run with Docker locally

You can use the detective package within a Docker container so there is no need to install anything on your machine (assuming you already have docker installed). Note this will pull the jupyter/scipy-notebook docker image the first time you run it, but subsequent runs will be much faster. Note there is no image available for Raspberry pi.

From the root directory of this repo run:

docker run --rm -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes -v "$PWD":/home/jovyan/work jupyter/scipy-notebook

Follow the URL printed to the terminal, which opens a Jupyter lab instance. Open a new terminal in Jupyter lab and navigate to the work directory containing setup.py, then run:

~/work$ pip install .

You can now navigate to the notebooks directory and start using the detective package. Note that you can install any package you want from pypi, but they will not persist on restarting the container.

Try out detective online

You can try out the latest version of detective from pypi without installing anything. If you click on the 'launch binder' button above, detective will be started in a Docker container online using the Binderhub service. Run the example notebook to explore detective, and use the Upload button to upload your own home-assistant_v2.db database file for analysis. Note that all data is deleted when the container closes down, so this service is just for trying out detective.

Development (VScode)

Create a venv: python3 -m venv venv
Activate venv: source venv/bin/activate
Install requirements: pip3 install -r requirements.txt
Install detective in development mode: pip3 install -e .
Install Jupyterlab to run the notebooks: pip3 install jupyterlab
Open the notebook at notebooks/Getting started with detective.ipynb

Running tests

Install dependencies: pip3 install -r requirements_test.txt
Run: pytest tests

Contributors

Big thanks to @balloob and @frenck, checkout their profiles!

hass-data-detective's People

Contributors

Stargazers

Watchers

hass-data-detective's Issues

Make correlations sensible

The calculated correlations can be 1.0 or -1.0 which must be due to insufficient data. Remove correlations with insufficient data.

Initialisation

Change initialisation to return true/false that the connection to the database is OK

Treat boolean sensors as 0/1 or False/True

Currently boolean sensors are treated as False/True bool objects. Consider if it is preferable to use 0/1.

Write tests and release 0.6

Add tests to get coverage up and release 0.6

Also query events table

Currently detective is only querying the states table but we also want to query the events table. Need to decide how we will aggregate both datasets

Add get entities

Add method to get all entities as a dict by entity type

Create sensor class

Create a super sensor class since attributes such as entities are common. With more experience the form of this super class should become more obvious.

Install in HASS virtual env?

Not knowing the dependencies and interactions, should this be installed in the same virtual env that HASS has been installed in, or a new/seperate one, or just global?

Given it gets installed as a HASS.IO addon, I'm tending the former.... #

Print some info about db on connection

On connection to a db, print out some info (not any auth) to avoid confusion where a use might be connecting to a recorder db

ModuleNotFoundError: No module named 'psycopg2'

Running a external postgresql db.

ModuleNotFoundError: No module named 'psycopg2'

Investigate Dask

For databases/dataframes that won't fit in memory, investigate Dask

http://docs.dask.org/en/latest/dataframe.html

Binder requires data

For binder usage notebook, wget a db file

Consider separate class for analysis

Currently the parser class also includes plotting. Consider creating a separate class for plotting and analysis.

Warn about `mysql://`

We want users to use mysql+pymysql:// because SQLAlchemy defaults with mysql to a package that does not support Python 3.

We should print a warning if we detect this that people should manually configure it. Maybe if we catch an ImportError ?

Add a describe method

Add a describe method which prints out general info about the database - size, number of entities, how far back the data goes

plot_sensor() not plotting all data

Appears that plot_sensor() is only plotting a limited amount of data compared to what is shown in the prophet plot. Could be related to timestamps which also look wrong.

Error when using a MySQL db

0.83.3 - External MySQL db using history.

Jupyter Error:


Successfully connected to database
Error with query: 
            SELECT entity_id, COUNT(*)
            FROM states
            GROUP BY entity_id
            ORDER by 2 DESC
            
Connection error, check your URL
---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1192                         parameters,
-> 1193                         context)
   1194         except BaseException as e:

/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/default.py in do_execute(self, cursor, statement, parameters, context)
    508     def do_execute(self, cursor, statement, parameters, context=None):
--> 509         cursor.execute(statement, parameters)
    510 

OperationalError: no such table: states

The above exception was the direct cause of the following exception:

OperationalError                          Traceback (most recent call last)
<ipython-input-1-b0a168cdf11c> in <module>
      1 from detective.core import db_from_hass_config
----> 2 db = db_from_hass_config()

/usr/local/lib/python3.6/dist-packages/detective/core.py in db_from_hass_config(path, **kwargs)
     16 
     17     url = config.db_url_from_hass_config(path)
---> 18     return HassDatabase(url, **kwargs)
     19 
     20 

/usr/local/lib/python3.6/dist-packages/detective/core.py in __init__(self, url, fetch_entities)
     39             print("Successfully connected to database")
     40             if fetch_entities:
---> 41                 self.fetch_entities()
     42         except:
     43             print("Connection error, check your URL")

/usr/local/lib/python3.6/dist-packages/detective/core.py in fetch_entities(self)
     62             """
     63             )
---> 64         response = self.perform_query(query)
     65         entities = [e[0] for e in list(response)]
     66         print("There are {} entities with data".format(len(entities)))

/usr/local/lib/python3.6/dist-packages/detective/core.py in perform_query(self, query)
     47         """Perform a query, where query is a string."""
     48         try:
---> 49             return self.engine.execute(query)
     50         except:
     51             print("Error with query: {}".format(query))

/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py in execute(self, statement, *multiparams, **params)
   2073 
   2074         connection = self.contextual_connect(close_with_result=True)
-> 2075         return connection.execute(statement, *multiparams, **params)
   2076 
   2077     def scalar(self, statement, *multiparams, **params):

/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py in execute(self, object, *multiparams, **params)
    946             raise exc.ObjectNotExecutableError(object)
    947         else:
--> 948             return meth(self, multiparams, params)
    949 
    950     def _execute_function(self, func, multiparams, params):

/usr/local/lib/python3.6/dist-packages/sqlalchemy/sql/elements.py in _execute_on_connection(self, connection, multiparams, params)
    267     def _execute_on_connection(self, connection, multiparams, params):
    268         if self.supports_execution:
--> 269             return connection._execute_clauseelement(self, multiparams, params)
    270         else:
    271             raise exc.ObjectNotExecutableError(self)

/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py in _execute_clauseelement(self, elem, multiparams, params)
   1058             compiled_sql,
   1059             distilled_params,
-> 1060             compiled_sql, distilled_params
   1061         )
   1062         if self._has_events or self.engine._has_events:

/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1198                 parameters,
   1199                 cursor,
-> 1200                 context)
   1201 
   1202         if self._has_events or self.engine._has_events:

/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
   1411                 util.raise_from_cause(
   1412                     sqlalchemy_exception,
-> 1413                     exc_info
   1414                 )
   1415             else:

/usr/local/lib/python3.6/dist-packages/sqlalchemy/util/compat.py in raise_from_cause(exception, exc_info)
    263     exc_type, exc_value, exc_tb = exc_info
    264     cause = exc_value if exc_value is not exception else None
--> 265     reraise(type(exception), exception, tb=exc_tb, cause=cause)
    266 
    267 if py3k:

/usr/local/lib/python3.6/dist-packages/sqlalchemy/util/compat.py in reraise(tp, value, tb, cause)
    246             value.__cause__ = cause
    247         if value.__traceback__ is not tb:
--> 248             raise value.with_traceback(tb)
    249         raise value
    250 

/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1191                         statement,
   1192                         parameters,
-> 1193                         context)
   1194         except BaseException as e:
   1195             self._handle_dbapi_exception(

/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/default.py in do_execute(self, cursor, statement, parameters, context)
    507 
    508     def do_execute(self, cursor, statement, parameters, context=None):
--> 509         cursor.execute(statement, parameters)
    510 
    511     def do_execute_no_params(self, cursor, statement, context=None):

OperationalError: (sqlite3.OperationalError) no such table: states [SQL: '\n            SELECT entity_id, COUNT(*)\n            FROM states\n            GROUP BY entity_id\n            ORDER by 2 DESC\n            '] (Background on this error at: http://sqlalche.me/e/e3q8)

HA addon log:


2018/12/14 15:01:12 [error] 607#607: *4 connect() failed (111: Connection refused) while connecting to upstream, client: 172.30.32.1, server: hassio.local, request: "GET /api/sessions?1544817671254 HTTP/1.1", upstream: "http://127.0.0.1:28459/api/sessions?1544817671254", host: "xxxxxx.duckdns.org", referrer: "https://xxxxxx.duckdns.org/lab?"
2018/12/14 15:01:12 [error] 607#607: *5 connect() failed (111: Connection refused) while connecting to upstream, client: 172.30.32.1, server: hassio.local, request: "GET /api/terminals?1544817671256 HTTP/1.1", upstream: "http://127.0.0.1:28459/api/terminals?1544817671256", host: "xxxxxx.duckdns.org", referrer: "https://xxxxxx.duckdns.org/lab?"
[W 15:01:15.262 LabApp] All authentication is disabled.  Anyone who can connect to this server will be able to run code.

Tryout Calmap

No scipy dependencies, looks nice https://github.com/martijnvermaat/calmap

ValueError: Array must be all same time zone

Hi, I'm starting to play around with your example hass-detective notebook in my own environment. I've pulled the master branch and made a few changes to your notebook code to pick up what's in master. When I try to use the NumericalSensors class I get a ValueError exception.

sensors_num = detective.NumericalSensors(parser.master_df)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in _convert_listlike(arg, box, format, name, tz)
    302             try:
--> 303                 values, tz = tslib.datetime_to_datetime64(arg)
    304                 return DatetimeIndex._simple_new(values, name=name, tz=tz)

pandas/_libs/tslib.pyx in pandas._libs.tslib.datetime_to_datetime64()

ValueError: Array must be all same time zone

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-5-8918ff9e5054> in <module>()
----> 1 sensors_num = detective.NumericalSensors(parser.master_df)

~/work/HASS-data-detective/detective/core.py in __init__(self, master_df)
    183             index='last_changed', columns='entity', values='state')
    184 
--> 185         sensors_num_df.index = pd.to_datetime(sensors_num_df.index)
    186         sensors_num_df.index = sensors_num_df.index.tz_localize(None)

We hit Daylight Savings time a few weeks ago so I there's a mix of UTC-5 and UTC-6 in my database. Not sure what the right solution is...converting to unix time or, how I managed to fix the problem, is to modify NumericalSensors's __init__ to read:

        sensors_num_df.index = pd.to_datetime(sensors_num_df.index, utc=True)
        sensors_num_df.index = sensors_num_df.index.tz_localize(None)

Add write-to-csv method

Enable writing the parsed data to csv, so it can be shared on kaggle or similar

Setup CI

Appears there is a migration from travis-ci.org to something else, confusing how to set this up

Allow publishing of data via Quilt

Advanced users only.

Can publish db or dataframes on https://quiltdata.com/
Can then use the data in hosted envs https://github.com/binder-examples/data-quilt
Also for teams T4 looks very interesting https://blog.quiltdata.com/using-quilts-t4-to-manage-an-s3-hosted-data-science-project-7332fc463e89
Jupyterlab quilt extension https://github.com/quiltdata/jupyterlab

Add ability to pass in date range to a plot

Whilst we don't want to create a new plotting API, the current plot() functions are very limited. At least add ability to pass in a date range

Add docs with gitbook

https://www.gitbook.com/?utm_source=content&utm_medium=trademark&utm_campaign=quiltdocs

Add to PyPi

Improve error message when we can't find hass config dir

In db_from_hass_config, when config.find_hass_config() raises a ValueError, we should print instructions on how a user should manually create a db instance from their DB url.

HASS-data-detective/detective/core.py

Line 17 in a2f0922

path = config.find_hass_config()

Try out Pandas-Bokeh

Bokeh plots without the hassle of learning bokeh? Worth a look

https://github.com/PatrikHlobil/Pandas-Bokeh

Add anonymise and export data

It would be nice to get people sharing their datasets, but they will want to anonymise them first. Add this ability

Modin

Checkout https://github.com/modin-project/modin for speed improvements

Set sensible size for requested data

Add a variable to limit the size of data returned from the db

Use Black

As title

MemoryException on NumericalSensors

Running on a rpi3

db.fetch_all_data()
numerical_sensors = NumericalSensors(db.master_df)

Raises a MemoryException

BinarySensor raises a SettingWithCopyWarning

Initialising an BinarySensor raises a SettingWithCopyWarning. The cause is (TBC) self._binary_df = binary_df.copy(). This isn't a worry, but might confuse users so find a way to address. Read this

Add export interesting data to csv

csv file is the common format for sharing tabular data. Add convenience for exporting interesting data, passing in a list of sensors or binary to export

Add db type as attribute

As pointed out by @balloob 'Most DBAPIs have built in support for the datetime module, with the noted exception of SQLite. In the case of SQLite, date and time types are stored as strings which are then converted back to datetime objects when rows are returned.' Detective should have the db type as an attribute, allowing for db type specific behaviour if required.

Drop SQL queries and use sqlalchemy exclusively

Sqlalchemy takes care of issues like handling datetime objects, so we could use if for queries rather than SQL which may return strings rather than python objects we care about

Make timestamps consistent

The formatting of timestamps varies with the different plot functions. Need to select a format and make it universal.

Tests

To do: create some tests.

db_url_from_hass_config to error out when no recorder section found and DB file doesn't exist

If we can't find a recorder section in config.db_url_from_hass_config, we return the SQLite DB. We should only return SQLLite if the file actually exists in the config dir. Else we should error out.

FileNotFoundError using !include

When running All Cells I get the following error

FileNotFoundError: [Errno 2] No such file or directory: '/config/includes/lights/includes/lights/lights.yaml'

The lights.yaml file is in /config/includes/lights so it seems to add the folder twice.

The relevant config.yaml part

automation: !include_dir_merge_list automation/
hue: !include includes/lights/hue.yaml
light: !include includes/lights/lights.yaml

I also got the message YAML tag !include_dir_merge_list is not supported but guess that's not an issue.

ValueError: Secrets file /config/includes/secrets.yaml not found

From the forum: When running the cells within the ‘GETTING STARTED’ workbook I recieve an error: ValueError: Secrets file /config/includes/secrets.yaml not found.
My ‘secrets.yaml’ file is directly within ‘/config/’ directory

Make graphs accessible and saveable

Would like to return the graph objects so people can add their own formatting such as changing the title etc. Plot functions should return the graph objects

Consider using the cookiecutter-data-science directory-structure

Consider adopting the structure below - would change imports however

http://drivendata.github.io/cookiecutter-data-science/#directory-structure

Data must be accessed correctly for datetime index

To return a dataframe with a datetime index it is necessary to access the data attributes with a list, such as:
sensors_binary.data[['binary_sensor.motion_at_home']]

If accessing without a list, a series with a non datetime index is returned:
sensors_binary.data['binary_sensor.motion_at_home']

Need to add checks that the data is accessed and returned correctly with datetime index.

doesn't accept include

Hello,
I installed Jupyter and homeassistant-Notebok, and when i try running the getting started i got the following error on my yaml file

YAML tag !include_dir_merge_named is not supported

Also it does add "config/" extra directory or what ever included ( example: " i use[switch: !include config/switches.yaml] if i have my switches.yaml file inside homeassistant/config/switches.yaml. then it creates homeassistant/config/config/switches so it won't find it)
I’m not using hass.io, i have home assistant running on ubuntu 18
Thanks

db.fetch_all_data()
binary_sensors = BinarySensors(db.master_df)

Returns:

---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-8-14f6c61226db> in <module>
      1 from detective.core import BinarySensors
----> 2 binary_sensors = BinarySensors(db.master_df)

/usr/local/lib/python3.6/dist-packages/detective/core.py in __init__(self, master_df)
    281         # Pivot
    282         binary_df = binary_df.pivot_table(
--> 283             index='last_changed', columns='entity', values='state')
    284 
    285         # Index to datetime

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in pivot_table(self, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name)
   5301                            aggfunc=aggfunc, fill_value=fill_value,
   5302                            margins=margins, dropna=dropna,
-> 5303                            margins_name=margins_name)
   5304 
   5305     def stack(self, level=-1, dropna=True):

/usr/local/lib/python3.6/dist-packages/pandas/core/reshape/pivot.py in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name)
     85     # if we have a categorical
     86     grouped = data.groupby(keys, observed=False)
---> 87     agged = grouped.agg(aggfunc)
     88     if dropna and isinstance(agged, ABCDataFrame) and len(agged.columns):
     89         agged = agged.dropna(how='all')

/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in aggregate(self, arg, *args, **kwargs)
   4654         axis=''))
   4655     def aggregate(self, arg, *args, **kwargs):
-> 4656         return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
   4657 
   4658     agg = aggregate

/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in aggregate(self, arg, *args, **kwargs)
   4085 
   4086         _level = kwargs.pop('_level', None)
-> 4087         result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
   4088         if how is None:
   4089             return result

/usr/local/lib/python3.6/dist-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
    346         if isinstance(arg, compat.string_types):
    347             return self._try_aggregate_string_function(arg, *args,
--> 348                                                        **kwargs), None
    349 
    350         if isinstance(arg, dict):

/usr/local/lib/python3.6/dist-packages/pandas/core/base.py in _try_aggregate_string_function(self, arg, *args, **kwargs)
    302         if f is not None:
    303             if callable(f):
--> 304                 return f(*args, **kwargs)
    305 
    306             # people may try to aggregate on a non-callable attribute

/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in mean(self, *args, **kwargs)
   1304         nv.validate_groupby_func('mean', args, kwargs, ['numeric_only'])
   1305         try:
-> 1306             return self._cython_agg_general('mean', **kwargs)
   1307         except GroupByError:
   1308             raise

/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in _cython_agg_general(self, how, alt, numeric_only, min_count)
   3970                             min_count=-1):
   3971         new_items, new_blocks = self._cython_agg_blocks(
-> 3972             how, alt=alt, numeric_only=numeric_only, min_count=min_count)
   3973         return self._wrap_agged_blocks(new_items, new_blocks)
   3974 

/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in _cython_agg_blocks(self, how, alt, numeric_only, min_count)
   4042 
   4043         if len(new_blocks) == 0:
-> 4044             raise DataError('No numeric types to aggregate')
   4045 
   4046         # reset the locs in the blocks to correspond to our

DataError: No numeric types to aggregate

robmarkcole / hass-data-detective Goto Github PK

hass-data-detective's Introduction

Introduction

Installation on your machine

Which version to install?

Run with Docker locally

Try out detective online

Development (VScode)

Running tests

Contributors

hass-data-detective's People

Contributors

Stargazers

Watchers

Forkers

hass-data-detective's Issues

Recommend Projects

Recommend Topics

Recommend Org