Light

jackkelly / neuralnilm_prototype Goto Github PK

View Code? Open in Web Editor NEW

50.0 11.0 22.0 16.72 MB

License: MIT License

Python 26.61% Jupyter Notebook 73.39%

nilm energy-disaggregation energy neural-networks deep-learning

neuralnilm_prototype's Introduction

Neural NILM Prototype

Early prototype for the Neural NILM (non-intrusive load monitoring) software. This software will be completely re-written as the Neural NILM project.

This is the software that was used to run the experiments for our Neural NILM paper.

Note that Neural NILM Prototype is completely unsupported and is a bit of a mess!

If you really want to re-implement my Neural NILM ideas, then I recommend that you start from scratch using a modern DeepLearning framework like TensorFlow. Honestly, it will be easier in the long run!

Directories:

neuralnilm contains re-usable library code
scripts contains runnable experiments
notebooks contains IPython Notebooks (mostly for testing stuff out)

The script which specified the experiments I ran in my paper is e567.py.

(It's a pretty horrible bit of code! Written in a rush!) In that script, you can see the SEQ_LENGTH for each appliance and the N_SEQ_PER_BATCH (the number of training examples per batch). Basically, the sequence length varied from 128 (for the kettle) up to 1536 (for the dish washer). And the number of sequences per batch was usually 64, although I had to reduce that to 16 for the RNN for the longer sequences.

The nets took a long time to train (I don't remember exactly how long but it was of the order of about one day per net per appliance). You can see exactly how long I trained each net in that e567.py script (look at the def net_dict_<architecture> functions and look for epochs.... that's the number of batches (not epochs!) given to the net during training). It's 300,000 for the rectangles net, 100,000 for the AE and 10,000 for the RNN (because the RNN was a lot slower to train... I chose these numbers because the nets appeared to stop learning after this number of training iterations).

neuralnilm_prototype's People

Contributors

Stargazers

Watchers

neuralnilm_prototype's Issues

Train from multiple datasets at different sample rates

Using hierarchical subsampling?
"Manual" way of doing it: Get time of day distributions from large but low sample rate datasets, use that to generate infinite amounts of training data by combining tracebase waveforms

Could try to build single system which can cope with many different sample rates from kHz to hourly data. Top few layers are common (and hence, if they are recurrent, might need to output at a sample rate which is the lowest common sample rate across all datasets, say 2 minutes) . But replace bottom layers for different sample rates. I suppose this is a form of ‘transfer learning’ except I want bi-directional transfer. Maybe we need to concurrently train multiple parallel lower layers (each for a different ‘type’ of input, e.g. kHz, 1 second, 10 second, 1 minute, 2 minute, 15 minute, with or without reactive power etc) all connected to a single upper layer. Training would be interleaved. This would perhaps force the upper layer to learn the common properties. NIPS 2014 paper on multimodal deep learning

Try quantising input data OR using linspace to init weights

See if this means we can get performance like e59a, but without one or both of the first dense layers.

Also try using linspace to init weights of first layer. And try visualisation to see what those neurons are doing (#15)

NILM Metrics

Compute NILM metrics on every validation cycle
Create benchmarks using FHMM and CO and Hart on, say, 1 month of data. Use this to compare against NeuralNILM

Pipeline for processing inputs and outputs

modify Source superclass so it can take a input_preprocessing=[] argument
Each Preprocessing class has
- input : the data source
- get_output_shape() for method to allow Source to figure out the final output shape
- get_output()
pass input data through this pipeline, and modify n_inputs and seq_length.

curriculum training

Train first on synthetic data where we just have, say, 500 timesteps, half of which are the TV. Mix with a combination of a small number of other appliances and nothing. Ignore time-of-day. Then, once these weights are trained, maybe add in time-of-day, possibly feeding into an upper layer.

Fix NaNs during training on 5-appliances

Ideas:

MultiModel disaggregation

Dr Alexander Gluhak (from Intel ICRI) is interested in multi-model disaggregation (e.g. using water and energy). Maybe it would be fairly simple to extend a neural net to use multi-model data?
Use weather data from my house
AMPds has water data, I think. Some other datasets might, too.

Build awesome disag algo into a web service

Once I have finished building an awesome disaggregation algo, release it as a really simple web service so others can easily benchmark against it.

give the net two inputs: raw power and fdiff

also:

have both inputs in range [-1, 1](try different ranges)
Centre mains power

Regularization

weight noise
input noise
dropout

train net on many appliances and then extract the ‘knowledge’ for a single appliance?

probably not possible?

Centre input array when using pad_input

Save and load learnt params to disk

protects again graphic machine being rebooted
related: #15

Output both power demand and classification

Output power demand and also information about the appliance. Could build a hierarchical appliance classification system. Related: #13

Use two networks: one outputs boolean for if the appliance is on or off

Use two networks: one outputs boolean for if the appliance is on or off. The other takes two inputs: the aggregate power demand; and whether the appliance is on or off; and outputs the disag power demand. Try doing this as two separate networks (trained separately) and also try as a single network (where we have a hidden layer which gives the boolean on/off signal)

Pre train on unlabelled data (e.g from xively)

by getting it to predict the next value

More FF layers at the top of the network

NILM formulated as a sequence-to-sequence problem where the output is Fridge=[12:30-12:50, etc]

Layer wise pre-training

Supervised pre-training
Unsupervised using auto encoder

Related:

#3

Inputs and representation

Representation:

quantized
- one-hot
- many-hot
- signed (if using fdiff)

Inputs:

More realistic data

use more appliances from house 1
appliances from other houses
train on one set of houses, test on another

Comparisons for paper

compare shallow net using my hand engineered feature detectors as inputs vs deep net without hand engineered detectors. Could try getting a net to just reproduce each of my feature detectors.
Compare RNN vs convnet.

Visualise what each neuron is interested in

a static visualisation which shows the ‘receptive field’ of each neuron
dynamic visualisation which shows which neurons are on at different time steps in processing an input. Use a matplotlib slider to allow users to scrub back/forth. See this Lasagne issue for some hints about visualisation.

1D convnet input

one network per appliance (multiple experts) vs one network for all classes of appliances

single-directional HSLSTM

Wait until appliance has finished its run, then output the total energy usage for that appliance (in one go). Or wait until end of sequence (but that requires long memory).

move Source and RealApplianceSource classes into NILMTK

Also need to modify NILMTK so we can do training and disag using simple dataframes.

Whole chapter could be on unsupervised training of auto encoders

Shrink down to a hidden layer with K units, where K is the number of appliances.
But how to do RNN auto encoder? Generate sequence after a delay? Generate k-length window given a k-length window of data?

Related:

#3

Deep Latent Gaussian Models (DLGMs)

As per the Deep Mind tutorial on 18th Feb. Produces probabilistic output. Handles missing data very well. Both things we need!

http://arxiv.org/abs/1401.4082

'Skip' connections

Mixture density network

Try getting the net to parameterise a mixture of Gaussians for probabilistic output. Maybe also output appliance estimates as proportion of total energy

logging

Or maybe output a CSV file with train cost, validation cost, NILM metrics, secs per epoch, train weights etc. Start log with config of the network, then each training loop

Automate running multiple models

Each directory would be like this:

e92
- e92.py (define experiment)
- e92.h5 (costs, metrics, network weights etc)
- e92_costs.png (multiple subplots: cross entropy, MSE, NILM metrics)
- e92_estimates_1250epochs_3.png
Read .py scripts (one for each experiment) from a directory
Run each script in sequence.
- Catch exceptions and log them
Need a better way to set max_appliance_power, on_duration, off_duration etc. Maybe these should go into the NILMTK metadata. Although, to start with, maybe just use a function to set the metadata manually from the experiment script. I'm also starting to think that we should use real aggregate data now. Our synthetic data doesn't include, for example, that the fridge turns on multiple times.
Output results from each experiment to an HDF5 file:
- training costs
- validation costs (need a standardised validation timeseries, make sure it has a section which is 'easy' i.e. just the appliances on their own)
- NILM validation costs
- network weights. Both for analysis / visualisation later and also to allow training to be restarted (if power is lost) and also to use the net.

Truncated back prop

Use small sequences (100). See page 9 in Graves 2014.

This looks like a good tutorial on backprop through time (BPTT) which I think is what Graves uses? The tutorial has a brief mention of truncating the gradient, although I think Graves does it in a different way.

Janczak. Identification of Nonlinear Systems Using Neural Networks and Polynomial Models. 2005 might also have some useful info

Does skaee mention this in his code or papers?

Does Graves do it on "micro batches" of 100 or does he do a sliding window? Reread Graves 2014. Check his book. Check the PhD thesis he cites. Check his code.

Perhaps it's as simple as splitting data into batches and have an option to allow activations to persist between sequences or between batches.

When target is single appliance, validation data must include that appliance!

Batch normalisation

http://arxiv.org/abs/1502.03167

Dealing with missing samples and different sample rates

maybe just set the network up so it gets a sample every second. Missing samples we run the network but with no input (i.e. just recurrent inputs)
or we give a ‘time index’ input. This might be handled as a special input, somehow directly influencing the LSTM blocks???

modify cost to severely penalise during training if the sum of the estimates is greater than the input

Should I take the average of the training costs for plotting?

Ensemble

http://en.wikipedia.org/wiki/Ensemble_learning

force the outputs to sum to no more than the aggregate power demand at each time slice

Just as soft max allows outputs to form a categorical distribution, maybe there is some way to force the outputs to sum to no more than the aggregate power demand at each time slice. If the network outputs proportion of energy per appliance then perhaps we can just use soft max. But what about unrecognised appliances? And can we still use density mixture models? And maybe getting it to estimate proportion won't work well if we use high frequency voltage data. Also means that the output for every appliance would have to change when any appliance changes state, which seems like a bad idea

Try getting net to map just from fdiff to quantized fdiff

Automatic architecture search

Vary one parameter (keeping others constant), pick best, vary next parameter, pick best, go round in loop

Validation data should use different activations

Check Source inits array to zeros in gen_data

Clockwork RNNs

A Clockwork RNN. Koutnik, Greff, Gomez & Schmidhuber 2014. Could be good for multiple datasets at different temporal resolution (#7)

Gradient clipping

Graves does this. See Graves 2014. I think skaae has implemented this in his code.

Target representation

energy usage for entire activation
rolling average
fdiff
raw power demand
quantized raw power demand / fdiff
classification:
- on / off binary
- multiple discrete states per appliance. Perhaps these could be found by a separate net whose aim is to reconstruct the ground truth from the ground truth

Related: #19

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.