Code Monkey home page Code Monkey logo

neuralnilm_prototype's Introduction

Neural NILM Prototype

Early prototype for the Neural NILM (non-intrusive load monitoring) software. This software will be completely re-written as the Neural NILM project.

This is the software that was used to run the experiments for our Neural NILM paper.

Note that Neural NILM Prototype is completely unsupported and is a bit of a mess!

If you really want to re-implement my Neural NILM ideas, then I recommend that you start from scratch using a modern DeepLearning framework like TensorFlow. Honestly, it will be easier in the long run!

Directories:

  • neuralnilm contains re-usable library code
  • scripts contains runnable experiments
  • notebooks contains IPython Notebooks (mostly for testing stuff out)

The script which specified the experiments I ran in my paper is e567.py.

(It's a pretty horrible bit of code! Written in a rush!) In that script, you can see the SEQ_LENGTH for each appliance and the N_SEQ_PER_BATCH (the number of training examples per batch). Basically, the sequence length varied from 128 (for the kettle) up to 1536 (for the dish washer). And the number of sequences per batch was usually 64, although I had to reduce that to 16 for the RNN for the longer sequences.

The nets took a long time to train (I don't remember exactly how long but it was of the order of about one day per net per appliance). You can see exactly how long I trained each net in that e567.py script (look at the def net_dict_<architecture> functions and look for epochs.... that's the number of batches (not epochs!) given to the net during training). It's 300,000 for the rectangles net, 100,000 for the AE and 10,000 for the RNN (because the RNN was a lot slower to train... I chose these numbers because the nets appeared to stop learning after this number of training iterations).

neuralnilm_prototype's People

Contributors

jackkelly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neuralnilm_prototype's Issues

Train from multiple datasets at different sample rates

  • Using hierarchical subsampling?
  • "Manual" way of doing it: Get time of day distributions from large but low sample rate datasets, use that to generate infinite amounts of training data by combining tracebase waveforms

Could try to build single system which can cope with many different sample rates from kHz to hourly data. Top few layers are common (and hence, if they are recurrent, might need to output at a sample rate which is the lowest common sample rate across all datasets, say 2 minutes) . But replace bottom layers for different sample rates. I suppose this is a form of ‘transfer learning’ except I want bi-directional transfer. Maybe we need to concurrently train multiple parallel lower layers (each for a different ‘type’ of input, e.g. kHz, 1 second, 10 second, 1 minute, 2 minute, 15 minute, with or without reactive power etc) all connected to a single upper layer. Training would be interleaved. This would perhaps force the upper layer to learn the common properties. NIPS 2014 paper on multimodal deep learning

NILM Metrics

  • Compute NILM metrics on every validation cycle
  • Create benchmarks using FHMM and CO and Hart on, say, 1 month of data. Use this to compare against NeuralNILM

Pipeline for processing inputs and outputs

  • modify Source superclass so it can take a input_preprocessing=[] argument
  • Each Preprocessing class has
    • input : the data source
    • get_output_shape() for method to allow Source to figure out the final output shape
    • get_output()
  • pass input data through this pipeline, and modify n_inputs and seq_length.

curriculum training

Train first on synthetic data where we just have, say, 500 timesteps, half of which are the TV. Mix with a combination of a small number of other appliances and nothing. Ignore time-of-day. Then, once these weights are trained, maybe add in time-of-day, possibly feeding into an upper layer.

Fix NaNs during training on 5-appliances

Ideas:

  • Find a minimal example which still fails (something that trains fast)
  • Is it:
    • The new nntools code (check by re-running e82)
    • dodgy data (check using old_X and old_y)
    • initialisation?
    • learning rate?
    • different learning algorithm?
    • try concat layer (see this comment)
    • Different cost function.
    • doesn't like producing 5 outputs?
      try single-appliance experts.
    • smaller seq length?
    • smaller batch size?
    • doesn't like range of targets (maybe shouldn't get right to 1?)
    • quantized inputs
    • do we get NaNs if we remove the Conv1D layers?
    • do we get NaNs if we swap from BLSTM to LSTM?
    • do we get NaNs if we remove (B)LSTM layers?
  • Hacks:
    • Save, say, last 5 training examples. When we hit a bad patch, examine these examples.
    • Save network weights. If we get a bad bit of training then revert network to, say, 5 steps ago.

MultiModel disaggregation

  • Dr Alexander Gluhak (from Intel ICRI) is interested in multi-model disaggregation (e.g. using water and energy). Maybe it would be fairly simple to extend a neural net to use multi-model data?
  • Use weather data from my house
  • AMPds has water data, I think. Some other datasets might, too.

Use two networks: one outputs boolean for if the appliance is on or off

Use two networks: one outputs boolean for if the appliance is on or off. The other takes two inputs: the aggregate power demand; and whether the appliance is on or off; and outputs the disag power demand. Try doing this as two separate networks (trained separately) and also try as a single network (where we have a hidden layer which gives the boolean on/off signal)

Inputs and representation

Representation:

  • quantized
    • one-hot
    • many-hot
    • signed (if using fdiff)

Inputs:

  • fdiff
  • two inputs: t and t+1 (so network can do diff on its own)
  • time of day
  • day of week / business day
  • time of year
  • country
  • weather (either from dataset or from met office)

More realistic data

  • use more appliances from house 1
  • appliances from other houses
  • train on one set of houses, test on another

Comparisons for paper

  • compare shallow net using my hand engineered feature detectors as inputs vs deep net without hand engineered detectors. Could try getting a net to just reproduce each of my feature detectors.
  • Compare RNN vs convnet.

1D convnet input

  • try sigmoid activation for Conv1D layer
  • Try ReLU (it appears that I can't get ReLU to work in any place in the network! I've tried in the Conv1DLayer and the DenseLayers)
  • convnet (stride 1), then LSTM
  • Try using input in range from minus 1 to plus 1
  • convnet (stride 1), convnet (stride 1), then LSTM
  • Try convnet (stride 1), then fully connected, then LSTM.
  • layerwise pretraining for the ConvNet layers (autoencoder) #4
  • Try larger range
  • Read about initialisation for rectifier units
  • Try with no LSTM layer and boolean targets.
  • Convnets should be easy to visualise
  • Try a purely feed-forward network using 1D convolutions and pooling. Zhang & LeCunn Text understanding from Scratch (2015) is a fascinating example of this working well. Also see Lecun 1998.
  • Use linear space to init weights and bias for bottom layer?
  • #17

single-directional HSLSTM

Wait until appliance has finished its run, then output the total energy usage for that appliance (in one go). Or wait until end of sequence (but that requires long memory).

Mixture density network

Try getting the net to parameterise a mixture of Gaussians for probabilistic output. Maybe also output appliance estimates as proportion of total energy

logging

Or maybe output a CSV file with train cost, validation cost, NILM metrics, secs per epoch, train weights etc. Start log with config of the network, then each training loop

Automate running multiple models

Each directory would be like this:

  • e92
    • e92.py (define experiment)
    • e92.h5 (costs, metrics, network weights etc)
    • e92_costs.png (multiple subplots: cross entropy, MSE, NILM metrics)
    • e92_estimates_1250epochs_3.png
  • Read .py scripts (one for each experiment) from a directory
  • Run each script in sequence.
    • Catch exceptions and log them
  • Need a better way to set max_appliance_power, on_duration, off_duration etc. Maybe these should go into the NILMTK metadata. Although, to start with, maybe just use a function to set the metadata manually from the experiment script. I'm also starting to think that we should use real aggregate data now. Our synthetic data doesn't include, for example, that the fridge turns on multiple times.
  • Output results from each experiment to an HDF5 file:
    • training costs
    • validation costs (need a standardised validation timeseries, make sure it has a section which is 'easy' i.e. just the appliances on their own)
    • NILM validation costs
    • network weights. Both for analysis / visualisation later and also to allow training to be restarted (if power is lost) and also to use the net.

Truncated back prop

Use small sequences (100). See page 9 in Graves 2014.

This looks like a good tutorial on backprop through time (BPTT) which I think is what Graves uses? The tutorial has a brief mention of truncating the gradient, although I think Graves does it in a different way.

Janczak. Identification of Nonlinear Systems Using Neural Networks and Polynomial Models. 2005 might also have some useful info

Does skaee mention this in his code or papers?

Does Graves do it on "micro batches" of 100 or does he do a sliding window? Reread Graves 2014. Check his book. Check the PhD thesis he cites. Check his code.

Perhaps it's as simple as splitting data into batches and have an option to allow activations to persist between sequences or between batches.

Dealing with missing samples and different sample rates

  • maybe just set the network up so it gets a sample every second. Missing samples we run the network but with no input (i.e. just recurrent inputs)
  • or we give a ‘time index’ input. This might be handled as a special input, somehow directly influencing the LSTM blocks???

force the outputs to sum to no more than the aggregate power demand at each time slice

Just as soft max allows outputs to form a categorical distribution, maybe there is some way to force the outputs to sum to no more than the aggregate power demand at each time slice. If the network outputs proportion of energy per appliance then perhaps we can just use soft max. But what about unrecognised appliances? And can we still use density mixture models? And maybe getting it to estimate proportion won't work well if we use high frequency voltage data. Also means that the output for every appliance would have to change when any appliance changes state, which seems like a bad idea

Gradient clipping

Graves does this. See Graves 2014. I think skaae has implemented this in his code.

Target representation

  • energy usage for entire activation
  • rolling average
  • fdiff
  • raw power demand
  • quantized raw power demand / fdiff
  • classification:
    • on / off binary
    • multiple discrete states per appliance. Perhaps these could be found by a separate net whose aim is to reconstruct the ground truth from the ground truth

Related: #19

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.