Input Boosting
Why
In the context of Support Vector Machines it's interesting to see how adding some features to the dataset can drastically improve the results of the optimization. Sometimes it's a kernel alternative.
How
Boosting is a simple operation:
- extract one data point with all his features t0 |
t0=[x0,x1,...,xn]
- choose a function f |
f=(x0,x1,...,xn)
dependent on one feature or more whose output will be a set of features t=[y0,...,yt] with 1<=|t|<=|t0|
- replace the data point with the previous features, t0, and the ones you just boosted, t, with function f
Example
// initial dataset
let dataset = [ [1,2], [2,3], [-2,-4], [-1,0] ];
// boosting function
let f = (v) => v.push( Math.pow(v[0],2) );
// boosting the dataset
dataset.foreach( data => f(data));
// output
// dataset =[ [1,2,1], [2,3,4], [-2,-4,4], [-1,0,1] ];
Where
The boosting operation must no reside inside one machine learning module. They have to be included in one ES6 module on their own, in order to achieve separation, modulation and cleaner code.
Example:
export const booster = function(){};
booster.prototype = {
boost: function(data,f){
...
},
getBoostingFunctions: function(){
...
},
// list of boosting functions
};
When
There are 2 distinct moments when boosting functions need to be known:
- Training phase:
manager
module would feed the training phase with the boosted data whenever we tell him
- Prediction phase: it's a bit tricky this time. Prediction is not
manager
's role. It's handled by the drawer
prototype function.
As we can see, most of the work of managing data is done by the manager
class, except from prediction.
A quick fix would be to move the responsibility of the prediction from drawer
to manager
. The only thing drawer
class has to do to receive the predicted value is to sent a "request" to the manager
. The latter will calculate the prediction, boosting the data if needed (he knows the boosting functions), and will return the value.
Who
The class that controls the datasets should also control when to boost them.
Manager
prototype allow us to easly set new data into algorithm classes.
In this context should be reasonable to specify another option: the boosting.
If boosting is selected then manager
applies the boosting functions selected to the data. The algorithm will be fed with the boosted data and they will be agnostic on the boosting operation.
But not only the training phase should be checked: prediction phase needs to know which boosting functions to apply, if there's anyone.