In this repository, an example of reinforcement learning ia prepared.
The function being optimized is 1/(3+(x-1)(x-1)) for x<=1.5 and 1/(1+(x-3)(x-3)) for x>1.5
The policy is a Gausian probability function with fixed standard deviation and the average is theta[0]*s+theta[1].