icb-dcm / pesto Goto Github PK
View Code? Open in Web Editor NEWPESTO: Parameter EStimation TOolbox, Bioinformatics, btx676, 2017.
Home Page: https://doi.org/10.1093/bioinformatics/btx676
License: BSD 3-Clause "New" or "Revised" License
PESTO: Parameter EStimation TOolbox, Bioinformatics, btx676, 2017.
Home Page: https://doi.org/10.1093/bioinformatics/btx676
License: BSD 3-Clause "New" or "Revised" License
It looks like there's a file missing for the GaussExample.
Currently only a few options are checked in PestoOptions.m
Pass PESTO options as class instances.
Therefore, create classes:
Should look something like this:
.obj_type ... type of objective function provided
= 'log-posterior' (default) ... algorithm assumes that
log-posterior or log-likelihood are provided and perfroms
a maximization of the objective function.
= 'negative log-posterior' ... algorithm assumes that negative
log-posterior or negative log-likelihood are provided and
perfroms a minimization of the objective function.
.comp_type ... type of computations
= 'sequential' (default) ... classical sequential (in core) method
= 'parallel' ... multi-core method exploiting parfor
.fmincon ... options for fmincon (the local optimizer)
.n_starts ... number of local optimizations (default = 20).
.init_threshold ... log-likelihood / log-posterior threshold for
initialization of optimization (default = -inf).
.proposal ... method used to propose starting points
= 'latin hypercube' (default) ... latin hypercube sampling
= 'uniform' ... uniform random sampling
= 'user-supplied' ... user supplied function parameters.init_fun
.rng ... initialization of random number generator (default = 0).
= any ral number r => random generator is initialized with r.
= [] ... random number generator is not initialized.
(Initializing the random number generator with a specific seed can be
helpful to reproduce problems.)
.mode ... output of algorithm
= 'visual' (default) ... plots are gnerated which show the progress
= 'text' ... optimization results for multi-start is printed on screen
= 'silent' ... no output during the multi-start local optimization
.fh ... handle of figure in which results are printed. If no
handle is provided, a new figure is used.
.plot_options ... plot options for plotMultiStarts.m.
.save ... determine whether results are directly saved
= false (default) ... results are not saved
= true ... results are stored to an extra folder
.trace ... determine whether objective function, parameter values and
computation time are stored over iterations
= false (default) ... not saved
= true ... stored in fields par_trace, fval_trace and time_trace
.tempsave ... determine whether intermediate results are stored every
10 iterations
= false (default) ... not saved
= true ... results are stored to an extra folder
.foldername ... name of the folder in which results are stored.
If no folder is provided, a random foldername is generated.
.start_index ... vector of indexes which starts should be performed.
default is 1:n_starts
.resetobjective ... clears the objective function before every
multi-start.
= false ... (default) persistent variables are preserved.
= true ... remove all temporary/persistent variables.
WHEN TRUE THIS OPTION REMOVES ALL OBJECTIVE FUNCTION BREAK POINTS
.optimizer ... specifies which optimizer should be used
= 'fmincon' ... (default) fmincon
= 'minibatch' ... uses a minibatch optimization approach
.optim_options ... struct with options for minibatch optimization
.isMinibatch ... logical: perform full batch or minibatch optim
= false ... deterministic optimization
= true ... minibatch optim, Obj function must be adapted
.nDatasets ... number of measurement points, only if isMinibatch
.nBatchdata ... Size of Minibatches, only if isMinibatch
.nOptimSteps ... number of maximum optimization steps
.model ... String with the model name for AMICI, may be left void
.method ... optimization method, to be chosen from
= 'standard' ... stochastic gradient descent, standard method
= 'momentum' ... sgd with momentum
= 'nesterov' ... sgd with Nesterov momentum function
= 'rmsprop' ... adaptive step size for each parameter
= 'rmspropnesterov' ... with additional momentum
= 'adam' ... adaptive method
= 'adadelta' ... adaptive method
.hyperparams ... struct containing the hyperparameters
(e.g. learning rate) for the opt-method, must fit with chosen
method (see documentation there)</pre>
.hold_on ... indicates whether plots are redrawn or whether something
is added to the plot
= 'false' (default) ... new plot
= 'true' ... extension of plot
.interval ... selection mechanism for x limits
= 'dynamic' (default) ... x limits depending on analysis results
= 'static' ... x limits depending on parameters.min and .max or on
user-defined bound options.bounds.min and .max. The later are
used if provided.
.bounds ... bounds used for visualization if options.interval = 'static'
.min ... lower bound
.max ... upper bound
.P ... options for profile plots
.plot_type ... plot type
= 0 (default) ... no plot
= 1 ... likelihood ratio
= 2 ... negative log-likelihood
.col ... color of profile lines (default: [1,0,0])
.lw ... line width of profile lines (default: 1.5)
.S ... options for sample plots
.plot_type ... plot type
= 0 (default if no samples are provided) ... no plot
= 1 (default if samples are provided) ... histogram
.col ... color of histogram (default: [0.7,0.7,0.7])
.bins ... number of histogram bins (default: 30)
= 'optimal' ... selection using Scott's rule
= 'conservative' ... selection using Scott's rule / 2
= N (with N being an integer) ... N bins
.MS ... options for multi-start optimization plots
.plot_type ... plot type
= 0 (default if no MS are provided) ... no plot
= 1 (default if MS are provided) ... likelihood ratio and
position of optima above threshold
= 2 ... negative log-likelihood and position of optima
above threshold
.col ... color of local optima (default: [1,0,0])
.lw ... line width of local optima (default: 1.5)
.A ... options for distribution approximation plots
.plot_type ... plot type
= 0 (default if no MS are provided) ... no plot
= 1 (default if MS are provided) ... likelihood ratio
= 2 ... negative log-likelihood
.col ... color of approximation lines (default: [0,0,1])
.lw ... line width of approximation lines (default: 1.5)
.boundary ... options for boundary visualization
.mark ... marking of profile points which are on the boundary
= 0 ... no visualization
= 1 (default) ... indicates points which ar close to the
boundaries in one or more dimensions.
.eps ... minimal distance from boundary for which points are
consider to e close do the boundary (default = 1e-4). Note
that a one-norm is used.
.CL ... options for confidence level plots
.plot_type ... plot type
= 0 (default) ... no plot
= 1 ... likelihood ratio
= 2 ... negative log-likelihood
.alpha ... visualized confidence level (default = 0.95)
.type ... type of confidence interval
= 'point-wise' (default) ... point-wise confidence interval
= 'simultanous' ... point-wise confidence interval
= {'point-wise','simultanous'} ... both
.col ... color of profile lines (default: [0,0,0])
.lw ... line width of profile lines (default: 1.5)
.op2D ... options used for 2D plot to position subplot axes.
.b1 ... offset from left and bottom border (default = 0.15)
.b2 ... offset from left and bottom border (default = 0.02)
.r ... relative width of subplots (default = 0.95)
.add_points ... option used to add additional points, e.g. true
parameter in the case of test examples
.par ... n x m matrix of m additional points
.col ... color used for additional points (default = [0,0.8,0]).
This can also be a m x 3 matrix of colors.
.ls ... line style (default = '--').
.lw ... line width (default = 2).
.m ... marker style (default = 'd').
.ms ... line width (default = 8).
.legend ... legend options
.color ... background color (default = 'none').
.box ... legend outine (default = 'on').
.orientation ... orientation of list (default = 'vertical')
They don't do anything but calling plotParameterUncertainty.m
Was there a specific motivation to have these extra functions?
I created a live script file (.mlx) for the enzymatic catalysis example. I think that these live scripts are very cool, especially for explaining, but they have some disadvantages: Debugging is more complicated (or: less possible), and I had the impression that they have an overhead for computation time, but didn't test it particularly... Moreover, the plotting interface works differently, plots are displayed after they are finished, refreshing und updating plots seems not to work. So having ONLY a live script is maybe not the best idea, since also compatibility is not ideal...
I would suggest having for each example a live script AND a function version of the main file. The function is more flexible, the live script a (very) nice add-on. But PESTO should also work without them...
PESTO prints out numbers at the end of every profile calculation, probably some stray debug info?
We need to check the parts for the parallel computing toolbox on the scm
Add PSwarm interface to getMultiStarts.m:
http://www.norg.uminho.pt/aivaz/pswarm/
What are the different collectResults.m files? Which one should we use?
collectResults.m
collectResults.m-d1552287-482b-4c88-a3c7-e5c52d475cf7
collectResults.m-e232cfd4-c3d6-40d3-9086-3379c1d58234
Currently, the routines are fairly cluttered and non-modularized. E.g. plotParameterUncertainty could be splitted up into modules for '1D', '2D' for optimization, profile and sampling results. This would also reduce redundancies with plotPropertyUncertainty, since it would use the same modules.
The naming should be adapted to ensure consistency. -> getKernelDensityEstimate or getKDE?
As discussed a while ago, we should at a certain point make the options completely object oriented.
structs passed to as arguments which should be PestoOptions passed to those functions should be automatically cast into PestoOptions.
also according to camel code, it should be called pestoOptions, not PestoOptions.
Maximum recursion limit of 500 reached.
Error in PestoOptions/checkVector (line 464)
this.(name) = value;
please ensure that optimization can be run in -nojvm mode. At least if mode is set to 'text'
Currently the options are distributed within 2 objects: The parameters struct and the options struct.
I suggest the following changes: The parameters-struct should cover results .MS, .P, .S exclusively, while the options-struct is meant to cover options in hierarchical fashion, e.g.
opt.parameters.max = [...]
opt.parameters.name = [...]
opt.sampling.nIterations = ...
opt.sampling.algorithm = 'PT'
opt.sampling.PT.alpha = ...
opt.sampling.PT.nTemps = ...
opt.sampling.plotting.plotType = '1D'
This increases the usability by making it more intuitive to set options. In addition, when dealing with many runs, loading options is very slow if they were stored together with large result objects (Sampling!) within one object.
Due to getParameterSamples.m 118-120
opt.number = parameters.number;
opt.min = parameters.min;
opt.max = parameters.max;
getParameterSamples cannot be called with PestoOptions instance, because these members do not exist.
(see runPestoTests
)
(why) is it required to copy these values into opt
?
Matlab's default indentation scheme is to not indent the first level of code:
function [ output_args ] = Untitled2( input_args )
%UNTITLED2 Summary of this function goes here
% Detailed explanation goes here
output_args = 1;
end
I find that extremely unintuitive and in the case of multiple functions per file, it reduces readability significantly.
Therefore, I would like to indent all levels of code as in:
function [ output_args ] = Untitled2( input_args )
%UNTITLED2 Summary of this function goes here
% Detailed explanation goes here
output_args = 1;
end
(This can be done automatically by changing Preferences > MATLAB > Editor/Debugger > Language > Indenting
to Indent all functions
and then in the editor simply choosing smart indent
from the context menu.)
My suggestion would be to use this style for new additions or during major reorganizations only, to not ruining git blame
output.
Any opinions?
I recently realized that getPropertySamples is very slow (43s for 2k sample iterations) in the conversionReaction example.
The largest portion (29s) of time goes into the 21 calls of plotPropertyUncertainty which updates the legend and histogram bars each time. Indeed, it updates every 100 MCMC iterations as coded in line 119 of getPropertySamples:
if (mod(j,100) == 0) || (j == length(properties.S.logPost))
I suggest to change it to 10%, 20%, ... of the iterations instead or removing the sequential drawing entirely.
Expected behavior:
The provided examples should not issue any warnings or errors.
Actual behavior:
Got the warnings shown below when running mainExampleGauss.m.
Progress: 100 %
ans =
Figure (8: plotParameterUncertainty) with properties:
Number: 8
Name: 'plotParameterUncertainty'
Color: [0.9400 0.9400 0.9400]
Position: [1000 918 560 420]
Units: 'pixels'
Show all properties
Warning: No valid values for sigma found! Not plotting approximation.
> In plotParameterUncertainty (line 613)
In plotParameterSamples (line 43)
Warning: No valid values for sigma found! Not plotting approximation.
> In plotParameterUncertainty (line 613)
In plotParameterSamples (line 43)
Warning: No valid values for sigma found! Not plotting approximation.
> In plotParameterUncertainty (line 613)
In plotParameterSamples (line 43)
Warning: No valid values for sigma found! Not plotting approximation.
> In plotParameterUncertainty (line 613)
In plotParameterSamples (line 43)
Warning: No valid values for sigma found! Not plotting approximation.
> In plotParameterUncertainty (line 613)
In plotParameterSamples (line 43)
Warning: No valid values for sigma found! Not plotting approximation.
> In plotParameterUncertainty (line 613)
In plotParameterSamples (line 43)
Warning: No valid values for sigma found! Not plotting approximation.
> In plotParameterUncertainty (line 613)
In plotParameterSamples (line 43)
Warning: No valid values for sigma found! Not plotting approximation.
> In plotParameterUncertainty (line 613)
In plotParameterSamples (line 43)
Warning: No valid values for sigma found! Not plotting approximation.
> In plotParameterUncertainty (line 613)
In plotParameterSamples (line 43)
ans =
Figure (1: plotParameterUncertainty) with properties:
Number: 1
Name: 'plotParameterUncertainty'
Color: [0.9400 0.9400 0.9400]
Position: [1000 918 560 420]
Units: 'pixels'
Show all properties
Following the discussion from yesterday, I would suggest to shift opt.obj_type into the parameters struct , since it is part of the problem formulation as pointed out by Fabian.
E.g. in getMultistart.m:
Isn't the option .isMinibatch redundant with .optimizer?
isMinibatch = (optimizer == 'minibatch')
or not?
Can this routine be removed by exploiting the new matlab option 'xticklabelrotation'?
We need to make sure that getParameterSamples also works without running a multistart local optimization method beforehand. Do we want getParameterProfiles also to work without information from parameters.MS?
Add figure('Name','some nice description') to identify plot windows more easily. Especially in the examples it is otherwise not all to clear what is happening in which of the 13 plotting windows.
First creates many warnings, then crashes leaving the error message:
Error using figure
Invalid convenience arg handle
Error in plotPropertyProfiles (line 132)
fh = figure(varargin{3});
Error in getPropertyProfiles (line 234)
case 'visual', fh = plotPropertyProfiles(properties,'1D',fh,options.property_index,options.plot_options); disp(str);
Error in main (line 206)
properties = getPropertyProfiles(properties,parameters,objectiveFunction,optionsProperties);
Shall we make review and approval of pull requests mandatory for PESTO? I think it would be a good idea, at least for master. It will help to recognize potential issues early on, it is likely to improve code organization and quality, and more people will feel responsible and in touch with the different parts of PESTO.
Undefined function or variable 't_cpu_fmincon'.
Error in getMultiStarts/outfun_fmincon (line 487)
Error in callAllOptimOutputFcns (line 13)
Error in barrier
Error in barrier
Error in barrier
Error in fmincon (line 796)
Error in getMultiStarts (line 228)
Perform optimization...Warning: Objective function has too many input arguments for the chosen optimization method.
> In getMultiStarts (line 236)
In main (line 82)
In run (line 96)
Do you want to pass additionally the whole dataset? Errors may occur... (y/n)
This is caused by the minibatch optimization.
Can we simply remove the "options" from the objective function?
At a certain point, we should change that, since the following problem can appear:
options1 = PestoOptions();
...do some assignments to options1
options2 = options1;
...do some assignments to options2
The latter assignments will also change the object in options1, which is counter-intuitive and can lead to wrong results.
When running the conversion reaction example runEstimation__CR
Undefined function or variable 'getParameterSamples'.
Error in runEstimation__CR (line 103)
parameters = getParameterSamples(parameters,logL,options_par);
In a final version, we should write all examples in a way, that the code from AMICI is not needed as a whole package (mex functions are of course okay), i.e. comment or cut all calls to amiwrap.
Remove /DRAM from PESTO repository, but keep DRAM-based routines and check for (external) DRAM availability where these functions have been used so far.
There are still two examples missing (sampling and the Pom1p)
plotting 2D parameter distributions with >10^5 samples (nsimu_run>10^6) takes long.
I think we should limit the maximum number of points to a reasonable number and use mvksdensity to color the plots by density
Pesto can compute confidence intervals and, as far as I know but please correct me if I'm wrong, also profile for multimodal objective functions. Since this is a useful feature, we should implement this in one example.
Are there any other useful features, which are not covered by examples or which I might not know so far?
E.g. there is extensive code duplication in getPropertyProfiles.m and getParameterProfiles.m, getParameterConfidenceIntervals.m and getPropertyConfidenceIntervals.m, ...
when generating default pesto options the fields .MCMC.nsimu_warmup and .MCMC.nsimu_run are empty but do indeed have default values (according to documentation)
ans =
Figure (4: plotParameterUncertainty) with properties:
Number: 4
Name: 'plotParameterUncertainty'
Color: [0.9400 0.9400 0.9400]
Position: [1000 918 560 420]
Units: 'pixels'
Show all properties
Warning: No valid values for sigma found! Not plotting approximation.
> In plotParameterUncertainty (line 613)
In plotParameterSamples (line 43)
Warning: No valid values for sigma found! Not plotting approximation.
> In plotParameterUncertainty (line 613)
In plotParameterSamples (line 43)
Warning: No valid values for sigma found! Not plotting approximation.
> In plotParameterUncertainty (line 613)
In plotParameterSamples (line 43)
Warning: No valid values for sigma found! Not plotting approximation.
> In plotParameterUncertainty (line 613)
In plotParameterSamples (line 43)
ans =
Figure (5: plotParameterUncertainty) with properties:
Number: 5
Name: 'plotParameterUncertainty'
Color: [0.9400 0.9400 0.9400]
Position: [1000 918 560 420]
Units: 'pixels'
Show all properties
Reference to non-existent field 'MS'.
Error in plotConfidenceIntervals (line 156)
plot([pStruct.MS.par(iP,1), pStruct.MS.par(iP,1)], [j-0.4, j+0.4], 'k-', 'linewidth', 2);
Error in getParameterConfidenceIntervals (line 121)
plotConfidenceIntervals(parameters, alpha, [], options);
Former ToDo.txt:
todo:
This fits better here. The file will be removed.
Partially fixed in 63796a9.
However, onCleanup callback is not called when Ctrl+C is hit during processing of some subroutine. Instead, onCleanup callback is only called at the next invocation of getMultiStarts.
Workaround: delete(findall(0,'tag','TMWWaitbar'))
PestoOptions.m:37
Same for specificyConstraintGradient
The following functions do not come with examples and are not used by any other function:
I guess we should provide examples for all of them or remove the obsolete ones.
In mainErbBSignaling.m load('erbb_signaling_pnom.mat');
File probably not added to repository due to *.mat in .gitignore which is now removed.
Please add data.
the calls of obj_w_error_count (https://github.com/ICB-DCM/PESTO/blob/master/getMultiStarts.m#L615) and obj are not consisten (i think they should be).
Also the function uses eval instead of feval. Eval creates invisible temporary variables and should only be used if there really is no other way. You could do something like feval(varargin{2},varargin{1,4}) instead and also avoid the additional switch statement.
Add regression tests
I suggest to remove logL__CR_with_H.m
, since it contains pretty much the same code as logL__CR.m
. Both provide the approximation of the Hessian.
Objections?
Before the first optimization run is finished, the waitbar reads "less than one minute", which is misleading, especially if the first run takes more than one minute. Probably better to change this to "unknown".
loops at
https://github.com/ICB-DCM/PESTO/blob/master/plotMultiStarts.m#L100
and
https://github.com/ICB-DCM/PESTO/blob/master/plotMultiStarts.m#L154
are super slow for large number of multistarts because i is set to length(parameters.MS.logPost) and not number of finished multistarts.
Currently, these are structs, but in most routines, we ask for the existence of specific fields and test then is they are empty. Classes would make this much nicer.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.