icb-dcm / pesto Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 12.0 253.53 MB

PESTO: Parameter EStimation TOolbox, Bioinformatics, btx676, 2017.

Home Page: https://doi.org/10.1093/bioinformatics/btx676

License: BSD 3-Clause "New" or "Revised" License

MATLAB 88.38% M 0.03% Fortran 11.59%

matlab optimization parameter-estimation profile-likelihood systems-biology uncertainty-quantification

pesto's People

Contributors

Stargazers

Watchers

Forkers

paulstapor enterstudio azzouzis annafiedler yishuiaoi elbaraim yxy92 vlyubenova passion4energy dkanhe benjasanchez shik-design

pesto's Issues

simulate_Gauss_LLH() seems to be missing

It looks like there's a file missing for the GaussExample.

Comprehensive option checking in PestoOptions.m

Currently only a few options are checked in PestoOptions.m

Create PESTO option classes

Pass PESTO options as class instances.

Therefore, create classes:

PestoOptions (for getMultiStarts getParameterProfiles getPropertyProfiles)
~~PestoParameter (for getMultiStarts getParameterProfiles getPropertySamples)~~
PestoPlottingOptions (for plot*.m)

Should look something like this:

PestoOptions

   .obj_type ... type of objective function provided
       = 'log-posterior' (default) ... algorithm assumes that
               log-posterior or log-likelihood are provided and perfroms
               a maximization of the objective function.
       = 'negative log-posterior' ... algorithm assumes that negative
               log-posterior or negative log-likelihood are provided and
               perfroms a minimization of the objective function.
   .comp_type ... type of computations
       = 'sequential' (default) ... classical sequential (in core) method
       = 'parallel' ... multi-core method exploiting parfor
   .fmincon ... options for fmincon (the local optimizer)
   .n_starts ... number of local optimizations (default = 20).
   .init_threshold ... log-likelihood / log-posterior threshold for 
       initialization of optimization (default = -inf).
   .proposal ... method used to propose starting points
       = 'latin hypercube' (default) ... latin hypercube sampling
       = 'uniform' ... uniform random sampling
       = 'user-supplied' ... user supplied function parameters.init_fun
   .rng ... initialization of random number generator (default = 0).
       = any ral number r => random generator is initialized with r.
       = [] ... random number generator is not initialized.
       (Initializing the random number generator with a specific seed can be
       helpful to reproduce problems.)
   .mode ... output of algorithm
       = 'visual' (default) ... plots are gnerated which show the progress
       = 'text' ... optimization results for multi-start is printed on screen
       = 'silent' ... no output during the multi-start local optimization
   .fh ... handle of figure in which results are printed. If no
       handle is provided, a new figure is used.
   .plot_options ... plot options for plotMultiStarts.m.
   .save ... determine whether results are directly saved
       = false (default) ... results are not saved
       = true ... results are stored to an extra folder
   .trace ... determine whether objective function, parameter values and
   computation time are stored over iterations
       = false (default) ...  not saved
       = true ... stored in fields par_trace, fval_trace and time_trace
   .tempsave ... determine whether intermediate results are stored every
   10 iterations
       = false (default) ...  not saved
       = true ... results are stored to an extra folder
   .foldername ... name of the folder in which results are stored.
       If no folder is provided, a random foldername is generated.
   .start_index ... vector of indexes which starts should be performed.
       default is 1:n_starts
   .resetobjective ... clears the objective function before every
       multi-start.
       = false ... (default) persistent variables are preserved.
       = true ... remove all temporary/persistent variables.
       WHEN TRUE THIS OPTION REMOVES ALL OBJECTIVE FUNCTION BREAK POINTS
   .optimizer ... specifies which optimizer should be used
       = 'fmincon' ... (default) fmincon
       = 'minibatch' ... uses a minibatch optimization approach
   .optim_options ... struct with options for minibatch optimization
       .isMinibatch ... logical: perform full batch or minibatch optim
           = false ... deterministic optimization
           = true ... minibatch optim, Obj function must be adapted
       .nDatasets ... number of measurement points, only if isMinibatch
       .nBatchdata ... Size of Minibatches, only if isMinibatch
       .nOptimSteps ... number of maximum optimization steps
       .model ... String with the model name for AMICI, may be left void
       .method ... optimization method, to be chosen from
           = 'standard' ... stochastic gradient descent, standard method
           = 'momentum' ... sgd with momentum
           = 'nesterov' ... sgd with Nesterov momentum function
           = 'rmsprop' ... adaptive step size for each parameter
           = 'rmspropnesterov' ... with additional momentum
           = 'adam' ... adaptive method
           = 'adadelta' ... adaptive method
       .hyperparams ... struct containing the hyperparameters 
           (e.g. learning rate) for the opt-method, must fit with chosen 
           method (see documentation there)</pre>

PestoParameter --------------------- ```` .number ... number of parameter .guess ... initial guess of parameter .min ... lower bound for parameter values .max ... upper bound for parameter values .name = {'name1',...} ... names of the parameters .init_fun ... function to draw starting points for local optimization ````

PestoPlottingOptions

   .hold_on ... indicates whether plots are redrawn or whether something
       is added to the plot
       = 'false' (default) ... new plot
       = 'true' ... extension of plot
   .interval ... selection mechanism for x limits
       = 'dynamic' (default) ... x limits depending on analysis results
       = 'static' ... x limits depending on parameters.min and .max or on
          user-defined bound options.bounds.min and .max. The later are
          used if provided.
   .bounds ... bounds used for visualization if options.interval = 'static'
       .min ... lower bound
       .max ... upper bound
   .P ... options for profile plots
       .plot_type ... plot type
           = 0 (default) ... no plot
           = 1 ... likelihood ratio
           = 2 ... negative log-likelihood
       .col ... color of profile lines (default: [1,0,0])
       .lw ... line width of profile lines (default: 1.5)
   .S ... options for sample plots
       .plot_type ... plot type
           = 0 (default if no samples are provided) ... no plot
           = 1 (default if samples are provided) ... histogram
       .col ... color of histogram (default: [0.7,0.7,0.7])
       .bins ... number of histogram bins (default: 30)
           = 'optimal' ... selection using Scott's rule
           = 'conservative' ... selection using Scott's rule / 2
           = N (with N being an integer) ... N bins
   .MS ... options for multi-start optimization plots
       .plot_type ... plot type
           = 0 (default if no MS are provided) ... no plot
           = 1 (default if MS are provided) ... likelihood ratio and
               position of optima above threshold
           = 2 ... negative log-likelihood and position of optima 
               above threshold
       .col ... color of local optima (default: [1,0,0])
       .lw ... line width of local optima (default: 1.5)
   .A ... options for distribution approximation plots
       .plot_type ... plot type
           = 0 (default if no MS are provided) ... no plot
           = 1 (default if MS are provided) ... likelihood ratio
           = 2 ... negative log-likelihood
       .col ... color of approximation lines (default: [0,0,1])
       .lw ... line width of approximation lines (default: 1.5)
   .boundary ... options for boundary visualization
       .mark ... marking of profile points which are on the boundary
           = 0 ... no visualization
           = 1 (default) ... indicates points which ar close to the
               boundaries in one or more dimensions.
       .eps ... minimal distance from boundary for which points are
               consider to e close do the boundary (default = 1e-4). Note
               that a one-norm is used.
   .CL ... options for confidence level plots
       .plot_type ... plot type
           = 0 (default) ... no plot
           = 1 ... likelihood ratio
           = 2 ... negative log-likelihood
       .alpha ... visualized confidence level (default = 0.95)
       .type ... type of confidence interval
           = 'point-wise' (default) ... point-wise confidence interval
           = 'simultanous' ... point-wise confidence interval
           = {'point-wise','simultanous'} ... both
       .col ... color of profile lines (default: [0,0,0])
       .lw ... line width of profile lines (default: 1.5)
   .op2D ... options used for 2D plot to position subplot axes.
       .b1 ... offset from left and bottom border (default = 0.15)
       .b2 ... offset from left and bottom border (default = 0.02)
       .r ... relative width of subplots (default = 0.95)
   .add_points ... option used to add additional points, e.g. true
           parameter in the case of test examples
       .par ... n x m matrix of m additional points
       .col ... color used for additional points (default = [0,0.8,0]).
                  This can also be a m x 3 matrix of colors.
       .ls ... line style (default = '--').
       .lw ... line width (default = 2).
       .m ... marker style (default = 'd').
       .ms ... line width (default = 8).
   .legend ... legend options
       .color ... background color (default = 'none').
       .box ... legend outine (default = 'on').
       .orientation ... orientation of list (default = 'vertical')

Do we need plotParameterProfiles.m and plotParameterSamples.m ?

They don't do anything but calling plotParameterUncertainty.m

Was there a specific motivation to have these extra functions?

LiveScripts, Scripts and Functions

I created a live script file (.mlx) for the enzymatic catalysis example. I think that these live scripts are very cool, especially for explaining, but they have some disadvantages: Debugging is more complicated (or: less possible), and I had the impression that they have an overhead for computation time, but didn't test it particularly... Moreover, the plotting interface works differently, plots are displayed after they are finished, refreshing und updating plots seems not to work. So having ONLY a live script is maybe not the best idea, since also compatibility is not ideal...

I would suggest having for each example a live script AND a function version of the main file. The function is more flexible, the live script a (very) nice add-on. But PESTO should also work without them...

output after profile calculation

PESTO prints out numbers at the end of every profile calculation, probably some stray debug info?

Parallel part

We need to check the parts for the parallel computing toolbox on the scm

Interface to global optimizer(s)

Add PSwarm interface to getMultiStarts.m:
http://www.norg.uminho.pt/aivaz/pswarm/

3 versions of collectResults.m

What are the different collectResults.m files? Which one should we use?

collectResults.m
collectResults.m-d1552287-482b-4c88-a3c7-e5c52d475cf7
collectResults.m-e232cfd4-c3d6-40d3-9086-3379c1d58234

Modularization of visulization routines

Currently, the routines are fairly cluttered and non-modularized. E.g. plotParameterUncertainty could be splitted up into modules for '1D', '2D' for optimization, profile and sampling results. This would also reduce redundancies with plotPropertyUncertainty, since it would use the same modules.

kde_simple.m

The naming should be adapted to ensure consistency. -> getKernelDensityEstimate or getKDE?

Restructure the Options

As discussed a while ago, we should at a certain point make the options completely object oriented.

Third argument is not of type PestoOptions when calling getMultiStarts

structs passed to as arguments which should be PestoOptions passed to those functions should be automatically cast into PestoOptions.

also according to camel code, it should be called pestoOptions, not PestoOptions.

Error when running mainConversionReaction

Maximum recursion limit of 500 reached.

Error in PestoOptions/checkVector (line 464)
this.(name) = value;

optimization with -nojvm

please ensure that optimization can be run in -nojvm mode. At least if mode is set to 'text'

Inconvenient Options Design

Currently the options are distributed within 2 objects: The parameters struct and the options struct.

I suggest the following changes: The parameters-struct should cover results .MS, .P, .S exclusively, while the options-struct is meant to cover options in hierarchical fashion, e.g.

opt.parameters.max = [...]
opt.parameters.name = [...]
opt.sampling.nIterations = ...
opt.sampling.algorithm = 'PT'
opt.sampling.PT.alpha = ...
opt.sampling.PT.nTemps = ...
opt.sampling.plotting.plotType = '1D'

This increases the usability by making it more intuitive to set options. In addition, when dealing with many runs, loading options is very slow if they were stored together with large result objects (Sampling!) within one object.

getParameterSamples cannot be called with PestoOptions instance

Due to getParameterSamples.m 118-120

   opt.number = parameters.number;
   opt.min    = parameters.min;
   opt.max    = parameters.max;

getParameterSamples cannot be called with PestoOptions instance, because these members do not exist.
(see runPestoTests)

(why) is it required to copy these values into opt?

Indentation style

Matlab's default indentation scheme is to not indent the first level of code:

function [ output_args ] = Untitled2( input_args )
%UNTITLED2 Summary of this function goes here
%   Detailed explanation goes here
output_args = 1;
end

I find that extremely unintuitive and in the case of multiple functions per file, it reduces readability significantly.
Therefore, I would like to indent all levels of code as in:

function [ output_args ] = Untitled2( input_args )
    %UNTITLED2 Summary of this function goes here
    %   Detailed explanation goes here
    output_args = 1;
end

(This can be done automatically by changing Preferences > MATLAB > Editor/Debugger > Language > Indenting to Indent all functions and then in the editor simply choosing smart indent from the context menu.)

My suggestion would be to use this style for new additions or during major reorganizations only, to not ruining git blame output.

Any opinions?

overhead in getPropertySamples with 'visual'

I recently realized that getPropertySamples is very slow (43s for 2k sample iterations) in the conversionReaction example.

The largest portion (29s) of time goes into the 21 calls of plotPropertyUncertainty which updates the legend and histogram bars each time. Indeed, it updates every 100 MCMC iterations as coded in line 119 of getPropertySamples:

    if (mod(j,100) == 0) || (j == length(properties.S.logPost))

I suggest to change it to 10%, 20%, ... of the iterations instead or removing the sequential drawing entirely.

Warnings when running mainExampleGauss.m

Expected behavior:
The provided examples should not issue any warnings or errors.

Actual behavior:
Got the warnings shown below when running mainExampleGauss.m.

Progress: 100 %

ans = 

  Figure (8: plotParameterUncertainty) with properties:

      Number: 8
        Name: 'plotParameterUncertainty'
       Color: [0.9400 0.9400 0.9400]
    Position: [1000 918 560 420]
       Units: 'pixels'

  Show all properties

Warning: No valid values for sigma found! Not plotting approximation. 
> In plotParameterUncertainty (line 613)
  In plotParameterSamples (line 43) 
Warning: No valid values for sigma found! Not plotting approximation. 
> In plotParameterUncertainty (line 613)
  In plotParameterSamples (line 43) 
Warning: No valid values for sigma found! Not plotting approximation. 
> In plotParameterUncertainty (line 613)
  In plotParameterSamples (line 43) 
Warning: No valid values for sigma found! Not plotting approximation. 
> In plotParameterUncertainty (line 613)
  In plotParameterSamples (line 43) 
Warning: No valid values for sigma found! Not plotting approximation. 
> In plotParameterUncertainty (line 613)
  In plotParameterSamples (line 43) 
Warning: No valid values for sigma found! Not plotting approximation. 
> In plotParameterUncertainty (line 613)
  In plotParameterSamples (line 43) 
Warning: No valid values for sigma found! Not plotting approximation. 
> In plotParameterUncertainty (line 613)
  In plotParameterSamples (line 43) 
Warning: No valid values for sigma found! Not plotting approximation. 
> In plotParameterUncertainty (line 613)
  In plotParameterSamples (line 43) 
Warning: No valid values for sigma found! Not plotting approximation. 
> In plotParameterUncertainty (line 613)
  In plotParameterSamples (line 43) 

ans = 

  Figure (1: plotParameterUncertainty) with properties:

      Number: 1
        Name: 'plotParameterUncertainty'
       Color: [0.9400 0.9400 0.9400]
    Position: [1000 918 560 420]
       Units: 'pixels'

  Show all properties

opt.obj_type -> par.obj_type

Following the discussion from yesterday, I would suggest to shift opt.obj_type into the parameters struct , since it is part of the problem formulation as pointed out by Fabian.

PESTO option .isMinibatch redundant?

E.g. in getMultistart.m:
Isn't the option .isMinibatch redundant with .optimizer?

isMinibatch = (optimizer == 'minibatch')
or not?

xticklabel_rotate.m

Can this routine be removed by exploiting the new matlab option 'xticklabelrotation'?

Independence of methods

We need to make sure that getParameterSamples also works without running a multistart local optimization method beforehand. Do we want getParameterProfiles also to work without information from parameters.MS?

Set figure window title

Add figure('Name','some nice description') to identify plot windows more easily. Especially in the examples it is otherwise not all to clear what is happening in which of the 13 plotting windows.

getPropertyProfiles

First creates many warnings, then crashes leaving the error message:

Error using figure
Invalid convenience arg handle

Error in plotPropertyProfiles (line 132)
fh = figure(varargin{3});

Error in getPropertyProfiles (line 234)
case 'visual', fh = plotPropertyProfiles(properties,'1D',fh,options.property_index,options.plot_options); disp(str);

Error in main (line 206)
properties = getPropertyProfiles(properties,parameters,objectiveFunction,optionsProperties);

Mandatory approval of pull requests?

Shall we make review and approval of pull requests mandatory for PESTO? I think it would be a good idea, at least for master. It will help to recognize potential issues early on, it is likely to improve code organization and quality, and more people will feel responsible and in touch with the different parts of PESTO.

error when running optimization with ftrace = true and ftempsave = true

Undefined function or variable 't_cpu_fmincon'.

Error in getMultiStarts/outfun_fmincon (line 487)

Error in callAllOptimOutputFcns (line 13)

Error in barrier

Error in fmincon (line 796)

Error in getMultiStarts (line 228)

Example enzymatic catalysis broken

 Perform optimization...Warning: Objective function has too many input arguments for the chosen optimization method. 
> In getMultiStarts (line 236)
  In main (line 82)
  In run (line 96) 
Do you want to pass additionally the whole dataset? Errors may occur... (y/n)

This is caused by the minibatch optimization.

Can we simply remove the "options" from the objective function?

PestoOptions is a handle

At a certain point, we should change that, since the following problem can appear:

options1 = PestoOptions();
...do some assignments to options1
options2 = options1;
...do some assignments to options2

The latter assignments will also change the object in options1, which is counter-intuitive and can lead to wrong results.

Undefined function or variable 'getParameterSamples'.

When running the conversion reaction example runEstimation__CR

Undefined function or variable 'getParameterSamples'.

Error in runEstimation__CR (line 103)
parameters = getParameterSamples(parameters,logL,options_par);

Amici commands

In a final version, we should write all examples in a way, that the code from AMICI is not needed as a whole package (mex functions are of course okay), i.e. comment or cut all calls to amiwrap.

Implement installer script for DRAM and maybe AMICI

Remove /DRAM from PESTO repository, but keep DRAM-based routines and check for (external) DRAM availability where these functions have been used so far.

Missing Examples

There are still two examples missing (sampling and the Pom1p)

overplotting in plotParameterSamples - Parameter Distribution 2D

plotting 2D parameter distributions with >10^5 samples (nsimu_run>10^6) takes long.
I think we should limit the maximum number of points to a reasonable number and use mvksdensity to color the plots by density

Features, which are not shown in examples so far

Pesto can compute confidence intervals and, as far as I know but please correct me if I'm wrong, also profile for multimodal objective functions. Since this is a useful feature, we should implement this in one example.

Are there any other useful features, which are not covered by examples or which I might not know so far?

Reduce code duplication between property and parameter functions

E.g. there is extensive code duplication in getPropertyProfiles.m and getParameterProfiles.m, getParameterConfidenceIntervals.m and getPropertyConfidenceIntervals.m, ...

ambiguous default pesto options for MCMC

when generating default pesto options the fields .MCMC.nsimu_warmup and .MCMC.nsimu_run are empty but do indeed have default values (according to documentation)

Warnings and errors when running mainExampleRing.m

ans = 

  Figure (4: plotParameterUncertainty) with properties:

      Number: 4
        Name: 'plotParameterUncertainty'
       Color: [0.9400 0.9400 0.9400]
    Position: [1000 918 560 420]
       Units: 'pixels'

  Show all properties

Warning: No valid values for sigma found! Not plotting approximation. 
> In plotParameterUncertainty (line 613)
  In plotParameterSamples (line 43) 
Warning: No valid values for sigma found! Not plotting approximation. 
> In plotParameterUncertainty (line 613)
  In plotParameterSamples (line 43) 
Warning: No valid values for sigma found! Not plotting approximation. 
> In plotParameterUncertainty (line 613)
  In plotParameterSamples (line 43) 
Warning: No valid values for sigma found! Not plotting approximation. 
> In plotParameterUncertainty (line 613)
  In plotParameterSamples (line 43) 

ans = 

  Figure (5: plotParameterUncertainty) with properties:

      Number: 5
        Name: 'plotParameterUncertainty'
       Color: [0.9400 0.9400 0.9400]
    Position: [1000 918 560 420]
       Units: 'pixels'

  Show all properties

Reference to non-existent field 'MS'.

Error in plotConfidenceIntervals (line 156)
                    plot([pStruct.MS.par(iP,1), pStruct.MS.par(iP,1)], [j-0.4, j+0.4], 'k-', 'linewidth', 2);

Error in getParameterConfidenceIntervals (line 121)
        plotConfidenceIntervals(parameters, alpha, [], options);

Contents of former ToDo.txt

Former ToDo.txt:

todo:

reoptimization of profile
new profile calculation method

This fits better here. The file will be removed.

Zombie-waitbars after Ctrl+C in getMultiStarts

Partially fixed in 63796a9.

However, onCleanup callback is not called when Ctrl+C is hit during processing of some subroutine. Instead, onCleanup callback is only called at the next invocation of getMultiStarts.

Workaround: delete(findall(0,'tag','TMWWaitbar'))

SpecifyObjectiveGradient is not an option for fmincon in R2015b

PestoOptions.m:37

Same for specificyConstraintGradient

Functions lacking examples

The following functions do not come with examples and are not used by any other function:

collectResults
GelmanRubinstat
logmvnpdf
plotMCMCdiagnosis
testGradient
updateCovariance
updateStatistics
xticklabel_rotate (remove?)

I guess we should provide examples for all of them or remove the obsolete ones.

erbb_signaling example missing data

In mainErbBSignaling.m load('erbb_signaling_pnom.mat');

File probably not added to repository due to *.mat in .gitignore which is now removed.

Please add data.

obj_w_error_count

the calls of obj_w_error_count (https://github.com/ICB-DCM/PESTO/blob/master/getMultiStarts.m#L615) and obj are not consisten (i think they should be).
Also the function uses eval instead of feval. Eval creates invisible temporary variables and should only be used if there really is no other way. You could do something like feval(varargin{2},varargin{1,4}) instead and also avoid the additional switch statement.

Add regression tests

Duplicated code in conversion reaction example: logLCR.m logLCR_with_H.m

I suggest to remove logL__CR_with_H.m, since it contains pretty much the same code as logL__CR.m. Both provide the approximation of the Hessian.

Objections?

Initial predicted waiting time

Before the first optimization run is finished, the waitbar reads "less than one minute", which is misleading, especially if the first run takes more than one minute. Probably better to change this to "unknown".

slow plotting for many multistarts

loops at

https://github.com/ICB-DCM/PESTO/blob/master/plotMultiStarts.m#L100

and

https://github.com/ICB-DCM/PESTO/blob/master/plotMultiStarts.m#L154

are super slow for large number of multistarts because i is set to length(parameters.MS.logPost) and not number of finished multistarts.

Write Classes for Parameters and Properties

Currently, these are structs, but in most routines, we ask for the existence of specific fields and test then is they are empty. Classes would make this much nicer.