stan-dev / stanc3 Goto Github PK

The Stan transpiler (from Stan to C++ and beyond).

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.04% OCaml 97.22% Shell 0.45% HTML 0.03% Python 0.14% Nix 0.15% JavaScript 1.30% Dockerfile 0.67%

stanc3's Issues

AST locations should support #includes

Right now AST locations are:

type location =
  | Location of Lexing.position * Lexing.position  (** delimited location *)
  | Nowhere  (** no location *)
...
type position = {
  pos_fname : string;
  pos_lnum : int;
  pos_bol : int;
  pos_cnum : int;
}

This doesn't seem to support the include use case where a statement comes from a file but may have been included by other files (unless I'm thinking about this wrong).

Add example models to integration tests

From this repo: https://github.com/stan-dev/example-models .

Note that some of these may need to be fixed, as they don't compile in the current Stan compiler.

Ultimately, the plan is to remove most of that repo, as most of these models are very hard to sample from. They are good tests for the grammar, semantic check and generator though, so we should rescue them before they get removed!

Cannot declare and define an integer array using the colon operator

Summary:

It is a parser error to declare and define a one-dimensional integer array using the colon operator.

Description:

It is a parser error to declare and define a one-dimensional integer array using the colon operator.

Reproducible Steps:

Try to parse the following Stan program

data {
  int<lower=1> K;
}
transformed data {
  int arr[K] = 1:K;
}

Current Output:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable definition dimensions mismatch, definition specifies 1, declaration specifies 0  error in 'model_int_array' at line 5, column 17
  -------------------------------------------------
     3: }
     4: transformed data {
     5:   int arr[K] = 1:K;
                        ^
     6: }
  -------------------------------------------------

Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'int_array' due to the above error.

Expected Output:

Nothing. It should assign the integers from 1 to K to arr

Additional Information:

Current Version:

v2.17.0

Command to output supported function signatures

There are tools out there (autocomplete, live doc, etc) that consume a list of function signatures in a regular format - it'd be good to output one. Eventually we can think about what else a presentation compiler would look like.

Make data only argument error message more informative

Currently, we just print a type error message. It could be good to emphasize in the message that the data only restriction is what is causing type checking to fail.

Need error messages for various truncation situations

In the current compiler, this model throws a couple of errors we aren't throwing:

model { 1 ~ bernoulli(0.2) T[0.1, 1.1]; }

, namely

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  real ~ bernoulli_cdf(real)

Available argument signatures for bernoulli_cdf:

  int ~ bernoulli_cdf(real)
  int ~ bernoulli_cdf(real[ ])
  int ~ bernoulli_cdf(vector)
  int ~ bernoulli_cdf(row_vector)
  int[ ] ~ bernoulli_cdf(real)
  int[ ] ~ bernoulli_cdf(real[ ])
  int[ ] ~ bernoulli_cdf(vector)
  int[ ] ~ bernoulli_cdf(row_vector)

Lower truncation not defined for specified arguments to bernoulli
 error in '../stanc3/test.stan' at line 2, column 34
  -------------------------------------------------
     1: model {
     2:   1 ~ bernoulli(0.2) T[0.1, 1.1];
                                         ^
     3: }
  -------------------------------------------------


make: *** [../stanc3/test.hpp] Error 253

that first error message might be spurious given that 1 could also be an int.

Semantic check should accept lpdf and lpmf functions which have first argument of array type

stan code parsers successfully but fails to compile due to pstream__ not declared error

Summary:

I tried to compile a stan model, which was successfully parsed, but fails to compile due to:
./sirStrain.hpp:2822:40: error: ‘pstream__’ was not declared in this scope. Compilation was attempted both with rstan and cmdstan

Description:

See Summary

Reproducible Steps:

Try to compile with cmdStan (2.18.0):

make sirStrain

Note that the actual stan code is included below.

Current Output:

$ make sirStrain

--- Translating Stan model to C++ code ---
bin/stanc  sirStrain.stan --o=sirStrain.hpp
Model name=sirStrain_model
Input file=sirStrain.stan
Output file=sirStrain.hpp
DIAGNOSTIC(S) FROM PARSER:
Warning: left-hand side variable (name=healthy) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
Warning: left-hand side variable (name=healthy) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
Warning: left-hand side variable (name=y0) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
Warning: left-hand side variable (name=ili) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.

Compiling pre-compiled header
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/sundials_3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS
 -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe  -c -O3
 stan/src/stan/model/model_header.hpp -o stan/src/stan/model/model_header.hpp.gch

--- Linking C++ model ---
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/sundials_3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS
 -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe   src/c
mdstan/main.cpp  -O3 -o sirStrain -include sirStrain.hpp stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundial
s_idas.a
In file included from <command-line>:0:0:
./sirStrain.hpp: In member function ‘void sirStrain_model_namespace::sirStrain_model::get_dims(std::vector<std::vector<long unsigned int> >&) const’:
./sirStrain.hpp:2822:40: error: ‘pstream__’ was not declared in this scope
         dims__.push_back(get_state_dim(pstream__));
                                        ^~~~~~~~~
./sirStrain.hpp:2822:40: note: suggested alternative: ‘pthread_t’
         dims__.push_back(get_state_dim(pstream__));
                                        ^~~~~~~~~
                                        pthread_t
In file included from <command-line>:0:0:
./sirStrain.hpp: In member function ‘void sirStrain_model_namespace::sirStrain_model::constrained_param_names(std::vector<std::__cxx11::basic_string<char> >&, bool, bool) const’:
./sirStrain.hpp:3159:56: error: ‘pstream__’ was not declared in this scope
             for (int k_0__ = 1; k_0__ <= get_state_dim(pstream__); ++k_0__) {
                                                        ^~~~~~~~~
./sirStrain.hpp:3159:56: note: suggested alternative: ‘pthread_t’
             for (int k_0__ = 1; k_0__ <= get_state_dim(pstream__); ++k_0__) {
                                                        ^~~~~~~~~
                                                        pthread_t
./sirStrain.hpp:3164:56: error: ‘pstream__’ was not declared in this scope
             for (int k_1__ = 1; k_1__ <= get_state_dim(pstream__); ++k_1__) {
                                                        ^~~~~~~~~
./sirStrain.hpp:3164:56: note: suggested alternative: ‘pthread_t’
             for (int k_1__ = 1; k_1__ <= get_state_dim(pstream__); ++k_1__) {
                                                        ^~~~~~~~~
                                                        pthread_t
./sirStrain.hpp:3171:56: error: ‘pstream__’ was not declared in this scope
             for (int k_0__ = 1; k_0__ <= get_theta_dim(pstream__); ++k_0__) {
                                                        ^~~~~~~~~
./sirStrain.hpp:3171:56: note: suggested alternative: ‘pthread_t’
             for (int k_0__ = 1; k_0__ <= get_theta_dim(pstream__); ++k_0__) {
                                                        ^~~~~~~~~
                                                        pthread_t
./sirStrain.hpp: In member function ‘void sirStrain_model_namespace::sirStrain_model::unconstrained_param_names(std::vector<std::__cxx11::basic_string<char> >&, bool, bool) const’:
./sirStrain.hpp:3253:56: error: ‘pstream__’ was not declared in this scope
             for (int k_0__ = 1; k_0__ <= get_state_dim(pstream__); ++k_0__) {
                                                        ^~~~~~~~~
./sirStrain.hpp:3253:56: note: suggested alternative: ‘pthread_t’
             for (int k_0__ = 1; k_0__ <= get_state_dim(pstream__); ++k_0__) {
                                                        ^~~~~~~~~
                                                        pthread_t
./sirStrain.hpp:3258:56: error: ‘pstream__’ was not declared in this scope
             for (int k_1__ = 1; k_1__ <= get_state_dim(pstream__); ++k_1__) {
                                                        ^~~~~~~~~
./sirStrain.hpp:3258:56: note: suggested alternative: ‘pthread_t’
             for (int k_1__ = 1; k_1__ <= get_state_dim(pstream__); ++k_1__) {
                                                        ^~~~~~~~~
                                                        pthread_t
./sirStrain.hpp:3265:56: error: ‘pstream__’ was not declared in this scope
             for (int k_0__ = 1; k_0__ <= get_theta_dim(pstream__); ++k_0__) {
                                                        ^~~~~~~~~
./sirStrain.hpp:3265:56: note: suggested alternative: ‘pthread_t’
             for (int k_0__ = 1; k_0__ <= get_theta_dim(pstream__); ++k_0__) {
                                                        ^~~~~~~~~
                                                        pthread_t
make/models:14: recipe for target 'sirStrain' failed
make: *** [sirStrain] Error 1

Expected Output:

A working model

Additional Information:

I tested this both from R (rstan) and cmdStan. In R it failed, but did not provide helpful information. Included here is the cmdStan output.

Current Version:

v2.18.0

The model consists of two files. The main file: sirStrain.stan

functions {
#include sirFunctions.stan
  real approxbin_lpdf(real k, real n, real p) {
    return beta_lpdf(p | k+1, n-k+1);
  }
  real niliForcing(real t, real amplitude, real period, real peakTime) {
    real sigma;
    sigma = 0.25*period;
    return amplitude*(exp(-pow(t-peakTime,2)/(2*pow(sigma,2)))-1);
  }
  real[] ode(real time, real[] state, real[] theta, real[] x_r, int[] x_i) {
    real dydt[get_state_dim()];
    real N;
    real healthy;
    healthy = N;
    N = get_N(x_r);
    for (k in 1:get_K(x_i)) {
      real newr;
      newr = get_gamma(theta)*get_I(state, {k});
      dydt[get_S_id({k})] = -get_beta(theta, {k})*get_S(state, {k})*get_I(state, {k})/N;
      dydt[get_I_id({k})] = get_beta(theta, {k})*get_S(state, {k})*get_I(state, {k})/N -
        newr;
      dydt[get_R_id({k})] = newr;
      dydt[get_PILI_id({k})] = newr*get_thetap(theta, {k});
      healthy = healthy - get_I(state, {k});
    }
    healthy = fmin(healthy, 1.0);
    dydt[get_NILI_id()] = exp(get_thetan(theta) +
        niliForcing(time, get_niliAmplitude(theta), get_niliPeriod(theta), get_niliPeak(theta))
      )*healthy;
    return dydt;
  }
}
data {
  int T;
  int K;
  real N;
  real ts[T];

  real sero_mu[K];
  real sero_sd[K];
  real niliPeak_mu;

  real thetap_alpha[K];
  real thetap_beta[K];

  int Y[T-1,K+1];
  real YILI[T,2];
}
transformed data {
  real x_r[get_xr_dim()];
  int x_i[get_xi_dim()];
  x_r = xr_to_1d(N);
  x_i = xi_to_1d(K);
}
parameters {
  real<lower=-10,upper=-6> I0[K];
  real<lower=0,upper=1> susc[K];
  real<lower=0,upper=1> asc;

  real<lower=0,upper=4> beta[K];
  real<lower=0,upper=1> gamma;
  real<lower=0,upper=1>  thetap[K];
  real<lower=-15,upper=0> thetan;

  real<lower=40,upper=200> niliPeriod;
  real<lower=0,upper=3> niliAmplitude;
  real<lower=14,upper=120> niliPeak;
}
transformed parameters {
  real y0[get_state_dim()];
  real y[T, get_state_dim()];
  real theta[get_theta_dim()];
  real pili[T-1,K];
  real nili[T-1];
  real ili[T-1];
  vector[K+1] ppili[T-1];

  theta = theta_to_1d(beta, gamma, thetap, thetan, niliPeriod, niliAmplitude, niliPeak);

  // initial
  y0[get_NILI_id()] = 0;
  for (k in 1:K) {
    y0[get_I_id({k})] = exp(I0[k])*susc[k]*N;
    y0[get_S_id({k})] = susc[k]*N - y0[get_I_id({k})];
    y0[get_R_id({k})] = (1-susc[k])*N;
    y0[get_PILI_id({k})] = 0;
  }

  y = integrate_ode_rk45(ode, y0, 0, ts, theta, x_r, x_i);

  // new nili and pili cases for last week
  for(t in 2:T) {
    nili[t-1] = get_NILI(y[t,]) - get_NILI(y[t-1,]);
    ili[t-1] = nili[t-1];

    for (k in 1:K) {
      pili[t-1,k] = get_PILI(y[t,], {k}) - get_PILI(y[t-1,], {k});
      ili[t-1] = ili[t-1] + pili[t-1,k];
    }
  }

  for (t in 1:(T-1)) {
    for (k in 1:K) {
      ppili[k, t] = pili[t,k]/ili[t];
    }
    ppili[K + 1, t] = nili[t]/ili[t];
  }
}
model {
  real sheddingPeriod;
  sheddingPeriod = 1/gamma;
  sheddingPeriod ~ normal(4.8, 0.245); // carrat_time_2008

  niliPeak ~ normal(niliPeak_mu + 7, 21); // +7 because we start measuring after one running weekp
  for (k in 1:K) beta[k] ~ normal(1.28*gamma*susc[k], 0.133); // Assume biggerstaff is on Reff
  for (k in 1:K) susc[k] ~ normal(sero_mu[k], sero_sd[k]);
  for (k in 1:K) thetap[k] ~ beta(thetap_alpha[k], thetap_beta[k]);

  asc ~ beta(35.644, 69.314);

  for (t in 1:(T-1)) {
    YILI[t,1] ~ approxbin(ili[t]*YILI[t,2]/N, asc);
    Y[t] ~ multinomial(ppili[t]);
  }
}

and the included file sirFunctions.stan:

int index_to_1d(int[] index, int[] dims, int offset);
int index_to_1d(int[] index, int[] dims, int offset) {
  int n;
  n = size(dims);
  if (n == 1) {
    return offset + index[1];
  } else {
    //return (index[n] - 1) * prod(head(dims, n - 1)) +
    //  index_to_1d(head(index, n - 1), head(dims, n - 1), offset);
    return (index[1] - 1) * prod(tail(dims, n - 1)) +
      index_to_1d(tail(index, n - 1), tail(dims, n - 1), offset);
  }
}

int get_beta_id(int[] index) { return index_to_1d(index, {3} , 0); }
real get_beta(real[] v, int[] index) { return v[get_beta_id(index)]; }
real[] get_beta_array(real[] v) { real arr[3];
              for (id1 in 1:3) arr[id1] = get_beta(v, {id1});
              return arr; }
int get_gamma_id() { return 4; }
real get_gamma(real[] v) { return v[get_gamma_id()]; }
real get_gamma_array(real[] v) { return get_gamma(v); }
int get_thetap_id(int[] index) { return index_to_1d(index, {3} , 4); }
real get_thetap(real[] v, int[] index) { return v[get_thetap_id(index)]; }
real[] get_thetap_array(real[] v) { real arr[3];
              for (id1 in 1:3) arr[id1] = get_thetap(v, {id1});
              return arr; }
int get_thetan_id() { return 8; }
real get_thetan(real[] v) { return v[get_thetan_id()]; }
real get_thetan_array(real[] v) { return get_thetan(v); }
int get_niliPeriod_id() { return 9; }
real get_niliPeriod(real[] v) { return v[get_niliPeriod_id()]; }
real get_niliPeriod_array(real[] v) { return get_niliPeriod(v); }
int get_niliAmplitude_id() { return 10; }
real get_niliAmplitude(real[] v) { return v[get_niliAmplitude_id()]; }
real get_niliAmplitude_array(real[] v) { return get_niliAmplitude(v); }
int get_niliPeak_id() { return 11; }
real get_niliPeak(real[] v) { return v[get_niliPeak_id()]; }
real get_niliPeak_array(real[] v) { return get_niliPeak(v); }
real[] theta_to_1d(real[] beta, real gamma, real[] thetap, real thetan, real niliPeriod, real niliAmplitude, real niliPeak) { real v[11]; v[1:3] = to_array_1d(beta); v[4] = gamma; v[5:7] = to_array_1d(thetap); v[8] = thetan; v[9] = niliPeriod; v[10] = niliAmplitude; v[11] = niliPeak; return v; }
int get_theta_dim() { return 11; }
int get_S_id(int[] index) { return index_to_1d(index, {3} , 0); }
real get_S(real[] v, int[] index) { return v[get_S_id(index)]; }
real[] get_S_array(real[] v) { real arr[3];
              for (id1 in 1:3) arr[id1] = get_S(v, {id1});
              return arr; }
int get_I_id(int[] index) { return index_to_1d(index, {3} , 3); }
real get_I(real[] v, int[] index) { return v[get_I_id(index)]; }
real[] get_I_array(real[] v) { real arr[3];
              for (id1 in 1:3) arr[id1] = get_I(v, {id1});
              return arr; }
int get_R_id(int[] index) { return index_to_1d(index, {3} , 6); }
real get_R(real[] v, int[] index) { return v[get_R_id(index)]; }
real[] get_R_array(real[] v) { real arr[3];
              for (id1 in 1:3) arr[id1] = get_R(v, {id1});
              return arr; }
int get_NILI_id() { return 10; }
real get_NILI(real[] v) { return v[get_NILI_id()]; }
real get_NILI_array(real[] v) { return get_NILI(v); }
int get_PILI_id(int[] index) { return index_to_1d(index, {3} , 10); }
real get_PILI(real[] v, int[] index) { return v[get_PILI_id(index)]; }
real[] get_PILI_array(real[] v) { real arr[3];
              for (id1 in 1:3) arr[id1] = get_PILI(v, {id1});
              return arr; }
real[] state_to_1d(real[] S, real[] I, real[] R, real NILI, real[] PILI) { real v[13]; v[1:3] = to_array_1d(S); v[4:6] = to_array_1d(I); v[7:9] = to_array_1d(R); v[10] = NILI; v[11:13] = to_array_1d(PILI); return v; }
int get_state_dim() { return 13; }
int get_N_id() { return 1; }
real get_N(real[] v) { return v[get_N_id()]; }
real get_N_array(real[] v) { return get_N(v); }
real[] xr_to_1d(real N) { real v[1]; v[1] = N; return v; }
int get_xr_dim() { return 1; }
int get_K_id() { return 1; }
int get_K(int[] v) { return v[get_K_id()]; }
int get_K_array(int[] v) { return get_K(v); }
int[] xi_to_1d(int K) { int v[1]; v[1] = K; return v; }
int get_xi_dim() { return 1; }

EDIT: clarify text

chol2inv function

Summary:

Expose chol2inv function which is implemented in stan-dev/math.

Description:

Efficient conversion from precision to covariance parameterizations on the Cholesky factor scale.

Current Version:

v2.17.1

Don't elide offset and multiplier of 0 and 1 when pretty-printing

Write unit tests for parser

We currently have a lot of integration tests, but unit tests are missing.

Write pretty printer in terms of `('a, 'b) statement` and `'a expression`

and pretty_print_statement = function
  | UntypedStmt (s_content, _) -> (
    match s_content with

you can basically delete the last two lines of the above to achieve this. No reason the pretty printer needs to know about locations or other metadata, I believe.

Add user-defined higher-order functions support

Extend the Stan language with general user-defined higher-order functions. To achieve this, at least the following needs to be done:

decide on a good syntax;
extend lexer.mll and parser.mly to accept it and construct appropriate AST node (AST node for higher order function definitions is already there);
(possibly) extend semantic check to accept library functions as function arguments (currently it does not, but it is a non-issue as we have no higher-order functions that could accept them anyway; library functions are treated differently from user defined functions from a typing point of view, as they can be overloaded);
extend code-gen to generate appropriate C++ for user-defined higher order functions; it should be possible to use C++11 lambdas.
write unit tests for everything.

Jenkins builds are failing non-deterministically

See the last few Jenkins builds.

For instance:

eval $(opam env) dune build @install --profile static
— Shell Script
<1s
+ opam env

[ERROR] Opam has not been initialised, please run `opam init'

+ eval

+ dune build @install --profile static

/home/jenkins-slave/jenkins-slave-files/workspace/stanc3_master-M4O3QE2CHRIUYN5T2XTQAZMQFNNAJIK55J364YWZ7FMB36TLMYPA@tmp/durable-0f88aa88/script.sh: line 1: dune: not found

script returned exit code 127

Switch over to consistently using Core_kernel

Every .ml file should open Core_kernel at the top to consistently use Core_kernel as a replacement for the standard library.

expose DAE solver interface to Stan

Summary:

integrate_dae solves a DAE.

Description:

The DAE solver from sundials solves a differential-algebraic equation. Math's DAE solver interface has the following signature.

template <typename F, typename Tpar>
std::vector<std::vector<Tpar> > integrate_dae(
    const F& f, 
    const std::vector<double>& yy0, 
    const std::vector<double>& yp0,
    double t0, const std::vector<double>& ts, 
    const std::vector<Tpar>& theta,
    const std::vector<double>& x_r, const std::vector<int>& x_i,
    const double rtol, const double atol,
    const int64_t max_num_steps = idas_integrator::IDAS_MAX_STEPS,
    std::ostream* msgs = nullptr)

Exposing it to Stan adds DAE solution feature.

Note that the DAE signature is different from that of ODE in

parameter initial condition not supported.
no default tolerance.
These limits are discussed in stan-dev/math#768.

Current Version:

v2.18.0

Implement changes to stanc2 since October 2018 in stanc3, before releasing

This should mostly consist of migrating any changes to the Math library signatures that are exposed.

multiple variable declarations on one line

Would it be possible to allow multiple declarations on one line, that avoid repeating the var_decl part of the grammar? Eg

real alpha, beta, gamma[5];
real<lower=0> a, b[3, 4];

that would be equivalent to

real alpha;
real beta;
real gamma[5];
real<lower=0> a;
real<lower=0> b[3, 4];

and similarly for other types.

This would allow more compact code, could factor out common parts. As far as I understand, this would only change the surface syntax (parser) a bit, and would require no change to the internals.

Implement lint tool for Stan

Implement a linter for Stan models to flag up common programming mistakes, e.g. using unvectorized distributions in a loop where a vectorized version is available. This would involve

adding an extra command line argument for lint mode;
writing the lint algorithm which walks the AST, similar to the semantic check, printing warnings and giving hints where there is reason to suspect a better programming style might be possible (to be contrasted with the semantic check, which rejects programs which would clearly throw run-time errors).

Replace `in_loop` with `loop_depth` counter, in semantic check

This is to correctly deal with nested loops

Switch stanc.ml to Core_kernel Command module from Arg module

https://dev.realworldocaml.org/command-line-parsing.html

parse integers in scientific notation out of dump format

Improve the integer parser for the dump format to accept integers like 1e8 by noticing that they have no decimals in their mantissa.

It might be easiest to read them into double-precision floats, then cast to int.

We should also test for overflow in sizes while we still restrict to int.

Merge AST nodes Single and Multiple index

Distinguish between the two in the semantic check, using type information of the index (int vs int[]). The distinction is semantic and the two nodes should only be separated out once we pass to the MIR.

Add closures to Stan language

The point of this feature is to add lexical closures to the Stan language.
For a possible use case, see e.g. this thread .

It should be relatively straightforward to add these in to stanc3. The AST already treats function definitions as statements that could theoretically occur anywhere in the program. It is the parser that currently disallows that. The point of this issue will be to modify the parser to allow function definitions anywhere and to deprecate the functions block. A challenge would be to change the logic in the semantic check to now check at the level of the whole program whether there is a function declaration which is missing a definition.

A strong requirement for the PR would be a decent set of good and bad test models to make sure sensible error messages are generated.

Allow for type conversion from (iterated) int[,] to real[,]

This is used in prophet. Add the prophet code as a test model.

add switch to enable threading (+ MPI?)

Summary:

Right now it is a matter of specifying appropriate compile time defines like -DSTAN_THREADS in order to switch threading support on. We should consider changing stanc to have an option which turns on threading independent of compiler settings.

Description:

I guess all what we need to do is to add into the services a flag which will then cause the parser to output at the right position a #define STAN_THREADS.

The same logic can be applied to MPI (and possibly GPU).

The goal is to have a consistent way for the interfaces to request threading being enabled.

Reproducible Steps:

Current Output:

Expected Output:

Additional Information:

Provide any additional information here.

Current Version:

v2.18.0

reverse deprecation on abs() and fabs()

I think we should deprecate

real fabs(real)

and go with just

real abs(real)
int abs(int)

The only drawback is that we won't have a function that applies to an integer and returns a real value, but I don't think we need that. Returning an integer won't lose information, and Stan will just promote it to a real value if necessary to use it elsewhere. And we'll still get integer absolute values for integers.

I propose getting rid of everything we're deprecating in Stan 3.

undeprecate abs(real)
deprecate fabs()
update manual explaing what's going on

Add exit() to the language

Summary:

Add an exit() function to the language.

We could also call it error(), fatal(), or fatal_error().

Description:

Currently, we have reject() in the Stan language. Using reject() rejects the current iteration.

We should have matching exit() calls in the Stan language. Using exit() should stop algorithms from continuing. This will indicate that there is something wrong and shouldn't be ignored. As opposed to reject() which will continue onto the next iteration.

Decide on what to throw in the C++ when exit() is used
If necessary, add a function to call to the math library.
Add exit(...) to the language to match the function signatures of reject().
Update manual with description of behavior of reject() and exit().
Update algorithms to exit with exit() call.

Reproducible Steps:

Here's an example:

parameters {
  real<lower = 0, upper = 1> theta;
}
model {
  exit();
}

Current Output:

Currently, this will show that the 100 tries for initialization failed.

Expected Output:

Exit the algorithm.

Current Version:

v2.11.0

Make compiler throw warnings

We need to throw warnings for deprecated constructs.

Are there any more that are desired?

Collapse FunApp and CondDistApp AST nodes

These nodes can be merged as long as we add a label to the node to indicate which of the two it is. That way, we can share more code in the semantic check and simplify our lives in terms of maintenance.

feature request: unicode in source

Introduction

Some languages now support Unicode (mostly UTF8) for writing source code. It would be great if one could also use Unicode in Stan source. (Note that comments in UTF8, or any superset that embeds ASCII, are already supported in the sense the parser just ignores them.)

Broadly, there are two possible levels of support:

in variable and function names (eg ϕ), and
in operators (eg ≤), which provide synonyms for existing ones (eg <=)

Example

This is how the 8 schools example would look like in unicode:

data {
  int<lower=0> J;             // number of schools
  real y[J];                  // estimated treatment effect (school j)
  real<lower=0> σ[J];         // std err of effect estimate (school j)
}
parameters {
  real μ;
  real θ[J];
  real<lower=0> τ;
}
model {
  θ ~ normal(μ, τ); 
  y ~ normal(θ, σ);
}

Possible benefits

more compact source code
better mapping to equations in papers

Possible downsides

editor/entry support
font support
possibly corrupted files

The first two are mitigated by the fact that ASCII is a subset of UTF8, so using the feature is optional.

UTF8 support in various languages which have interfaces for Stan

language	literals	identifiers	operators	would UTF8 variables work for interfacing with Stan?
R	yes	yes	no	yes
Python	yes	only from version 3	no	yes, even in Python 2, as they are used as literal keys
Julia	yes	yes	yes	yes
Matlab	yes	yes, but needs to be enabled	no	yes
Stata	yes	yes, from version 14	no	probably?

Editor support

Emacs

See this list for various UTF8 implementations using autocomplete, company-mode, and quail.

(** Meta-data on expressions before type checking: a location for error messages *)
and expression_untyped_metadata =
  {expr_untyped_meta_loc: location sexp_opaque [@compare.ignore]}

 (** Meta-data on expressions after type checking: a location, as well as a type
    and an origin block (lub of the origin blocks of the identifiers in it) *)
and expression_typed_metadata =
  { expr_typed_meta_origin_type: originblock * unsizedtype
  ; expr_typed_meta_loc: location sexp_opaque [@compare.ignore] }
and untyped_expression =
  | UntypedExpr of (untyped_expression expression * expression_untyped_metadata)

and typed_expression =
  | TypedExpr of (typed_expression expression * expression_typed_metadata)

and untyped_expression =
  { expr_untyped: untyped_expression expression
  ; expr_untyped_loc: location sexp_opaque [@compare.ignore] }

and typed_expression =
  { expr_typed: typed_expression expression
  ; expr_typed_loc: location sexp_opaque [@compare.ignore]
  ; expr_typed_origin: originblock
  ; expr_typed_type: unsizedtype }

Which made use much easier as well, see this commit for more details (where we managed to almost half the lines touched): 72c4c6b?w=1

Print errors on stderr (probably)

I'm looking at the behavior of the stanc executable and noticing that asking it to pretty-print will pretty-print and then print error messages. I think this might be reasonable but probably we should send the errors to stderr? I don't feel that strongly but this will allow people to at least capture the pretty output even if there are errors.

Change operator precedence of .* and ./ to be the same as of * and /

Note that this is a bug in stanc2 as well: stan-dev/stan#2711 .

Improve include paths backtrace

Add line and character numbers to the files we are including from, rather than just listing the chain of file names.

Change location and scale keywords to offset and multiplier

Set up Jenkins to build OSX and Windows binaries for deployment

promote int to real, double-based divide function

to_real()

Add a special function:

real to_real(int);

code
tests
doc in manual

Suggested implementation:

double to_real(int x) { return x; }

Would this clearer with a static cast?

div()

Add a special function

real div(real,real);

code
tests
doc in manual

Suggested implementation:

double div(double x, double y) { return x / y; }

Motivation

The motivation is that I'm implementing some mark-recapture models that involve integer populations, but I need a fractional bound on a real parameter that's currently cumbersome to write:

data {
  int<lower=0> M;                       // marked
  int<lower=0> C;                       // captured
  int<lower=0,upper=min(M,C)> R;        // recaptured
}
transformed data { // HACK!
  real theta_max;     
  theta_max <- M;         
  theta_max <- theta_max / (C - R + M);
}
parameters {
  real<lower=0,upper=theta_max> theta;  // proportion marked
}
...

I would prefer to get rid of the transformed data division hack and just write:

parameters {
  real<lower=0,upper=div(M, C - R + M)> theta;  // proportion marked
}

Without the constraint, you get inconsistent values of theta that are above the upper bound.

Pretty printer should print iterated arrays as [,] rather than [][]

Double ;; Induces Parser Error

Description:

Extra semicolons ending a line throw a parser error despite the fact that an empty line causes no problems.

Reproducible Steps:

parameters {
  real x1;;
  real x2;
}

model {
  x1 ~ normal(0, 1);
  x2 ~ normal(0, 1);
}

Current Output:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

  error in 'bad.stan' at line 2, column 11
  -------------------------------------------------
     1: parameters {
     2:   real x1;;
                  ^
     3:   real x2;
  -------------------------------------------------

PARSER EXPECTED: <one of the following:
  a variable declaration, beginning with type,
      (int, real, vector, row_vector, matrix, unit_vector,
       simplex, ordered, positive_ordered,
       corr_matrix, cov_matrix,
       cholesky_corr, cholesky_cov
  or '}' to close variable declarations>

Expected Output:

No error or something more explicit like "Double semicolons not allowed".

Current Version:

v2.16.0

Update AST etc to use list of transform[ation]s

Right now we have an AST that allows one transformation per variable declaration, but 1) syntactically we are actually sometimes expressing two with e.g. <lower=0, upper=1>, and 2) we think we will allow more in the future. If we upgrade now to a list of transformations, we can express the idea of the lowerupper transformation more clearly as two serial transforms.

Create more code reuse in error reporting

Syntax and semantic errors should be able to share the same code; similarly for lexing and include errors. There should be some sharing possible with warnings as well.

add syntax for mixtures of distributions

Andrew suggested something to make mixture modeling easier. I'm not sure what he's thinking in terms of syntax, but maybe something like a distribution like this:

y ~ finite_mixture(lambda, 
                   normal(mu1,sigma1), 
                   normal(mu2,sigma2), 
                   normal(mu3,sigma3) );

or a special "function" like

target += finite_mixture(lambda,
                         normal_log(y,mu1,sigma1),
                         normal_log(y,mu2,sigma2),
                         normal_log(y,mu3,sigma3));

Of course, these aren't functions with the usual type of signature.

Another issue is how to deal with vectorization.

Investigate removing context_flags (and Symbol_table) from global state

Right now in Semantic_check.ml we have two global objects that get modified and passed around a lot: context_flags and the Symbol_table. I think the ways in which we use these typically involve modifying them for the execution of some function and then resetting them back to their original state after the function has executed. It looks like it might be less messy with these two just passed in to the function calls in question and having them return a new pair that we can update with if need be (seems rare). I'm not sure this is the case but we can use this issue to track an investigation into this possibility - maybe some highlighting some places where it's messy vs. clean in each approach and attempting to estimate the net effect of each.

Improve error messages

Print more of the source file surrounding the error when we emit an error message. Maybe make it look like the existing compiler?

rename cov_matrix to spd_matrix

"spd" is for symmetric positive definite. The problem with "cov_matrix" is that they can be used for precision matrices, too, making the declaration look confusing.

This will also match the functions Marcus has added (but aren't quite yet through the pipe into the Stan language).

We should just deprecate cov_matrix with a warning.

We should also rename the Cholesky factors to match.

(Thanks to Ben for noticing the problem and Marcus for suggesting a better name.)

Eliminate structs that wrap user-defined functions

Summary:

The parser should not generate structs that call user-defined functions anymore. We should use generalized lambda functions instead.

Description:

A user-defined function foo will generate additional C++ that looks something like

struct foo_functor__ {
    templates
    return_type
    operator()(..., std::ostream* pstream__) const {
        return foo(..., pstream__);
    }
};

irrespective of whether foo_functor__ actually gets called by one of the functionals. The parser should not generate foo_functor__, and if functionals are used in the Stan program, then stanc should utilize a generalized lambda function that calls foo.

This would also allow us to avoid the unnecessary restriction that you cannot define multiple user-defined functions with the same name (but different signatures). See

stan-dev/stan#1547

Reproducible Steps:

Parse

functions {
  void foo() {
  
  }
}

and look at the C++ code

Current Output:

struct foo_functor__ {
            void
    operator()(std::ostream* pstream__) const {
        return foo(pstream__);
    }
};

Expected Output:

Not that part

Additional Information:

None

Current Version:

v2.18.0

Refactor pretty printer to use Format library

ambiguity of `1./a` (real literal and infixOp)

Summary:

1./a is parsed as 1.0/a instead of 1 ./ a (where are a is a vector). This confused me a bit.

To avoid this ambiguity I would prefer to not allow 1. but enforce 1.0 as I would consider the latter more readable anyways. If you care to much about breaking existing code, please consider to have the parser issue a warning in this case.

Related: In the grammar in the manual the definitions of numeric_literal and real_literal are equivalent: (every numeric is real)

numeric_literal ::= integer_literal | real_literal
integer_literal ::= 0 | [1-9] [0-9] *
real_literal ::= integer_literal ?('.' [0-9] * ) ?exp_literal

I would prefer:
real_literal ::= integer_literal '.' [0-9] + ?exp_literal

Current Version:

rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)

remove namespace scoped using statements in generated code for model

Summary:

See title.

Description:

We don't need those top-level using statements; we can move them to where they're needed.

Reproducible Steps:

Just look at the C++ generated for any Stan program.

Current Output:

Generates these using statements:

namespace unit_model_namespace {

using std::istream;
using std::string;
using std::stringstream;
using std::vector;
using stan::io::dump;
using stan::math::lgamma;
using stan::model::prob_grad;
using namespace stan::math;

Expected Output:

Nothing at this scope; everything pushed down to where needed or explicitly qualified in its output.

Current Version:

v2.9.0

stan-dev / stanc3 Goto Github PK

stanc3's Issues

Summary:

Description:

Reproducible Steps:

Current Output:

Expected Output:

Additional Information:

Current Version:

Summary:

Description:

Reproducible Steps:

Current Output:

Expected Output:

Additional Information:

Current Version:

Summary:

Description:

Current Version:

Summary:

Description:

Current Version:

Summary:

Description:

Reproducible Steps:

Current Output:

Expected Output:

Additional Information:

Current Version:

Summary:

Description:

Reproducible Steps:

Current Output:

Expected Output:

Current Version:

Introduction

Example

Possible benefits

Possible downsides

UTF8 support in various languages which have interfaces for Stan

Editor support

Emacs

See also

to_real()

div()

Motivation

Description:

Reproducible Steps:

Current Output:

Expected Output:

Current Version:

Summary:

Description:

Reproducible Steps:

Current Output:

Expected Output:

Additional Information:

Current Version:

Summary:

Current Version:

Summary:

Description:

Reproducible Steps:

Current Output:

Expected Output:

Current Version:

Recommend Projects

Recommend Topics

Recommend Org