stan-dev / stanc3 Goto Github PK
View Code? Open in Web Editor NEWThe Stan transpiler (from Stan to C++ and beyond).
License: BSD 3-Clause "New" or "Revised" License
The Stan transpiler (from Stan to C++ and beyond).
License: BSD 3-Clause "New" or "Revised" License
Right now AST locations are:
type location =
| Location of Lexing.position * Lexing.position (** delimited location *)
| Nowhere (** no location *)
...
type position = {
pos_fname : string;
pos_lnum : int;
pos_bol : int;
pos_cnum : int;
}
This doesn't seem to support the include use case where a statement comes from a file but may have been included by other files (unless I'm thinking about this wrong).
From this repo: https://github.com/stan-dev/example-models .
Note that some of these may need to be fixed, as they don't compile in the current Stan compiler.
Ultimately, the plan is to remove most of that repo, as most of these models are very hard to sample from. They are good tests for the grammar, semantic check and generator though, so we should rescue them before they get removed!
It is a parser error to declare and define a one-dimensional integer array using the colon operator.
It is a parser error to declare and define a one-dimensional integer array using the colon operator.
Try to parse the following Stan program
data {
int<lower=1> K;
}
transformed data {
int arr[K] = 1:K;
}
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
variable definition dimensions mismatch, definition specifies 1, declaration specifies 0 error in 'model_int_array' at line 5, column 17
-------------------------------------------------
3: }
4: transformed data {
5: int arr[K] = 1:K;
^
6: }
-------------------------------------------------
Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname, :
failed to parse Stan model 'int_array' due to the above error.
Nothing. It should assign the integers from 1
to K
to arr
v2.17.0
There are tools out there (autocomplete, live doc, etc) that consume a list of function signatures in a regular format - it'd be good to output one. Eventually we can think about what else a presentation compiler would look like.
Currently, we just print a type error message. It could be good to emphasize in the message that the data only restriction is what is causing type checking to fail.
In the current compiler, this model throws a couple of errors we aren't throwing:
model { 1 ~ bernoulli(0.2) T[0.1, 1.1]; }
, namely
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for:
real ~ bernoulli_cdf(real)
Available argument signatures for bernoulli_cdf:
int ~ bernoulli_cdf(real)
int ~ bernoulli_cdf(real[ ])
int ~ bernoulli_cdf(vector)
int ~ bernoulli_cdf(row_vector)
int[ ] ~ bernoulli_cdf(real)
int[ ] ~ bernoulli_cdf(real[ ])
int[ ] ~ bernoulli_cdf(vector)
int[ ] ~ bernoulli_cdf(row_vector)
Lower truncation not defined for specified arguments to bernoulli
error in '../stanc3/test.stan' at line 2, column 34
-------------------------------------------------
1: model {
2: 1 ~ bernoulli(0.2) T[0.1, 1.1];
^
3: }
-------------------------------------------------
make: *** [../stanc3/test.hpp] Error 253
that first error message might be spurious given that 1 could also be an int.
I tried to compile a stan model, which was successfully parsed, but fails to compile due to:
./sirStrain.hpp:2822:40: error: ‘pstream__’ was not declared in this scope
. Compilation was attempted both with rstan and cmdstan
See Summary
Try to compile with cmdStan (2.18.0):
make sirStrain
Note that the actual stan code is included below.
$ make sirStrain
--- Translating Stan model to C++ code ---
bin/stanc sirStrain.stan --o=sirStrain.hpp
Model name=sirStrain_model
Input file=sirStrain.stan
Output file=sirStrain.hpp
DIAGNOSTIC(S) FROM PARSER:
Warning: left-hand side variable (name=healthy) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
Warning: left-hand side variable (name=healthy) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
Warning: left-hand side variable (name=y0) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
Warning: left-hand side variable (name=ili) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
Compiling pre-compiled header
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/sundials_3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS
-DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe -c -O3
stan/src/stan/model/model_header.hpp -o stan/src/stan/model/model_header.hpp.gch
--- Linking C++ model ---
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/sundials_3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS
-DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe src/c
mdstan/main.cpp -O3 -o sirStrain -include sirStrain.hpp stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundial
s_idas.a
In file included from <command-line>:0:0:
./sirStrain.hpp: In member function ‘void sirStrain_model_namespace::sirStrain_model::get_dims(std::vector<std::vector<long unsigned int> >&) const’:
./sirStrain.hpp:2822:40: error: ‘pstream__’ was not declared in this scope
dims__.push_back(get_state_dim(pstream__));
^~~~~~~~~
./sirStrain.hpp:2822:40: note: suggested alternative: ‘pthread_t’
dims__.push_back(get_state_dim(pstream__));
^~~~~~~~~
pthread_t
In file included from <command-line>:0:0:
./sirStrain.hpp: In member function ‘void sirStrain_model_namespace::sirStrain_model::constrained_param_names(std::vector<std::__cxx11::basic_string<char> >&, bool, bool) const’:
./sirStrain.hpp:3159:56: error: ‘pstream__’ was not declared in this scope
for (int k_0__ = 1; k_0__ <= get_state_dim(pstream__); ++k_0__) {
^~~~~~~~~
./sirStrain.hpp:3159:56: note: suggested alternative: ‘pthread_t’
for (int k_0__ = 1; k_0__ <= get_state_dim(pstream__); ++k_0__) {
^~~~~~~~~
pthread_t
./sirStrain.hpp:3164:56: error: ‘pstream__’ was not declared in this scope
for (int k_1__ = 1; k_1__ <= get_state_dim(pstream__); ++k_1__) {
^~~~~~~~~
./sirStrain.hpp:3164:56: note: suggested alternative: ‘pthread_t’
for (int k_1__ = 1; k_1__ <= get_state_dim(pstream__); ++k_1__) {
^~~~~~~~~
pthread_t
./sirStrain.hpp:3171:56: error: ‘pstream__’ was not declared in this scope
for (int k_0__ = 1; k_0__ <= get_theta_dim(pstream__); ++k_0__) {
^~~~~~~~~
./sirStrain.hpp:3171:56: note: suggested alternative: ‘pthread_t’
for (int k_0__ = 1; k_0__ <= get_theta_dim(pstream__); ++k_0__) {
^~~~~~~~~
pthread_t
./sirStrain.hpp: In member function ‘void sirStrain_model_namespace::sirStrain_model::unconstrained_param_names(std::vector<std::__cxx11::basic_string<char> >&, bool, bool) const’:
./sirStrain.hpp:3253:56: error: ‘pstream__’ was not declared in this scope
for (int k_0__ = 1; k_0__ <= get_state_dim(pstream__); ++k_0__) {
^~~~~~~~~
./sirStrain.hpp:3253:56: note: suggested alternative: ‘pthread_t’
for (int k_0__ = 1; k_0__ <= get_state_dim(pstream__); ++k_0__) {
^~~~~~~~~
pthread_t
./sirStrain.hpp:3258:56: error: ‘pstream__’ was not declared in this scope
for (int k_1__ = 1; k_1__ <= get_state_dim(pstream__); ++k_1__) {
^~~~~~~~~
./sirStrain.hpp:3258:56: note: suggested alternative: ‘pthread_t’
for (int k_1__ = 1; k_1__ <= get_state_dim(pstream__); ++k_1__) {
^~~~~~~~~
pthread_t
./sirStrain.hpp:3265:56: error: ‘pstream__’ was not declared in this scope
for (int k_0__ = 1; k_0__ <= get_theta_dim(pstream__); ++k_0__) {
^~~~~~~~~
./sirStrain.hpp:3265:56: note: suggested alternative: ‘pthread_t’
for (int k_0__ = 1; k_0__ <= get_theta_dim(pstream__); ++k_0__) {
^~~~~~~~~
pthread_t
make/models:14: recipe for target 'sirStrain' failed
make: *** [sirStrain] Error 1
A working model
I tested this both from R (rstan) and cmdStan. In R it failed, but did not provide helpful information. Included here is the cmdStan output.
v2.18.0
The model consists of two files. The main file: sirStrain.stan
functions {
#include sirFunctions.stan
real approxbin_lpdf(real k, real n, real p) {
return beta_lpdf(p | k+1, n-k+1);
}
real niliForcing(real t, real amplitude, real period, real peakTime) {
real sigma;
sigma = 0.25*period;
return amplitude*(exp(-pow(t-peakTime,2)/(2*pow(sigma,2)))-1);
}
real[] ode(real time, real[] state, real[] theta, real[] x_r, int[] x_i) {
real dydt[get_state_dim()];
real N;
real healthy;
healthy = N;
N = get_N(x_r);
for (k in 1:get_K(x_i)) {
real newr;
newr = get_gamma(theta)*get_I(state, {k});
dydt[get_S_id({k})] = -get_beta(theta, {k})*get_S(state, {k})*get_I(state, {k})/N;
dydt[get_I_id({k})] = get_beta(theta, {k})*get_S(state, {k})*get_I(state, {k})/N -
newr;
dydt[get_R_id({k})] = newr;
dydt[get_PILI_id({k})] = newr*get_thetap(theta, {k});
healthy = healthy - get_I(state, {k});
}
healthy = fmin(healthy, 1.0);
dydt[get_NILI_id()] = exp(get_thetan(theta) +
niliForcing(time, get_niliAmplitude(theta), get_niliPeriod(theta), get_niliPeak(theta))
)*healthy;
return dydt;
}
}
data {
int T;
int K;
real N;
real ts[T];
real sero_mu[K];
real sero_sd[K];
real niliPeak_mu;
real thetap_alpha[K];
real thetap_beta[K];
int Y[T-1,K+1];
real YILI[T,2];
}
transformed data {
real x_r[get_xr_dim()];
int x_i[get_xi_dim()];
x_r = xr_to_1d(N);
x_i = xi_to_1d(K);
}
parameters {
real<lower=-10,upper=-6> I0[K];
real<lower=0,upper=1> susc[K];
real<lower=0,upper=1> asc;
real<lower=0,upper=4> beta[K];
real<lower=0,upper=1> gamma;
real<lower=0,upper=1> thetap[K];
real<lower=-15,upper=0> thetan;
real<lower=40,upper=200> niliPeriod;
real<lower=0,upper=3> niliAmplitude;
real<lower=14,upper=120> niliPeak;
}
transformed parameters {
real y0[get_state_dim()];
real y[T, get_state_dim()];
real theta[get_theta_dim()];
real pili[T-1,K];
real nili[T-1];
real ili[T-1];
vector[K+1] ppili[T-1];
theta = theta_to_1d(beta, gamma, thetap, thetan, niliPeriod, niliAmplitude, niliPeak);
// initial
y0[get_NILI_id()] = 0;
for (k in 1:K) {
y0[get_I_id({k})] = exp(I0[k])*susc[k]*N;
y0[get_S_id({k})] = susc[k]*N - y0[get_I_id({k})];
y0[get_R_id({k})] = (1-susc[k])*N;
y0[get_PILI_id({k})] = 0;
}
y = integrate_ode_rk45(ode, y0, 0, ts, theta, x_r, x_i);
// new nili and pili cases for last week
for(t in 2:T) {
nili[t-1] = get_NILI(y[t,]) - get_NILI(y[t-1,]);
ili[t-1] = nili[t-1];
for (k in 1:K) {
pili[t-1,k] = get_PILI(y[t,], {k}) - get_PILI(y[t-1,], {k});
ili[t-1] = ili[t-1] + pili[t-1,k];
}
}
for (t in 1:(T-1)) {
for (k in 1:K) {
ppili[k, t] = pili[t,k]/ili[t];
}
ppili[K + 1, t] = nili[t]/ili[t];
}
}
model {
real sheddingPeriod;
sheddingPeriod = 1/gamma;
sheddingPeriod ~ normal(4.8, 0.245); // carrat_time_2008
niliPeak ~ normal(niliPeak_mu + 7, 21); // +7 because we start measuring after one running weekp
for (k in 1:K) beta[k] ~ normal(1.28*gamma*susc[k], 0.133); // Assume biggerstaff is on Reff
for (k in 1:K) susc[k] ~ normal(sero_mu[k], sero_sd[k]);
for (k in 1:K) thetap[k] ~ beta(thetap_alpha[k], thetap_beta[k]);
asc ~ beta(35.644, 69.314);
for (t in 1:(T-1)) {
YILI[t,1] ~ approxbin(ili[t]*YILI[t,2]/N, asc);
Y[t] ~ multinomial(ppili[t]);
}
}
and the included file sirFunctions.stan:
int index_to_1d(int[] index, int[] dims, int offset);
int index_to_1d(int[] index, int[] dims, int offset) {
int n;
n = size(dims);
if (n == 1) {
return offset + index[1];
} else {
//return (index[n] - 1) * prod(head(dims, n - 1)) +
// index_to_1d(head(index, n - 1), head(dims, n - 1), offset);
return (index[1] - 1) * prod(tail(dims, n - 1)) +
index_to_1d(tail(index, n - 1), tail(dims, n - 1), offset);
}
}
int get_beta_id(int[] index) { return index_to_1d(index, {3} , 0); }
real get_beta(real[] v, int[] index) { return v[get_beta_id(index)]; }
real[] get_beta_array(real[] v) { real arr[3];
for (id1 in 1:3) arr[id1] = get_beta(v, {id1});
return arr; }
int get_gamma_id() { return 4; }
real get_gamma(real[] v) { return v[get_gamma_id()]; }
real get_gamma_array(real[] v) { return get_gamma(v); }
int get_thetap_id(int[] index) { return index_to_1d(index, {3} , 4); }
real get_thetap(real[] v, int[] index) { return v[get_thetap_id(index)]; }
real[] get_thetap_array(real[] v) { real arr[3];
for (id1 in 1:3) arr[id1] = get_thetap(v, {id1});
return arr; }
int get_thetan_id() { return 8; }
real get_thetan(real[] v) { return v[get_thetan_id()]; }
real get_thetan_array(real[] v) { return get_thetan(v); }
int get_niliPeriod_id() { return 9; }
real get_niliPeriod(real[] v) { return v[get_niliPeriod_id()]; }
real get_niliPeriod_array(real[] v) { return get_niliPeriod(v); }
int get_niliAmplitude_id() { return 10; }
real get_niliAmplitude(real[] v) { return v[get_niliAmplitude_id()]; }
real get_niliAmplitude_array(real[] v) { return get_niliAmplitude(v); }
int get_niliPeak_id() { return 11; }
real get_niliPeak(real[] v) { return v[get_niliPeak_id()]; }
real get_niliPeak_array(real[] v) { return get_niliPeak(v); }
real[] theta_to_1d(real[] beta, real gamma, real[] thetap, real thetan, real niliPeriod, real niliAmplitude, real niliPeak) { real v[11]; v[1:3] = to_array_1d(beta); v[4] = gamma; v[5:7] = to_array_1d(thetap); v[8] = thetan; v[9] = niliPeriod; v[10] = niliAmplitude; v[11] = niliPeak; return v; }
int get_theta_dim() { return 11; }
int get_S_id(int[] index) { return index_to_1d(index, {3} , 0); }
real get_S(real[] v, int[] index) { return v[get_S_id(index)]; }
real[] get_S_array(real[] v) { real arr[3];
for (id1 in 1:3) arr[id1] = get_S(v, {id1});
return arr; }
int get_I_id(int[] index) { return index_to_1d(index, {3} , 3); }
real get_I(real[] v, int[] index) { return v[get_I_id(index)]; }
real[] get_I_array(real[] v) { real arr[3];
for (id1 in 1:3) arr[id1] = get_I(v, {id1});
return arr; }
int get_R_id(int[] index) { return index_to_1d(index, {3} , 6); }
real get_R(real[] v, int[] index) { return v[get_R_id(index)]; }
real[] get_R_array(real[] v) { real arr[3];
for (id1 in 1:3) arr[id1] = get_R(v, {id1});
return arr; }
int get_NILI_id() { return 10; }
real get_NILI(real[] v) { return v[get_NILI_id()]; }
real get_NILI_array(real[] v) { return get_NILI(v); }
int get_PILI_id(int[] index) { return index_to_1d(index, {3} , 10); }
real get_PILI(real[] v, int[] index) { return v[get_PILI_id(index)]; }
real[] get_PILI_array(real[] v) { real arr[3];
for (id1 in 1:3) arr[id1] = get_PILI(v, {id1});
return arr; }
real[] state_to_1d(real[] S, real[] I, real[] R, real NILI, real[] PILI) { real v[13]; v[1:3] = to_array_1d(S); v[4:6] = to_array_1d(I); v[7:9] = to_array_1d(R); v[10] = NILI; v[11:13] = to_array_1d(PILI); return v; }
int get_state_dim() { return 13; }
int get_N_id() { return 1; }
real get_N(real[] v) { return v[get_N_id()]; }
real get_N_array(real[] v) { return get_N(v); }
real[] xr_to_1d(real N) { real v[1]; v[1] = N; return v; }
int get_xr_dim() { return 1; }
int get_K_id() { return 1; }
int get_K(int[] v) { return v[get_K_id()]; }
int get_K_array(int[] v) { return get_K(v); }
int[] xi_to_1d(int K) { int v[1]; v[1] = K; return v; }
int get_xi_dim() { return 1; }
EDIT: clarify text
Expose chol2inv
function which is implemented in stan-dev/math
.
Efficient conversion from precision to covariance parameterizations on the Cholesky factor scale.
v2.17.1
We currently have a lot of integration tests, but unit tests are missing.
and pretty_print_statement = function
| UntypedStmt (s_content, _) -> (
match s_content with
you can basically delete the last two lines of the above to achieve this. No reason the pretty printer needs to know about locations or other metadata, I believe.
Extend the Stan language with general user-defined higher-order functions. To achieve this, at least the following needs to be done:
See the last few Jenkins builds.
For instance:
eval $(opam env) dune build @install --profile static
— Shell Script
<1s
+ opam env
[ERROR] Opam has not been initialised, please run `opam init'
+ eval
+ dune build @install --profile static
/home/jenkins-slave/jenkins-slave-files/workspace/stanc3_master-M4O3QE2CHRIUYN5T2XTQAZMQFNNAJIK55J364YWZ7FMB36TLMYPA@tmp/durable-0f88aa88/script.sh: line 1: dune: not found
script returned exit code 127
Every .ml file should open Core_kernel at the top to consistently use Core_kernel as a replacement for the standard library.
integrate_dae
solves a DAE.
The DAE solver from sundials solves a differential-algebraic equation. Math's DAE solver interface has the following signature.
template <typename F, typename Tpar>
std::vector<std::vector<Tpar> > integrate_dae(
const F& f,
const std::vector<double>& yy0,
const std::vector<double>& yp0,
double t0, const std::vector<double>& ts,
const std::vector<Tpar>& theta,
const std::vector<double>& x_r, const std::vector<int>& x_i,
const double rtol, const double atol,
const int64_t max_num_steps = idas_integrator::IDAS_MAX_STEPS,
std::ostream* msgs = nullptr)
Exposing it to Stan adds DAE solution feature.
Note that the DAE signature is different from that of ODE in
v2.18.0
This should mostly consist of migrating any changes to the Math library signatures that are exposed.
Would it be possible to allow multiple declarations on one line, that avoid repeating the var_decl
part of the grammar? Eg
real alpha, beta, gamma[5];
real<lower=0> a, b[3, 4];
that would be equivalent to
real alpha;
real beta;
real gamma[5];
real<lower=0> a;
real<lower=0> b[3, 4];
and similarly for other types.
This would allow more compact code, could factor out common parts. As far as I understand, this would only change the surface syntax (parser) a bit, and would require no change to the internals.
Implement a linter for Stan models to flag up common programming mistakes, e.g. using unvectorized distributions in a loop where a vectorized version is available. This would involve
This is to correctly deal with nested loops
Improve the integer parser for the dump format to accept integers like 1e8
by noticing that they have no decimals in their mantissa.
It might be easiest to read them into double-precision floats, then cast to int.
We should also test for overflow in sizes while we still restrict to int
.
Distinguish between the two in the semantic check, using type information of the index (int vs int[]). The distinction is semantic and the two nodes should only be separated out once we pass to the MIR.
The point of this feature is to add lexical closures to the Stan language.
For a possible use case, see e.g. this thread .
It should be relatively straightforward to add these in to stanc3. The AST already treats function definitions as statements that could theoretically occur anywhere in the program. It is the parser that currently disallows that. The point of this issue will be to modify the parser to allow function definitions anywhere and to deprecate the functions block. A challenge would be to change the logic in the semantic check to now check at the level of the whole program whether there is a function declaration which is missing a definition.
A strong requirement for the PR would be a decent set of good and bad test models to make sure sensible error messages are generated.
This is used in prophet. Add the prophet code as a test model.
Right now it is a matter of specifying appropriate compile time defines like -DSTAN_THREADS
in order to switch threading support on. We should consider changing stanc
to have an option which turns on threading independent of compiler settings.
I guess all what we need to do is to add into the services a flag which will then cause the parser to output at the right position a #define STAN_THREADS
.
The same logic can be applied to MPI (and possibly GPU).
The goal is to have a consistent way for the interfaces to request threading being enabled.
NA
NA
NA
Provide any additional information here.
v2.18.0
I think we should deprecate
real fabs(real)
and go with just
real abs(real)
int abs(int)
The only drawback is that we won't have a function that applies to an integer and returns a real value, but I don't think we need that. Returning an integer won't lose information, and Stan will just promote it to a real value if necessary to use it elsewhere. And we'll still get integer absolute values for integers.
I propose getting rid of everything we're deprecating in Stan 3.
abs(real)
fabs()
Add an exit()
function to the language.
We could also call it error()
, fatal()
, or fatal_error()
.
Currently, we have reject()
in the Stan language. Using reject()
rejects the current iteration.
We should have matching exit()
calls in the Stan language. Using exit()
should stop algorithms from continuing. This will indicate that there is something wrong and shouldn't be ignored. As opposed to reject()
which will continue onto the next iteration.
exit()
is usedexit(...)
to the language to match the function signatures of reject()
.reject()
and exit()
.exit()
call.Here's an example:
parameters {
real<lower = 0, upper = 1> theta;
}
model {
exit();
}
Currently, this will show that the 100 tries for initialization failed.
Exit the algorithm.
v2.11.0
We need to throw warnings for deprecated constructs.
Are there any more that are desired?
These nodes can be merged as long as we add a label to the node to indicate which of the two it is. That way, we can share more code in the semantic check and simplify our lives in terms of maintenance.
Some languages now support Unicode (mostly UTF8) for writing source code. It would be great if one could also use Unicode in Stan source. (Note that comments in UTF8, or any superset that embeds ASCII, are already supported in the sense the parser just ignores them.)
Broadly, there are two possible levels of support:
ϕ
), and≤
), which provide synonyms for existing ones (eg <=
)This is how the 8 schools example would look like in unicode:
data {
int<lower=0> J; // number of schools
real y[J]; // estimated treatment effect (school j)
real<lower=0> σ[J]; // std err of effect estimate (school j)
}
parameters {
real μ;
real θ[J];
real<lower=0> τ;
}
model {
θ ~ normal(μ, τ);
y ~ normal(θ, σ);
}
The first two are mitigated by the fact that ASCII is a subset of UTF8, so using the feature is optional.
language | literals | identifiers | operators | would UTF8 variables work for interfacing with Stan? |
---|---|---|---|---|
R | yes | yes | no | yes |
Python | yes | only from version 3 | no | yes, even in Python 2, as they are used as literal keys |
Julia | yes | yes | yes | yes |
Matlab | yes | yes, but needs to be enabled | no | yes |
Stata | yes | yes, from version 14 | no | probably? |
See this list for various UTF8 implementations using autocomplete, company-mode, and quail.
It's a bit easier to deal with these recursive parameterized types if the fixed points of them use records instead of single-constructor variants with tuples. We went from
(** Meta-data on expressions before type checking: a location for error messages *)
and expression_untyped_metadata =
{expr_untyped_meta_loc: location sexp_opaque [@compare.ignore]}
(** Meta-data on expressions after type checking: a location, as well as a type
and an origin block (lub of the origin blocks of the identifiers in it) *)
and expression_typed_metadata =
{ expr_typed_meta_origin_type: originblock * unsizedtype
; expr_typed_meta_loc: location sexp_opaque [@compare.ignore] }
and untyped_expression =
| UntypedExpr of (untyped_expression expression * expression_untyped_metadata)
and typed_expression =
| TypedExpr of (typed_expression expression * expression_typed_metadata)
to
and untyped_expression =
{ expr_untyped: untyped_expression expression
; expr_untyped_loc: location sexp_opaque [@compare.ignore] }
and typed_expression =
{ expr_typed: typed_expression expression
; expr_typed_loc: location sexp_opaque [@compare.ignore]
; expr_typed_origin: originblock
; expr_typed_type: unsizedtype }
Which made use much easier as well, see this commit for more details (where we managed to almost half the lines touched): 72c4c6b?w=1
I'm looking at the behavior of the stanc executable and noticing that asking it to pretty-print will pretty-print and then print error messages. I think this might be reasonable but probably we should send the errors to stderr? I don't feel that strongly but this will allow people to at least capture the pretty output even if there are errors.
Note that this is a bug in stanc2 as well: stan-dev/stan#2711 .
Add line and character numbers to the files we are including from, rather than just listing the chain of file names.
Add a special function:
real to_real(int);
Suggested implementation:
double to_real(int x) { return x; }
Would this clearer with a static cast?
Add a special function
real div(real,real);
Suggested implementation:
double div(double x, double y) { return x / y; }
The motivation is that I'm implementing some mark-recapture models that involve integer populations, but I need a fractional bound on a real parameter that's currently cumbersome to write:
data {
int<lower=0> M; // marked
int<lower=0> C; // captured
int<lower=0,upper=min(M,C)> R; // recaptured
}
transformed data { // HACK!
real theta_max;
theta_max <- M;
theta_max <- theta_max / (C - R + M);
}
parameters {
real<lower=0,upper=theta_max> theta; // proportion marked
}
...
I would prefer to get rid of the transformed data division hack and just write:
parameters {
real<lower=0,upper=div(M, C - R + M)> theta; // proportion marked
}
Without the constraint, you get inconsistent values of theta that are above the upper bound.
Extra semicolons ending a line throw a parser error despite the fact that an empty line causes no problems.
parameters {
real x1;;
real x2;
}
model {
x1 ~ normal(0, 1);
x2 ~ normal(0, 1);
}
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
error in 'bad.stan' at line 2, column 11
-------------------------------------------------
1: parameters {
2: real x1;;
^
3: real x2;
-------------------------------------------------
PARSER EXPECTED: <one of the following:
a variable declaration, beginning with type,
(int, real, vector, row_vector, matrix, unit_vector,
simplex, ordered, positive_ordered,
corr_matrix, cov_matrix,
cholesky_corr, cholesky_cov
or '}' to close variable declarations>
No error or something more explicit like "Double semicolons not allowed".
v2.16.0
Right now we have an AST that allows one transformation per variable declaration, but 1) syntactically we are actually sometimes expressing two with e.g. <lower=0, upper=1>
, and 2) we think we will allow more in the future. If we upgrade now to a list of transformations, we can express the idea of the lowerupper transformation more clearly as two serial transforms.
Syntax and semantic errors should be able to share the same code; similarly for lexing and include errors. There should be some sharing possible with warnings as well.
Andrew suggested something to make mixture modeling easier. I'm not sure what he's thinking in terms of syntax, but maybe something like a distribution like this:
y ~ finite_mixture(lambda,
normal(mu1,sigma1),
normal(mu2,sigma2),
normal(mu3,sigma3) );
or a special "function" like
target += finite_mixture(lambda,
normal_log(y,mu1,sigma1),
normal_log(y,mu2,sigma2),
normal_log(y,mu3,sigma3));
Of course, these aren't functions with the usual type of signature.
Another issue is how to deal with vectorization.
Right now in Semantic_check.ml we have two global objects that get modified and passed around a lot: context_flags and the Symbol_table. I think the ways in which we use these typically involve modifying them for the execution of some function and then resetting them back to their original state after the function has executed. It looks like it might be less messy with these two just passed in to the function calls in question and having them return a new pair that we can update with if need be (seems rare). I'm not sure this is the case but we can use this issue to track an investigation into this possibility - maybe some highlighting some places where it's messy vs. clean in each approach and attempting to estimate the net effect of each.
Print more of the source file surrounding the error when we emit an error message. Maybe make it look like the existing compiler?
"spd" is for symmetric positive definite. The problem with "cov_matrix" is that they can be used for precision matrices, too, making the declaration look confusing.
This will also match the functions Marcus has added (but aren't quite yet through the pipe into the Stan language).
We should just deprecate cov_matrix with a warning.
We should also rename the Cholesky factors to match.
(Thanks to Ben for noticing the problem and Marcus for suggesting a better name.)
The parser should not generate structs that call user-defined functions anymore. We should use generalized lambda functions instead.
A user-defined function foo
will generate additional C++ that looks something like
struct foo_functor__ {
templates
return_type
operator()(..., std::ostream* pstream__) const {
return foo(..., pstream__);
}
};
irrespective of whether foo_functor__
actually gets called by one of the functionals. The parser should not generate foo_functor__
, and if functionals are used in the Stan program, then stanc
should utilize a generalized lambda function that calls foo
.
This would also allow us to avoid the unnecessary restriction that you cannot define multiple user-defined functions with the same name (but different signatures). See
Parse
functions {
void foo() {
}
}
and look at the C++ code
struct foo_functor__ {
void
operator()(std::ostream* pstream__) const {
return foo(pstream__);
}
};
Not that part
None
v2.18.0
1./a
is parsed as 1.0/a
instead of 1 ./ a
(where are a is a vector
). This confused me a bit.
To avoid this ambiguity I would prefer to not allow 1.
but enforce 1.0
as I would consider the latter more readable anyways. If you care to much about breaking existing code, please consider to have the parser issue a warning in this case.
Related: In the grammar in the manual the definitions of numeric_literal
and real_literal
are equivalent: (every numeric is real)
numeric_literal ::= integer_literal | real_literal
integer_literal ::= 0 | [1-9] [0-9] *
real_literal ::= integer_literal ?('.' [0-9] * ) ?exp_literal
I would prefer:
real_literal ::= integer_literal '.' [0-9] + ?exp_literal
rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)
See title.
We don't need those top-level using statements; we can move them to where they're needed.
Just look at the C++ generated for any Stan program.
Generates these using statements:
namespace unit_model_namespace {
using std::istream;
using std::string;
using std::stringstream;
using std::vector;
using stan::io::dump;
using stan::math::lgamma;
using stan::model::prob_grad;
using namespace stan::math;
Nothing at this scope; everything pushed down to where needed or explicitly qualified in its output.
v2.9.0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.