morganstanley / hobbes Goto Github PK

A language and an embedded JIT compiler

Home Page: http://hobbes.readthedocs.io/

License: Apache License 2.0

CMake 0.16% C++ 26.00% C 70.87% Shell 0.03% Lex 0.27% Yacc 1.50% Dockerfile 0.01% Python 0.94% Nix 0.18% Vim Script 0.04%

haskell-application hobbes

hobbes's Introduction

hobbes

a language, embedded compiler, and runtime for efficient dynamic expression evaluation, data storage and analysis

section	description
Building	how to build and install hobbes
Embedding	use hobbes in C++ programs
Evaluation	evaluate basic hobbes expressions
Storage	record data for out-of-band analysis
Networking	interact with remote hobbes processes
Comprehensions	transform/sort/join/filter/group sequences for data analysis
Pattern Matching	efficiently classify and destructure data
Parsing	parse text with LALR(1) grammars
Type Classes	overloading and compile-time calculation
Unqualifier Modules	user-defined "compiler plugins" for custom constraint handling

Note on Hobbes Usage

Hobbes is built for high performance integration with C/C++ applications. While Hobbes is a strongly typed language that offers compile-time checks, it doesn't have a sandboxed runtime environment or runtime safety features. By design, Hobbes gives direct access to memory and does not have array bounds checks. Additionally, Hobbes supports compilation and execution of native code remotely over a network (RPC). This feature is meant for use within your trusted internal network only. If you choose to utilize such functionality, you need to be aware of these design choices and understand the security implications.

Building

To build hobbes, you will need LLVM 3.3 or later, cmake 3.4 or later, GNU gcc 4.8 or later, and a version 2.5 or later Linux kernel.

With LLVM, cmake, and g++ installed, after downloading this code you should be able to build and install hobbes just by running:

$ cmake .
$ make
$ make install

Depending on where you've installed LLVM, this might not work until you give cmake the path to LLVM's cmake modules. In particular, you want the path to LLVM's LLVMConfig.cmake file. If you set the environment variable LLVM_DIR to point to that directory, then the previous steps should complete successfully.

The build process will produce a static library, libhobbes.a, which can be linked into a C++ executable (if you want to use hobbes in a .so instead, the build produces a different static library to use, libhobbes-pic.a).

In addition, the build process will produce two utility programs, hi and hog. The hi program is a basic interactive interpreter for hobbes expressions and the hog program will efficiently record structured data produced by applications into data files (these files can be loaded and queried at the same time by hi). The source code for these programs can be informative as well, because they demonstrate many different aspects of the hobbes API.

Embedding

Let's consider how to embed hobbes in a simple C++ program. This code implements a very basic shell, similar to hi:

#include <iostream>
#include <stdexcept>
#include <hobbes/hobbes.H>

int main() {
  hobbes::cc c;

  while (std::cin) {
    std::cout << "> " << std::flush;

    std::string line;
    std::getline(std::cin, line);
    if (line == ":q") break;

    try {
      c.compileFn<void()>("print(" + line + ")")();
    } catch (std::exception& ex) {
      std::cout << "*** " << ex.what();
    }

    std::cout << std::endl;
    hobbes::resetMemoryPool();
  }

  return 0;
}

First, to compile any expression we need to construct a hobbes::cc object. Then, in the context of an exception handler, we can compile functions out of this hobbes::cc object with the compileFn method, giving it the type we expect back (void() in this case) and also a string that we expect can compile to a value of that type. If any stage of compilation fails (parse error, type mismatch, etc) then an exception with details about the failure will be raised. Finally, we call hobbes::resetMemoryPool() to release any memory that might have been dynamically allocated by our compiled expression (that is, memory allocated by compiled expressions but not the compiled functions themselves -- those are released only when hobbes::cc is destroyed or when hobbes::cc::releaseMachineCode is used to destroy them).

When a compiled function decides to allocate memory, that allocation happens out of a "memory region". A memory region is a dynamically-growable sequence of bytes where allocations come from, and it is deallocated in bulk when hobbes::resetMemoryPool() is called. This makes allocation and deallocation very efficient, but requires some thought to establish "logical transaction" boundaries. The active memory region is thread-local, so you can use the same function pointer in several threads concurrently without worrying about synchronization issues.

Finally, if we put the above program in a file called "test.cpp" then we can build it like this:

$ g++ -pthread -std=c++11 -I <path-to-hobbes-headers> -I <path-to-llvm-headers> test.cpp -o test -L <path-to-hobbes-libs> -lhobbes -ldl -lrt -ltinfo -lz -L <path-to-llvm-libs> `llvm-config --libs x86asmparser x86codegen x86 mcjit passes`

The explicit path statements may not be necessary depending on where/how LLVM and hobbes have been installed on your system. The inline invocation of the llvm-config program is typical with users of LLVM, to avoid explicitly listing several libraries.

If everything worked correctly, we should now have a simple shell where we can evaluate hobbes expressions.

Evaluation

Now that we have a working shell, we can experiment with expressions to get a sense of how the hobbes language works. To start with, we have some typical constant values:

> 'c'
'c'
> 42
42
> 3.14159
3.14159

In total, these are the set of supported primitive types/constants:

name	example	description
unit	`()`	like 'void' in C, a trivial type with just one value
bool	`false`	either true or false
char	`'c'`	a single character of text
byte	`0Xff`	a single byte (0-255)
short	`42S`	a 2-byte number
int	`42`	a 4-byte number
long	`42L`	an 8-byte number
float	`42.0f`	a 4-byte floating point value
double	`42.0`	an 8-byte floating point value

These primitives can be combined with arrays:

> [1, 2, 3]
[1, 2, 3]
> "foobar"
"foobar"
> 0xdeadbeef
0xdeadbeef

They can also be combined with records/tuples:

> {name="Jimmy", age=45, job="programmer"}
{name="Jimmy", age=45, job="programmer"}
> ("Jimmy", 45, "programmer")
("Jimmy", 45, "programmer")

Or with arrays of records/tuples, where they will print conveniently as a table:

> [{name="Jimmy", age=45, job="programmer"}, {name="Chicken", age=40, job="programmer"}]
   name age        job
------- --- ----------
  Jimmy  45 programmer
Chicken  40 programmer
> [("Jimmy", 45, "programmer"),("Chicken", 40, "programmer")]
  Jimmy 45 programmer
Chicken 40 programmer

We can combine types with variants or sums (the "nameless" form of variants, as tuples are to records). Because "naked" variant introduction is incomplete by definition, we may sometimes need to introduce explicit type annotations:

> |food="pizza"|::|vehicle:int,food:[char]|
|food="pizza"|
> |1="pizza"|::(int+[char])
|1="pizza"|

Also, variants can be combined with isorecursive types to make linked lists. These have a convenient form when printed, and built-in functions to avoid the recursive and variant type boilerplate:

> roll(|1=("chicken", roll(|1=("hot dog", roll(|1=("pizza", roll(|0=()|))|))|))|) :: ^x.(()+([char]*x))
"chicken":"hot dog":"pizza":[]
> cons("chicken", cons("hot dog", cons("pizza", nil())))
"chicken":"hot dog":"pizza":[]

These types and type constructors together make up the "algebraic data types" common in the ML family of programming languages (SML, ocaml, Haskell, ...). The syntax and names for these types also come from ML and common academic programming language textbooks as in TaPL and PFPL.

As in Haskell, hobbes supports a form of qualified types with type classes and even user-defined constraint resolvers (which we will see in more detail later). Among many other uses, this allows both mixed-type arithmetic and type inference to coexist:

> 0X01+2+3.0+4L+5S
15
> (\x y z u v.x+y+z+u+v)(0X01,2,3S,4L,5.0)
15

We can also use the hi program (a slightly more complex interpreter than the example earlier, distributed with hobbes) to evaluate expressions like this and also inspect their types. For example, the types for the primitive expressions we considered earlier can be queried:

$ hi
hi : an interactive shell for hobbes
      type ':h' for help on commands

> :t ()
()
> :t false
bool
> :t 'c'
char
> :t 0Xff
byte
> :t 42S
short
> :t 42
int
> :t 42L
long
> :t 42.0f
float
> :t 42.0
double

And the types for record, array, variant and recursive values:

> :t [1..10]
[int]
> :t "foobar"
[char]
> :t 0xdeadbeef
[byte]
> :t {name="Jimmy", age=45, job="programmer"}
{ name:[char], age:int, job:[char] }
> :t ("Jimmy", 45, "programmer")
([char] * int * [char])
> :t cons("foo",nil())
^x.(() + ([char] * x))

Also for qualified types, we can use this feature to view the set of open/unresolved type constraints:

> :t \x.x.foo
(a.foo::b) => (a) -> b
> :t .foo
(a.foo::b) => (a) -> b

Here we see that the type of a simple function which just returns the value of the "foo" field in its input (both in the fully explicit \x.x.foo form as well as the equivalent .foo shorthand) has a constraint (the part to the left of the => in its type -- the same as in Haskell). In this case, the entire type states that if a type a has a field named foo with type b, the function has type a -> b. The additional complexity of this constraint allows this function to work with any type a meeting that condition. This particular constraint allows a kind of duck typing for functions.

Binding

If we're using hobbes inside of a C++ process (as opposed to just running scripts with hi), at some point we will need to bind C++ values (typically functions) to a hobbes::cc instance so that they can be used in dynamic expressions at a later stage. Much of this work is automated, often allowing us to avoid thinking about binding logic.

For example, consider a C++ function that we might have already defined:

int applyDiscreteWeighting(int x, int y) { return x * y; }

Now we can import this definition into our previous program, and bind it to our hobbes::cc instance by adding this line:

  c.bind("applyDiscreteWeighting", &applyDiscreteWeighting);

Then when we run our program, its shell will allow use of this function and will reject any expression using it in a way inconsistent with its type structure:

> applyDiscreteWeighting(3, 4)
12
> applyDiscreteWeighting(3, "pizza")
*** stdin:1,1-22: Cannot unify types: int != [char]

In other cases, it may be more convenient to use a type understood natively by hobbes and with dynamic memory allocation:

typedef std::pair<size_t, const hobbes::array<char>*> MyRecord;

const hobbes::array<MyRecord>* loadRecords(int key) {
  static const char* names[] = { "chicken", "hot dog", "foobar", "jimmy" };

  auto rs = hobbes::makeArray<MyRecord>(key);
  for (size_t i = 0; i < rs->size; ++i) {
    rs->data[i].first  = i;
    rs->data[i].second = hobbes::makeString(names[i % 4]);
  }

  return rs;
}
// [...]
  c.bind("loadRecords", &loadRecords);
// [...]

These calls to hobbes::makeArray and hobbes::makeString will allocate out of the calling thread's memory region and will produce data that hobbes knows how to generically destructure and print in a convenient form, as we can see by running this in our test program again:

> loadRecords(10)
0 chicken
1 hot dog
2  foobar
3   jimmy
4 chicken
5 hot dog
6  foobar
7   jimmy
8 chicken
9 hot dog

If we'd prefer to represent records by structs with significant field names rather than as tuples, we can instead define our type with the DEFINE_STRUCT macro (this is purely a syntactic difference -- the final runtime representation of tuples and structs is identical in hobbes). For example, if we'd defined the MyRecord type in the previous example this way:

DEFINE_STRUCT(MyRecord,
  (size_t, index),
  (const hobbes::array<char>*, name)
);

And change the initialization code in loadRecords to refer to these fields by name:

// [...]
    rs->data[i].index = i;
    rs->data[i].name  = hobbes::makeString(names[i % 4]);
// [...]

Then we can access those fields by name from hobbes and e.g. tables of such records will be printed with a header showing these field names:

> loadRecords(10)
index    name
----- -------
    0 chicken
    1 hot dog
    2  foobar
    3   jimmy
    4 chicken
    5 hot dog
    6  foobar
    7   jimmy
    8 chicken
    9 hot dog

We can also bind "in reverse" using C function types, so that rather than making our C++ code appear to be valid hobbes symbols, we can make hobbes expressions appear to be valid C++ symbols. For example, consider a C++ binding like:

int iter2(int (*pf)(int,int), int x) {
  return pf(pf(x, x), pf(x, x));
}
// [...]
  c.bind("iter2", &iter2);
// [...]

Here we've bound a function that itself takes a function as input, and we can use the hobbes syntax for anonymous functions to decide logic that this C++ function will use:

> iter2(\x y.x*x+y, 2)
42

Variants can be passed between C++ and hobbes as the hobbes::variant<...> type:

typedef hobbes::variant<int, const hobbes::array<char>*> classification;

const classification* classifyParameter(int x) {
  if (x < 42) {
    return hobbes::make<classification>(42 - x);
  } else {
    return hobbes::make<classification>(hobbes::makeString("achievement unlocked"));
  }
}
// [...]
  c.bind("classifyParameter", &classifyParameter);
// [...]

These should also print in a standard way (showing the constructor name "0" for the first case and "1" for the second case, since the constructors for this variant have no names), assuming that the payload types of your variant can be printed:

> classifyParameter(16)
|0=26|
> classifyParameter(8675309)
|1="achievement unlocked"|

In the process of binding application variables and functions to a hobbes::cc instance, hobbes will decide how to "lift" every C++ type to an equivalent hobbes type. Many common cases are already handled transparently, as above where types like size_t and std::pair and hobbes::array are "lifted" without any special work required on our part. If hobbes doesn't know how to lift a type, it will be lifted as an "opaque C++ type" such that hobbes just checks for consistent usage (and will respect class hierarchies and derived/base conversions as necessary) but will otherwise know nothing about the structure of the type.

For example, consider a binding case like this (meant to represent a complex type hierarchy):

class Top {
public:
  Top(int x, double y, int z) : x(x), y(y), z(z) { }

  int    getX() const { return this->x; }
  double getY() const { return this->y; }
  int    getZ() const { return this->z; }
private:
  int    x;
  double y;
  int    z;
};

class Left : public virtual Top {
public:
  Left(int x, double y, int z) : Top(x, y, z) { }
};

class Right : public virtual Top {
public:
  Right(int x, double y, int z) : Top(x, y, z) { }
};

class Bottom : public Left, public Right {
public:
  Bottom(int x, double y, int z) : Left(x*z, y*y, z*x), Right(x*x, y*y, z*z), Top(x, y, z) { }
};
// [...]
  c.bind("getX", memberfn(&Top::getX));
  c.bind("getY", memberfn(&Top::getY));
  c.bind("getZ", memberfn(&Top::getZ));

  static Bottom b(1, 3.14159, 42);
  c.bind("b", &b);
// [...]

Now back in our shell, we can invoke the bound Top methods on our bound Bottom instance (hobbes will coordinate with gcc to apply the correct pointer adjustments):

> b.getX()
1
> b.getY()
3.14159
> b.getZ()
42

If the default logic for lifting your C++ type is insufficient, the hobbes::lift<T> type can be specialized to determine a more useful description (the code in hobbes/lang/tylift.H has many examples of this; this is where the default set of lift definitions are made).

Storage

Rather than using hobbes to "pull" application data with ad-hoc dynamic queries, we can also use hobbes to "push" large volumes of application data to local or remote storage for real-time or historical analysis. In this case, we don't need a compiler instance but instead just need to define some basic quality-of-service parameters for the storage process. For example, let's consider a simple mock weather monitor:

#include <hobbes/storage.H>
#include <stdlib.h>

DEFINE_STORAGE_GROUP(
  WeatherMonitor,              /* an arbitrary name for our data set */
  3000,                        /* our maximum expected throughput in system pages (if we record up to this limit, we either drop or block) */
  hobbes::storage::Unreliable, /* we'd rather drop than block */
  hobbes::storage::AutoCommit  /* we don't need to correlate multiple records in a batch (non-transactional) */
);

// a set of sensor readings
DEFINE_HSTORE_STRUCT(
  Sensor,
  (double, temp),
  (double, humidity)
);

int main() {
  while (true) {
    // record a random sensor reading from somewhere
    const char* locations[] = { "AZ", "CO", "HI" };
    const char* location    = locations[rand() % 3];

    Sensor s;
    s.temp     = 1.0 + 50.0 * ((double)(rand() % 100)) / 100.0;
    s.humidity = ((double)(rand() % 100)) / 100.0;

    HSTORE(WeatherMonitor, sensor, location, s);

    // sometimes record a passing car
    if (rand() % 100 == 0) {
      HSTORE(WeatherMonitor, carPassed, location);
    }
  }
  return 0;
}

We first use DEFINE_STORAGE_GROUP to decide a name for the set of data we'll record (the name we've chosen will be important later), how many memory pages to allocate for the ring buffer where data is temporarily stored (on most systems a page will be 4KB), whether to wait or discard data when it can't be stored as quickly as we're producing it, and whether we need to correlate data in batches/transactions or not.

Then we use DEFINE_HSTORE_STRUCT to define a simple Sensor structure.

Finally we run a simple loop, generating and recording mock sensor data. The HSTORE macro, which we use to record data, takes the name of our storage group, an arbitrary name to identify this data, and then any subsequent arguments as data to store. If our storage group is used in "unreliable" mode (as in the above example), then the HSTORE macro will return false if the data was dropped due to a consumer falling behind. Many common types can be stored automatically with HSTORE (primitives, std::string, std::vector, ...) and as with binding to C++ symbols, it is possible to define storage for custom types by specializing the hobbes::storage::store<T> type (here the definitions in hobbes/storage.H can be instructive).

Now, at the same time that this program runs, we will also want to run another program to consume this data. The hog program is distributed with hobbes for this purpose -- its source code may be instructive if you'd like to understand the storage consumer API or customize its logic in some way. To get an overview of our recording options with this program, we can run it without arguments:

$ hog
hog : record structured data locally or to a remote process

  usage: hog [-d <dir>] [-g group+] [-p t s host:port+] [-s port] [-c] [-m <dir>] [-z]
where
  -d <dir>          : decides where structured data (or temporary data) is stored
  -g group+         : decides which data to record from memory on this machine
  -p t s host:port+ : decides to send data to remote process(es) every t time units or every s uncompressed bytes written
  -s port           : decides to receive data on the given port
  -c                : decides to store equally-typed data across processes in a single file
  -m <dir>          : decides where to place the domain socket for producer registration (default: /var/tmp)
  -z                : store data compressed
$

This shows that we can use this program to consume and store data in one of three modes:

immediate storage into a local disk file
temporary local storage (batched, compressed) pending successful sends to a remote process
a network server, storing into local files from remote processes

For our test program above, it will be enough to just store data locally. We must run the hog consumer first before starting the producer (our test program). If we run hog and then start our test program, we should see hog produce output that looks like:

$ hog -g WeatherMonitor
[2018-01-01T09:00:00.867323]: hog running in mode : |local={ dir="./", groups={"WeatherMonitor"} }|
[2018-01-01T09:00:00.867536]: polling for creation of memory regions
[2018-01-01T09:00:00.873862]: group 'WeatherMonitor' ready, preparing to consume its data
[2018-01-01T09:00:01.637614]:  ==> carPassed :: ([char]) (#1)
[2018-01-01T09:00:01.733374]:  ==> sensor :: ([char] * { temp:double, humidity:double }) (#0)
[2018-01-01T09:00:01.848340]: finished preparing statements, writing data to './WeatherMonitor/data-2018.01.01-0.log'

Now while these two programs (both hog and our test program) are left running, we can simultaneously run a third program to load and query this data as it is being recorded. This is a good use-case for the hi program seen previously. If we enter the directory where hog is writing its output files, we can take a look at the data we're recording this way:

$ hi
hi : an interactive shell for hobbes
      type ':h' for help on commands

> wm = inputFile :: (LoadFile "./WeatherMonitor/data-2018.01.01-0.log" w) => w
> wm.sensor
AZ {temp=44.572017, humidity=44.582017}
AZ   {temp=3.575808, humidity=3.585808}
HI {temp=39.150116, humidity=39.160116}
...
> wm.carPassed
AZ
AZ
CO
AZ
...
>

Although the syntax for loading this file may appear somewhat unusual (or maybe not, after we've covered qualified types and type classes in more detail), this method of loading data in hi should let us look at what we're recording fairly quickly. If we repeatedly query the same data while it's being written, we should see different results as our view of the data is "live". It's also possible to "tail" specific data as it is written by registering a function to handle updates:

> signals(wm).sensor <- (\_.do { putStrLn("sensor batch updated"); return true })
> sensor batch updated
sensor batch updated
sensor batch updated
sensor batch updated

In more complex applications, we might use this ability to accumulate statistics, index recorded data, alert on some condition, etc.

We could also want to record batched or "transactional" data where our storage statements need to be correlated. For example, consider a mock cheeseburger order management application:

#include <hobbes/storage.H>
#include <stdlib.h>

DEFINE_STORAGE_GROUP(
  Orders,
  3000,
  hobbes::storage::Reliable,    /* we _must_ track all orders */
  hobbes::storage::ManualCommit /* we need to correlate all events in an order */
);

int main() {
  while (true) {
    // a customer enters the store
    const char* names[] = { "Harv", "Beatrice", "Pat" };
    const char* name    = names[rand() % 3];

    HSTORE(Orders, customerEntered, name);

    // he/she orders a cheeseburger
    const char* products[] = { "BBQ Attack", "Cheese Quake", "Bacon Salad" };
    const char* product    = products[rand() % 3];

    HSTORE(Orders, productOrdered, product);

    // perhaps he/she decides to add a drink
    if (rand() % 5 == 0) {
      const char* drinks[] = { "Soda", "Lemonade", "Water" };
      const char* drink    = drinks[rand() % 3];

      HSTORE(Orders, drinkOrdered, drink);
    }

    // sometimes the customer has a change of heart, else they pay up and go
    if (rand() % 10 == 0) {
      HSTORE(Orders, orderCanceled);
    } else {
      HSTORE(Orders, paymentReceived, 10.0 * ((double)(rand() % 100)) / 100.0);
    }

    // now that the order is finished, end the transaction
    Orders.commit();
  }
  return 0;
}

Now that we have indicated that we're recording meaningful transactions, when we run hog for this storage group, it will also record a "transactions" sequence along with the sequences it logs for each storage statement:

$ hog -g Orders
[2018-01-01T09:00:00.371394]: hog running in mode : |local={ dir="./", groups={"Orders"} }|
[2018-01-01T09:00:00.371633]: polling for creation of memory regions
[2018-01-01T09:00:00.371697]: group 'Orders' ready, preparing to consume its data
[2018-01-01T09:00:01.615395]:  ==> paymentReceived :: (double) (#4)
[2018-01-01T09:00:01.736183]:  ==> orderCanceled :: () (#3)
[2018-01-01T09:00:01.753279]:  ==> drinkOrdered :: ([char]) (#2)
[2018-01-01T09:00:01.838780]:  ==> productOrdered :: ([char]) (#1)
[2018-01-01T09:00:01.880418]:  ==> customerEntered :: ([char]) (#0)
[2018-01-01T09:00:01.927219]:  ==> transactions :: <any of the above>
[2018-01-01T09:00:02.136798]: finished preparing statements, writing data to './Orders/data-2018.01.01-0.log'

And if we simultaneously load this file, we can see that all of our storage statements can be queried as well as this new "transactions" data. Each transaction is stored as a timestamp and an array of a variant over all possible storage statements so that we can tell which statements were recorded and in what order:

$ hi
hi : an interactive shell for hobbes
      type ':h' for help on commands

> orders = inputFile :: (LoadFile "./Orders/data-2018.01.01-0.log" w) => w
> orders.transactions
                      time                                                                                                              entries
-------------------------- --------------------------------------------------------------------------------------------------------------------
2018-01-01T12:19:03.244206                                       [|customerEntered=("Pat")|, |productOrdered=("Bacon Salad")|, |orderCanceled|]
2018-01-01T12:19:03.244172                                 [|customerEntered=("Beatrice")|, |productOrdered=("Cheese Quake")|, |orderCanceled|]
2018-01-01T12:19:03.244127    [|customerEntered=("Pat")|, |productOrdered=("Cheese Quake")|, |drinkOrdered=("Water")|, |paymentReceived=(6.4)|]
2018-01-01T12:19:03.244092                               [|customerEntered=("Pat")|, |productOrdered=("Bacon Salad")|, |paymentReceived=(6.4)|]
2018-01-01T12:19:03.244047 [|customerEntered=("Pat")|, |productOrdered=("Cheese Quake")|, |drinkOrdered=("Lemonade")|, |paymentReceived=(2.1)|]
2018-01-01T12:19:03.244013                               [|customerEntered=("Pat")|, |productOrdered=("Bacon Salad")|, |paymentReceived=(7.6)|]
2018-01-01T12:19:03.243978                           [|customerEntered=("Beatrice")|, |productOrdered=("BBQ Attack")|, |paymentReceived=(3.4)|]
2018-01-01T12:19:03.243944                          [|customerEntered=("Beatrice")|, |productOrdered=("Bacon Salad")|, |paymentReceived=(3.5)|]
2018-01-01T12:19:03.243910                             [|customerEntered=("Harv")|, |productOrdered=("Cheese Quake")|, |paymentReceived=(6.3)|]
...

Although the content of these transactions appears to possibly be very large, it's actually just a minor addition due to the way that this data is stored. It can also be queried very efficiently.

Networking

Just as we may need a lightweight, efficient method to record structured application data from C++, we may also need such a method to efficiently query plain C++ data from a remote process that understands hobbes. This is made possible with a macro DEFINE_NET_CLIENT that takes a signature for interaction and produces a type to implement it.

We can cut to the chase with a simple test program:

#include <iostream>
#include <hobbes/net.H>

// our RPC signature
DEFINE_NET_CLIENT(
  Connection,                       // a name for this connection profile
  (add, int(int,int), "\\x y.x+y"), // a function "add" of type int(int,int) remotely implemented by the expression '\x y.x+y'
  (mul, int(int,int), "\\x y.x*y")  // as above but "mul" for the expression '\x y.x*y'
);

int main(int, char**) {
  try {
    Connection c("localhost:8080");

    std::cout << "c.add(1,2) = " << c.add(1,2) << std::endl
              << "c.mul(1,2) = " << c.mul(1,2) << std::endl;

  } catch (std::exception& e) {
    std::cout << "*** " << e.what() << std::endl;
  }
  return 0;
}

Here we assume that a process is running at localhost:8080 to accept our requests. Once the connection is made, we initialize the remote end with our expected expressions/types (so that we can efficiently invoke them later just by ID). If any part of this process fails (connection or remote type-checking) then an exception will be raised describing the failure. Otherwise, execution proceeds in this case just to call both functions and report the results.

We can run a hi process to quietly listen on port 8080 like this:

$ hi -s -p 8080
>

And if we've compiled our test program above to an executable named test then we can invoke and run it against our running server to observe results like:

$ ./test
c.add(1,2) = 3
c.mul(1,2) = 2
$

Just as lightweight storage definitions can be added by specializing hobbes::storage::store<T>, also lightweight RPC codecs can be added by specializing hobbes::net::io<T> (and many types and type families are supported in hobbes/net.H, for example).

This networking/remote-evaluation method can be accessed purely from hobbes as well. For example, with the previous hi server process still running, we can start a new hi process to talk to it like this:

$ hi -s
> c = connection :: (Connect "localhost:8080" p) => p
>

Now we have a connection made at compile-time. We can determine the (current) static structure of this connection with the printConnection function, which currently shows this:

> printConnection(c)
localhost:8080

id expr input output
-- ---- ----- ------

>

Now we can make a basic function to perform some interaction between these processes. Say we want a function to record a set of data about employees:

> sendData = \x.receive(invoke(c, `\x.println(x)`, x :: [{name:[char], age:int, salary:double}]))

> :t sendData
([{ name:[char], age:int, salary:double }]) -> ()

Here we see a couple of new functions, invoke and receive to perform remote evaluation and parse results (respectively). We haven't invoked sendData yet, but because we have kept this connection at compile-time, hi was able to communicate with the remote process enough to know what its result type will be (though we've only specified the input type), and if we try printConnection again we should see this static protocol detail that's been negotiated:

> printConnection(c)
localhost:8080

id                                                       expr                                     input output
-- ---------------------------------------------------------- ----------------------------------------- ------
 0 \(.arg0).(let .t13409.rv0 = .arg0 in println(.t13409.rv0)) [{ name:[char], age:int, salary:double }]     ()

This shows that the remote (desugared) expression is known statically, as well as input and output types, and a unique ID has been determined for the expression. At a later stage ("run-time" or "on the critical path"), to invoke this remote expression our sender program just has to write this value 0 followed by the input data that the remote side expects for this particular expression (but no further "self-describing" data).

This also critically relies on a feature of our type representation in hobbes, that it can carry statically-known expressions. In this example, we use a "quoted expression" form to carry this expression so that it can be used at compile-time:

> :t `\x.println(x)`
(quote \(.arg0).(let .t14200.rv0 = .arg0 in println(.t14200.rv0)))

This type is equivalent to () at run-time (i.e.: it doesn't exist at run-time), but carrying its quoted expression in this way we can reason about and send/receive expression details at compile-time.

And if we invoke the function we've defined:

> sendData([{name="bob", age=40, salary=23.2}, {name="jane", age=51, salary=53.1}])

The remote side will print it, as expected:

name age salary
---- --- ------
 bob  40   23.2
jane  51   53.1

Comprehensions

Whether we are running queries directly against application bindings, against out-of-band live/historical data, or as prepared remote queries, we will eventually need to sort, join, group, filter, and transform data. Traditionally we might think of SQL queries for problems like this, but hobbes (inspired as it is by Haskell) uses comprehensions and basic functions.

A comprehension has the general form:

[E | p <- S, C]

Where E is a transform or map expression, S is an expression producing a sequence (of some type), p is a pattern matched against each value in S (possibly introducing variables used in E), and C is a boolean filter expression that decides which values in S to select. The C expression is optional and may be omitted. The pattern p is usually just a name for a variable, but can be more complex if necessary.

For example, we might want string representations of numbers evenly divisible by 3, out of the first 20 natural numbers:

> [show(x) | x <- [0..20], x % 3 == 0]
["0", "3", "6", "9", "12", "15", "18"]

If we've loaded the storage file from our mock weather monitor example earlier, we might want to get readings for AZ:

> [p.1 | p <- wm.sensor, p.0 == "AZ"]
temp humidity
---- --------
12.5    12.51
  33    33.01
  16    16.01
  48    48.01
 5.5     5.51
...

Alternately, we could use pattern binding both to simplify tuple indexing and to do the equality check (any values that don't match our pattern will be excluded):

> [s | ("AZ", s) <- wm.sensor]
temp humidity
---- --------
12.5    12.51
  33    33.01
  16    16.01
  48    48.01
 5.5     5.51
...

We can use the function groupBy to categorize a sequence, and then use a comprehension with the result to summarize it:

> [(k, sum(vs)) | (k, vs) <- groupBy(\x.x % 3, [0..20])]
0 63
1 70
2 77

We can use the function joinBy to combine two sequences of values on a common key:

> joinBy(.name, [{name="bob", score=20}, {name="jim", score=50}], .id, [{id="jim", age=6}, {id="bob", age=42}])
name score  id age
---- ----- --- ---
 bob    20 bob  42
 jim    50 jim   6

We can also use a slice notation to select subsequences. The syntax S[i:e] can be used to select values out of S starting at i and ending at e. Both i and e are interpreted as indexes in the sequence but can be empty to indicate the end of sequence or negative to indicate an offset from the end of the sequence. Indexes that are out of bounds will not be included in the result. If i is greater than e then the result will be reversed compared to its original order in S. Let's try a few examples:

> "foobar"[0:3]
"foo"
> "foobar"[3:0]
"oof"
> "foobar"[-3:]
"bar"
> "foobar"[:-3]
"rab"
> "foobar"[0:]
"foobar"
> "foobar"[:0]
"raboof"

We can sort a sequence with the sort or sortWith functions, assuming that either the sequence type is orderable or we can provide a function on values in the sequence to a type that's orderable. Values will be given in ascending order, but we can get descending order just by reversing this sequence:

> sort([0..10])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
> sortWith(.age, [{name="jimmy", age=30, state="AZ"}, {name="bob", age=42, state="CO"}, {name="frank", age=37, state="HI"}])
 name age state
----- --- -----
jimmy  30    AZ
frank  37    HI
  bob  42    CO
> sortWith(.age, [{name="jimmy", age=30, state="AZ"}, {name="bob", age=42, state="CO"}, {name="frank", age=37, state="HI"}])[:0]
 name age state
----- --- -----
  bob  42    CO
frank  37    HI
jimmy  30    AZ

Together, these methods of consuming sequences can be especially useful to inspect the state of a hobbes-enabled process or to analyze recorded application data.

Pattern Matching

The hobbes language supports a form of pattern matching for classifying and destructuring data. The most general form of pattern matching is in the match expression, and we will consider how other uses of matching, like in sequence comprehensions, can be rewritten (or "desugared") into this general form.

An important special case of match expressions are C++ switch statements, as the C++ code:

switch (x) {
case 0:  return "foo";
case 1:  return "bar";
case 2:  return "chicken";
default: return "beats me";
}

is equivalent to the hobbes match expression (because the pattern _ matches any value):

match x with
| 0 -> "foo"
| 1 -> "bar"
| 2 -> "chicken"
| _ -> "beats me"

And in general, for r distinct cases, both will efficiently determine a result in O(log(r)) time (although some expressions, like the one above, can be arranged to run in O(1) time).

But match expressions generalize switch statements in a few key ways.

First, match expressions allow you to match on multiple values at once (so that for c values and r cases, a match can be determined in O(c*log(r)) time -- in the switch example we have c=1 and the expected time complexity). For example, we could write a match expression like:

match x y with
| 0 0 -> "foo"
| 0 1 -> "foobar"
| 1 0 -> "bar"
| 1 1 -> "barbar"
| 2 0 -> "chicken"
| 2 1 -> "chicken bar!"
| _ _ -> "beats me"

where we'd otherwise have to write a switch statement like:

switch (x) {
case 0:
  switch (y) {
  case 0:  return "foo";
  case 1:  return "foobar";
  default: return "beats me";
  }
case 1:
  switch (y) {
  case 0:  return "bar";
  case 1:  return "barbar";
  default: return "beats me";
  }
case 2:
  switch (y) {
  case 0:  return "chicken";
  case 1:  return "chicken bar!";
  default: return "beats me";
  }
default:
  return "beats me";
}

and although both code fragments will compute the same result in the same amount of time (likely in exactly the same way), the match expression is more concise and so offers some notational convenience.

Another way that match expressions generalize switch statements is that they allow binding variables to some portion of a matched value. For example, we might match on a complex variant type like the order transaction entry type from the storage example we looked at earlier:

match e with
| |paymentReceived=x| -> "received " ++ show(x)
| _                   -> "something happened"

In this case, the pattern |paymentReceived=x| checks that the variant in e is the paymentReceived case and if so, binds its payload to the variable x where it can be used subsequently. This matching and binding process can be nested to any depth, so for example this expression:

match [(1,2),(3,4),(5,6)] with
| [_, (3, x), _] -> x
| _              -> -1

will evaluate to 4 because the first pattern row checks for an array of length 3 where the second value is a pair where the first value is 3 and the second value we will bind to x (given the input then, x=4) and then x is immediately returned.

As a special case of this, when we match on character arrays, we can use regular expressions with typical binding syntax as a form of pattern as well. For example, this expression:

match "foobar" with
| 'f(?<us>u*)bar' -> us
| 'f(?<os>o*)bar' -> os
| _               -> "???"

will evaluate to "oo" because it matches the second regular expression, which binds to the variable os the sub-sequence of the input matching o*.

Finally, another way that match expressions generalize switch statements is that cases may contain a guard expression which makes the final determination whether or not to follow a branch. For example, this expression:

match 1 2 with
| 1 x where x > 10 -> 1
| _ _              -> 2

will return 2 because, although the input does match the first row at the patterns 1 x, the guard expression x > 10 (in this case, 2 > 10) evaluates to false and so the branch for that row is not followed, causing just the final row to match.

In order to be considered valid (and subsequently compiled), match tables must be exhaustive and all rows reachable. To be exhaustive, it just means that there is a matching row for every possible input to the match table (this is why the examples so far always end with a "match any" row like | _ -> -1). For a row to be reachable, it means that no prior row fully subsumes it and so makes it redundant (this check can be disabled with a compiler flag but it is on by default under the assumption that programmers introduce pattern rows because they intend for them to be relevant). This interpretation of matching is slightly different than in ocaml and Haskell, where e.g. inexhaustive matches may be flagged with a warning and then abort program execution at runtime when some input is given that doesn't match any row.

The fully general form of match expressions may be written:

match I0 I1 ... IN with
| p00 p10 ... pN0 where C0 -> E0
| p01 p11 ... pN1 where C1 -> E1
| ...
| p0R p1R ... pNR          -> ER

such that each of I0, I1, ..., IN is an expression whose value will be matched, each pij is a pattern to match against the value Ii, Cr is a condition to evaluate against the variables in p0r to pNr, and each Er is a result expression to return iff row r is selected (by implication, the type of each such Er must be the same).

And a pattern is any term freely generated by the grammar:

pattern := constant
        |  variable
        |  [pattern, pattern, ..., pattern]
        |  regex
        |  (pattern, pattern, ..., pattern)
        |  {f0=pattern, f1=pattern, ..., fN=pattern}
        |  |C=pattern|

where constant is any constant value ((), true, 42, 'c', etc), variable is any name (unique across a row), f0, f1, ..., fN are record field names, and C is a variant constructor name.

For convenience, the expression E matches p desugars to match E with | p -> true | _ -> false. If the pattern p is irrefutable (ie: never fails to match) then let p = E in B desugars to match E with | p -> B. If the pattern p is irrefutable then \p.E desugars to \x.match x with | p -> E. If the pattern p is refutable then the expression \p.E desugars to \x.match x with | p -> just(E) | _ -> nothing. If the pattern p is irrefutable, then [E | p <- S] desugars to map(\p.E, S). If the pattern p is refutable, then [E | p <- S] desugars to dropNulls(map(\p.E, S)).

Pattern matching is very useful in many different kinds of applications. For some uses of hobbes, it even functions as a kind of "rules engine" (where rows are "rules") such that users can derive very efficient machine code from very large match tables.

Parsing

Although many parsing problems can be solved with match expressions, there are many other common parsing problems for which the approach is insufficient. In particular, basic expression languages can't be parsed by the simple regular expressions supported in match expressions.

For these cases, hobbes defines a syntax for the introduction of parsers based on context-free grammars (technically, we produce LALR(1) parsers). As with many such methods (e.g. yacc, GNU bison, happy, ...) parsing rules define both syntax as well as "actions" for producing semantic values out of each rule.

As an example, we can build up to a basic calculator for arithmetic expressions. Let's start with the simplest first step, a parser for recognizing single digits:

$ cat p.hob

calc = parse {
  D := "0" {0} | "1" {1} | "2" {2} | "3" {3} | "4" {4}
    |  "5" {5} | "6" {6} | "7" {7} | "8" {8} | "9" {9}
}

Here we have a file with just a single definition (to define the calc variable). With this definition, we use the syntax parse { RULES } to construct a parser based on all of the rules in RULES and starting from the first rule in the sequence (in this case, the rule for D). Here our syntax terms are simple (just the constant lexical values as in "0", "4", etc) and the actions are also simple (returning 0, 4, etc). These actions can be arbitrary hobbes code, which will be useful as we develop this parser into something more useful.

For now, if we load this file with the hi program, we can first check the type of this new definition:

$ hi p.hob
hi : an interactive shell for hobbes
      type ':h' for help on commands

loaded module 'p.hob'
> :t calc
Array a char => (a) -> (() + int)

This tells us that the prepared parser is actually an overloaded function, taking any type that can be interpreted as an array of characters to a "maybe int" (it's maybe because the parser input might not be recognized).

We can interact with this parser to see its simple behavior:

> calc("9")
9
> calc("42")

>

Now we can see that a string with a single digit is recognized, but trying to read multi-digit input fails to produce a value. So we can go back to our file p.hob and extend the grammar slightly to accept multiple digits:

calc = parse {
  V := v:V d:D { v*10 + d }
    |  d:D     { d }

  D := "0" {0} | "1" {1} | "2" {2} | "3" {3} | "4" {4}
    |  "5" {5} | "6" {6} | "7" {7} | "8" {8} | "9" {9}
}

Here we've introduced a few new ideas. First, the rules for V define syntax in terms of the rule D (and V itself) rather than directly specifying lexical syntax. Also, the rules for V bind values to matched sub-rules (in the variables v and d) which are available for use in actions for those rules. Finally, the first rule for V uses left recursion to accept an indefinite-length sequence of lexical digits that will be incrementally translated to a single integer value in the usual way. We can see this with another interaction:

> calc("42")
42
> calc("8675309")
8675309

But nonsensical input (still including input involving arithmetic operators) will be rejected:

> calc("1+2")

> calc("cheeseburger")

>

Finally, we can add support for arithmetic operators (respecting standard order of operations) by the typical arithmetic grammar:

calc = parse {
  E := x:E "+" y:T { x + y }
    |  x:E "-" y:T { x - y }
    |  x:T         { x }

  T := x:T "*" y:F { x * y }
    |  x:T "/" y:F { x / y }
    |  x:F         { x }

  F := "(" x:E ")" { x }
    |  x:V         { x }

  V := v:V d:D { v*10 + d }
    |  d:D     { d }

  D := "0" {0} | "1" {1} | "2" {2} | "3" {3} | "4" {4}
    |  "5" {5} | "6" {6} | "7" {7} | "8" {8} | "9" {9}
}

And we have a parser that can recognize and evaluate simple arithmetic expressions:

> calc("9*(7+(8-4*3)/2)-3")
42

Not every syntactically valid parser definition will produce a valid parser. Some parser definitions introduce ambiguous choice (where the parser must choose between two equally valid paths). In hobbes, these are treated as errors rather than accepting unpredictable performance (all parse times should be O(n) for n input characters).

For example, consider a parser definition like this:

$ cat e.hob

err = parse {
  S := "a" { 1 }
    |  z:Z { z }

  Z := "a" { 2 }
}

Here, parsing the S definition must be ambiguous for the input "a" because we could either choose the first path through S := "a" (giving 1 as the result) or else through z:Z we could choose Z := "a" (giving 2 as the result). Because this choice is ambiguous, hobbes will reject the parser with an error indicating where its grammar went wrong:

$ hi -s e.hob
e.hob:2,7-7,1: Conflict between r0 and r0 on $
S -> 'a'# [$]
Z -> 'a'# [$]

In this error message, the symbol # is meant to indicate a position in a basic grammar rule (as in parser items), and the symbols in brackets indicate the lookahead values that allow the parser to conclude that a rule can be completed or reduced. So the error message above is telling us that in the given state, we could either reduce with S -> 'a'# or Z -> 'a'# on the end-of-input terminal (aka $).

Type Classes

In hobbes, user-defined overloading and ad-hoc type analysis can be defined through the use of type classes. Type classes were originally introduced in Haskell as a way to combine overloading and type inference. They can also be viewed as a specialized use of qualified types, which is also the view taken in hobbes (although the majority of type constraints in hobbes programs so far designate type classes). To get the intuition for the idea prior to a formal statement, let's consider some simple examples.

One of the simplest type classes to consider is the Show class, which has the job of converting a value to a string. In hobbes, Show is defined like this:

class Show a where
  show :: a -> [char]

This definition can be read as a statement that the constraint Show a is true if and only if there is an associated function show of type a -> [char]. We introduce these associated functions with instance definitions (also as in Haskell), as in for example:

instance Show int where
  show = showInt

Because there happens to be a function showInt :: int -> [char] defined by default with hobbes, it works perfectly as a show function for int values. With this definition in place, we can evaluate:

> show(42)
"42"

But hobbes also supports regular families of types, and although it works well enough for primitive types to define each Show instance on a case-by-case basis, it would be very frustrating if we had to do the same thing to show a value of type [int] and again to show a value of type [[int]] or [[[int]]] etc. We can use instance generators where there is a regular rule to decide how to construct a type class instance for a set of types. In the case of arrays, for example, we can introduce an instance generator for Show like this:

instance (Show a) => Show [a] where
  show xs = "[" ++ cdelim(map(show, xs), ", ") ++ "]"

This definition appears almost identical to a regular instance definition as before, but it has a constraint to the left of the => as in a qualified type. In this case, a good way to read (Show a) => Show [a] is "if it's true that there's a Show a instance, then we can make a Show [a] instance". Then we just introduce a definition of show that can work for any such value of type [a]. Because there happens to be a function cdelim :: ([[char]]*[char]) -> [char] around to concatenate an array of strings interleaved with another string, it's pretty easy to write this function so that we show every value in the input array and then join them together into one larger string.

Experimenting at the prompt, we can see that multiple levels of array nesting will uniformly apply this definition:

> :t [[[1,2],[3,4]], [[5,6],[7,8]]]
[[[int]]]
> show([[[1,2],[3,4]], [[5,6],[7,8]]])
"[[[1, 2], [3, 4]], [[5, 6], [7, 8]]]"

> :t [[1,2,3,4], [5,6,7,8]]
[[int]]
> show([[1,2,3,4], [5,6,7,8]])
"[[1, 2, 3, 4], [5, 6, 7, 8]]"

> :t [1,2,3,4,5,6,7,8]
[int]
> show([1,2,3,4,5,6,7,8])
"[1, 2, 3, 4, 5, 6, 7, 8]"

We can further expand the set of types in Show if we consider tuples in a similar way to arrays. In this case, tuples are intentionally designed to be heterogeneously-typed unlike arrays, so we don't have a function as simple as map to evaluate across them. However, we can make a new type class to fill a similar role:

class ShowTuple p where
  showTuple :: p -> [[char]]

And we can use a special constraint p=(h*t) to deconstruct the tuple within an instance generator (where we bottom out in the unit type () when the tuple is fully considered):

instance (p=(h*t), Show h, ShowTuple t) => ShowTuple p where
  showTuple p = [show(p.0)] ++ showTuple(tupleTail(p))
instance ShowTuple () where
  showTuple _ = []

Now with these definitions, we can determine the same intermediate string array that we determined for arrays. All that we need to bring this into Show is to use the same cdelim method:

instance (ShowTuple p) => Show p where
  show p = "(" ++ cdelim(showTuple(p), ", ") ++ ")"

And running this at the prompt, together with the prior definitions for Show, we can show even more complex types:

> show(42)
"42"
> show((42,[1,2,3]))
"(42, [1, 2, 3])"
> show((42,[1,2,3],[(4,5),(6,7)]))
"(42, [1, 2, 3], [(4, 5), (6, 7)])"

So far we have considered Show as an example of a typical type class, but sometimes we need to consider multiple types in combination. For example, we might try a simple type class for overloaded addition:

class Addable a b c where
  (+) :: (a,b) -> c

And assuming that we have a function iadd :: (int,int) -> int for the primitive CPU instruction for addition of integers, we might try introducing the instance:

instance Addable int int int where
  (+) = iadd

And this would certainly work, but if we try to use this at the prompt we should get an error:

> 1 + 2
stdin:1,1-5: Failed to compile expression due to unresolved type constraints:
  Addable int int a
  Print a
>

The problem here is that hobbes wasn't able to infer a result type. If we explicitly annotate the result type it will work:

> (1+2)::int
3
> (3+4)::int
7

However, this is an extremely inconvenient restriction and could easily become even more absurd for more complex types or multiple successive sums. In short, we need some way to communicate to type inference and instance selection that although the result type of addition may vary, it is uniquely determined by the first two types. Currently in hobbes, the way to address this problem is with functional dependencies. In short, when we define our type class we include with it the fact that the third type is uniquely determined by the first two:

class Addable a b c | a b -> c where
  (+) :: (a,b) -> c

With this change, we no longer need to introduce an explicit type annotation for the result -- instance selection will search just based on the first two types and then will communicate back to type inference that c=int for the single instance that it finds.

More examples and applications of type classes can be found in the hobbes/boot/ scripts, which are loaded whenever a hobbes::cc instance is constructed in C++ or when the hi program is initially run.

An important distinction between the interpretation of type classes in hobbes and in Haskell is that hobbes does not use the common dictionary-passing transform for overloaded identifiers. Instead, hobbes requires that any type constraint must also come with some logic that, for any satisfied constraint, can rewrite an expression involving that constraint into an equivalent expression without it. For example, at the hi prompt with our Show example, we can use the :u command to show the result of this process just on the expression level (here with obvious type annotations removed):

> :u show(42)
showInt(42)

This distinction is important because is allows hobbes to derive more efficient machine code than if dictionaries of indirect functions were passed around (though the approach is well known and has its drawbacks as well).

Unqualifier Modules

The C++ interface to hobbes supports the introduction of custom "unqualifiers" (and the system of type classes described earlier is just one such "unqualifier" implementation). The standard set of unqualifiers is defined in hobbes/lang/preds/ but it's possible to introduce your own by implementing the Unqualifier interface, which allows all of the mechanisms described so far. This definition is located in hobbes/lang/tyunqualify.H:

struct Unqualifier {
  // bind any implied type variables in a constraint
  virtual bool refine(const TEnvPtr& tenv, const ConstraintPtr& cst, MonoTypeUnifier* u, Definitions* ds) = 0;

  // determine whether or not this constraint is satisfied
  virtual bool satisfied(const TEnvPtr& tenv, const ConstraintPtr& cst, Definitions* ds) const = 0;

  // determine whether or not it's possible to satisfy this constraint
  virtual bool satisfiable(const TEnvPtr& tenv, const ConstraintPtr& cst, Definitions* ds) const = 0;

  // why couldn't this constraint be eliminated?
  virtual void explain(const TEnvPtr&, const ConstraintPtr&, const ExprPtr&, Definitions*, annmsgs*) = 0;

  // resolve a qualified expression
  virtual ExprPtr unqualify(const TEnvPtr& tenv, const ConstraintPtr& cst, const ExprPtr& e, Definitions* ds) const = 0;

  // allow overloaded symbols (as in type classes)
  virtual PolyTypePtr lookup(const std::string& vn) const = 0;
  
  // list overloaded symbols (if any)
  virtual SymSet bindings() const = 0;

  // list functional dependencies between constraint parameters (if any)
  virtual FunDeps dependencies(const ConstraintPtr&) const = 0;
};

The key functions to implement for your own constraint unqualifiers here are:

refine, which can infer parts of a constraint from other parts
satisfied, which decides whether a constraint is fully satisfied
unqualify, which rewrites an expression with a satisfied constraint into an equivalent expression without that constraint

It is usually not necessary to introduce new unqualifiers this way, but it can be a very useful tool when all other options are insufficient. Once you've defined an implementation of this interface for your own compiler module, you can install it with hobbes::cc::typeEnv()->bind("ConstraintName", UnqualifierPtr(new P())) assuming that your type is P and you want it to resolve constraints named "ConstraintName".

Why might you decide to implement your own unqualifier module? Ultimately this will come down to the particulars of your use-case, but in general this is a good option if you need to configure an external resource at an initial stage in order to query it arbitrarily at a subsequent stage.

For example, we've seen earlier that hobbes has a LoadFile constraint that takes a compile-time string (the path to a structured data file) and a type variable. The Unqualifier implementation that handles this constraint (in hobbes/db/bindings.C) will load the file at compile-time, determine its type structure, and will refine (or unify) the constraint's second argument to this type structure to allow subsequent type checking and compilation against the file's contents. There's also an AddDBFieldSignal constraint defined in hobbes/db/signals.C to handle the previously-mentioned installation of callbacks or "tails" to react to updates in structured file data (eventually reducing to an interaction between epoll and inotify on Linux). The addition of these unqualifier modules together allows a kind of flexible "dynamic, duck-typing" scripting against recorded data without giving up type checking or compilation of efficient code to react/query recorded data.

Another example we've seen earlier is the hobbes Connect constraint. This also takes a compile-time string and produces a unique type representing the connection made at compile-time. This is used further (and with other supporting constraints/unqualifiers) to negotiate a protocol and derive efficient code to communicate with a remote process.

These are just some examples of extensions to hobbes through the Unqualifier interface, but there are undoubtedly many other ways that this option can be useful to applications using hobbes.

hobbes's People

Contributors

Stargazers

Watchers

Forkers

sunmeng007 libow jeb2239 nivir cpehle vdt shobbs northholm hhy5277 wangzhx123 bywang bssrdf glycerine aaclark jkhoogland imraghava pritambaral clairvoyant mindis artisdom raymond1989 prateek fons lipei0330 lizhizhou jhgx flip111 cvnb sammoorhouse cduggan11 brianegge jkred369 wintel2014 zdeli goofysong goldstar111 haojerry sammoorhouse-ms jklyip julyip lmatz zsombor smunix ravachol70 auserj robertomalatesta vaishalipari pdxtrader laeeth hstalker kthielen remkamal chenkai036 michzia kg810 mziakovic pdx-trader nottug rtvt123 moneytech martinlaz neuroradiology csdvrx snow-abstraction mbrukman meet-ai ezhangle gurpreetsachdeva kevmal cannadayr wang-shun sannysanoff mafm robertdigital bellissimogiorno alhg adam-antonik chanpu9 ssh352 dawa79 rjmcguire kaveman- daninmontreal zabrane dmytro300 jinjinfan tgn3000 mo-xiaoming pratik-a baiyanhuang emg110 trumanz vitorian rnshah9 bingenito whitten romanshestakov clown-team stockxiay sinomiko

hobbes's Issues

add a stream type for FD-backed values

This is a useful way to query file data as well.

Example programs

Any thoughts on the best way to package example programs with the base library? Arguably the "hi" and "hog" programs are example programs, even though they are used very often for real work.

I'm a little concerned about the way that dependencies might be introduced through these programs. For example, the "hi" program wants to use readline, so readline has effectively become a dependency of hobbes. However, somebody could easily use hobbes without needing to use "hi".

I've been considering making some OpenGL example programs and wouldn't want to make OpenGL a dependency of hobbes. ;)

reserved identifier violation

I would like to point out that identifiers like “_HOBPP_FIRST” and “_HSTORE_STATE_UNBLOCKED” do not fit to the expected naming convention of the C++ language standard.
Would you like to adjust your selection for unique names?

Support for Ubuntu Linux (also Debian)

I have had luck building this on ubuntu 17.04 with no issues as long as I use find_package(LLVM 3.7.1 EXACT REQUIRED CONFIG) after installing llvm-3.7-dev package from the universes repository. llvm 3.9 and 4.0 will not work

edit:

Actually whenever I try to run hi I get a unification error. I am looking into this.

./hi
stdin:60,1-62,22: stdin:61,10-19: Cannot unify types: <std.__cxx11.basic_string<char, std.char_traits<char>, std.allocator<char> >> != <std.string>

hobbes::cc should be made thread-safe

So that applications can share or use multiple hobbes::cc instances between threads.

doesn't compile with gcc-7

/home/barriosj/hobbes-orig/lib/hobbes/events/httpd.C: In member function ‘bool hobbes::PartialHTTPRequestState::transition(char)’:
/home/barriosj/hobbes-orig/lib/hobbes/events/httpd.C:72:7: error: this statement may fall through [-Werror=implicit-fallthrough=]
       if (c != ' ') {
       ^~
/home/barriosj/hobbes-orig/lib/hobbes/events/httpd.C:81:5: note: here
     case FindHeaderBegin:
     ^~~~
cc1plus: all warnings being treated as errors
CMakeFiles/hobbes-pic.dir/build.make:422: recipe for target 'CMakeFiles/hobbes-pic.dir/lib/hobbes/events/httpd.C.o' failed
make[2]: *** [CMakeFiles/hobbes-pic.dir/lib/hobbes/events/httpd.C.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/hobbes-pic.dir/all' failed
make[1]: *** [CMakeFiles/hobbes-pic.dir/all] Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2

I think having all warnings be errors might be slightly heavy handed in that new gcc versions have more warnings.

Linkage Issues

General discussion of issues related to linkage.

This command on OSX 10.12 appears to create a hobbes shared library:

clang++ -shared -Wl,-all_load /usr/local/lib/libhobbes-pic.a -WL,-noall_load -L/usr/local/lib `llvm-config --libs x86asmparser x86codegen x86 mcjit passes` -lz -lncurses -o hob.dylib

which at least works with the README test code. My LLVM is in /usr/local. The command is converting the already installed libhobbes-pic.a into a local shared library hob.dylib which i later install as /usr/local/lib/libhobbes_dynamic.dylib. Note, I like to use a distinct basename from the static lib, to be absolutely sure when linking i get the library I want (that is, never install *.a and *.dylib with the same basename, the linker is too smart).

When I run the link command above I also get this:

ld: warning: direct access in function '(anonymous namespace)::DataFlowSanitizer::addGlobalNamePrefix(llvm::GlobalValue*)' from file '/usr/local/lib/libLLVMInstrumentation.a(DataFlowSanitizer.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes-pic.a(cmodule.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'llvm::DOTGraphTraits<llvm::SelectionDAG*>::getNodeAttributes(llvm::SDNode const*, llvm::SelectionDAG const*)' from file '/usr/local/lib/libLLVMSelectionDAG.a(SelectionDAGPrinter.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes-pic.a(cmodule.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'llvm::CodeViewDebug::getFullFilepath(llvm::DIFile const*)' from file '/usr/local/lib/libLLVMAsmPrinter.a(CodeViewDebug.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes-pic.a(cmodule.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'llvm::CodeViewDebug::getFullFilepath(llvm::DIFile const*)' from file '/usr/local/lib/libLLVMAsmPrinter.a(CodeViewDebug.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes-pic.a(cmodule.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'llvm::CodeViewDebug::getFullFilepath(llvm::DIFile const*)' from file '/usr/local/lib/libLLVMAsmPrinter.a(CodeViewDebug.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes-pic.a(cmodule.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'llvm::ScalarEvolution::verify() const' from file '/usr/local/lib/libLLVMAnalysis.a(ScalarEvolution.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes-pic.a(cmodule.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'llvm::ScalarEvolution::verify() const' from file '/usr/local/lib/libLLVMAnalysis.a(ScalarEvolution.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes-pic.a(cmodule.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'replaceSubString(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, llvm::StringRef, llvm::StringRef)' from file '/usr/local/lib/libLLVMAnalysis.a(ScalarEvolution.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes-pic.a(cmodule.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'llvm::getPGOFuncNameVarName(llvm::StringRef, llvm::GlobalValue::LinkageTypes)' from file '/usr/local/lib/libLLVMProfileData.a(InstrProf.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes-pic.a(cmodule.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'llvm::getPGOFuncNameVarName(llvm::StringRef, llvm::GlobalValue::LinkageTypes)' from file '/usr/local/lib/libLLVMProfileData.a(InstrProf.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes-pic.a(cmodule.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function '(anonymous namespace)::hasObjCCategoryInModule(llvm::BitstreamCursor&)' from file '/usr/local/lib/libLLVMBitReader.a(BitcodeReader.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes-pic.a(cmodule.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function '(anonymous namespace)::hasObjCCategoryInModule(llvm::BitstreamCursor&)' from file '/usr/local/lib/libLLVMBitReader.a(BitcodeReader.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes-pic.a(cmodule.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.
ld: warning: direct access in function 'llvm::sys::getDefaultTargetTriple()' from file '/usr/local/lib/libLLVMSupport.a(Host.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes-pic.a(cmodule.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.

Now I get this:

otool -L hob.dylib
hob.dylib:
	hob.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.8)
	/usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 307.4.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.0.0)

This is NOT what I expected! There are no dependencies on LLVM there.
Turns out I didn't build LLVM as a shared lib. Grrr.

Support llvm-5

Building hobbes currently fails when using llvm-5.0 (latest stable release from September).

It appears various parts of the code check for llvm versions up to and including 4.x. Treating 5.0 as 4.x gets the code to build, but the resulting binary does not work. "hobbes-test" and "hi" both fail with an empty "LLVM ERROR". (No other message is printed).

Not being a c++ programmer and not understanding llvm means I have limited ability to patch this. Is support planned for llvm 5.0?

asyncClientAPI test hangs on OSX 10.11.6 El Capitan

With #29 applied, I can build hobbes-test (and hog and hi). Sadly however, the asyncClientAPI test appears to hang forever on OSX 10.11.6 El Capitan.

~/go/src/github.com/Morgan-Stanley/hobbes (osx_10.11.6) $ ./hobbes-test 
./hobbes-test 
Running 13 groups of tests
---------------------------------------------------------------------

  Arrays (4 tests)
  ---------------------------------------------------------
    Strings SUCCESS (2.47523s)
    Slices SUCCESS (91.5234ms)
    Comprehensions SUCCESS (76.8317ms)
    Streams SUCCESS (430.414ms)
  ---------------------------------------------------------
  3.07417s

  Compiler (2 tests)
  ---------------------------------------------------------
    compileToSupportsMoreThanSixArgs SUCCESS (2.33238s)
    liftApathy SUCCESS (2.18987s)
  ---------------------------------------------------------
  4.52231s

  Definitions (3 tests)
  ---------------------------------------------------------
    Basic SUCCESS (2.31537s)
    Arrays SUCCESS (43.2005ms)
    RecType SUCCESS (247.707ms)
  ---------------------------------------------------------
  2.60636s

  Existentials (2 tests)
  ---------------------------------------------------------
    Basic SUCCESS (2.25023s)
    Closures SUCCESS (17.2458ms)
  ---------------------------------------------------------
  2.26753s

  Matching (15 tests)
  ---------------------------------------------------------
    Basic SUCCESS (2.31324s)
    Strings SUCCESS (113.427ms)
    Arrays SUCCESS (66.5039ms)
    Struct SUCCESS (73.4537ms)
    Variant SUCCESS (67.4096ms)
    Efficiency SUCCESS (11.3873ms)
    Guards SUCCESS (28.5218ms)
    Regex SUCCESS (575.002ms)
    Support SUCCESS (26.2198ms)
    Tests SUCCESS (35.6084ms)
    Functions SUCCESS (155.126ms)
    Monadic SUCCESS (6.39863ms)
    matchFromStringToBoolIsBool SUCCESS (17.5476ms)
    matchFromIntToBoolIsBool SUCCESS (7.86611ms)
    matchFromStringToIntIsCorrect SUCCESS (19.2914ms)
  ---------------------------------------------------------
  3.5174s

  Net (2 tests)
  ---------------------------------------------------------
    syncClientAPI SUCCESS (2.6604s)
    asyncClientAPI   (...hangs here)

support for macOS 10.12

For the subtype constraint, to decide C++ class relationships, a substitute for gcc's '__cxxabiv1::__class_type_info' is needed.

For file signals, a substitute for inotify is needed.

For HSTORE/HLOG queues, a substitute for shared memory / futex is needed.

For loading structured data files, a substitute for mmap is needed.

(Will add notes as I run into new issues.)

class info RTTI under clang

There's a useful feature in hobbes where it can infer your bound C++ class relationships so that if you want to write an expression that needs that information (e.g.: passing a derived class to a function that wants a base class) then you can do it.

For gcc/Linux, this uses the '__class_type_info' structures in __cxxabiv1 (which is where this feature was originally implemented). For clang/OSX I couldn't find a way to get this same information, so I've disabled that functionality (and the corresponding unit tests) just on this platform. If there's a way to get it back though, it would be very useful.

match references to ambient variables

Sometimes we want to refer to the contents of a dynamic variable during a match (whether locally or externally defined). It would be useful if there was support for this without having to resort to guards.

e.g.

eqTest x y d = match x y with
| z z -> "both are equal"
| _ d -> "at least y is equal to the given d"
| _ _ -> "who knows?"

doesn't build

the master build fails on linux because of the more restrictive compiler flags.

move storage.H to standard reflect.H method for reflective types

This will put it in line with net.H, fregion.H, and cfregion.H for lightweight reflection.

LLVM 4.0 support

hobbes currently does not build with LLVM 4.0 out of the box, but only needs slight patching.

Biggest issue is that most ifdef only compare LLVM_VERSION_MINOR, I'd recommend to add a macro like LLVM_VERSION_ATLEAST(major,minor). Then it's essentially enough to use the newest ifdef for LLVM 4.0, and:

In jitcc.H add

#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LegacyPassManager.h

in llvm.H, replace include for llvm/Bitcode/ReaderWriter.h with llvm/Bitcode/BitcodeReader.h and llvm/Bitcode/BitcodeWriter.h, and rewrite context() to return a static instance of llvm::LLVMContext.

I have attached a quick patch that makes it build and pass the test suite, if you want I can prepare a real PR, but perhaps you may want to tweak this anyway.

0001-llvm-4.0-support.patch.txt

There is another problem, but it may be due to how packaging works on my distribution: it links all .a files of LLVM and the libLLVM-4.0.so, which results in duplicate instance of certain objects which break LLVM immediately. I haven't found a better way than editing the link.txt files manually to fix this yet. (llvm-config --libs correctly prints -lLLVM-4.0, not sure what cmake does.)

shmem doesn't get reclaimed if StorageGroup fails to initialize

invoke the ctor for each slot in an array<T>

https://github.com/Morgan-Stanley/hobbes/blob/50819464fa90f16de0a9143f3853948cd369b373/include/hobbes/hobbes.H#L59

Every slot is left uninitialised, which can cause crashes.

support match table "fall through"

In match tables where multiple contiguous rows have different pattern columns but identical result expressions, it would be helpful to support syntax such that the result expression only has to be written once.

e.g.:

match 1 2 3 with
| 1 x 3
| 1 2 x -> x
| _ _ _ -> 20

The first and second row share the same result expressions and so the first row can leave it unspecified expecting that it can be provided by a subsequent row.

As suggested by @skaller this can be supported for rows with differing sets of variables by requiring explicit bindings for unbound variables.

e.g.:

match 2 2 3 with
| 1 x y
| x 2 3 -> x+y
| _ _ _ -> 20

... will not compile, because the second row doesn't introduce a binding for "y" although its result expression requires it (but it would work for the first row which does introduce a binding for "y"). To make this work, we can introduce a local binding as in Felix with:

match 2 2 3 with
| 1 x y
| x 2 3 with y = 40 -> x+y
| _ _ _ -> 20

... would compile and return 42 because the input just matches the second row, so x=2 and at this row we've explicitly bound y=40 so the final result is x+y=42.

structured data files can't store fields whose types serialize to more than 2,147,479,552 bytes

These are pretty remarkable and unlikely types, and just describing them can bloat our files significantly (perhaps we should compress them), but in any case it appears that a write call in the logic for recording type structure into files assumes that the write either finishes in one shot or is a failure. However this isn't a correct assumption (at least on Linux) as described here:

http://man7.org/linux/man-pages/man2/write.2.html

hog write resumption scenario

Currently, if hog bounces then it will always start writing a new file when it resumes (even if there was a file already sitting around for the session that it resumes).

Instead, at least in -c mode, hog should only start a new file if/when the type signature actually changes.

Hobbes compilation errors

I have failed to compile hobbes. Do you have setup that is known to be tested to work? I have the following environment:
centos - 7
linux - 3.10
llvm - 3.7.1 (installed from source)
cmake - 3.6
gcc - 4.8.5

It exits compilation at stage "Linking CXX executable hobbes-test" with errors. It is mentioned in internet that the errors happened when LLVM is compiled by default with options LLVM_ENABLE_RTTI=0 and DEVMJIT=0. Although I have tried with "LLVM_ENABLE_RTTI=1" but without success. The actual errors are:

libhobbes.a(func.C.o):(.rodata._ZTIN4llvm18ValueMapCallbackVHIPKNS_5ValueENS_6WeakVHENS_14ValueMapConfigIS3_NS_3sys10SmartMutexILb0EEEEEEE[_ZTIN4llvm18ValueMapCallbackVHIPKNS_5ValueENS_6WeakVHENS_14ValueMapConfigIS3_NS_3sys10SmartMutexILb0EEEEEEE]+0x10): undefined reference to typeinfo for llvm::CallbackVH' libhobbes.a(func.C.o):(.rodata._ZTIN4llvm7PHINodeE[_ZTIN4llvm7PHINodeE]+0x10): undefined reference to typeinfo for llvm::Instruction'
libhobbes.a(func.C.o):(.rodata._ZTIN4llvm8FCmpInstE[_ZTIN4llvm8FCmpInstE]+0x10): undefined reference to typeinfo for llvm::CmpInst' libhobbes.a(func.C.o):(.rodata._ZTIN4llvm8ICmpInstE[_ZTIN4llvm8ICmpInstE]+0x10): undefined reference to typeinfo for llvm::CmpInst'
libhobbes.a(func.C.o):(.rodata._ZTIN4llvm17GetElementPtrInstE[_ZTIN4llvm17GetElementPtrInstE]+0x10): undefined reference to typeinfo for llvm::Instruction' libhobbes.a(jitcc.C.o):(.rodata._ZTIN6hobbes8LenWatchE[_ZTIN6hobbes8LenWatchE]+0x10): undefined reference to typeinfo for llvm::JITEventListener'
libhobbes.a(jitcc.C.o):(.rodata._ZTIN6hobbes5jitmmE[_ZTIN6hobbes5jitmmE]+0x10): undefined reference to typeinfo for llvm::SectionMemoryManager' libhobbes.a(jitcc.C.o):(.rodata._ZTIN4llvm18MCJITMemoryManagerE[_ZTIN4llvm18MCJITMemoryManagerE]+0x10): undefined reference to typeinfo for llvm::RuntimeDyld::MemoryManager'

Non-UNIX systems port?

Hello!

First of all thank you for open-sourcing hobbes, I just stumbled upon it and it looks great!
I'd be interested in contributing a win32 port if you are interested. For this, I have the following thoughts:

A first coarse review of the code shows you're relying on mman and UNIX sockets, epoll & kevent respectively. For a win32 port, these should be easily replaced with VirtualAlloc, MapViewOfFile et al, winsock2 and WaitForMultipleObjects, modulo some worker threads for tasks that don't map 1:1 to the epoll semantics.
But, seeing that this would not be the first cross-platform project that needs async IO, timers (and HTTP), I can't help but think of using a library like libuv to avoid having to maintain this sort of cross-platform code (with all unexpected caveats). I suppose avoiding this sort of maintenance burden would be in the interest of Morgan Stanley, too.
As I'm writing this, I'm reminded that LLVM should also contain the odd cross-platform abstraction for some of these topics, if not all (after all clang's static analyzer includes an http server as well, though I haven't checked the code so far).

So... may I get my hands dirty at this?

Cheers!

Pattern Extensions

Just suggestions! It useful to have multiple possible patterns with one handler:

| X= (a,b) | Y= (b,a) -> ....

It is even more useful if you add a "with" clause:

| Complex =(x,y) | Real =(x) with y = 0.0 -> ...

In principle these can be nested as well, including any guards. BTW I have no idea how you do exhaustion analysis when guards are used.

I'm not sure about the syntax. I think your current syntax is wrong. This syntax:

match e1 e2 e3 with
| p1 p2 p3 where C -> ...

is silly. You should get rid of the multiple values, there's no point to it, it gets in
the way of generalisations like the ones I outlined above, and the effect can be
obtained using tuples anyhow:

match (e1,e2,e3) with
| (p1,p2, p3) where C -> ...

Given that, the where clauses can be nested and you can add the with clause.
Indeed, any number each in any order. You could then write

| (A | B with .. | D where ... )

kind of thing. I can do these in Felix. I actually cheat by flattening the matches
and then replicating the handlers which allows the variables to be generic
but you probably don't want to do that (I desugar pattern matches before
type analysis which is a bad idea but simplifies the compiler).

Felix also supports user defined matches. To do that you pick a pseudo
constructor name, then define two functions based on that name:
one to see if a value matches or not, and one to extract the constructor argument.
Felix actually defined "Snoc" instead of "Cons" for lists, because it puts the pointer
first, allowing some operations to be universal (independent of the value type).
I then have a "user defined constructor" named "Cons" you can match against
in the expected order. I think F# has something like this. If you do this kind of
thing you cannot check for exhaustion (so the user will have to add a wildcard
at the end).

Another really painful thing with pattern matching is that you cannot use defined
values in a pattern easily. You have to first bind a variable and then use a guard,
in Felix I allow something like:

let x = 1 in match y with 
| $(x) -> "one"

This allows one to use "manifest constants" in patterns. Means the same as

| z where z == x -> "one"

some regex match tables compile to overly large DFAs

A match table column looking for multiple substrings translates to a DFA with a lot of "incidental ambiguity" -- '.SS0.|.SS1.|.SS2.|...|.SSn.' -- and this leads to a very large DFA that takes a long time to minimize and compile. Instead we should factor this regex column into '.(SS0|SS1|SS2|...|SSn).' and add a register to track which substring matched (which is all that the extra paths in the exploded DFA were doing anyway).

Scripting

Does hi allow us to execute scripts. In a way similar to doing python file.py. hi file.hob seems to open a repl after evaluating the module.

Very large pattern tables can take a long time to compile

The current approach to match compilation builds a DFA from a match table by selecting a column for discrimination and then recursively building DFA states for each branch on the set of implied tables (with the discriminated column removed). Finally the derived DFA is translated to a more primitive expression that can then be compiled.

This works well and produces very efficient residual machine code, however for very large tables (e.g. 10,000 rows, 30 columns) this can produce very large DFAs and so a significant amount of code and take a very long time to compile.

We should have an option to compile such tables more quickly and can tolerate some loss of runtime efficiency but should minimize that loss as much as possible.

type class for overloaded assignment

This might also need some way to solve open constraints by default, otherwise assignment of generic variants into e.g. fixed enum fields will be left open.

hobbes should build with -Werror=old-style-cast

Some of the storage.H code uses C-style casts and fails to build with this flag.

Algebraic types

Can I make a new algebraic type of variant?

data foo_or_not = |foo:[char],hotdog:int|
println( 
    match (|foo="chicken"|::foo_or_not) with 
        | |foo=x| where x ==  "chicken" -> x 
        | |hotdog=_| -> "hi"
        | _ -> "hello"
)

It seems like this is not possible. As the compiler says that this is unsat when run hi -x -s file.hob

support compression for logging to shmem

In this scenario, the consumer should accumulate statistics to feed compression tables to the producer. The producer can then apply those tables to pack data more efficiently.

Chat room for hobbes

Hello, I was wondering if we could create some sort of chat room for hobbes. This would make separating issues from questions simpler. Gitter.im or Slack is the one most github hosted projects seem to use.

regex pattern-matching : match implements a search-like behavior, instead of a match

I noted the following issues with regex pattern matching :

1.- binding the same name from 2 different pattern rows confuses the result :

match "fooooooooobar" with | 'f(?o*)bar' -> o | '(?.+)(?o*)bar' -> o | _ -> "none"
""
match "fooooooooobar" with | 'f(?o*)bar' -> o | '(?.+)(?o*)bar' -> od | _ -> "none"
"ooooooooo"

2.- .+ followed by another regex in sequence obliterates the sequenced regexes. Although this behaviour is somewhat correct, a user might expect a stricter matching behaviour to be implemented here. Some regex engines differentiates between matches and searches; is it something we'd benefit from having here too ?

match "fooooooooobar" with | 'f(?o*)bar' -> o | '(?.+)(?o*)bar' -> h | _ -> "none"
"ooooooooo"
match "fooooooooobar" with | 'a(?o*)bar' -> o | '(?.+)(?o*)bar' -> h | _ -> "none"
"fooooooooobar"
match "fooooooooobar" with | 'a(?o*)bar' -> o | '(?.+)(?o*)bar' -> hd | _ -> "none"
"fooooooooobar"
match "fooooooooobar" with | 'a(?o*)bar' -> o | '(?.)(?o*)bar' -> hd | _ -> "none"
"f"
match "fooooooooobar" with | 'a(?o*)bar' -> o | '(?.+)(?o*)bar' -> hd | _ -> "none"
"fooooooooobar"

Crash with llvm-4.0.1 when allocating after a failed write to a name

Crash with ToT hobbes and llvm-4.0.1 after failing to update a variable, and then allocating another.

# ./hi
hi : an interactive shell for hobbes
      type ':h' for help on commands

> x=54

> x<-75
stdin:1,1-1: Failed to get reference to global variable: x
1 x<-75                                                                           

> y=newPrimZ()::int
hi: /home/billt/llvm-4.0.1.src/include/llvm/Support/Casting.h:95: static bool llvm::isa_impl_cl<To, const From*>::doit(const From*) [with To =     llvm::BranchInst; From = llvm::TerminatorInst]: Assertion `Val && "isa<> used on a null pointer"' failed.
Aborted (core dumped)

Of course hob manages to avoid this, but stopping at the first error.

# ./hi test.hob 
hi : an interactive shell for hobbes
      type ':h' for help on commands

test.hob:2,1-1: Failed to get reference to global variable: x
1 x=54                                                                            
2 x<-75                                                                           
3 y=newPrimZ()::int                                                               
4 y<-2                                                                            
5 print(x+y)

README compilation example

FYI on OSX 10.12 this worked for me:

clang++ -std=c++11 test.cpp -o test -lhobbes -lz -lncurses -L/usr/local/lib `llvm-config --libs x86asmparser x86codegen x86 mcjit passes`

Changes from documented command line: OSX does not require -ldl.
I did not have libtinfo, but libncurses worked.
My hobbes and LLVM are in /usr/local. Not quite sure why clang++ finds
stuff in /usr/local/include.

[Of course I get hundreds of lines of these warnings:

ld: warning: direct access in function 'replaceSubString(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, llvm::StringRef, llvm::StringRef)' from file '/usr/local/lib/libLLVMAnalysis.a(ScalarEvolution.cpp.o)' to global weak symbol 'std::__1::char_traits<char>::eq(char, char)' from file '/usr/local/lib/libhobbes.a(funcdefs.C.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.

]

hi on macOS repeats input lines

Incorrect readline behavior? Default readline lib is wrong?

The same code works fine on Linux, so something funny is going on.

unsigned types should have distinguished types

Currently the lift definitions expediently lift unsigned types as signed types. This is adequate for "pass through data" but not correct.

generalize fileref types for buffers with local references

We might want to introduce typed buffers with local references (in the same way that filerefs identify local references within files).

exponential back-off on storage group producer / consumer

Building Hobbes on MacOS

I am trying to build Hobbes on Mac and got stuck with this:

cmake .
CMake Error at CMakeLists.txt:5 (find_package):
  Could not find a package configuration file provided by "LLVM" with any of
  the following names:

    LLVMConfig.cmake
    llvm-config.cmake

  Add the installation prefix of "LLVM" to CMAKE_PREFIX_PATH or set
  "LLVM_DIR" to a directory containing one of the above files.  If "LLVM"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!
See also "/Users/x/code/sb/hobbes/CMakeFiles/CMakeOutput.log".

Darwin y.local 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64

Am I missing something?

Hobbes not reentrant

Quick look at the C++ embedding example seems to indicate the system isn't reentrant.
Not sure how easy that is to fix. The golden rule in C/C++ is never never never ever
use global variables.

This indicates the problem:

    hobbes::resetMemoryPool();

The standard way to do this is to have a function which creates a handle, which is then passed around. Just FYI here is the test case in the README, in Felix:

type hobbes_cc = "::hobbes::cc*" 
  requires package "hobbes"
;

ctor hobbes_cc : unit = "new ::hobbes::cc";

type hobbes_proc = "hobbes_proc" 
  requires header "typedef void (*hobbes_proc)(void);"
;

fun compileProc: hobbes_cc * string -> hobbes_proc = 
  "$1->compileFn<void()>($2)"
;

proc runProc: hobbes_proc = "$1();";

proc hobbes_resetMemoryPool : hobbes_cc = "hobbes::resetMemoryPool();" 
  requires package "hobbes"
;

var cc = hobbes_cc();

hobbesloop: while true do
  var s = readln stdin;
  if s == ":q\n" break hobbesloop;
  var x = "print(" + s + ")";
  var p = compileProc (cc, x);
  runProc p;
  println$ "Read s="+s;
  hobbes_resetMemoryPool cc;
done

No error handling yet. The compile should be polymorphic.
The package "hobbes" does all the header file inclusion and linkage required.
Here's the database entry I'm using:

Platform: OSX
Description: hobbes
provides_slib: -L/usr/local/lib -lhobbes
requires_slibs: -lz -lncurses -L/usr/local/lib  -lLLVMPasses -lLLVMipo -lLLVMInstrumentation -lLLVMVectorize -lLLVMLinker -lLLVMIRReader -lLLVMAsmParser -lLLVMMCJIT -lLLVMExecutionEngine -lLLVMRuntimeDyld -lLLVMX86Disassembler -lLLVMX86CodeGen -lLLVMGlobalISel -lLLVMSelectionDAG -lLLVMAsmPrinter -lLLVMDebugInfoCodeView -lLLVMDebugInfoMSF -lLLVMCodeGen -lLLVMTarget -lLLVMScalarOpts -lLLVMInstCombine -lLLVMTransformUtils -lLLVMBitWriter -lLLVMAnalysis -lLLVMProfileData -lLLVMX86AsmParser -lLLVMX86Desc -lLLVMX86Info -lLLVMX86AsmPrinter -lLLVMX86Utils -lLLVMObject -lLLVMBitReader -lLLVMCore -lLLVMMCDisassembler -lLLVMMCParser -lLLVMMC -lLLVMSupport -lLLVMDemangle
cflags: -I/usr/local/include
includes: '<hobbes/hobbes.H>'

hog should block new connections until old disconnected ones are drained

There is a concern that one producer process might record a series of transactions, crash (or be shut down), then restart, and have transactions recorded out of order between the two (ie: some old transactions, followed by some new transactions, followed by some old transactions).

Instead, we should have hog block new connections until it's confirmed that old disconnected ones are completely finished.

skip list structure for concurrent indexing

We should have a way to store indexes in a way that's safe for concurrent reading. It should work from C++ as well as from hobbes. Programs like hog should be able to store data this way, possibly compressed.

Test Result

Probably the build instructions should include

make test

I tries that and got this:

~/Desktop/hobbes>make test
Running tests...
Test project /Users/skaller/Desktop/hobbes
    Start 1: hobbes-test
1/1 Test #1: hobbes-test ......................***Exception: Other 23.11 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =  23.12 sec

The following tests FAILED:
	  1 - hobbes-test (OTHER_FAULT)
Errors while running CTest
make: *** [test] Error 8

Any idea what's happening?

Missing hog binary doesn't result in error to terminal

There are no exception handlers, so the code which throws the exception just terminates the process. It would helpful to know that the execv failed because the hog file was missing.

https://github.com/Morgan-Stanley/hobbes/blob/45b68b98aa9891aed8cf09a11aaa557edd495886/test/Hog.C#L131

unstable test observed (race condition?)

https://travis-ci.org/Morgan-Stanley/hobbes/jobs/359367109

object.method style of calling type class instance methods doesn't work.

Example,

Suppose we defined a type class

class SetVal a where
setVal :: (, a) -> bool

instance SetVal [char] where
setVal = setHobbesArray

instance SetVal ()+( * long) where
setVal = setVariantPair

Then in your hobbes code:

mo.setVal("ABC") <-- this works
mo.setVal(mypair) <-- this doesn't however, setVal(mo, mypair) works fine.
Error for the first call is:
Cannot unify types: (() + ( * long)) != [char]

type bindings should work in implicit type classes the same as in explicit type classes

We have some cases where functions are defined that take type parameters and use them internally, and these functions are lifted into (implicitly-defined) type classes. However the type binding in these classes is not identical to the equivalent explicitly defined type class -- the link between the signature type binding and the internal reference is lost.

typical hobbes types should be placed in store<T> and io<T>

Although the storage.H and net.H libs are self-contained and easy to transport out of hobbes, if they're used within the context of hobbes then we should at least specialize store and io for common hobbes types (like datetimeT, alias, structs, variants, etc).

issue with install hobbes in centos 7

I make and installed quickly in my debian server.However, when comes to Centos 7 . I have tried llvm3.4 ,llvm3.9, llvm5.0 but they all make failed. Sometimes not get the LLVM_VERSION_MINOR;
sometimes miss llvm::createGVNPass() this method,even in llvm5.0, Not such file:llvm/Config/llvm-config.h.I use yum install llvm in Centos. I am really love this language and want to do some development for analysis data.

issue with substring captures in regexes

m a =
match a with
| (?< p > . ) _ (?< m >CDF|SWP) _ PS _ . -> m
| _ -> "0"

print(m("foo-_CDF_PS _ -bar"))

this prints out " _ SP _ ". This result doesn't look accurate.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.