This is a brain dump of an idea I've had for a long time around how we can make codegen a first class citizen that's easy to use and orchestrate in ReScript. I'm posting this here in the Rewatch repo because exploring it involves changes to the build system, and Rewatch looks like a great place to try that type of changes.
Summary
Proposing first class support for code generation in the ReScript build system and compiler. This can enable easily embedding other languages directly in your code. SQL, EdgeQL, GraphQL, markdown, CSS - anything really. Generators can be written in any language, and the build system will take care of everything from when to trigger the generators most efficiently, to managing the generated files from each generator (regenerate, delete, etc).
Here's a quick pseudo example of how this idea could work for embedding other languages, implementing a type safe SQL code generator:
// UserResolvers.res
// module Query is replaced with a reference to the file generated by the sql generator
module Query = %gen.sql(`
select * from users where id = $1
`)
// The file generated by the sql generator has a function called `query`, that takes an argument `id`
let getUserById = (~id) => Query.query(id)
Let's break down at a high level how this pseudo example could work.
- The build system scans
UserResolvers.res
before it compiles it, and sees that it has %gen.sql
. It looks for a generator registered under the sql
name.
- It finds our
sql
generator and calls it with some data including the file name, the string inside of %gen.sql()
, and a few other things that can help with codegen. The generator in this example will leverage information from a connected SQL database to type the query fed to it, and generate a simple function to execute the query. Since the generator is responsible for emitting an actual .res
file and not rewrite an AST, it can be written in any language, as long as we can call it and feed it data via stdin.
- The generator runs and outputs
UserResolvers__sql.res
. The build system knows this and now handles UserResolvers__sql.res
as a dependency, meaning it knows when to clean up the generated file, and so on.
- A built in PPX in the compiler turns the
module Query = %gen.sql
part into module Query = UserResolvers__sql
. A very simple heuristics-based swap from the embedded code definition to the module its generator generates, powered by rules around how to name files emitted by generators.
Generation will be easily cacheable, since regeneration of the files is separate from the compiler running. This means that the build system and the generator in tandem decides when to regenerate code. And this in turn means that you pay the cost of code generation only when the source code for the generation itself changes.
There's of course a lot of subtlety and detail to how to make this work well, be performant, and so on. But the gist is the above. I'll detail with more examples later.
Goals
The idea behind this is that codegen is a fairly simple tool that's efficient in many use cases, but is too inaccessible right now. In order to do codegen today, you need to either write a PPX, or for separate codegen have:
- Your own watcher that watches whatever source files you generate from
- Your own dependency management of the files you generate
- Separate build commands/processes for your code generators
With the approach to codegen outlined above, you'll instead need:
- A code generator written in whatever language you want
- Some simple configuration
...and that's it. The ReScript compiler and build system handles the rest.
Concerns
Performance
Performance is king. We need to be very mindful to keep build performance as fast as possible. This includes intelligent cacheing etc, but also setting up good starter projects for building performant generators.
We can of course ask users to write generators in performant languages like Rust and OCaml. But, one strength of this proposal is that you should be able to write generators in JS and ReScript directly. This has several benefits:
- Using ReScript to write ReScript tooling is nice because ReScript is obviously a nice language
- The JS ecosystem is huge and has tooling and packages for almost everything
- All of the regular reasons JS is nice to write - not having to build and distribute binaries for each target platform, etc
In order to make the JS route as performant as possible, we can for example recommend using https://bun.sh/, a JS runtime with fast startup, and include tips on how to keep Bun startup performance fast.
As for the design of the generators themselves, they can hopefully be designed in a way so that they can:
- Run async in dev mode, so they don't slow down the regular compiler
- Be possible to run in parallell
- Be heavily cacheable
Tooling (LSP, syntax highlighting, etc)
Embedding languages in other languages is a pretty common practice. For example, we already have both graphql-ppx and RescriptRelay embedding GraphQL in ReScript. So for tooling, it's a matter of adjusting whatever tooling already exists to be able to understand embedded code in ReScript.
Error reporting
In an ideal world, code generators can emit build errors that the build system picks up, and by extension reports to the user via the editor tooling. This would be the absolute best solution, if codegen errors are picked up and treated like any compiler error.
Future and ideas
Here are some loose ideas and thoughts:
- We can have a dedicated editor code action to rerun a code generator whenever needed. Good for generators where you want full control of when they're rerun.
- Generators could be driven both by embedded languages (
%gen.sql
as example is above) or by fully separate files (.gql
, .sql
, etc).
- Generators could be both installable (npm packages) and local hand rolled (point to local file that's the code generator). In the package case, we could find a way for each package to be able to provide its own configuration.
- We can provide "optimized" general tooling for writing code generators in ReScript (and OCaml?).
- Could support AST based generation, as in allow regular ReScript code in
%gen.<generator>
, and pass a representation of that AST to generators.
Use case examples
Not sure we actually want to encourage all of these, but just to show capabilities.
Embedding EdgeDB
I did an experiment a while back for embedding EdgeDB inside of ReScript: https://twitter.com/___zth___/status/1666907067192320000
That experiment would fit great with this approach:
- A generator for EdgeDB is written in JS and registered for
%gen.edgedb
.
- That generator calls out to the general EdgeDB tooling to produce the types needed.
- That's it. The build system handles the rest.
Embedding GraphQL
The same goes for GraphQL. For those who don't want to use a PPX-based solution, it'd be easy to build a generator (something similar to https://the-guild.dev/graphql/codegen perhaps) that just emits ReScript types and helpers.
Type providers: OpenAPI clients
F# has a concept of "type providers": https://learn.microsoft.com/en-us/dotnet/fsharp/tutorials/type-providers/
We could do something similar with this approach.
Imagine you have a URL to an open API specification. We'll take GitHub's as example: https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/ghes-3.9/ghes-3.9.json
Now, imagine there's a generator for turning an OpenAPI spec into a ReScript client, ready to use. We could write a generator to hook up that OpenAPI generator:
module GitHubAPIClient = %gen.openapi("https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/ghes-3.9/ghes-3.9.json")
// Pseudo
GitHubAPIClient.getUserById(~id="githubUserId")
Roll your own simple CSS modules
You could use this to roll your own simple CSS modules.
Imagine a code generator registered for gen.cssModules
.
// SomeModule.res
module Styles = %gen.cssModules(`
.primary {
color: black;
}
`)
let button = <Button className=Styles.primary />
The code generator is called with the CSS string above, and relevant meta data. It reads the CSS using standard CSS tooling, and just like CSS modules it hashes each class name based on the file name it's defined in, plus the local class name. It then outputs two files:
/* __generated__/SomeModule__cssModules.css */
/* This file is automatically generated. Do not edit manually. */
.dzs16n {
color: black;
}
// __generated__/SomeModule__cssModules.res
// This file is automatically generated. Do not edit manually.
// @sourceHash("<file-hash-here>")
@inline let primary = "dzs16n"
%raw(`import "./SomeModule__cssModules.css"`)
And, the original file after it's transformed by the internal compiler PPX for the code gen:
// SomeModule.res
module Styles = SomeModule__cssModules
let button = <Button className=Styles.primary />
There, we've reinvented a small version of CSS modules, but fully integrated into the ReScript compiler.
Next step: a PoC
There's a lot to explore and talk about if there's interest in this route. A good next step would be to pick one simple generator, and PoC how it could look integrating it into the build system. @jfrolich we talked about this briefly.
If there's interest from you to explore this further, we could set up a simple spec of what needs to happen where to explore this further. What do you say?