cucapra / filament Goto Github PK
View Code? Open in Web Editor NEWFearless hardware design
Home Page: http://filamenthdl.com/
License: MIT License
Fearless hardware design
Home Page: http://filamenthdl.com/
License: MIT License
Any uses of @phantom
ports in the body of a component should be an error.
For all the data ports, why do we even need a guard? In absence of sharing, the port will either receive a 0
(because of how calyx generated muxes) or always receive the same input. If data ports behave as described in calyxir/calyx#1169, then we can just always forward the input because the Filament type system guarantees that we will not read outputs corresponding to 'x
.
In other words, because the default assignments for data ports should be 'x
(according to the Calyx proposal above), it is safe to replace the assignment: in = guard ? data : 'x
with in = data
because 'x
can be any value.
With parameteric module, recursion starts making more sense if we can prove that the recursion terminates. This can be done if we can show that all parameters to a recursive call are smaller than their current values (maybe only one parameter needs to decrease?)
Maybe Ryan knows if this is the same as Coqs termination check.
The following program defines an unsafe pipeline:
import "primitives/core.fil";
component Main<G>(
@interface[G, G+3] go_G: 1,
@[G, G+1] l0: 32,
@[G, G+1] r0: 32,
@[G+9, G+10] l1: 32,
@[G+9, G+10] r1: 32,
) -> () {
M := new Mult;
m0 := M<G>(l0, r0);
m1 := M<G+9>(l1, r1);
}
It runs the same multiplier on the first and the ninth cycles of its execution.
However, this pipeline is unsafe; here is a sequence of events that will cause conflicting use of G:
Main
triggered (invocation 1), m0
starts executingM
completes executionMain
triggered again (invocation 2). However, invocation 1 of Main
attempts to execute m1
which now conflicts with execution of m0
in invocation 1.The @interface
port should be constrained so that it does not attempt to restart the module until all executions of a instance have had a chance to run.
Definition: For a pipeline
Definition (safe execution of a pipeline): A pipeline execution is safe if no resource used in the pipeline conflicts with any other usages of the pipeline. In other words, all active invocations of the pipeline use disjoint sets of resources.
Note: The use of "anytime" in the definition of an initiation interval is important because there might exist other values
The current restriction on @interface
are:
We can divide the requirements into safety conditions for one invocation and safety conditions for pipelined use.
(3), along with the availability checking of signals within the body of the component, ensures that one use of a pipeline is correct; all signals are valid when they are read and resource usage is non-conflicting.
(1) & (2) attempt to ensure that pipelined usages of the component are safe.
(1) ensures that subsequent pipelined invocations do not break the requirements or guarantees of a previous invocation.
(2) ensures that all components used inside the module are only invoked when they can be safely pipelined.
The missing piece is this: the @interface
signal needs to ensure that a pipelined invocation of a resource does not conflict with the current use of the components.
A high-level, we can imagine "ghost" version of all components floating around that represent the execution of the pipeline at time interval
My current thought on how to do it is by ensure that the end time of @interface
is greater than or equal to the max
of the end times of all usages of an instance. I have to convince myself that this is necessary and sufficient.
Build a simple tool that parses a Verilog module's signature and generates a Filament stub out of it. Next step would be to add comments to the Verilog module that can be extracted as signatures directly
Currently, we specify interfaces like this:
@interface[G, L]
which denotes that this is an interface for the event G
and has the length L - G
. However this doesn't allow us to specify interfaces whose delay is something like G-L-1
because L-1
is not a valid event. However, consider the Latch
interface:
component Latch<G, L>(
@interface[G, L] write_en: 1,
@[G, G+1] in: 32
) -> (
@[G+1, L] out: 32
) where L > G+1;
It says that the Latch may not be re-executed more than once every L-G
cycles. However, for a concrete use of the latch, we'll have something like:
L = Latch<G, G+2>(in)
In order to make the in
signal available between G+1
and G+2
. However, the interface will force us to not use the latch for G+2 - G
cycles which is not correct.
This problem came up in the broader context of trying to implement an II=1 pipeline:
component FastMult<G>(
@interface[G, G+1] go_G: 1,
@[G, G+1] left: 32,
@[G, G+1] right: 32,
) -> (
@[G+3, G+4] out: 32,
) {
// First stage, register the inputs
L := new Latch;
R := new Latch;
l := L<G, G+2>(left);
r := R<G, G+2>(right);
// Second stage, perform the computation and save it
M := new MultComb;
m := M<G+1>(l.out, r.out);
OutTmp := new Latch;
ot := OutTmp<G+1, G+3>(m.out);
// Third stage, forward the value from temp out to out register
Out := new Latch;
final := Out<G+2, G+4>(ot.out);
// Connect the output to the out register
out = final.out;
}
This pipeline cannot type check because G
pulses every cycle but the interface requirements on the Latch will not allow it to pulse more often than every two cycle.
Seems like they probably should? This program type checks on 8d9af95:
extern "dummy.sv" {
component Merge<G, L, E>(
@interface[G, L] go_G: 1,
@interface[L, E] go_L: 1,
@[G, L] in1: 32,
@[L, E] in2: 32,
) -> (
@[G, E] out: 32,
) where L > G, E > L, E > G;
But I think that @interface[G, L]
is wrong because the out
signal needs to be held between [G, E]
. Shouldn't the interface for G
be greater than E
?
In Filament, components are implicitly pipelined. For example, in the following component can execute in a pipelined fashion:
component MultAdd<G>(
@interface[G, G+1] go_G: 1
@[G, G+1] left: 32,
@[G, G+1] right: 32,
@[G+1, G+2] acc: 32
) -> (
@[G+1, G+2] out: 32
) {
M := new Mult;
A := new Add;
m0 := M<G>(left, right);
a0 := A<G+1>(m0.out, acc);
out = a0.out;
}
Since the interface for G
specifies that it may trigger every cycle, the Filament type checker must check that repeated re-execution is safe.
Safety is checked is assuming that G
will trigger any time after its delay, in this case 1, and showing that no re-execution of the pipeline after G's delay would conflict with the current iteration.
Consider the following component signature:
component Dyn<G, L>(
@interface[G, G+3] go_G: 1,
@interface[L, L+1] go_L: 1, ...
) -> ( ... ) where L >= G+3 { ... }
The interface specifies two events G
and L
and provides an ordering constraint between them, L >= G+3
. The ordering constraint requires that any called of the component ensures that L
arrives at least 3 cycles after G
does.
In order to ensure safe pipelining, we have to reason about the retriggering behavior of both G
and L
and ensure that their delay do not violate the constraint on the interface. As provided, the interface is incorrect; here is an trace of events that is accepted by the interface but violate the constraint:
Note that this trace is accepted by the delay specifications for
The key problem is that the delay specification for G
and L
do not constraint the event in such a way that subsequent retriggers obey the constraints of the interface. Ideally, we could change the delay specification such that the constraint holds. We would like trace history to look something like:
G0 --> G0+3 --> L0 --> G1
This implies that the delay for G
must be dependent on the event L
occurring; G
may not retrigger unless L
has triggered:
@interface[G, L] go_G: 1
Next, we need to ensure that L1
obeys the interface constraint:
... --> L0 --> G1 --> G1+3 --> L1
which is the same as (where dG
is the delay for G
):
... --> L0 --> G0+dG --> G0+dG+3 --> L1
This means that the constraint on the delay for L
looks something like:
L + dL >= G + dG + 3
This shows that the delay for L
depends on the delay of G
; however, the delay of G
is itself dependent on L
.
TLDR: The history of L
and G
are interdependent; in order to figure out when the next L
should occur, we need to know when the last G
occurred. In other words, the @interface
for both events need to be able to talk about the trace of the other event directly so that the constraint can be obeyed.
The problem here is the constraint between G
and L
. By specifying it, we relate the traces of events in some manner, and we need the delay specifications to encode this information. Since interfaces only talk about delays, there is no way to relate histories of multiple events.
At this point we have two solutions:
I do not think (1) is worth doing. This is because even if we can make it work, this way of relating dynamic events together does not correspond well with dynamic pipelines. I think the right way of doing this probably requires some mechanism of specifying type-level "back pressure"; the retrigger of G
and L
is not dependent on the two events but rather the availability of the resources. The resources should be able to apply back pressure to the rest of the circuit to stall events from arriving. Doing all of this is definitely out of scope for the first paper.
This means that the proposed solution is disallowing ordering constraints in the language.
The proposed fix is to disallow constraints on user-level components but to allow them on primitives. For example, the register primitive is still allowed to specify the constraint:
component Latch<G, L>(
@interface[G, L] go_G: 1,
@[G, G+1] in: 32,
) -> (
@[S, L] out: 32
) where L > G+1;
User-level components do not have ordering constraints, there is no way to construct a dynamic use of the Latch
:
component Main<G, L>(@[G, G+1] in: 32) -> (...) {
L := new Latch;
l0 := L<G, L>(in); // <- no way to prove L > G+1
}
This program will not type-check since the component cannot specify L > G+1
.
However, uses of such Latch
that use the same event are still valid:
l0 := L<G, G+3>(in) // G+3 > G+1
Furthermore, it might still be possible to use max
expression to express a limited form of dynamism in the language.
With complex bundle types showing up in the arguments of components, it seems that a more general technique to represent IO interfaces is in need. Borrowing ideas from Chisel, I think we can use the concept of Bundle
(different from filament bundles which are just arrays).
The high-level idea is that a struct definition has a delay and it ensures that all sub-fields hold onto values for less time than the delay of the struct:
struct PacketInfo<G: 1> {
id: @[G, G+4] 32,
fields: for<#i> @[G+#i, G+#i+1] 32,
}
Type-checking should reject this struct
definition because the id
field holds onto values for 4
cycles while the PacketInfo
specifies that the event G
reoccurs every cycle. This is problematic because this means that providing a new PacketInfo
every cycle will cause problems.
Components like Mac
can be defined like this now:
struct MacInputs<G: 1> {
left: @[G, G+1] 32,
right: @[G, G+1] 32,
acc: @[G+2, G+3] 32
}
comp Mac<G: 1>(inp: MacInputs<G>) -> (@[G, G+1] out: 32) { ... }
Of course, you should be able to specify parameters for your structs.
Parameters in Filament are universally quantified. For example, for the following module:
comp Foo[#W]() -> () where #W >= 32 { ... }
The parameter #W
is universally quantified: forall values of #W
that respect the constraints, we generate a component that is well-pipelined. Such universal quantification allows the calling context to decide how to use the module by providing a concrete instantiation for #W
. In return, the calling context must prove that the binding for #W
satisfies the constraint.
Existential types are the dual of universally quantified types. Instead of the calling context deciding how to use the module, the module tells the context how it's allowed to use it. For example:
comp ShiftSome[exists #L]<G: 1>(@[G, G+1] in: 32) -> (@[G+#L, G+#L+1] out: 32)
This module delays the signal by some amount #L
that is unknown to the calling context. Instead, when the calling context instantiates a shift circuit, it must use the specific instance of #L
bound to that ShiftSome
component:
comp main<G: 1>(... in: 32) -> () {
S := new ShiftSome; // We don't provide a parameter here
s0 := S<G>(in);
mux := new Mux[32]<G+S.#L>(sel, s0.out, ...); // Scheduling must use S.#L to correctly use the signal s0.out
}
Note that when scheduling this program, we have to use S.#L
to schedule things. S.#L
is unknown when we write the program but will turned into a concrete value during compilation. While all invocations of S
will get the same constant, different instantiations of ShiftSome
might get different values of #L
. This is important because it opens the door to something really cool–per-instance retiming.
Constraints on existential parameters must be satisfied by the body of the component defining them and can be assumed by the calling context.
The compilation strategy for existentials involves turning them into universally quantified types. This has the effect of turning the quantifiers "inside out". For each instance, we decide (somehow), the binding for each existential variable and replace all instantiations to use those constants. For example, in the above program, if we decide #L=10
is a good binding for the variable, we change:
S := new ShiftSome;
Into:
S := new ShiftSome[10]
And make the signature of the component take to have #L
be universally quantified parameter.
This is not fully developed yet but I think it's a good starting point.
A key part of a language ecosystem is a well thought out package manager and module system so let's build one for Filament! There is a couple of key challenges with designing something like this for Filament:
The space of module systems is vast and complicated. I suggest we start with something simple and extensible like Rust's module and namespacing scheme. We should have the ability to describe private and public components and referring to them using the ::
syntax.
@sampsyo's writeup on modules for Calyx is a good starting point for other ideas. Specifically, take a look at the "related work" links which mentions verilog package managers: calyxir/calyx#419
It seems that if we can prove that none of the values in a bundle overlap then we can use the same wire to receive the inputs. What is the right way to do this? Probably needs special handling in the backend
I was wondering a couple of related questions:
[G, L]
or [G, max(G, L)+1]
as opposed to [G, G+n]
.A trivial multi-event component is one that essentially has different "paths"
running through its signals associated with an individual event:
component two_paths<G, L>(
@[G, G+1] in0: 32,
@[L, L+1] in1: 32,
) -> (
@[G+3, G+4] out0: 32,
@[L+2, L+3] out1: 32,
)
The signature of this components tells us that the two events, G
and L
do not interact in any observable way; if they did, at least one of the outputs would have a lifetime that depended on both of them.
One of the simplest non-trivial multi-event component is a Latch:
component Latch<G, L>(
@interface<G, L> write_en: 1,
@[G, G+1] in: 32
) -> (
@[G+1, L] out: 32
)
The out
signal of the latch requires that it be available till the event L
occurs. The write_en
signal enforces that the latch cannot be used again till at least the event L
has been observed. Because the implementation of the Latch
does not provide a way to observe the L
event, there is no Filament program that can construct signal available from [G+1, L]
; this means Latch
is necessarily implemented as an extern.
Another non-trivial multi-event component is the following Join
component which provides its output when both the input tokens, left
and right
arrive:
// A component that triggers when both left and right tokens are available.
component Join<G, L>(
@interface<G, max(G, L)> left: 1,
@interface<L, max(G, L)> right: 1,
) -> (
@[max(G, L), max(G, L) + 1] out: 1,
) {
LL := new Latch;
LR := new Latch;
A := new CombAnd;
l := LL<G, max(G, L)+1>(left);
r := LR<L, max(G, L)+1>(right);
and := A<max(G, L)>(l.out, r.out);
out = and.out;
}
This is a truly dynamic component: G
and L
are entirely unrelated and unordered which means either event can fire first. In fact, different invocations can have different orders of G
and L
firing.
The interface
ports for the two events require that the module may not be used till max(G, L)
has been observed, which again, is a truly dynamic event in the system.
Some components have signatures like this:
component Mux<G>(
@interface[G, G+1] _go: 1, ...
Where the signal _go
is not used inside the component and only provided for modeling the II of the component (in this case, a combinational Mux that is available every cycle).
However, there is another way to interpret an "unused" interface signal which basically states that the signal is "internal" to the module and cannot be provided by the outside world. For example, consider a pipelined systolic array that always moves its internal values no matter whether a new input was provided or not. In this case, the calling context cannot provide this pipeline with a go
signal since the pipeline is continuously performing its computation. I'll call such pipeline "continuous pipeline". Other benefits of continuous pipelines is that they do not need internal pipelined FSMs because their internal logic is always executing. Because of this, we can completely eliminate guard in certain cases as well (#37).
The challenge with such pipelines is that there isn't a clear way to have the calling context be forced to continuously provide valid data to them which might break Filament's guarantees. I say might because Filament's guarantee might allow some wiggle room about interpreting the meaning of absent signals vs present 'x
signals.
The disjoint write analysis provides the second guarantee of the Filament type checker: that resource usage never conflicts. A naive way to enforce this is to require that writes to ports of the same instance are all disjoint. For example:
component MultTwice(@[G, G+1] left: 32, @[G, G+1] right: 32) -> (@[G+2, G+3] out: 32) {
M := new Mult; // 1-cycle latency Mult
m0 := M<G>(left, right);
m1 := M<G+1>(m0.out, m0.right);
}
We can try to generate the following disjointness conditions where L
give the liveness of a port:
L(left) /\ L(m.out) = ϕ
L(right) /\ L(m.out) = ϕ
In general, for a component with n
input ports and k
uses, we'd generate n
constraints of size k
. However, I want to demonstrate that we can get away by checking that the assignments to the abstract event variables do not conflict with each other.
@interface
PortsThe @exact
modality allows us to reason about when signals will have zero and non-zero values. For example:
@[G, G+5] + @exact[G, G+1] go: 1
This states that we can reason about the value of go
b/w [G, G+5]
and rely on the fact that it will only be non-zero during [G, G+1]
.
An @interface
definition like the following:
@interface<G, 2> go: 1
Is just sugar for:
@[G, G+2] + @exact[G, G+1]
Using this "required zero" semantics, @interface
ports allow us to enforce requirements on repeated usage: a signal may not pulse more often than allowed by its interface requirement.
When compiling a high-level Filament program, the compiler infers interface ports that correspond to each event in the component.
Well-formedness of @interface
: If a signal is used b/w [G+i, G+i+n]
, the interface signal should have a delay of at least n
.
In other words, if a design is using the signal for n
cycles, it is certainly not safe to trigger the design more than once every n
cycles.
Going back to our original example, if the design uses an input signal between [G+i, G+i+n]
, then we know that the interface signal will require a delay of at least n
.
m0 := M<G> // <- first assignment to interface
m1 := M<G+1> // <- second assignment to interface
If we can ensure that G
and G+1
are n
apart, then we know that all signals associated with them will also be disjoint.
In other words, the disjointness of @interface
ports is a sufficient constraint for disjointness of all other ports that make use of the related event.
Extend the instance creation syntax with option of defining the length of the "borrow" for the instance. This is meant to pre-commit to an interval during which the instance can be reused. I think that in general, parametric programs will need this in order for the shareability check (especially for #76)
Currently, we use the following loop to check whether a given constraint P
is valid, i.e., satisfied by all assignments to the variables:
(push)
(assert (not P))
(check-sat)
(pop)
Instead of doing that, we can attempt to generate "indicator variables" which essentially tell us which constraints can be violated.
For example, we can have a constant t0
along with the formula:
(assert (and t0 (not P)))
This formula is satisfiable if and only if both t0
and P
can be assigned to true at the same time. This means that if we allocate a different variable for each assert and the check-sat
query says SAT
, we can look at the model and look which indicator variables were set to 1
to know which constraints were violated.
Of course, in the above model, we will only get a non-deterministic subset of indicator variables set to 1
when the constraints are violated. In order to get all the violated constraints, we can ask the solver to maximize the value of (sum t0 .. tn)
which would force it to assign as many ti
to 1 as possible.
Sometimes we might have complex constraints that are not easily proved using the SMT-backed solver. There are two possible solutions to this problem:
assume
statement that defers the checking of the constraint to the monomorphization stage.For example, if we have the following component:
comp Foo[#N, #K](...) where log2(#N) = #K { ... }
Because we have
comp Foo[#N, #K](...) where log2(#N) = #K, exp(2, #K) = #N { ... }
Of course, if a component uses Foo
, it has to prove this property as well and therefore needs to assume it:
comp Bar[#P, #Q] where log2(#P) = #Q, exp(2, #Q) = #P {
f := new Foo[#P, #Q];
}
comp main() -> () {
b := new Bar[16, 4];
}
Note that this "threading through" only needs to happen up till the main
component which provides concrete values for both the parameters and can therefore prove that the constraint holds numerically.
assume
StatementsThe threading through technique can get pretty cumbersome, especially when complex constraints are implied by the type safety checks. Instead of threading through things, we can use assume
statements:
comp Foo[#N, #K](...) where log2(#N) = #K assume exp(2, #K) = #N { ... }
This basically states that if the constraints on the component are met then certain other conditions automatically hold. Other components can use Foo
without having to assume extra constraints:
comp Bar[#P, #Q] where log2(#P) = #Q {
f := new Foo[#P, #Q];
}
Of course, the assume
statement itself could be invalid and so during monomorphization, when we have concrete values for #N
and #K
, we can plug them in and check that the assume
statement held. If it didn't, we can signal a problem.
Some ideas for case studies. We need to find components that have purely static behavior:
Generate and check them
When a module does not have an II=1, we should be able to generate smaller, counter-based FSMs. This is particularly important if we're interacting with modules of high-latency since we may need to a long time and the generated pipelined FSM can get quite large.
The @stable
annotation is used by the Calyx canonicalization pass to sort assignments in dataflow order. However, because filament doesn't mark any primitive ports as @stable
, canonicalization often fails leading us to disable it. It occurs to me that we can do a conservative analysis to figure out which ports must be registered and therefore @stable
. Specifically, any port that isn't alive in the cycle [G, G+1]
must be registered while ports active during that interval might have a combinational path between their inputs.
This can be discussed in the paper as well as a utility of the type system
There are two possible definitions of initiation intervals of pipelines (apologies in advance for the unwieldy names):
If
If
Both definitions of initiation intervals (II) seem reasonable and there doesn't seem to be any a priori consensus on which one to pick. For example, in software pipelining, the initiation interval is considered to be "the number of cycles between the launch of successive loop iterations". This corresponds to our "exact retriggerability" definition. I suspect that HLS tools subscribe to this definition as well.
In hardware pipelines, it often seems to be "the number of cycles that must elapse before instruction issue to the same unit". This corresponds to our "forever retriggerability" definition.
Without much motivation, I'm going to pick two criteria for a good definition of II:
There are two places where the definition of initiation interval is used and checked:
For (1), we have something like this:
M := new Mult; m0 := M<G>(...); m1 := M<G+10>(...);
We need to ensure that m0
and m1
do not conflict. In the exact case, we must show that after G
occurs, G+10-G
is exactly a constant factor of the initiation interval of Mult
. For n
invocations, we must do
In the forever case, we must check that G+10-G
if greater than or equal to the II of the pipeline. This gives us a little more flexibility in that we can reuse an instance any time after the II of the instance. On the other hand, we might have to wait for fewer cycles in the exact case.
For (2), we need to ensure the event used to trigger an invocation, triggers in accordance to the instance's II. In the exact case, we would have to ensure that G
triggers some constant factor of the II of Mult
. In the forever case, we have to ensure that G
triggers less often than the II of Mult
. Again, forever gives us more flexibility but may require the pipeline to wait for longer.
Questions about composition are more focused on interfaces:
component Double<G>(
@interface[G, G+100] go_G: 1,
...
) -> (...) {
M := new Mult; // II = 3
m0 := M<G>(...); m1 := M<G+3>(...)
}
The interface for G
waits a 100 hundred cycles even though the pipeline takes 6 cycles. In the exact case, we'd have to have the interface line up exactly with the II of G
, along with other constraints.
The more interesting case of the brittleness of the exact definition is when we use an II where there is some
Add support for existential variables which can occur in signatures and need to used in interval checking.
Choice 1
We've been using the following syntax for whiteboard discussions. It's kinda janky:
extern component Mult<G, @exists(L)>(
@within(G, G+1) left: 32,
) -> (
@exact(G+L, G+L+1) valid: 1
);
However, it does make clear that L
is only bound within the definition of the signature. This also implicitly means that L
cannot be used in the body of the component. This might be an unintended restriction.
Choice 2
We can use an OCaml-like syntax for existential type variables:
extern component Mult<G>(
@within(G, G+1) left: 32,
) -> (
@exact(G+L, G+L+1) valid: 1
) {
exists L;
};
This has the benefit that an invocation only needs to instantiate the exact time variables that are specified in the signature. The downside is the complexity in detaching a Signature from a component definition.
TODO: Flesh out the proposal after basic interval checker is working with non-existential programs.
The interval check pass has gotten quite big and complicated because it is currently handling two different tasks:
The reason for this is because of the original authors laziness and hesitation in adding another pass that calls into an SMT solver–because (1) also requires calls into an SMT solver, it was just easier to keep everything in one place. However, this has gotten extremely unwieldy. The correct solution is separating those two things out so we can write a standard type checking and well-formedness pass followed by an interval checker.
The nice thing about this issue is that it mostly involves code reorganization.
One way to use parameter bounds for black-box external tools is to provide a bound on them such that we can guarantee that the backend tool can generate a corresponding module. This can be useful for tools like Reticle which cannot necessarily generate dot product hardware for all possible sizes of vector.
We need to ensure that output ports of a component and all bundle locations have exactly one assignment to them (are linear). The former isn't super hard but we'd probably need more machinery to encode the constraints needed to prove that assignments to all bundle locations are unique.
Filament enforces the read-validity property of the type system on intra-iteration access, that is, data within the same pipeline iteration. A simple example of this is a two stage multiplier:
component Mult<G>(
@interface[G, G+1] go: 1,
@[G, G+1] left: 32,
@[G, G+1] right: 32,
) -> (
@[G+2, G+3] out: 32
) {
// Register the inputs
LL := new Latch;
LR := new Latch;
ll := LL<G, G+1, G+2>(left);
lr := LR<G, G+1, G+2>(right);
// Perform combinational multiply and return the answer
M := new MultComb;
m0 := M<G+2>(ll.out, lr.out);
out = m0.out;
}
The multiplier registers the inputs left
and right
allowing it to accept another set of inputs on the next iteration while the multiplication is being performed. The multiplication stage accesses the data that was registered in the previous cycle. The type system ensures that when the multiplier reads the data, it is still valid.
A simple example of inter-iteration data accesses is something like a MAC unit which accumulates the products in the same register. However, Filament will not allow a user to write this program:
component Mac<G>(
@interface[G, G+1] go: 1,
@[G, G+1] left: 32,
@[G, G+1] right: 32
) -> (
@[G, G+1] out: 32
) {
Acc := new Latch;
M := new MultComb;
A := new Add;
m0 := M<G>(left, right);
a := A<G>(m0.out, acc.out);
acc := Acc<G, G+1, G+2>(a.out);
}
The problem is that Filament does not consider the input acc.out
to be live till
component Latch<G, S, L>(
@interface[G, G+1] write_en: 1,
@interface[S, L] _go_S: 1,
@[G, G+1] in: 32
) -> (
@[S, L] out: 32
) where S = G+1, L > S;
What we would like to express is that we're accessing the previous value stored in the register. One way to do this is to change the Register component to expose a prev
signal that exposes the previous value stored in the register. This is a simple change to the component:
component Latch<G, S, L>(
@interface[G, G+1] write_en: 1,
@interface[S, L] _go_S: 1,
@[G, G+1] in: 32
) -> (
@[S, L] out: 32,
@[G, G+1] prev: 32
) where S = G+1, L > S;
With this change, we can implement the Mac component as follows:
component Mac<G>(...) {
...
m0 := M<G>(left, right);
a := A<G>(m0.out, acc.prev);
acc := Acc<G, G+1, G+2>(a.out);
}
Within the Verilog implementation of the register, the prev
signal is just connected to the register's output, i.e., it only exists for modeling purposes and does not imply any additional hardware since the register already contains its "previous" value.
There opens up a couple of important questions:
prev
signal is valid? For example, at the start of the computation, the register might contain garbage and needs to be initialized to zero.Mac
unit and a new usage. Repeated invocations are going to continue accumulating values into the Mac
unit. We need some way to explicitly reset the internal state.acc.prev
signal adds a form of aliasing to the system since a register that is reused has two ways to access the same logical value. Is this going to be a problem?A different approach to this problem is having the data be "recirculated" into the pipeline, i.e., we make the driver of the pipeline responsible to providing both the current value and the previous value. We do this by exposing the value of the accumulator stored in the Mac
:
component Mac<G>(
@[G, G+1] left: 32,
@[G, G+1] right: 32,
@[G, G+1] acc: 32
) -> (
...
@[G+1, G+2] out: 32
) {
...
m0 := M<G>(left, right);
a := A<G>(m0.out, acc);
acc := Acc<G, G+1, G+2>(a.out);
out = acc.out;
}
Then, the driver of the pipeline provides the value of the accumulator by "tieing" the acc
signal to the output of the Mac
unit after the first iteration:
component Driver<G>(...) {
M := new Mac;
m0 := M<G>(l0, r0, 0);
m1 := M<G+1>(l1, r1, m0.out);
m2 := M<G+2>(l2, r2, m1.out);
...
}
This solution becomes pretty unwieldy quickly---in a
Filament programs exist at two-levels of abstractions: high and low. A high-level filament program can use abstract scheduling whereas a low-level filament program must explicitly use FSMs to trigger and manage resources. Both levels of abstractions are checked for the Filament type safety property.
A high-level filament program defines all the events uses using generics syntax and only uses those events to perform scheduling:
component Mult<G>(@[G, G+1] left: 32, @[G, G+2] right: 32) -> (@[G+2, G+3] out: 32) {
A := new Add;
a0 = A<G>(left, right);
out = a0.out
}
A high-level filament program is not allowed to use @interface
signals,
fsm
construct, or low-level invoke
statements inside it.
High-level filament programs have the following type checking rules:
A low-level filament program does not use high-level invokes to perform scheduling. Instead, it explicitly
instantiates fsm
to coordinate the use of resources and signals.
The specific requirements of a low-level filament program is:
@interface
signal for each defined event in the componentinvoke
syntax is allowed.The type checking rules are:
Given these requirements, we can guarantee that:
Compiling a high-level filament program to low-level one involves:
@interface
signalsThe compilation process must ensure that the generated code follows the requirements laid out for the low-level filament programs.
It would be useful to specify default values of event variables. For example, we often want to use a register for exactly one cycle. But we have to specify:
r := new Register[32]<G, G+1, G+2>(...)
Instead, it would be nice to just say:
r := new Register[32]<G>(...)
We can add some syntax for default bindings for event variables:
component Register[WIDTH]<G, ?S = G+1, ?L = G+2>(...)
The ?S
syntax tells the compiler to bind S
to G+1
if no binding is provided for the variable. ?X
variables must appear at the end of the event variable list
In src::backend::compile.rs, it's calling write_extern
function. However, fb15c5e7 commit of calyx removed it, and I we have to use write_externs
function instead.
I would recommend,
ir::Printer::write_externs(
(&path, prims.iter().map(|(_, v)| v)),
&mut out,
)
Here is an ever-growing collection of designs to implement in Filament as well as interesting links worth reading.
When working with big pipelines, it is common to have a bunch of code that looks like this:
// stage 1
add := new Add<G>(left, right);
gt := new Gt<G>(add.out, 10);
...
// stage 2
add := new Add<G+1>(left, right);
gt := new Gt<G+1>(add.out, 10);
...
It might be useful to have an operator that allows us to define the start time of the current pipeline stage:
with S1 = G {
add := new Add<S1>(left, right);
gt := new Gt<S1>(add.out, 10);
...
}
with S2 = G+1 {
add := new Add<S2>(left, right);
gt := new Gt<S2>(add.out, 10);
...
}
Of course, under the hood, all this is doing is defining equality between the events S1
and G
but from the perspective of the user, it is a nice way to visually see which portions of the code are associated with a particular stage.
Parameteric components might benefit from being able to specify that their IO interface depends on the parameters. For example if we want to build a shift register where each one of its intermediate states are visible:
comp Shift[N]<G: 1>(@[G, G+1] in: 32) -> (out[N]: for #i. @[G+i, G+i+1])
Here, the port out
is a bundle (similar to an array) but can provide values at different time steps. The nice thing is that we don't need any extra machinery to check such IO bundles because they are already supported by the compiler inside the component body.
I think the types of bundles are also safe to occur in the IO binding position because they are dependent pairs and not forall
quantified.
If we have the following program:
A := new Add[32];
for #i in 0..#W {
a := A<G+#i>(10, 20); // Uses the adder #W times
}
We need some way to check that the various shared uses of a
are safe. The naive encoding is of course just unrolling the loop and adding constraints between each use. However, we can use the fact that the shared uses come from a structured loop to generate more efficient constraints. Namely:
#i
>= delay of A. This is sufficient because each use of A
is guaranteed to be #i
cycles after the previous one.max(#i) + delay of A - min(#i)
<= delay. This is again sufficient because all uses of A
are going to be contained within the range of #i
.The bigger challenge is what do when we have uses of A
instead multiple contexts? I think the above can serve as a guideline for generating efficient constraints instead of the normal thing of O(n^2) constraints for each reuse.
Some components we should implement:
Here is a component that takes three inputs and adds them all up:
import "primitives/core.fil";
component Add3<G>(
@interface[G, G+1] go_G: 1,
@[G, G+1] in0: 32,
@[G+1, G+2] in1: 32,
@[G+2, G+3] in2: 32
) -> (
@[G+2, G+3] out: 32
) {
// Register first input
L0 := new Latch;
l0 := L0<G, G+1, G+2>(in0);
// Add the second input and register it
A0 := new Add;
a0 := A0<G+1>(l0.out, in1);
L1 := new Latch;
l1 := L1<G+1, G+2, G+3>(a0.out);
// Add the final input
A1 := new Add;
a1 := A1<G+2>(l1.out, in2);
// Produce the output
out = a1.out;
}
Note: This can be turned into a finite impulse response (FIR) filter by providing a delay
parameter for each input.
Looking at the signature, we can see that the inputs all arrive at different times:
component Add3<G>(
@[G, G+1] in0: 32,
@[G+1, G+2] in1: 32,
@[G+2, G+3] in2: 32
Because of this, we should be able to map the logical ports to one physical wire since we know that the inputs arrive at disjoint time intervals.
This is not true if
Add3
is pipelined; in that case,in0
andin1
may arrive at the same time and cannot be multiplexed onto the same input wire.
In addition to making the interface more concise, this also allows to express the true intention behind this component: its interface is a stream of inputs that we'd like to process using the pipeline.
Here is some made up syntax:
component Add3<G>(
@stream([G, G+1], delay=1, num=3) in: 32
The syntax tells us that the first input arrives at @[G, G+1]
, the next input arrives after a delay of 1
, i.e. at @[G+1, G+2]
, and there will be three such inputs. Therefore, normal ports can be written down as:
component Add3<G>(
@stream([G, G+1], delay=1 num=1) in0: 32
Inside the body of the component, we use the syntax port[idx]
to access the idx
th port of a stream:
component Add3<G>(
@stream([G, G+1], delay=1, num=3) in: 32
) -> (...) {
// Register first input
L0 := new Latch;
l0 := L0<G, G+1, G+2>(in[0]);
// Add the second input and register it
A0 := new Add;
a0 := A0<G+1>(l0.out, in[1]);
The semantics of stream ports are straightforward, the i
th port of a stream port with: @stream([G, G+n], delay=d, num=i)
is the i
th port of the stream that arrives at @[G+i*d, G+n+i*d]
.
Now we have component that takes a stream of inputs, we need a way to construct a stream of inputs. This can be done using the Stream
construct:
component Add3<G>(
@stream([G, G+1], delay=1, num=3) in: 32, ...) {
...
}
component Main<T>(
@[T, T+1] in0: 32,
@[T+1, T+2] in1: 32,
@[T+2, T+3] in2: 32
) -> (...) {
A := new Add3;
Stream inputs([T, T+1], delay=1) = [in0, in1, in2];
a := A<T>(inputs);
The syntax is a bit verbose but for now that okay
Type checking the Stream
construct is straightforward:
in0
in this case.delay
parameter for the streamThere are two parts to this, the @stream
ports and construction of stream ports.
Stream port definitions need to be compiled into a set of normal ports. For example, the Add3
component above would be compiled into:
component Add3<G>(
@[G, G+1] in0: 32,
@[G+1, G+2] in1: 32,
@[G+2, G+3] in2: 32
) -> (...) {
...
}
Each use of a stream port is replaced with the corresponding normal port.
Constructing stream ports is straightforward, we simply forward each signal to the corresponding normal port in the component.
Currently, the bundle syntax looks something like this:
bundle f[#N]: for<#i> @[G+#i, G+#i+1] #W
It is (hopefully) obvious that each wire in this bundle holds onto values for 1 cycle. However, for a bundle like this:
bundle f[#N]: for<#i> @[G, G+#i+1] #W
Each index holds onto a value for varying number of cycles. This means that this bundle cannot be reused until the index that holds the value for the longest amount of time releases it. This is extremely similar to how the input-output ports of a module affect its delay. It implies that a bundle must have a delay:
bundle f<G: #N>: for<#i> @[G, G+#i+1] #W
This tells us that the bundle may hold onto values for long amounts of time but it is bounded by #N
. A bundle well-formedness pass must check this property.
A bundle used inside a module affects its delay in the same way that an invocation: for a module with delay d
, for each bundle, we must ensure that the delay of bundle b
is less than or equal to d
.
We should check that each instantiation site only uses non-negative parameter expressions. This will allow us to come up with a well-founded induction scheme over the parameter and enable #85 in the future
Rachit and I talked about this briefly - I thought it would be convenient if we could represent inputs in binary/hex when testing components. I don't think JSON files will accept things that aren't in decimal, though.
This came to mind when I was trying to test a decoder for RISC-V instructions. To test it, I was writing out the instructions in binary, converting it to decimal, and putting that value in the JSON file. This was a bit unwieldy and also made it less clear what each test was targeting.
Compilation of high-level Filament programs need to turn all invocation into guarded assignments that forward the source values into the instance's ports as specified by the schedule.
In short, it must turn this high-level program:
M := new Mult; m0 := M<G>(a.out);
into:
M := new Mult; m0 := invoke M<G>; m0.left = guard ? a.out;
The core problem during lowering is synthesizing guard
expressions for each invocation. Here, we'll show that for a well-typed high-level Filament program, we can always generate correct guards.
Restrictions: We only address compilation of programs without max
expressions and adhering to restrictions from #27 which disallows relating events using ordering constraints.
Generating FSMs: We assume that for each event G
for the current component, there is an FSM called Gf
that is triggered by the interface port for G
:
component main<G>(@interface[G, G+d] go_G: 1) -> (...) {
fsm Gf[n](go_G);
}
The FSMs for each event can be thought of as a series of shift registers of length n
. The port Gf._i
is active when the
Gf._i : @[G+i-d, G+i+d] + @exact[G+i, G+i+1]
where d
is the delay for the port that triggers the FSM.
First, we look at the active part of the specification: @exact[G+i, G+i+1]
. This means that the guard has a non-zero value
The availability of the guard is more complex: [G+i-d, G+i+d]
. It states that the guard is guaranteed to have a known value in the given duration. Next, it says that the guard does not observe a new token for d
cycles after it gets a token or d
cycles before it observes a new token.
The intuition for this is that if an FSM is triggered at most once every d
cycles, each token is separated by d
cycles in both the past and the future. The first token is a special case but we assume that resetting the circuit instantiates everything to
In the high-level program
m0 := M<G>(a.out);
We have the restriction that a.out
is available for at least as long as m0.left
requires. Thus, if we have the availabilities a.out
and m0.left
respectively, we have
When generating the assignment:
m0.left = guard ? a.out;
We need to synthesize a guard that is available for
Lemma: The delay for the event
Proof. Since a.out
is an output from some instance related to invocation a
, we know that G
must obey the delay constraints (as specified in #20). Specifically, the delay must be longer than the length of the interval of any input or output port. Similar reasoning applies to any input port of the component that can be used in this position.
We also know that
Here is the rule we use to generate guards: If the requirement of a destination port is
Gf._md | Gf._m{d+1} | ... | Gf._n{d-1}
So, if we have the interval
Gf._3 | Gf._4
To show that this is correct, we need to show that:
We have (1) by construction since we use the destination to construct the guard expression.
For (2), we have
This shows that the guard is active at least until the end of the interval for the source port.
Next, we can use similar reasoning to show that the guard is active at least before the start of the source port:
Thus, we have that the guard is available for at least as long as the source port and active as long as the destination requires.
The implementation in the compiler currently does not have a way to talk about terms like G+i-d
since negation of events is not supported. If we want the lowered programs to type check, we need some way of doing this. If not, we can skip implementing this and rely on the pen-and-paper proof.
The following program type checks:
import "primitives/core.fil";
component main<G, H, L>(
@interface[G, G+3] go_G: 1,
@interface[H, L] go_H: 1,
@interface[L, L+1] go_L: 1,
@[H, L] in: 32
) -> () where G >= H, L >= G+3 {
M := new Mult;
m0 := M<G>(in, in);
}
The constraints on H
and L
ensure that the requirements of m0
are met. However, these constraints are not enough to make the program pipeline safe.
This is because G
may re-trigger after 3
cycles but L
from the first iteration may still be active. This means that the second iteration of this pipeline would not be able to satisfy the requirement H <= G
.
I think we need to say something about the constraints on the constraints themselves. Specifically, given a bunch of constraints, we need to make sure that they are correct for this execution as well as the pipelined execution. In this case, I think the thing to enforce is that G + dG >= H + dH
and L + dL >= G + 3 + dG
, where dX
is the delay of the interface X
. These constraints ensure that the requirements implied by the first set of constraints are satisfiable during pipelined execution.
Oftentimes, components can have different sets of behavior depending on the subset of the signals being assigned to. For example, a memory will often have two different interfaces: one for reads and one for writes
comp Mem(addr: 8, in: 32, read_en: 1, write_en: 1) -> (out: 32, write_done: 1);
This component has two interfaces:
write_en
tells the component that the data on the input needs to be written to the address.read_en
tells the component that the user is trying to read data from an address.In Filament land, we can write the following signature:
comp Mem<G: 1>(
@interface[G] read_en: 1,
@interface[G] write_en: 1,
@[G, G+1] addr: 8,
@[G, G+1] in: 32
) -> (
@[G+1, G+2] out: 32,
@[G+2, G+3] write_done: 1
);
Of course, this gives you the usual goodness of knowing that writes take two cycles while reads take one cycle. However, the weird thing is that it has two interface ports: both read_en
and write_en
represent the event G
. Additionally, only a subset of ports are meaningful. For example, when the memory is being used to perform a read, the out
port will have a meaningful output while performing a write, it won't. Similarly, the in
port doesn't require an input if we're performing a write.
The solution proposed here is using enumerations (similar to Rust's enum) as a way to represent method calls. For example:
enum MemInterface {
Read(@interface[G] read_en: 1) -> (@[G+1, G+2] out: 32),
Write(@interface[G] write_en: 1, @[G, G+1] in: 32) -> (@[G+2, G+3] write_done: 1)
}
(We use the function-like port signature to represent the inputs and outputs related to the call)
Next, we can change the definition of the component to use this enum:
comp Mem<G: 1>(addr: 8, interface: MemInterface) -> (...interface.outs) {...}
(TK: Not sure what the best way to represent the output associated with the enum is here. Just using something dumb for now)
When using this component, you must explicitly construct the enumeration associated with the "method" you're trying to use: either a read or a write
comp main<G: 1>)(...) {
M := new Mem;
mw := M<G>(1 /*addr*/, MemInterface::Write { in: 15 }) // again, not the best syntax here
out = mw.out; // error because we called the Write method
}
This all gives us the usual goodness of Filament of checking that signals are available at the right times etc. However, this leaves out a big question.
Delays are the heart of Filament's reasoning about pipelining. The challenge with enumerations is that different methods might have different affects on pipelining: do reads and writes use the same underlying circuits and therefore might affect each other's delay. The simple solution is just making the whole enum
have the same delay which might be a deal breaker but perhaps something to start with.
The other question is how to represent methods at the source level? This just talks about external primitives but overall, we want people to implement their own methods. I think the solution to this second question will probably elucidate the answer for the first one.
@phantom
ports are currently used to model the initiation intervals of modules that don't have an explicit go
signal. For example, a combinational component has a @phantom @interface[G, G+1] go: 1
port to model the fact that they cannot be used multiple times in the same cycle.
The idea here is to allow user-level components to specify @phantom
ports iff the lowered component will not instantiate a component. This is possible only when:
@phantom
go ports, andIn these cases, we do not need to instantiate an FSM and can additionally make the go
port corresponding to the event a @phantom
.
With #117, we have support for assume
statements that add assumptions in the interval checking context. The expectation is that users only use these for sound assumptions with respect to the current set of constraints. However, it is going to be very easy to run into trouble with them. For example, the following is completely legal code currently:
comp Foo[#W]() -> () where #W > 10 {}
comp Bar[#N]() -> () {
assume 1 = 0;
F := new Foo[#W]; // type checks because the assumption makes everything unsat
}
It is easy to do some simple consistency checks for assume
statements: if, for the current path condition (set of assumptions), this assume
statement makes the path condition unsat
, then we know that the assume
statement is unsound.
When using a bundle in the ports of a component, we need to ensure that the availability of each signal is less than the delay if the module.
We should establish that for each bundle, the interval is well-formed. For example:
bundle f[#N]: for<#i> @[G, G+#i] #W
Ensure that:
We've played around with encoding the semantics of log
and pow
using forall
statements but that didn't work since the solver will often timeout or respond with unknown
in presence of quantifier:
; definition of log
(assert (= (pow2 0) 1))
(assert (forall ((n Int)) (= (pow2 (+ n 1)) (* (pow2 n) 2))))
However, we can instantiate these assumptions for specific variables. For example, if someone writes:
comp Foo[#N, #K]() -> () where #N = pow2(#K) {}
It is sound to add the following assume statement to the body and the signature:
assume #N/2 = pow2(#K);
This can possibly be enough to prove recursive component generation correct (for example, a reduction tree) and would avoid problems like #119. In general, there is no guarantee this will be enough but at least we have a centralized source of soundness that avoids making the user litter code with assume
.
Note: This is specialized version of the general concept that we don't want the forall
to be instantiated (Z3 terminology) too many times and want some concept of a fuel. See F*'s concept of fuel.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.