Comments (23)
We could also use
once_cell
to only generate the router on the first request, and use it for subsequent requests.
This things don't really work in lunatic. Each request gets a completely separate memory, like a new operating system process. After the request, the whole memory will be gone. Tools like once_cell
are intended for global Rust stuff (threads sharing data), but lunatic doesn't have any global shared memory space. Only if you dedicated one process to hold global state and other querying it, but in this case you still need to send data between processes.
from submillisecond.
To summarise our discussion on discord and just bring everyone on the same page.
Probably the best way forward would be to use a macro to define a router (router!
) that would expend to a non-capturing closure (|req: Request| -> Response { ...}
). The closure would have the same signature as a handler function so that multiple routers could be composed together. One big benefit of having a macro that expends to a function in lunatic is that this router is generated during compile time and all child processes share it and you don't need to send a huge router struct to every child handling a new incoming request.
One main router would then be passed to the Application
struct as the entry point:
Application::from(router).listen(3000);
The syntax of the macro is not final yet, but it would look something like:
const ROUTER = router! {
use [logger]
GET "/users/:id" => user_profile if is_user && has_rights use [metrics, mid3],
POST "/users" => create_user use [metrics, mid3],
"/users/:id/todos" => {
GET "/" => get_user_todos,
GET "/:todo_id" => get_user_todo_by_id,
POST "/" => create_user_todo
}
}
The use [logger]
and use [metrics, mid3]
segments define middleware. if is_user && has_rights
is a guard.
To what exactly this would expend is not completely clear at the moment, but we have a few ideas that we would like to test and see if there are issues with them? Ari suggested something like:
fn router(req: Request) -> impl IntoResponse {
let maching_routes: [HandlerFn] = match_routes(req.path());
for handler in matchin_routes {
let resp = handler.run(req);
if !resp.forward() {
return resp;
}
}
}
trait RunHandler {
type Response: IntoResponse;
fn run(&self) -> Self::Response;
}
impl<F, R> RunHandler for F
where
F: Fn() -> R,
R: IntoResponse
{
type Response = R;
fn run(&self) -> R { self() }
}
impl<F, E1, R> RunHandler for F
where
F: Fn(E1) -> R,
R: IntoResponse
{
type Response = ExtractorResponse<R>;
fn run(&self, req: Request) -> Self::Response {
let e1 = <E1 as FromRequest>::from_request(&req).map_err(|err| {
if err.forward() {
ExtractorResponse::Forward(err.into_response())
} else {
ExtractorResponse::Failed(err.into_response())
}
})?;
self(e1)
}
}
How exactly the matching is done and how the middleware and guards are composed together with the handler is not completely clear to me at the moment.
@tqwewe You have already written a lot of the matching code and handler function stuff. Do you want to take over this issue and see if you could come up with nicely generated code from this macro structure?
from submillisecond.
@SquattingSocrates Sorry for the confusion. I spent some time yesterday trying to make it work the way you suggested, but just couldn't figure out how. If you can make it work like this it would be awesome!
The main issue for me was that Rust's match statements are not fall-through. And we kinda need this because part of the matching for us lives in the handler function (as part of extractors) or sub-routers.
Just to illustrate the problem:
fn handler_a(Path(dynamic): Path<i32>) {
// Handle request
}
const ROUTER_B = router! {
GET "/:dynamic" => handler_a,
GET "/test" => handler_b,
}
In this case if a request came in with the path /test
, I would expect the first match to fail because :dynamic
can't be casted to i32
. Then the matcher should fall-through and test the next one, in this case handler_b
would succeed. This if how most rust web frameworks work.
If we change the signature of handler_a
to fn handler_a(Path(dynamic): Path<String>)
, then it would become the match for /test
. But, because the router!
macro can't observe the signature (only handler name), we can't really generate a parser at compile time that differentiate this two cases and matches the right one directly.
This could be solved by adding a type hint to the macro:
const ROUTER_B = router! {
GET "/:dynamic:i32" => handler_a,
GET "/test" => handler_b,
}
But then we are repeating the type in two places and need to keep them in sync. It could also lead to errors that are hard to detect, e.g. developers changing the value from i32
to i64
in one place and the handler works just sometimes.
Another problem comes from sub-routers:
const ROUTER_A = router! {
GET "/hello" => handler_a,
}
const ROUTER_B = router! {
GET "/test" => ROUTER_A,
GET "/:dynamic/:dynamic" => handler_b,
}
In this case, if a request of /test/hello2
came in, the handler_b
should be called. But this is impossible to do with a match
statement:
const ROUTER_A = router! {
// .. parse request
let matching_handler = match (req.method(), req.path()) {
(GET, "/hello") => handler_a(),
_ => None
};
}
const ROUTER_B = |stream| {
// .. parse request
let matching_handler = match (req.method(), req.path()) {
(GET, "/test") => ROUTER_A(req),
(GET, "/:dynamic:string/:dynamic:sting") => handler_b(req)
_ => None
};
// handle response of failure to match
}
For the request /test/nothello
we can't really provide a req.path()
that will match /test
, then jump into the function ROUTER_A
, then realise no matches actually exist and fall-through and try to match another one. In Rust the first match that succeeds is taken.
I'm in general not sure how to do this partial matching and forwarding the request to sub routers, not only with this approach.
There are probably other issues, like how do you turn a /hello/test
from a request into matches (/hello
, /:dynamic/:dynamic
) just based on partial routing information that you can observe in the macro? I'm also not aware of any routing library that works that way, so it's a bit hard to wrap my head around it.
Also, to get from this:
const ROUTER_B = router! {
GET "/test" => handler_a,
GET "/:dynamic:string" => handler_b,
}
to this:
const ROUTER_B = |stream| {
// DO THE WORK!
let matching_handler = match (req.method(), req.path()) {
(GET, "/test") => handler_a(req),
(GET, "/:dynamic:string") => handler_b(req)
_ => None
};
// handle response of failure to match
}
You need to generate the DO THE WORK
part and there you kinda need to already use a (prefix tree based?) matching to parse the request and turn it into either /test
or /:dynamic
, but you also need to extract all the dynamic parts and attach them to the request. So you did all the parsing and determined which handler with what arguments you need to call, but instead of calling it directly you generate a "string pointer" to it and use a match
to dispatch it.
One obvious benefit is that when expending the macro you would get a nice match
statement and it may be easier to debug. However, most of the work of doing the actual parsing would bi hidden away in the DO THE WORK
part, so I'm not sure how helpful it would be in the end. The final result would be a bit of a mirror of the macro.
from submillisecond.
So in that case it would be better to match only /hello, not /hello/world I think
from submillisecond.
Should we use an existing crate for route matching? I found https://docs.rs/matchit/latest/matchit/
Or build or own solution?
from submillisecond.
Should we use an existing crate for route matching? I found https://docs.rs/matchit/latest/matchit/
Or build or own solution?
I think we need to build our own.
One characteristic of lunatic is that each process (request) gets their own heap memory. This means that we don't share any memory between them. If we create a router struct in the initial process, we will need to serialise and copy it to all request handler processes. This could become a bottleneck in apps that have hundreds of routes.
This basically forces us to generate the router during compile time, meaning all processes will get a static copy of it. Because of this we can't really use any existing router library.
from submillisecond.
But if we have a function router we could initialise the router in that function and would not need to (de)serialise it. Later we could improve this, but to get a MVP ready we could use this ready router, since it has all the features we want for now.
from submillisecond.
Yes, for the first iteration this should be fine and I wouldn't worry too much about performance.
Just notice that in this case you are still building a router (function based one), but you are using a library to simplify the matching.
would not need to (de)serialise it
But you would still need to generate the router on every request, right? You are just wrapping the generation into the handler function. This could be even more costly than de/serialising it.
The good part is that with a function based one we can always replace how it works in the future and how much of the code is generated during compile time. In this case, initialising matchit
on each request if fine for the first iteration.
from submillisecond.
Yeah, we'd need to build the router every time, but eventually we could inline generating the matching of the url which would make it pretty efficient. So we would just spit out a function that takes in the url, matches it and returns the matching handler Uber-function
from submillisecond.
We could also use once_cell
to only generate the router on the first request, and use it for subsequent requests.
from submillisecond.
Sure that sounds great.
Regarding the syntax, do we want the if ... use ...
to be placed after the handler, or after the "/"
path definition?
I think it makes more sense to have it right after the "/"
path to more closely match a Rust match statement. Eg:
router! {
"/user" if is_user use user_middleware => { ... }
}
Otherwise I'm not quite sure how to write the above example if it's after the handler.
Also, perhaps the use [...]
at the top of the macro could instead be the same syntax as Rust's use
?
router! {
use middleware::{logger, diagnostics};
...
}
from submillisecond.
Didn't we also want to try generating a single match statement from the macro?
I think we could actually try both approaches in parallel, we just need the basic parsing capabilities of the match syntax and I could then try and generate a match statement while @tqwewe is trying the function + trait based approach.
Or is this off the table? Because I'm confused honestly 😕
const ROUTER = |stream| {
// .. parse request
let matching_handler = match (req.method(), req.path()) {
(GET, "/users/:id") if guard1(req) => //...
(POST, "/users") => //...
(GET, "/users/:id/todos") => //...
(GET, "/users/:id/todos/:todo_id") => //...
(POST, "/user/:id/todos") => //...
_ => None
};
// handle response of failure to match
}
from submillisecond.
@bkolobara I think the question of matching similar routes needs to be reframed a little, because when you write a regular match
you wouldn't write it like this:
match security_level {
num => any_level,
6 => MI_6
}
because if there's a specific
match that you're looking for that needs to come first, so that logic shouldn't change in our augmented match statement. It's the same in other languages where you sometimes need to match a specific hardcoded value of an ID and sometimes you can actually do this, but the specific ID route definition needs to come before the generic id matching. So in other languages it would look like this in exactly this order:
@Get("/users/ADMIN_USER")
public getAdminUser() { ... }
@Get("/users/:user_id")
public getUser() { ... }
Another thing is that a pattern like this should probably not be implemented anyway because it doesn't make sense to anyone irl and therefore I'd suggest not to support it at all:
const ROUTER_A = router! {
GET "/hello" => handler_a,
}
const ROUTER_B = router! {
GET "/test" => ROUTER_A,
GET "/:dynamic/:dynamic" => handler_b,
}
If someone truly needs some "dynamic" routing like that they're probably doing something wrong. I also don't see how this would work in other web frameworks in other languages because it's just too ambiguous. Whether it's strict REST or rest-like where you'd maybe do something like /tweetsForUser and use a POST request because you need to send a large filter body, using the same route for completely different purposes doesn't make sense for consumers of API. Imagine reading documentation of an API and you see that you can call
/users/:user_id/posts
/users/:user_id/subscriptions
and
/users/:user_id/:anything/:anything_else
? What would you make of this documentation? I mean sure it's cool for experimenting with something but then the user/dev should probably just make a /users/:user_id/*
route and try to parse what the user is sending them. Since we don't mind being somewhat opinionated I think it's fine to not encourage a pattern like this.
Regarding sub-routers: one idea was to "flatten" the paths completely so that the generated code doesn't even have any sub-routers, but since we want to have sub-routers, it could be hard if a router can be assigned to a variable.
const ROUTER_A = router! {
GET "/hello" => handler_a,
}
const ROUTER_B = router! {
GET "/users/:user_id" => ROUTER_A, // this is hard to flatten, but also hard to further match because of the GET
GET "/products" => handler_b,
}
However we could implement a sub_router!
macro that would indicate you can only have them inside the other macro or simply use a { }
block for sub-routing and that way we always get a flat match statement that can either match or not. No match means 404, here's how that would look like:
const ROUTER = router! {
"/users" => {
GET "/" => find_users,
GET "/:user_id" => get_user_by_id
}
}
// this is what it would expand into:
const ROUTER = |stream| {
// do work here
match (req.method(), req.path()) {
(GET, "/users") => find_users,
(GET, "/users/:user_id") => get_user_by_id
_ => fn_404
}
// finish requests
}
You're right that we need to do a prefix-tree matching of the uri first, but parsing the request in the DO THE WORK
part is just calling a function like let req = http::parse(stream);
and then call the static matching tree to get the template string like /users/:user_id
from /users/123
. But how much overhead is it to match a 50 byte string to a prefix tree?
I don't mind using the prefix routing tree directly to return a reference to the handler and skip the match statement entirely in order to gain performance or simplicity or style. Of course for debugging purposes a match statement would be easier to comprehend because you could just log out the parsed template string and method and then expand the router macro to see all the available branches of the match.
I realise that iterating is a straight-forward solution and many if not most frameworks do it like that. But since we're already tapping into macros we could go one step further and generate code that is efficient, easy to reason about and debug and strict in it's logic?
from submillisecond.
So what I'm thinking the router macro would expand to is something like this (as a start):
router! {
GET "/foo" => foo_handler,
}
// becomes:
fn router(mut req: Request) -> Result<Response, RouteError> {
// TODO: generate the router with the macro and inline it as const
let mut router = matchit::Router::new();
router.insert("/foo", "/foo").unwrap();
let route = req.extensions().get::<Route>().unwrap();
let route_match = router.at(route.path());
match route_match {
Ok(route) => {
// TODO: insert params into request extensions
// Call handler
match *route.value {
"/foo" if req.method() == Method::GET => Ok(Handler::handle(
foo_handler as FnPtr<_, _, { arity(&hello) }>,
req,
)
.into_response()),
_ => Err(RouteError::RouteNotMatch(req)),
}
}
Err(_) => Err(RouteError::RouteNotMatch(req)),
}
}
This works, though doesn't support fall-through.
I think to support this, we can modify matchit to return a list of items which possibly match the route... and execute them one by one until one indicates it handled the request and provides a response.
Eg:
fn router(mut req: Request) -> Result<Response, RouteError> {
let router = ...; // const given at compile time from macro.
let handlers_iter = router.at(req.path()); // handlers_iter would be an iterator providing the handlers which match
for handler in handlers_iter {
// Try, and if it handles, return the response
}
}
from submillisecond.
It's worth noting a good example of fall-through that I've used in the past in production is netlify's redirect system.
https://docs.netlify.com/routing/redirects/#rule-processing-order
It states that it attempts each rule from top to bottom until one matches.
from submillisecond.
But isn't this the same as matching /blog/:post_id
?
from submillisecond.
Ah, it's configured as redirect.
It would then just be
GET "/blog/old-post" => redirect("/blog/new-post"),
GET "/blog/new-post" => serve_new_post
or something, we have no handling of redirects yet.
from submillisecond.
It's the same in other languages where you sometimes need to match a specific hardcoded value of an ID and sometimes you can actually do this, but the specific ID route definition needs to come before the generic id matching.
That's a good point.
Another thing is that a pattern like this should probably not be implemented anyway because it doesn't make sense to anyone irl and therefore I'd suggest not to support it at all
In the beginning we needed to support this because we had a path on top of the handler function, so the handler was part-router. Now that we got rid of it, I think it's ok if we don't allow composing routers for now.
from submillisecond.
On a different note, is the character /
of any significance to us? E.g. Would this:
router! {
GET "/:path" => dynamic_routing,
}
match /hello/world
?
from submillisecond.
Generally path parameters should not have slashes in them because it makes path parsing ambiguous. It's kinda like a (/a+)+
regex scenario which is not really great to match without crashing your computer. Usually if you don't care about slashes you should just use a /some_prefix/*
, which is useful if you only need to redirect requests to another service and don't want to bother with typing out request and response
from submillisecond.
On a different note, is the character / of any significance to us?
If you wanted to match like that, you do GET "/*path"
, otherwise it will only match the first segment if you do :path
.
from submillisecond.
If you wanted to match like that, you do
GET "/*path"
, otherwise it will only match the first segment if you do:path
.
So you will be able to do both, GET "/*path"
and GET "/:path"
? And the extractor will be able to extract it either?
from submillisecond.
If you had both /:path
and /*path
, then I believe /foo
would be available in the extractor as foo
, and /foo/bar
would be foo/bar
.
from submillisecond.
Related Issues (20)
- RUSTSEC-2020-0071: Potential segfault in the time crate HOT 1
- RUSTSEC-2021-0139: ansi_term is Unmaintained HOT 2
- RUSTSEC-2021-0145: Potential unaligned read
- middleware compatibility HOT 1
- Examples don't work HOT 2
- RUSTSEC-2023-0065: Tungstenite allows remote attackers to cause a denial of service
- Session data
- Add a cookie extractor
- Where to put Guards? HOT 6
- Rename `static_dir!` macro to `static_router!`
- Router params don't work in params example HOT 7
- Support `[...]` syntax for multiple middleware in macro
- `params router` benchmark fails, route not matched
- Add support for wildcard routing `/*` HOT 2
- Add WebSocket support
- Create lunatic cli HOT 2
- Allow State passed into handler HOT 1
- Static router doesn't work with `"/"` path
- Support for dependency injection? HOT 2
- Server-Sent Events HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from submillisecond.