The router is usually one of the core parts of the web framework. Developers are famil

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Should we use an existing crate for route matching? I found <a href="https://docs.rs/m

Add Router,about lunatic-solutions/submillisecond

Comments (23)

bkolobara commented on June 14, 2024 1

We could also use once_cell to only generate the router on the first request, and use it for subsequent requests.

This things don't really work in lunatic. Each request gets a completely separate memory, like a new operating system process. After the request, the whole memory will be gone. Tools like once_cell are intended for global Rust stuff (threads sharing data), but lunatic doesn't have any global shared memory space. Only if you dedicated one process to hold global state and other querying it, but in this case you still need to send data between processes.

from submillisecond.

bkolobara commented on June 14, 2024 1

To summarise our discussion on discord and just bring everyone on the same page.

Probably the best way forward would be to use a macro to define a router (router!) that would expend to a non-capturing closure (|req: Request| -> Response { ...}). The closure would have the same signature as a handler function so that multiple routers could be composed together. One big benefit of having a macro that expends to a function in lunatic is that this router is generated during compile time and all child processes share it and you don't need to send a huge router struct to every child handling a new incoming request.

One main router would then be passed to the Application struct as the entry point:

Application::from(router).listen(3000);

The syntax of the macro is not final yet, but it would look something like:

const ROUTER = router! {
    use [logger]

    GET "/users/:id" => user_profile if is_user && has_rights use [metrics, mid3],
    POST "/users" => create_user use [metrics, mid3],
    
    "/users/:id/todos" => {
        GET "/" => get_user_todos,
        GET "/:todo_id" => get_user_todo_by_id,
        POST "/" => create_user_todo
    }
}

The use [logger] and use [metrics, mid3] segments define middleware. if is_user && has_rights is a guard.

To what exactly this would expend is not completely clear at the moment, but we have a few ideas that we would like to test and see if there are issues with them? Ari suggested something like:

fn router(req: Request) -> impl IntoResponse {
    let maching_routes: [HandlerFn] = match_routes(req.path());
    for handler in matchin_routes {
        let resp = handler.run(req);
        if !resp.forward() {
            return resp;
        }
    }
}

trait RunHandler {
    type Response: IntoResponse;
    fn run(&self) -> Self::Response;
}

impl<F, R> RunHandler for F
where
    F: Fn() -> R,
    R: IntoResponse
{
    type Response = R;
    fn run(&self) -> R { self() }
}

impl<F, E1, R> RunHandler for F
where
    F: Fn(E1) -> R,
    R: IntoResponse
{
    type Response = ExtractorResponse<R>;
    fn run(&self, req: Request) -> Self::Response {
        let e1 = <E1 as FromRequest>::from_request(&req).map_err(|err| {
            if err.forward() {
                ExtractorResponse::Forward(err.into_response())
            } else {
                ExtractorResponse::Failed(err.into_response())
            }
        })?;
        self(e1)
    }
}

How exactly the matching is done and how the middleware and guards are composed together with the handler is not completely clear to me at the moment.

@tqwewe You have already written a lot of the matching code and handler function stuff. Do you want to take over this issue and see if you could come up with nicely generated code from this macro structure?

from submillisecond.

bkolobara commented on June 14, 2024 1

@SquattingSocrates Sorry for the confusion. I spent some time yesterday trying to make it work the way you suggested, but just couldn't figure out how. If you can make it work like this it would be awesome!

The main issue for me was that Rust's match statements are not fall-through. And we kinda need this because part of the matching for us lives in the handler function (as part of extractors) or sub-routers.

Just to illustrate the problem:

fn handler_a(Path(dynamic): Path<i32>) {
    // Handle request
}

const ROUTER_B = router! {
    GET "/:dynamic" => handler_a,
    GET "/test" => handler_b,
}

In this case if a request came in with the path /test, I would expect the first match to fail because :dynamic can't be casted to i32. Then the matcher should fall-through and test the next one, in this case handler_b would succeed. This if how most rust web frameworks work.

If we change the signature of handler_a to fn handler_a(Path(dynamic): Path<String>), then it would become the match for /test. But, because the router! macro can't observe the signature (only handler name), we can't really generate a parser at compile time that differentiate this two cases and matches the right one directly.

This could be solved by adding a type hint to the macro:

const ROUTER_B = router! {
    GET "/:dynamic:i32" => handler_a,
    GET "/test" => handler_b,
}

But then we are repeating the type in two places and need to keep them in sync. It could also lead to errors that are hard to detect, e.g. developers changing the value from i32 to i64 in one place and the handler works just sometimes.

Another problem comes from sub-routers:

const ROUTER_A = router! {
    GET "/hello" => handler_a,
}

const ROUTER_B = router! {
    GET "/test" => ROUTER_A,
    GET "/:dynamic/:dynamic" => handler_b,
}

In this case, if a request of /test/hello2 came in, the handler_b should be called. But this is impossible to do with a match statement:

const ROUTER_A = router! {
     // .. parse request
     let matching_handler = match (req.method(), req.path()) {
        (GET, "/hello") => handler_a(),
        _ => None
    };
}

const ROUTER_B = |stream| {
    // .. parse request
    let matching_handler = match (req.method(), req.path()) {
        (GET, "/test") => ROUTER_A(req),
        (GET, "/:dynamic:string/:dynamic:sting") => handler_b(req)
        _ => None
    };
    // handle response of failure to match
}

For the request /test/nothello we can't really provide a req.path() that will match /test, then jump into the function ROUTER_A, then realise no matches actually exist and fall-through and try to match another one. In Rust the first match that succeeds is taken.

I'm in general not sure how to do this partial matching and forwarding the request to sub routers, not only with this approach.

There are probably other issues, like how do you turn a /hello/test from a request into matches (/hello, /:dynamic/:dynamic) just based on partial routing information that you can observe in the macro? I'm also not aware of any routing library that works that way, so it's a bit hard to wrap my head around it.

Also, to get from this:

const ROUTER_B = router! {
    GET "/test" => handler_a,
    GET "/:dynamic:string" => handler_b,
}

to this:

const ROUTER_B = |stream| {
    // DO THE WORK!

    let matching_handler = match (req.method(), req.path()) {
        (GET, "/test") => handler_a(req),
        (GET, "/:dynamic:string") => handler_b(req)
        _ => None
    };
    // handle response of failure to match
}

You need to generate the DO THE WORK part and there you kinda need to already use a (prefix tree based?) matching to parse the request and turn it into either /test or /:dynamic, but you also need to extract all the dynamic parts and attach them to the request. So you did all the parsing and determined which handler with what arguments you need to call, but instead of calling it directly you generate a "string pointer" to it and use a match to dispatch it.

One obvious benefit is that when expending the macro you would get a nice match statement and it may be easier to debug. However, most of the work of doing the actual parsing would bi hidden away in the DO THE WORK part, so I'm not sure how helpful it would be in the end. The final result would be a bit of a mirror of the macro.

from submillisecond.

SquattingSocrates commented on June 14, 2024 1

So in that case it would be better to match only /hello, not /hello/world I think

from submillisecond.

tqwewe commented on June 14, 2024

Should we use an existing crate for route matching? I found https://docs.rs/matchit/latest/matchit/
Or build or own solution?

from submillisecond.

bkolobara commented on June 14, 2024

Should we use an existing crate for route matching? I found https://docs.rs/matchit/latest/matchit/
Or build or own solution?

I think we need to build our own.

One characteristic of lunatic is that each process (request) gets their own heap memory. This means that we don't share any memory between them. If we create a router struct in the initial process, we will need to serialise and copy it to all request handler processes. This could become a bottleneck in apps that have hundreds of routes.

This basically forces us to generate the router during compile time, meaning all processes will get a static copy of it. Because of this we can't really use any existing router library.

from submillisecond.

SquattingSocrates commented on June 14, 2024

But if we have a function router we could initialise the router in that function and would not need to (de)serialise it. Later we could improve this, but to get a MVP ready we could use this ready router, since it has all the features we want for now.

from submillisecond.

bkolobara commented on June 14, 2024

Yes, for the first iteration this should be fine and I wouldn't worry too much about performance.

Just notice that in this case you are still building a router (function based one), but you are using a library to simplify the matching.

would not need to (de)serialise it

But you would still need to generate the router on every request, right? You are just wrapping the generation into the handler function. This could be even more costly than de/serialising it.

The good part is that with a function based one we can always replace how it works in the future and how much of the code is generated during compile time. In this case, initialising matchit on each request if fine for the first iteration.

from submillisecond.

SquattingSocrates commented on June 14, 2024

Yeah, we'd need to build the router every time, but eventually we could inline generating the matching of the url which would make it pretty efficient. So we would just spit out a function that takes in the url, matches it and returns the matching handler Uber-function

from submillisecond.

tqwewe commented on June 14, 2024

We could also use once_cell to only generate the router on the first request, and use it for subsequent requests.

from submillisecond.

tqwewe commented on June 14, 2024

Sure that sounds great.

Regarding the syntax, do we want the if ... use ... to be placed after the handler, or after the "/" path definition?

I think it makes more sense to have it right after the "/" path to more closely match a Rust match statement. Eg:

router! {
    "/user" if is_user use user_middleware => { ... }
}

Otherwise I'm not quite sure how to write the above example if it's after the handler.

Also, perhaps the use [...] at the top of the macro could instead be the same syntax as Rust's use?

router! {
    use middleware::{logger, diagnostics};
    
    ...
}

from submillisecond.

SquattingSocrates commented on June 14, 2024

Didn't we also want to try generating a single match statement from the macro?
I think we could actually try both approaches in parallel, we just need the basic parsing capabilities of the match syntax and I could then try and generate a match statement while @tqwewe is trying the function + trait based approach.
Or is this off the table? Because I'm confused honestly 😕

const ROUTER = |stream| {
    // .. parse request
    let matching_handler = match (req.method(), req.path()) {
        (GET, "/users/:id") if guard1(req) => //...
        (POST, "/users") => //...
        (GET, "/users/:id/todos") => //...
        (GET, "/users/:id/todos/:todo_id") => //...
        (POST, "/user/:id/todos") => //...
        _ => None
    };
    // handle response of failure to match
}

from submillisecond.

SquattingSocrates commented on June 14, 2024

@bkolobara I think the question of matching similar routes needs to be reframed a little, because when you write a regular match you wouldn't write it like this:

match security_level {
   num => any_level,
    6 => MI_6
}

because if there's a specific match that you're looking for that needs to come first, so that logic shouldn't change in our augmented match statement. It's the same in other languages where you sometimes need to match a specific hardcoded value of an ID and sometimes you can actually do this, but the specific ID route definition needs to come before the generic id matching. So in other languages it would look like this in exactly this order:

@Get("/users/ADMIN_USER")
public getAdminUser() { ... }

@Get("/users/:user_id")
public getUser() { ... }

Another thing is that a pattern like this should probably not be implemented anyway because it doesn't make sense to anyone irl and therefore I'd suggest not to support it at all:

const ROUTER_A = router! {
    GET "/hello" => handler_a,
}

const ROUTER_B = router! {
    GET "/test" => ROUTER_A,
    GET "/:dynamic/:dynamic" => handler_b,
}

If someone truly needs some "dynamic" routing like that they're probably doing something wrong. I also don't see how this would work in other web frameworks in other languages because it's just too ambiguous. Whether it's strict REST or rest-like where you'd maybe do something like /tweetsForUser and use a POST request because you need to send a large filter body, using the same route for completely different purposes doesn't make sense for consumers of API. Imagine reading documentation of an API and you see that you can call
/users/:user_id/posts
/users/:user_id/subscriptions and
/users/:user_id/:anything/:anything_else ? What would you make of this documentation? I mean sure it's cool for experimenting with something but then the user/dev should probably just make a /users/:user_id/* route and try to parse what the user is sending them. Since we don't mind being somewhat opinionated I think it's fine to not encourage a pattern like this.

Regarding sub-routers: one idea was to "flatten" the paths completely so that the generated code doesn't even have any sub-routers, but since we want to have sub-routers, it could be hard if a router can be assigned to a variable.

const ROUTER_A = router! {
    GET "/hello" => handler_a,
}

const ROUTER_B = router! {
    GET "/users/:user_id" => ROUTER_A, // this is hard to flatten, but also hard to further match because of the GET
    GET "/products" => handler_b,
}

However we could implement a sub_router! macro that would indicate you can only have them inside the other macro or simply use a { } block for sub-routing and that way we always get a flat match statement that can either match or not. No match means 404, here's how that would look like:

const ROUTER = router! {
   "/users" => {
       GET "/" => find_users,
       GET "/:user_id" => get_user_by_id
   }
}

// this is what it would expand into:
const ROUTER = |stream| {
    // do work here
   match (req.method(), req.path()) {
      (GET, "/users") => find_users,
      (GET, "/users/:user_id") => get_user_by_id
      _ => fn_404
  }

   // finish requests
}

You're right that we need to do a prefix-tree matching of the uri first, but parsing the request in the DO THE WORK part is just calling a function like let req = http::parse(stream); and then call the static matching tree to get the template string like /users/:user_id from /users/123. But how much overhead is it to match a 50 byte string to a prefix tree?
I don't mind using the prefix routing tree directly to return a reference to the handler and skip the match statement entirely in order to gain performance or simplicity or style. Of course for debugging purposes a match statement would be easier to comprehend because you could just log out the parsed template string and method and then expand the router macro to see all the available branches of the match.

I realise that iterating is a straight-forward solution and many if not most frameworks do it like that. But since we're already tapping into macros we could go one step further and generate code that is efficient, easy to reason about and debug and strict in it's logic?

from submillisecond.

tqwewe commented on June 14, 2024

So what I'm thinking the router macro would expand to is something like this (as a start):

router! {
    GET "/foo" => foo_handler,
}

// becomes:
fn router(mut req: Request) -> Result<Response, RouteError> {
    // TODO: generate the router with the macro and inline it as const
    let mut router = matchit::Router::new();
    router.insert("/foo", "/foo").unwrap();

    let route = req.extensions().get::<Route>().unwrap();
    let route_match = router.at(route.path());
    match route_match {
        Ok(route) => {
            // TODO: insert params into request extensions

            // Call handler
            match *route.value {
                "/foo" if req.method() == Method::GET => Ok(Handler::handle(
                    foo_handler as FnPtr<_, _, { arity(&hello) }>,
                    req,
                )
                .into_response()),
                _ => Err(RouteError::RouteNotMatch(req)),
            }
        }
        Err(_) => Err(RouteError::RouteNotMatch(req)),
    }
}

This works, though doesn't support fall-through.

I think to support this, we can modify matchit to return a list of items which possibly match the route... and execute them one by one until one indicates it handled the request and provides a response.
Eg:

fn router(mut req: Request) ->  Result<Response, RouteError> {
    let router = ...; // const given at compile time from macro.
    
    let handlers_iter = router.at(req.path()); // handlers_iter would be an iterator providing the handlers which match
    for handler in handlers_iter {
        // Try, and if it handles, return the response
    }
}

from submillisecond.

tqwewe commented on June 14, 2024

It's worth noting a good example of fall-through that I've used in the past in production is netlify's redirect system.
https://docs.netlify.com/routing/redirects/#rule-processing-order

It states that it attempts each rule from top to bottom until one matches.

from submillisecond.

SquattingSocrates commented on June 14, 2024

But isn't this the same as matching /blog/:post_id ?

from submillisecond.

SquattingSocrates commented on June 14, 2024

Ah, it's configured as redirect.
It would then just be

GET "/blog/old-post" => redirect("/blog/new-post"),
GET "/blog/new-post" => serve_new_post

or something, we have no handling of redirects yet.

from submillisecond.

bkolobara commented on June 14, 2024

It's the same in other languages where you sometimes need to match a specific hardcoded value of an ID and sometimes you can actually do this, but the specific ID route definition needs to come before the generic id matching.

That's a good point.

Another thing is that a pattern like this should probably not be implemented anyway because it doesn't make sense to anyone irl and therefore I'd suggest not to support it at all

In the beginning we needed to support this because we had a path on top of the handler function, so the handler was part-router. Now that we got rid of it, I think it's ok if we don't allow composing routers for now.

from submillisecond.

bkolobara commented on June 14, 2024

On a different note, is the character / of any significance to us? E.g. Would this:

router! {
    GET "/:path" => dynamic_routing,
}

match /hello/world?

from submillisecond.

SquattingSocrates commented on June 14, 2024

Generally path parameters should not have slashes in them because it makes path parsing ambiguous. It's kinda like a (/a+)+ regex scenario which is not really great to match without crashing your computer. Usually if you don't care about slashes you should just use a /some_prefix/*, which is useful if you only need to redirect requests to another service and don't want to bother with typing out request and response

from submillisecond.

tqwewe commented on June 14, 2024

On a different note, is the character / of any significance to us?

If you wanted to match like that, you do GET "/*path", otherwise it will only match the first segment if you do :path.

from submillisecond.

bkolobara commented on June 14, 2024

If you wanted to match like that, you do GET "/*path", otherwise it will only match the first segment if you do :path.

So you will be able to do both, GET "/*path" and GET "/:path"? And the extractor will be able to extract it either?

from submillisecond.

tqwewe commented on June 14, 2024

If you had both /:path and /*path, then I believe /foo would be available in the extractor as foo, and /foo/bar would be foo/bar.

from submillisecond.

Add Router about submillisecond HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent