lantern-chat / server Goto Github PK

Lantern Server Backend

License: Other

Dockerfile 0.03% Rust 90.59% PLpgSQL 9.16% Shell 0.23%

server's Issues

Composite foreign key for `profiles` -> `party_member`

lantern.party_member already contains a composite primary key of (user_id, party_id), so it would be great to reference that as a composite foreign key in lantern.profiles when user_id and party_id are not-NULL. That way when the party_member entry is deleted, the (if any) per-party profile is also deleted by cascade.

However, profiles.party_id can be NULL for non-party profiles, so I'm currently unsure what exactly that foreign key declaration would look like.

I've found this article as perhaps a starting point.

Dynamic metadata on index page

With self-hosting, the server will need to modify head metadata fields that are currently just hard-coded into the index.html file on the frontend, which webpack populates with scripts and similar.

The file isn't modified often, so whatever kind of modification step doesn't need to be realtime, but inserting metadata should be done before compressing/caching, with correct cache-invalidation when metadata changes.

Decouple gateway into semi-stateless microservice

Been thinking about what is needed to a MVP, and realized that my current approach to websockets might be... annoying, when it comes to updating the server.

For the (more or less) stateless REST API, I could easily spawn a newer version of the server and route any new connections to that, so the old instance doesn't run any API queries again, allowing database migrations without risking crashing the old instance since it just isn't handling anything else.

However, the websocket gateway is intrinsically tied into the server backend, meaning if I have to update the API or do any database migrations, all websockets must be interrupted.

One solution to that is to simply not tie the gateway to the server, have it run in a microservice of its own, acting more like a proxy for already-encoded events coming from the active backend server through some low-level message-passing interface. Basically the gateway would not need any ties to the database, so it would be immune to migration woes.

This approach would massively reduce downtime, but would require more research and effort, so it will just go here in the meantime.

Use edge functions for oEmbed fetching

Fetching oEmbed metadata for message embeds reveals the server's IP address and so forth. It would be nice to use some kind of serverless edge function to fetch those instead, possibly something in Cloudflare's network.

In-memory thread cache

Due to the performance limitations of recursive queries, it would be very useful in the future to create an in-memory thread structure to cache the last/top N messages to a certain depth, for query-less rapid access. Treat it like an LRU cache based on room/thread accesses, so the most in-demand rooms/threads can avoid a large chunk of work. For the purpose of this, an entire room is a top-level thread. This has the added benefit of the listing of posts in a user-board room also being cached.

This is partially inspired by https://slack.engineering/real-time-messaging/

However, I want to extend this to user-board rooms and posts, with the full tree structure. As of writing this I'm unsure what the tree should look like, as it might need to support varying traversal conditions and filtering.

type MsgId = Snowflake;
type RoomOrParentId = Snowflake;

// 16 bytes
struct CachedMessage {
    msg_id: MsgId,
    parent_id: RoomOrParentId,
}

struct Cache {
    threads: scc::HashCache<RoomOrParentId, Tree<CachedMessage>>,
    messages: scc::HashMap<MsgId, Message>,
}

perhaps key by (RoomOrParentId, SortMethod) where SortMethod could be Top/New/etc., so those get their own sorted trees.

Furthermore, we'll need to track some kind of threshold for cache insertion, such that we aren't constantly thrashing it with small posts. The cache should only be used when the time spent doing repeated work becomes significant for a single post.

"Away" presence can override "Online" if newer.

Should be able to fix by giving a priority or secondary sorting.

Full-text search system

See this thread for inspiration.

As of writing this, I've just added a dynamic tsvector column that uses a 6-bit flag for language selection, and a GIN index for queries.

That's already suitable for most text searches, except for exact-match searches. Exact searching could be done in two stages, first with tsquery and second with ILIKE on the result.

Furthermore, the system should support extra attributes such as from:user, in:room, etc.

Analyze pg_tgrm once we have a real workload

As of writing this, the test instance of Lantern has around 2,600 messages in it. The messages table is 377KB.

A GIN (lower(content) gin_trgm_ops) index currently occupies 385KB, so the trigram index on lower(content) is larger than the entire messages table itself. For comparison, the GIN (ts) index on the generated tsvector column is only 155KB.

Together, that means all the indexes required for fuzzy and exact searching increases the storage cost of messages by about 150%. The average footprint per-messages comes out to around 438 bytes.

I'll need to keep track of these and see if that cost is acceptable. Things may not scale linearly. It's probably fine, since the actual storage of messages will be nothing compared to files, but the database is more difficult to split up.

key-phrase notifications

Using https://github.com/postgrespro/rum#rum_tsquery_ops it should be possible to allow users to specify a trigger-phrase for which new messages will create a notification.

It's like the opposite of a full-text search.

However, I need to figure out how to install the extension with Postgres running in Docker, and still be able to update the database eventually using https://github.com/tianon/docker-postgres-upgrade or similar, which may not support having the extension loaded at migration-time, if that's even required.

Enforce text limits

540fa17 removed in-database text limits, since PostgreSQL doesn't distinguish their representation anyway and we want them to be configurable.

Therefore, limits must be handled in application logic using a configuration object.

BPF-based IP blocking

To "block" banned IPs in userspace, the TCP connection must be established first, wasting time and resources.

It would be better to use a eBPF script, possibly via RedBPF or similar, with some map type that can be updated as the block list changes.

Use Hurl for API tests?

https://hurl.dev/

Animated User Asset Processing

GIFs for avatars and banners will be quite popular, and need to be cropped, scaled and optimized like still images.

Currently, the "standard" image library has rather poor GIF support, drastically increasing file size in some cases, and is also quite slow based on issues.

https://github.com/ImageOptim/gifski has a better encoder, but is more focused in extracting the highest quality from video, rather than optimizing existing GIFs. Likely not a good fit.

ImageMagick may be the only solution, by spawning a subprocess and passing the image data through stdin. https://github.com/zshipko/image2-rs is notable for this, but lacks apparent GIF support. Custom routines would likely be needed.

Squash initial SQL before release

All of the migrations so far were just created throughout initial development, but can be combined into an init script before release, allowing for more organized migrations in the future.

lantern-chat / server Goto Github PK

server's People

Contributors

Stargazers

Watchers

server's Issues

Recommend Projects

Recommend Topics

Recommend Org