lantern-chat / server Goto Github PK
View Code? Open in Web Editor NEWLantern Server Backend
License: Other
Lantern Server Backend
License: Other
lantern.party_member
already contains a composite primary key of (user_id, party_id)
, so it would be great to reference that as a composite foreign key in lantern.profiles
when user_id
and party_id
are not-NULL. That way when the party_member
entry is deleted, the (if any) per-party profile is also deleted by cascade.
However, profiles.party_id
can be NULL for non-party profiles, so I'm currently unsure what exactly that foreign key declaration would look like.
I've found this article as perhaps a starting point.
With self-hosting, the server will need to modify head metadata fields that are currently just hard-coded into the index.html
file on the frontend, which webpack populates with scripts and similar.
The file isn't modified often, so whatever kind of modification step doesn't need to be realtime, but inserting metadata should be done before compressing/caching, with correct cache-invalidation when metadata changes.
Been thinking about what is needed to a MVP, and realized that my current approach to websockets might be... annoying, when it comes to updating the server.
For the (more or less) stateless REST API, I could easily spawn a newer version of the server and route any new connections to that, so the old instance doesn't run any API queries again, allowing database migrations without risking crashing the old instance since it just isn't handling anything else.
However, the websocket gateway is intrinsically tied into the server backend, meaning if I have to update the API or do any database migrations, all websockets must be interrupted.
One solution to that is to simply not tie the gateway to the server, have it run in a microservice of its own, acting more like a proxy for already-encoded events coming from the active backend server through some low-level message-passing interface. Basically the gateway would not need any ties to the database, so it would be immune to migration woes.
This approach would massively reduce downtime, but would require more research and effort, so it will just go here in the meantime.
Fetching oEmbed metadata for message embeds reveals the server's IP address and so forth. It would be nice to use some kind of serverless edge function to fetch those instead, possibly something in Cloudflare's network.
Due to the performance limitations of recursive queries, it would be very useful in the future to create an in-memory thread structure to cache the last/top N messages to a certain depth, for query-less rapid access. Treat it like an LRU cache based on room/thread accesses, so the most in-demand rooms/threads can avoid a large chunk of work. For the purpose of this, an entire room is a top-level thread. This has the added benefit of the listing of posts in a user-board room also being cached.
This is partially inspired by https://slack.engineering/real-time-messaging/
However, I want to extend this to user-board rooms and posts, with the full tree structure. As of writing this I'm unsure what the tree should look like, as it might need to support varying traversal conditions and filtering.
type MsgId = Snowflake;
type RoomOrParentId = Snowflake;
// 16 bytes
struct CachedMessage {
msg_id: MsgId,
parent_id: RoomOrParentId,
}
struct Cache {
threads: scc::HashCache<RoomOrParentId, Tree<CachedMessage>>,
messages: scc::HashMap<MsgId, Message>,
}
perhaps key by (RoomOrParentId, SortMethod)
where SortMethod
could be Top
/New
/etc., so those get their own sorted trees.
Furthermore, we'll need to track some kind of threshold for cache insertion, such that we aren't constantly thrashing it with small posts. The cache should only be used when the time spent doing repeated work becomes significant for a single post.
Should be able to fix by giving a priority or secondary sorting.
See this thread for inspiration.
As of writing this, I've just added a dynamic tsvector column that uses a 6-bit flag for language selection, and a GIN index for queries.
That's already suitable for most text searches, except for exact-match searches. Exact searching could be done in two stages, first with tsquery and second with ILIKE
on the result.
Furthermore, the system should support extra attributes such as from:user
, in:room
, etc.
As of writing this, the test instance of Lantern has around 2,600 messages in it. The messages
table is 377KB.
A GIN (lower(content) gin_trgm_ops)
index currently occupies 385KB, so the trigram index on lower(content)
is larger than the entire messages table itself. For comparison, the GIN (ts)
index on the generated tsvector column is only 155KB.
Together, that means all the indexes required for fuzzy and exact searching increases the storage cost of messages by about 150%. The average footprint per-messages comes out to around 438 bytes.
I'll need to keep track of these and see if that cost is acceptable. Things may not scale linearly. It's probably fine, since the actual storage of messages will be nothing compared to files, but the database is more difficult to split up.
Using https://github.com/postgrespro/rum#rum_tsquery_ops it should be possible to allow users to specify a trigger-phrase for which new messages will create a notification.
It's like the opposite of a full-text search.
However, I need to figure out how to install the extension with Postgres running in Docker, and still be able to update the database eventually using https://github.com/tianon/docker-postgres-upgrade or similar, which may not support having the extension loaded at migration-time, if that's even required.
540fa17 removed in-database text limits, since PostgreSQL doesn't distinguish their representation anyway and we want them to be configurable.
Therefore, limits must be handled in application logic using a configuration object.
To "block" banned IPs in userspace, the TCP connection must be established first, wasting time and resources.
It would be better to use a eBPF script, possibly via RedBPF or similar, with some map type that can be updated as the block list changes.
GIFs for avatars and banners will be quite popular, and need to be cropped, scaled and optimized like still images.
Currently, the "standard" image
library has rather poor GIF support, drastically increasing file size in some cases, and is also quite slow based on issues.
https://github.com/ImageOptim/gifski has a better encoder, but is more focused in extracting the highest quality from video, rather than optimizing existing GIFs. Likely not a good fit.
ImageMagick may be the only solution, by spawning a subprocess and passing the image data through stdin. https://github.com/zshipko/image2-rs is notable for this, but lacks apparent GIF support. Custom routines would likely be needed.
All of the migrations so far were just created throughout initial development, but can be combined into an init script before release, allowing for more organized migrations in the future.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.