decentraland / catalyst Goto Github PK

🐧 Content server for Decentraland

Home Page: http://decentraland.github.io/catalyst-monitor

License: Apache License 2.0

TypeScript 99.45% Shell 0.01% JavaScript 0.35% Makefile 0.01% Dockerfile 0.19%

catalyst's Introduction

Catalyst Project

A Catalyst is a server that bundles different services. These services currently work as the backbone for Decentraland and run the decentralized storage for most of the content needed by the client and orchestrate the communications between peers.

If you just want to run a Catalyst server, please check the Catalyst Owner repository. The current repository is mostly used for developing.

The architecture of the server is as follows:

Backend for Frontend (BFF) This service was created to resolve client needs to enable faster development of new features without breaking the existing APIs. In the Catalyst context, it's used for the communications between peers connected to the client, its main responsibility is to manage the P2P signaling.
Archipelago Service Previously Archipelago was a library used by the Lighthouse, as now it needs to work with the different transports beyond P2P, it was converted into a Service. This service will have the same responsibility that the library did: group peers in clusters so they can communicate efficiently. On the other hand, the service will also need to be able to balance islands using the available transports and following a set of Catalyst Owner defined rules, in order to, for example, use LiveKit for an island in the Casino and P2P in a Plaza.
NATS NATS is a message broker that enables the data exchange and communication between services. This is also a building block for future developments and will enable an easy way to connect services using subject-based messaging. In the context of the communication services architecture, it is used to communicate the BFF, Archipelago and LiveKit.
LiveKit LiveKit is an open source project that provides scalable, multi-user conferencing over WebRTC. Instead of doing a P2P network, peers are connected to a Selective Forwarding Unit (SFU) in charge of managing message relay and different quality aspects of the communication. This will be the added infrastructure in order to provide high-performance/high-quality communications between crowds on designated scenes.
Lambdas: This service provides a set of utilities required by the Catalyst Server Clients/Consumers in order to retrieve or validate data. Some of the validations run in these functions are ownership related and for that it uses The Graph to query the blockchain.
Content Server: The Content Server currently stores many of the Entities used in Decentraland. For example scenes, wearables and profiles. Content Servers will automatically sync with each other, as long as they were all approved by the DAO. If you set up a local content server, it will receive all updates by those other DAO Catalysts. However, new deployments that happen on your local server will not be sent to other servers.
Nginx is the reverse proxy used to route traffic to the Catalysts Services.

The Catalyst Client library can be used to interact with the Catalyst servers. You can both fetch data, or deploy new entities to the server you specify.

Check full architecture here

Catalyst API

This Server implements the v1 of the API Specification detailed here

Monitoring

For monitoring see the following doc

Tests

yarn build
yarn test

Dependencies

For a list of other Decentraland libraries that Catalyst servers depend on, please check the library dependencies

Contributing

Code of Conduct

Please read the full text so that you can understand what actions will and will not be tolerated.

Contributing Guide

Read our contributing guide to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes.

Release

Create a tag release in Git
It will trigger the CI job which publishes a new docker image version under @latest tag

catalyst's People

Contributors

Stargazers

Watchers

Forkers

eordano petaglair blockchains kenman79 kiichisugihara cryply eyesnareinc bear-finance modestgoblin satyrnfinancial telestoworld andrianbalanescu cryptodex-catalogue 0xblockchainx0 sirtwitter fullstack-web-blockchain-developer kalufinnle chxckj stjordanis shubham-kaushal crypto-forks stevegyutyan mintaverse rlty-live mlikhar3 usamaliaquat123 shelvenzhou redblow decentrastates isabella232 phala-network pyfundation tomzhang manumena sarlongda514 franksee fibrinlab indieversestudio jfjun trungthaihieu93-dev innovation-labs-technical-hub aetheriaxai flynn12jun codestheos avmultilinkdefi cryptoyanush abinandh15 scriblet enigma-openweb-3-0-cybertechnologies

catalyst's Issues

Investigate Catalyst running slow

We've identified that requesting to https://peer.melonwave.com/content/deployments is very slow. After some investigation we found some queries that could be improved.

We need to test the performance after that change and keep it if the investigation is successfull.

Query

The query is under /deployments endpoint.

            SELECT
                 dep1.id,
                 dep1.entity_type,
                 dep1.entity_id,
                 dep1.entity_pointers,
                 date_part('epoch', dep1.entity_timestamp) * 1000 AS entity_timestamp,
                 dep1.entity_metadata,
                 dep1.deployer_address,
                 dep1.version,
                 dep1.auth_chain,
                 dep1.origin_server_url,
                 date_part('epoch', dep1.origin_timestamp) * 1000 AS origin_timestamp,
                 date_part('epoch', dep1.local_timestamp) * 1000 AS local_timestamp,
                 dep2.entity_id AS overwritten_by
             FROM deployments AS dep1
             LEFT JOIN deployments AS dep2 ON dep1.deleter_deployment = dep2.id WHERE dep1.local_timestamp >= to_timestamp(1 / 1000.0) ORDER BY dep1.local_timestamp DESC, dep1.entity_id
DESC LIMIT 501 OFFSET 5500;

Explain anaylize:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=241601.63..241601.75 rows=1 width=1643) (actual time=39257.893..39536.809 rows=1 loops=1)
   ->  Gather Merge  (cost=241601.63..277170.41 rows=304854 width=1643) (actual time=39244.951..39523.865 rows=1 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=240601.61..240982.68 rows=152427 width=1643) (actual time=39130.170..39130.174 rows=1 loops=3)
               Sort Key: dep1.local_timestamp DESC, dep1.entity_id
               Sort Method: top-N heapsort  Memory: 28kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 31kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 30kB
               ->  Parallel Hash Left Join  (cost=89269.61..239839.47 rows=152427 width=1643) (actual time=37436.907..38389.368 rows=121909 loops=3)
                     Hash Cond: (dep1.deleter_deployment = dep2.id)
                     ->  Parallel Seq Scan on deployments dep1  (cost=0.00..86256.34 rows=152427 width=1592) (actual time=0.060..435.529 rows=121909 loops=3)
                           Filter: (local_timestamp >= '1970-01-01 00:00:00.001+00'::timestamp with time zone)
                     ->  Parallel Hash  (cost=85875.27..85875.27 rows=152427 width=51) (actual time=1729.537..1729.538 rows=121909 loops=3)
                           Buckets: 65536  Batches: 16  Memory Usage: 2528kB
                           ->  Parallel Seq Scan on deployments dep2  (cost=0.00..85875.27 rows=152427 width=51) (actual time=32.408..380.144 rows=121909 loops=3)
 Planning Time: 0.463 ms
 JIT:
   Functions: 37
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 8.062 ms, Inlining 0.000 ms, Optimization 11.040 ms, Emission 98.046 ms, Total 117.148 ms
 Execution Time: 39541.596 ms

Ideas

Remove locale_timestamp index and create one for CREATE INDEX ix_deployment_timestamp_entity_id ON deployments ( local_timestamp DESC, entity_id DESC ))
Write as a subquery, so instead of doing a full left join (NxM) it'd do a Nx1:

             SELECT
                 dep1.id,
                 dep1.entity_type,
                 dep1.entity_id,
                 dep1.entity_pointers,
                 date_part('epoch', dep1.entity_timestamp) * 1000 AS entity_timestamp,
                 dep1.entity_metadata,
                 dep1.deployer_address,
                 dep1.version,
                 dep1.auth_chain,
                 dep1.origin_server_url,
                 date_part('epoch', dep1.origin_timestamp) * 1000 AS origin_timestamp,
                 date_part('epoch', dep1.local_timestamp) * 1000 AS local_timestamp,
                 dep2.entity_id AS overwritten_by
             FROM deployments AS dep1
             LEFT JOIN
                (SELECT 
                   id,
                   entity_id
                 FROM deployments sub1
                 WHERE sub1.id = dep1.deleter_deployment) as dep2 
                ON dep1.deleter_deployment = dep2.id 
             WHERE dep1.local_timestamp >= to_timestamp(1 / 1000.0) ORDER BY dep1.local_timestamp DESC, dep1.entity_id
             DESC LIMIT 501 OFFSET 5500;

Send dev alerts to custom slack channel #platform-dev-alerts

Changes to Collections lambdas

Currently, the lambdas for collections is downloading the image and thumbnail as a buffer instead of using streams.
The best way to fix this is by adding a new function to the SmartContentServerFetcher that returns a stream instead of a buffer (we're using fetchBufferFromContentServer now).

[security] Allow a client to challenge the server to know it's being run by the owner

Context
There is a list of 3 servers in the Node Registry: [A, B, C]

Each Node is registered using the following structure:

struct Katalyst {
    bytes32 id;
    address owner;
    string domain;
}

Each Client uses the above information to connect to the servers as they see fit.

Problem
Though any Client can trust that each NodeRegistry entry was approved by the DAO, the server they connect to can be impersonated in different ways. If Server B is poisoned it might start returning bogus data.

Proposal
Maybe it is a good idea to allow for clients to challenge the servers and ensure they are who they say.

This can maybe be done by:
A) The Client sending a /challenge request with a particular random message.
B) The Server signing the message with the corresponding address PrivateKey and sending it back to the client.
C) The Client verifies that the message was signed by the owner Key.

Note:
To avoid exposing the owner private key having it sit on a server, a different challenge private key could be used.

Support partial deployments

As of today, new entities need to be deployed all at once. Content files need to be already uploaded and in use by another entity, or they need to be uploaded in the same operation that creates the new entity.

This works for most cases, but there are some entities (such as scenes) that could be really big in terms of size. This has caused problems in the past. For example, cloudflare has a maximum size per POST request.

The idea here would be to allow "partitioned deployments". These would be deployments that can be deployed in smaller part. The idea would be that many HTTP operations can result in one deployment.

If we go with this approach, we would need to be careful with a few things:

Make sure that this cannot be used by attackers, so the same size limit that applies to "full" deployments need to be enforced
Deployments take into account the entity's timestamp to determine if they are valid or not. We would need to take this new operation into account and make sure we are not breaking anything

Once this is merged and released, we would need to change the catalyst client to use this new operation by default. If it doesn't exist, please create an issue for that once this is merged.

Implement persistent messages in the Catalyst nodes

Create V3 content library

The content server needs to be accessed from different clients (CLI, Explorer, Builder, and others).
The required HTTP requests to interact with it may be complex to re-implement them in every client, so we need to encapsulate them in a client library.
On the other hand, having this library will help us to distribute API changes more easily.
The library should expose all the read, write and delete operations.

When you restart a Catalyst keep comms state

Whenever you restart a Catalyst server, for example let say you are updating to a new version you lose all comms connections and the server name is changed.

Maybe some form of state can be persisted, for example the comm server name, so clients can retry reaching out to the server when the server is back online.

Add flag to avoid sync

Right now, when a content server starts, it will automatically try to find other content servers and try to sync with them. This is OK as default behaviour, but it would be nice to have an easy way to disable syncing, mostly for testing purposes.

One possible solution is having a flag only used for testing that disables all the synchronization mechanism and acts as it were only one node.

Add support for sorting in /deployments

Currently, the /deployments endpoint returns deployments sorted by localTimestamp, in a descencing order.

The idea would be to add support to sort by entityTimestamp, and to also allow the order (ASC or DESC) to be set

Return a BAD_REQUEST with validation errors

As of now, when an entity is deployed and fails, all errors return a 500 (Internal server error).

Validation errors should return a 400 instead.

Investigate deny-list performance

As part of the performance analysis we identified a potential problem in the query used to filter the results when applying the deny-list.
As an emergency action we disabled the deny-list temporarily (see #419) but we need to re-enable it without compromising performance.

We discovered that the query used to check the deny-list may grow unbounded and that may impact in the correct behaviour of the system.
An instance of this query is attached to the issue.

deny-list-query.txt

Add some documentation about content server APIs

We have a many different endpoints exposed by the content server. However, they are not documented anywhere. It's a pity, because there is a lot of information that could be used for many different use cases.

We need to write this documentation, and expose it in a way that the community can read it, and use it. It might make sense to add it to https://docs.decentraland.org/

Verification tool

What do we want to verify?

We need to be able to verify that everything is going well with the active content servers. This can be divided into:

Synchronization

We need to be able to make sure that two or more content servers are in sync. This would imply that either they have the same active entities, or that their active entities are coherent with:

Eventual consistency
The synchronization cycle
Entity overlapping logic
Reported failed deployments

We have recently done something similar on the monitor tool, but it might need some more iterations.

Validation

Another thing that we must verify, is that there are no malicious servers serving invalid content to clients. This can be verified by going through active content files, hashing them, and making sure that the actual hash matches the expected one.

How to implement it?

Ideally, this would be implemented on a script that can be run by the community (maybe the CLI?). However, it would also be nice to have a server running it automatically and periodically, with some form of alerting in place.

Note: It might make sense to split this issue into when trying to implement it.

Performance improvements to /profile lambda

Right now, our /profile lambda takes an ethAddress, fetches the profile from the content server, and sanitizes its name by querying the-graph. This last action is performed to verify whether the current name is actually an NFT or not.

Now, the problem is that it can take a few seconds to return the profile. In order to improve this, we can do 2 things:

We can add a cache similar to Cache<EthAddress, Set<Name>>, to avoid querying the-graph each time a profile is requested. This is specially useful since we consider the fact that when a user enters the world, most users around it will fetch the new user's profile from the catalyst at around the same time.
Notes:

This cache should have some kind of time invalidation.
We had something similar in the content server that could help in this scenario.

We could add support to fetch more than one profile at a time. This would allow the explorer to avoid many requests, and at the same time we could group name queries to the-graph into one

Add retry mechanism for deployments

When content servers sync between each other, they learn of new deployments. Now, when a content server finds a new deployment, it will:

Download all necessary files
Validate the deployment
Deploy it locally

During any of those steps, things can go wrong. The most common source of issues are network problems, where file downloads or blockchain queries fail. These problems can't be avoided, but they can be mitigated. Each of those steps has a local retry mechanism, but the time window is of course really small.

The idea would be to add a more general retry mechanism for failed deployments. A possibility would be to create something like the GarbageCollectionManager that executes an action periodically.

We could (this is of course open to discussion) have this periodic "checker" take all failed deployments and execute all three deployment steps again. It could make sense to also implement a sort of backoff, so each time a deployment fails, the window between retries increases.

[Investigation] What happens when the same entity is deployed in different servers?

Content servers don't require any type of consensus. Each deployment is shared and validated by all servers, and there is no concept similar to "double spending". However, there is a possibility that the same entity could be deployed in two or more different servers at the same time.

This should have no effect other than having the same entity with different originTimestamp and originServerUrl, but we need to make sure that is the case.

[ops] init.sh does not retry cert if failed

In the process of initial installation if for some reason the init.sh fails to obtain a certificate from LetsEncrypt it won't retry on the next run.

The workaround is to delete the folder local/certbot

The deployedBy filter in the deployments endpoint is not filtering the deployments properly

See this case:
https://peer.decentraland.org/content/entities/profile?pointer=0xa7c825bb8c2c4d18288af8efe38c8bf75a1aab51

This call returns many deployments (207 at the time when this issue was created):
https://peer.decentraland.org/content/deployments?entityType=profile&pointer=0xa7c825bb8c2c4d18288af8efe38c8bf75a1aab51

However this other call is returning only one deployment:
https://peer.decentraland.org/content/deployments?entityType=profile&deployedBy=0xa7c825bb8c2c4d18288af8efe38c8bf75a1aab51

@nchamo suggested that the problem may be caused by the deployedBy filter, which may take into account the word casing of the filter address.

Stop exposing origin information

We currently support/expose the originServerUrl and originTimestamp because it was needed to keep compatibility with the old/history endpoint. Once we move into a more decentralized approach, we won't know where deployments were originated. The idea is to start removing that information from the /deployments endpoint.

We should:

Stop exposing originTimestamp & originServerUrl on the /deployments payload

Make pointers case insensitive

Currently, all pointers are being saved as lowercase. So, we need to modify the code so when setting a pointer through a query param, then the values retrieved are the ones that match case insensitive with the pointer param.

Configure Catalyst to compress logs when reach 100MB

Rename Peer Library Package to not use the old "katalyst" name

We have managed to rename most of the Catalyst to be with C instead of K. But the package for the Peer Library in NPM is still called "decentraland-katalyst-peer".

Now that we also have the "dcl" organization in NPM maybe we could create a package called "catalyst-peer" there and start pushing our peer library there. Then the package on the whole would be "@dcl/catalyst-peer".

Serve the explorer from Catalysts

As of today, the only client available is served in play.decentraland.org. It could be a good idea to provide a way for each catalyst to serve a version of the explorer that connects to itself.

So if for example you enter https://peer.decentraland.org, you would be using an explorer that is connected to that the catalyst server running under that same url.

The interesting part, would be to think how updates will be handled. Catalyst updates are a lot less frequent than explorer updates.

Could somehow related to the efforts in decentraland/explorer#1424

Add support for S3 storage

As of now, content files can only be stored on the local filesystem. We already have an interface ContentStorage, so we would only need to implement it, and add some environment configuration for the S3 bucket to be set.

This change could add a new way for catalyst owners to create an inconsistent state. We may need to add a check during startup to make sure that all files that are supposed to be present on the storage, actually are. If this check fails, then the content server should show the appropriate error message and cancel the startup.

[Discussion] Support non allow-listed catalysts

The situation today

As of today, there is a list of catalyst servers that is allow-listed by the DAO. Content servers sync with other servers on that list and only those. When a non allow-listed server starts, it will sync with whitelisted servers, but any deployments made to them will not be picked up by any other servers.

The proposal

We could potentially change this by changing this DAO maintained list with a more dynamic list of known servers. This would imply:

Allowing servers to register with each other
Make each server expose a list of other known servers

By doing this, any new server could use the whitelisted list as a starting point to know all active servers and register with them, therefore becoming a part of the network. Then each server would sync with all servers on the network.

The consequences

This approach would require a lot of changes to how DCL works today, but mainly:

We would need to change the way that the synchronization works:
A. We would probably want to announce new deployments, instead of asking for them to each known server
B. We would need to handle the lifecycle of the list of known servers
The explorer relies on the fact that catalyst servers all run the same version, that will no longer be possible to ensure
We need to enable the explorer to easily connect to other servers
Right now malicious servers can be simply removed from the whitelist and no client will connect to them. Maybe we would need to create a way to report servers, as a way to keep all users safe
We would need to have a good communication channel to announce updates, and provide support

Synchronize messages across all Catalyst nodes

Analyze Distributed events synchronization mechanism

Remove Bazel from our build pipeline

[Refactor] Remove hardcoded listeners from the service

In #154, we added the concept listeners to the content server service. When a deployment is made, listeners are called so that they can execute some action.

We have some hardcoded listeners into the service, for Segment and SQS. It would be best to remove that from the service itself, and let those elements just subscribe to the service.

Add linter and formatter to project

We need to agree on formatting and linter standards for the repository, then configure them and modify all the files to comply them. This issue was started in #220

For format we will be using Prettier

Configure Prettier acording the standard that the team has decided.
Configure VSCode to automatically format a file on save.
Run prettier over all the files in the repository.
Add Prettier as a requirement when setting up the project as a developer.
[nice to have] Create a pre commit that validates format with prettier over the modified files. This step must have a way to skip it.

Linter

Configure the linter in the repository.
Run the linter and resolve all the issues found.
Add running the linter as part of the pipeline.

[Refactor] Get rid of HistoryManagerImpl

We can now get rid of HistoryManagerImpl.ts, and simply use the deployments information to return the necessary data. This would allow us to delete some extra unnecessary code

Support to fetch more than one profile in /profile lambda

Right now, our /profile lambda takes an ethAddress, fetches the profile from the content server, and sanitizes its name by querying the-graph. This last action is performed to verify whether the current name is actually an NFT or not. A cache has been added to avoid querying the same values more than once in #243

Anyway, a bigger improvement would be to add support to fetch more than one profile at a time. This would allow the explorer to avoid many requests, and at the same time we could group name queries to the-graph into one

Upgrade to IPFSv2 hashing algorithm

Nowadays the content hashing algorithm is not 1:1 with IPFS. It is an ad-hoc hashing. The hashes are prefixed with Qm and have the same encoding of IPFSv1 hashes, but those hashes are not compatible.

We should support the new IPFSv2 hashes (prefixed with bafy instead of Qm) to enable integration with IPFS and to leverage decentralized and free CDN solutions like Cloudflare IPFS, and to enable other use cases like synchronization cheap P2P between brand new nodes without exhausting bandwidth from specific content servers.

Example link: https://cloudflare-ipfs.com/ipfs/bafybeiemxf5abjwjbikoz4mc3a3dla6ual3jsgpdr4cjr3oz3evfyavhwq/wiki/

Caveats

The content (as it is deployed today) can’t change. It is already signed, indexed and stored using the ad-hoc hashing algorithm (Qm prefix). To maintain backward compatibility, every hash starting with Qm will be assumed as legacy.
The CLI should also implement this change, it may be under a flag to test at the beginning. To progressively test the implementation.
The explorer may have hardcoded logic for the Qm.. hashes. An assessment would be required.
So does the builder.

Add support for both hashes (Qm and bafy) in catalysts.
Add beta support in CLI, behind a flag, for internal testing.
Run internal tests in .zone and verify synchronization works correctly.
Set up plan for rollout keeping both hashes
Define deprecation date for old hashes (about 1/2 month from release)

support npm commands

NPM usage is more widespread than yarn's, maybe support both in development mode?

Use a DB to store the content server state

Currently we're using the same storage layer to store the content data (entities and files) as well as the control data (history, pointers and deployment proofs).
This architecture avoids us to implement improvements and optimizations that are natural to each category, like remote persistent storage in S3 for the content, and fast querying and access for the control data (which could be achieved by using a DB).
In this task we should create different abstractions for each storage category (content and control) as well as default implementations for them.
The storage should have at least an implementation using the local file system and another one using S3.
The control data should have at least an implementation using a database.

Deciding if we should use a local/embedded database or a remote one is also part of this task.

[monitoring] segment payload size larger than supported

The content server is exhibiting a warning related to tracking of events.

Your message must be < 32kb. This is currently surfaced as a warning to allow clients to update. Versions released after August 1, 2018 will throw an error instead. Please update your code before then. {
 userId: '0xdc13378dafca7fe2306368a16bcfac38c80bfcad',
  event: 'Catalyst Content Upload',
  properties:
   { server: '2b235332-2e10-4d07-9966-b95efb6146ec',
     type: 'scene',
     cid: 'QmUNpSvuqLVXfRgSsXXSKeRsixu8SnK7tmEimJo9ncQJTg',
     pointers:

Modify User-Agent between catalyst requests

Currently, when content servers sync with each other, the user agent is set to node-fetch/1.0 (+https://github.com/bitinn/node-fetch)

It would be better if we were able to modify the user agent header, so that it was clearer that the requests come from catalyst servers, and not some generic node-fetch library.

This will probably requiere changes to catalyst-client and catalyst-commons repositories. Ideally, this would be done in a way that if we want to add future headers to requests, we won't need to change those other 2 repositories

Provide a way to find the deployments associated to a particular file hash

We need that information to troubleshoot problems related to malformed snapshot or other problems related to uploaded data.
It would be great to list all the deployments pointing to the hash.

[Migration] Fix problem with concurrent updates

Before #161, concurrent deployments that had the same pointers could be left active, all of them. A fix was added, but on many catalysts, there are 2 or more deployments that are being referenced by the same pointer.

We need to make a db migration to find those deployments, and fix the issue, by marking that one overwrote the other.

Peer messaging. Investigate possible implementations. [time-boxed]

In the next quarter we need to provide services that allow our users to chat between them.
The objective of this task is to explore possible implementations for that feature.

Besides the current client allows the users to chat, it's limited to the peers they have around, the so called "whispering".
The new feature should our users to send private messages to their friends, even when they are offline. Those messages should be stored in a secure way (no one besides the sender and the receiver should be able to see them) and delivered to the destination peer when it get back online.

As part of this investigation we should take a look at matrix.org

Add versioning to Catalyst

We are currently using commit sha with the latest tag to checkout the versions from the catalyst.

We need to use semver, so we could use the same approach used in Catalyst-Client and in Catalyst-Commons which is:

Use oddish github action to:

Create a new version with every commit to master branch (with the next tag) and push it to docker.
Publish every tag in Docker (this will allow us to test behaviour that is work in progress).
When a release is done, then the new version is uploaded to Docker with the corresponding tag and pointing to latest.

fix: enforce 15MB limit of pointers

Right now, the content server has a limit of 15MB per pointer being deployed. In scene terms, it means that a scene can upload up to 15MB per occupied parcel. The problem here, is that there is a way to exploit this limitation. The limit is being calculated on files being uploaded, not on all files being used by the scene. So a scene could upload some of the files on one deployment, and then upload the rest on another deployment, and therefore circumvent the limit.

This "hack" has been abused by the community and the dcl foundation, so the first step would be to investigate the current state of the world, and figure out which and how many scenes are over said limit.

Then, we will need to make a decision on how to reinforce this limit, by making sure that the already deployed scenes remain valid, while new ones fail. This has to be properly discussed and communicated to everyone involved, because updates to already deployed scenes could fail.

A possible fix will probably require the content server to store the file size of each uploaded file in the database, to make future validations quicker.

Create API to support messaging from the Catalyst nodes.

Perform smarter bootstrap

Right now, when a content server starts, it will take the last known timestamp for each known server and ask for new deployments since that timestamp. This works pretty well for server updates & restarts, but it is not ideal for a brand new server with no deployments.

Since #154, we now have the concept of snapshots. Snapshots are basically a list of all active entities at a given point in time. We could use this information and start by syncing those active entities, and continue from there. We could also allow to optionally sync all historic deployments, but we would do that on background. With this approach, we could allow brand new servers to be up and ready to go in almost no time!

Warning: if we go with this approach, it would be more difficult to detect when two servers are out of sync. Today, by looking at the amount of known deployments (also called history size) you can tell if two servers are synced. This change would make different amounts of known deployments valid. It would be ideal to first build a good monitoring tool that can detect sync issues, regardless of whether we implement this change or not.

Create Comms Testing Infrastructure

We need a framework that allow us to create testing scenarios for the P2P network.
Having it we should be able to define, at least:

how many peers to simulate
how and when they connect to the network
which actions they perform (based on possible real actions for a peer like chat, walk, etc)
for how much time the simulation should be run
As the outcome of this simulation we should be able to retrieve stats from each peer and from the lighthouse and collect them in a centralized and durable location.
That data should be queryable so we can extract information from it.

[Memo] All signed stuff should be timestamped

Not just ephemerals, but entities as well. Otherwise, one can do replay attacks!

I was reading through:

https://github.com/decentraland/katalyst/blob/dc2cb5fcdf21df34f639aac0b295a28232f19205/crypto/src/Authenticator.ts#L79

I think in this case it's fine because all Entities have a timestamp. But I wanted to leave this issue open as a reminder.

Line 75 in f700ca5

return `${target.asString()}${metadata.timestamp}`

Could be abused if the signature for a different action would be taken.

The correct signed message should look like:

block-{type}-{entity or pointer}-{timestamp}

Problem running content server locally

Hi! I'm trying to run my own local content server. I've run:

$ yarn bazel run content:db

and it seems to be running:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS                    NAMES
d7b78c0e1fc3        postgres            "docker-entrypoint.s…"   About a minute ago   Up About a minute   0.0.0.0:5432->5432/tcp   postgres

but the content server can't connect to the database for some reason.

$ /Users/manu/git/catalyst/node_modules/.bin/bazel run content:server
INFO: Invocation ID: e9a9831f-987c-4b0e-ad0d-a57d31054306
WARNING: Download from https://mirror.bazel.build/nodejs.org/dist/v12.18.1/node-v12.18.1-darwin-x64.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
INFO: Analyzed target //content:server (1 packages loaded, 9 targets configured).
INFO: Found 1 target...
Target //content:server up-to-date:
  dist/bin/content/server_loader.js
  dist/bin/content/server.sh
INFO: Elapsed time: 0.425s, Critical Path: 0.03s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
Failed to connect to the database. Error was Error: connect ECONNREFUSED 127.0.0.1:5432
Failed to connect to the database. Error was Error: connect ECONNREFUSED 127.0.0.1:5432

Any ideas what I might be doing wrong?