Code Monkey home page Code Monkey logo

file-hosting-service's Introduction

File Hosting Service

Introduction

File Hosting Service (FHS) is a marketplace for sharing file data and is part of The Graph Network's World of Data Services.

FHS is a decentralized, peer-to-peer data sharing platform designed for efficient and trust-minimised file sharing that is payments-enabled. It leverages a combination of technologies including hash commitments on IPFS for file discovery and verification, chunked data transfer and micropayments reducing trust requirements between clients and servers, and secure and efficient data transfers via HTTP2. The system is built with scalability, performance, integrity, and security in mind, aiming to create a robust market for file sharing.

Target Audience

This documentation is tailored for individuals who have a basic understanding of decentralized technologies, peer-to-peer networks, and cryptographic principles. Whether you are an indexer running various blockchain nodes looking for sharing and verifying your data, an indexer looking to launch service for a new chain, or simply a user interested in the world of decentralized file sharing, this guide aims to provide you with a clear and comprehensive understanding of how File Service operates.

Features

  • Decentralized File Sharing: FHS uses direct connections for file transfers, eliminating central points of failure.
  • IPFS Integration: Employ IPFS for efficient and reliable file discovery and content verification.
  • SHA2-256 Hashing: Ensure data integrity through robust and incremental cryptographic hashing.
  • HTTP2 and TLS: Leverage the latest web protocols for secure and efficient data transfer.

To be supported:

  • Micropayments Support: Implement a system of micropayments to facilitate fair compensation and reduce trust requirements.
  • Scalability and Performance: Designed with a focus on handling large volumes of data and high user traffic.
  • User-Friendly Interface: Intuitive design for easy navigation and operation.

More details can be found in Feature Checklist

Upgrading

The project will follow conventional semantic versioning specified here. Server will expose an endpoint for package versioning to ensure correct versions are used during exchanges.

Background Resources

You may learn background information on various components of the exchange

  1. Cryptography: SHA2-256 Generic guide, Hashed Data Structure slides

  2. Networking: HTTPS with SSL/TLS.

  3. Specifications: IPFS file storage, retrieval, and content addressing.

  4. Blockchain: World of data services, flatfiles for Ethereum, use case.

Documentation

Quickstarts and Configuring

Contributing

We welcome and appreciate your contributions! Please see the Contributor Guide, Code Of Conduct and Security Notes for this repository.

file-hosting-service's People

Contributors

chriswessels avatar cjorge-graphops avatar hopeyen avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

file-hosting-service's Issues

Clear and succinct Presentation

Prepare a presentation to share among Core developers.
Describe protocol at a high level (how it works, requirements, user personas, launch plans), get feedback and ask burning questions

Refactor: Custom service error

  • Generalize errors in subfile_exchange
  • make a SubfileExchangeError enum to replace the current anyhow::Error
  • Document potential causes/fixes in an Errors.md file

Building subfiles

For files to become subfiles we need

  • FileHasher
    • use sha2-256 as it is more commonly used, faster than sha3-256, both no known hacks (should be easy to switch)
    • Read file metadata (ask for metadata is included in a flatfile; use flatfile decoder?)
    • Chunk files to a certain size
    • Hash each chunk to build a merkle tree
    • construct an chunk_file including the hashes and metadata, publishable to IPFS
  • Subfile builder / publisher - CLI service create [subfile_name] [file_path]
    • Take a file, use FileHasher to get an IPFS hash for file source
      • later, take a list of files, use File hasher to hash all files and get IPFS hashes
    • Construct a subfile manifest with metainfo using YAML builder
    • May include a status endpoint for the "canonical publisher" for easy access, though the endpoint may change later on
    • Publish subfile to IPFS, receive a IPFS hash for the subfile
      • No on-chain activities for now
  • IPFS client
    • Connect to an IPFS gateway
    • Post files
    • Cat files
  • YAML parser and builder
    • Deserialize and serialize yaml files

Feat: Docs update

Slim down documentation, succinct, update and accurate explanations

Feat: Downloader send requests based on missing chunks

Currently, downloader loop through chunk_files and chunk indices, and attempt at most max_retry for a single chunk. This is a trivial way to handle failure but doesn't have smart guarantee of full file download when the max_retry exceeded.

New approach may be

  1. At the start of download, resolve for a map of file to a Vec of full range chunk indices. This is partially implemented in 8bd5ddf
  2. When a chunk has been successfully downloaded, remove the chunk index.
  3. Instead of looping through i in 0..(chunk_file.total_bytes / chunk_size + 1), send request based the entries in Vec<index>.
  4. A file is considered complete when the Vec is empty. When the Vec is not empty, continuously send requests until there's no available indexer endpoint.
  5. max_retry is currently used for all errors from any requests on (file, chunk, indexer_endpoint); then indexer_endpoint for the subfile is added to the blocklist and not used again. By the fact that client only run with 1 target subfile at a time, indexer will not be used for other files. Update to immediately block indexer when the verification fails; allow retry for other errors (likely a timeout).

Tracking: PoC checklist

This issue should track all the items needed for a proof of concept. Aim to finish by Dec 1st, basic testing done by Dec 20.

Start with firehose flatfiles, data verifiability guaranteed by ovc and files decoder

More updated details can be found in README.md under the PoC checklist section.

General

Minimal components

  • Standarlise subfile.yaml manifest formats (in subfile_manifest.md)
  • Basic indexer selection algorithm done in client CLI (data availability checked with each indexer endpoints)
  • Draft out GRC
  • basic documentations, usage, and architectural diagrams

Next steps

  • verifiable TAP payments on data chunks
  • verifiable committed blockchain data
  • verifiable staking for computed blockchain data
  • More intelligent indexer selection algorithms - Economics research
  • Subfile data service enabled on the staking contract

Publisher/Provider

We can expect that the provider will use an CLI to interact with the continuous service. They can create a subfile service (as a deployment unit) on-chain and/or start serving the file off-chain. We assume that the the file is accessible by the continuous service and not necessarily by the CLI.

Minimal components

  • CLI: Create a subfile and publish to IPFS
  • CLI: Start serving a particular subfile by IPFS hash
  • Service: Receive request from CLI command at admin API and do corresponding actions
  • Service: host a file availability endpoint (/status)
  • Service: start with free service for a single file, (/subfiles/:id)
    • continuously host the subfile
    • require a free_query_auth_token for service
    • take partial download request, verify validity of the request (valid range)
    • grab the chunk by the range and respond
    • support hosting multiple subfiles
  • CLI: Delete subfile from service

Next steps

  • Paid query flow: parse, validate, store, and redeem receipts with TAP
  • cost models: store a db of cost models for each torrent file according to chunk sizes, serve a dedicated route on subfile-service

Client

Clearly state the limitations of our approach that while payment is minimized to a chunk at a time, We require 1 of n (provider) trust as there is no guarantee for a subfile to be completely torrented if no indexer serves the target files.

Minimal components

Assume the client is capable of identifying the correct ipfs hash and maintain budget balance on-chain

  • CLI takes in request (ipfs_hash)
    • Indexer selection - This may live somewhere else
      • Start with an static indexer status endpoint, with a free query token
      • Ping indexer endpoints for availability
      • Construct and send requests (must be parallelizable) to indexer endpoints
    • Wait for the responses (For now, assume that the response chunks correspond with the verifiable chunks)
      • Keeps track of the downloaded and missing pieces
      • Multiple attempts for requesting missing pieces
      • Upon receiving a response, verify the chunk data in the chunk_file
        • if failed, blacklist the indexer
      • Once all chunks for a file has been received, verify the file in subfile (should be vacuously true)
    • Once all file has been received and verified, terminate client

Next steps

  • client: expose an endpoint for showing download progress
  • CLI: stop download by subfile hash
  • Payment: Read subfile manifest and construct receipts using budget and chunk sizes
  • Payment: build and send TAP receipts

Testing and Documentation

  • Conduct basic testing to ensure functionality, reliability, and security
  • Benchmark performant sensitive functions
  • Create foundational documentation outlining the usage, architecture, and known limitations.

Goals of MVP

  • Validate the feasibility and utility of the data exchange service.
  • Gather early feedback from users and identify areas for improvement.
  • Identify unforeseen challenges or limitations that arise during development.

Scope outside of MVP

  • Advanced dispute resolution mechanisms: this may live on Horizon
  • Optimal service and price matching algorithms: conduct Economics research on information markets, building on top of this paper
  • Formal verifications on data validation and integrity checks.
  • Detailed and polished user interface.

Post-MVP Developments

Once the MVP is successfully developed, tested, and validated, subsequent iterations would focus on

  • refining the existing features,
  • adding the excluded advanced features,
  • optimizing performance,
  • enhancing security,
  • and improving user experience based on feedback and requirements.

Spike: Generalize storage paths

Currently all the file paths are declared in a local path setting, but the reality is many users are using cloud storage instead of local stores.

While each individual user may specify their relative paths locally, it is important to also allow them to specify a cloud storage path.

Look into ways to support accessing files in both local and clouds
crate object_store ; readme
go library dstore written and used by streamingfast

Identify subfile data types regarding object storage versus file storage

Refactor: Split into smaller crates

Problem statement

Everything sits in one crate at the moment. As the functionality and complexity grows, the single crate might make sense to break down to different parts.

Expectation proposal

Specify a workspace structure, such as subfile-common, subfile-service, and subfile-cli

Repo checklist

Before open sourcing

  • README.md
  • Has CONTRIBUTORS.md guide
  • Has SECURITY.md
  • Has Pull Request Template
  • Has CODE_OF_CONDUCT.md
  • License (any public work must be licensed Apache 2)
  • CD to automatically build and push any artifacts (Docker Images, Helm Charts, binary blobs, etc)
  • Has automated release/changelog generation
  • Has commit hooks that enforce (via husky)
  • Has been added to admin repo public list if public
  • Has a description and tags
  • The repo has decent documentation for users

Feat: Server TUI

TUI crate choice: crossterm

Allow servers to manage subfile services in terminal. Similar to managing through admin API

Feat: failure mode - Downloader switch indexer endpoint after max_retry

The base unit of failing to download a subfile is failing to download a chunk.

Given an indexer query endpoint and a chunk bytes range, downloader currently retry a configured max_retry amount of time before adding the endpoint to a blocklist.

Downloader currently tracks a HashMap of files and chunk indices yet to download.

After trying to download from the same indexer multiple times and still have missing pieces, the downloader uses the HashMap for missing pieces and switch to a new indexer endpoint.

Feat: Basic Receipt fee

Problem statement

Previous receipt construction used a dumb constant of 1 for the fee values. Update it to be more realistic and satisfy indexer's price requirement.

Do this after porting indexer service such that the indexer serves some type of cost model (not standardlised in indexer-rs, depends on how we setup the price/cost schema).

Expectation proposal

Basic calculations

  1. Estimate a price from available indexer endpoints. get indexers' individual price posting from /cost. The most simple definition is for indexers to return $p_i$ for price per byte for indexer $i$; build a map of pricing <Indexer, Price>.
  2. When making a query chunk request, include $fee=p_i*chunk_size$
  3. On the server side, fee value should be checked against the posted price; indexer should not accept values lower than their posted price, (I'm not sure where this check is in existing indexer software, or if it checks at all)

Feat: Client wallet connection and dumb TAP payment

  • To pay for file exchanges, the client must also pass in a mnemonic/private key for connecting to their wallet.
  • Client is in charge of approve GRT spending and Escrow contract deposits
  • Client will construct a TAP receipt for each chunk request if on-chain is setup correctly, otherwise require a free query token
  • Refactor path to make paid or free query

Feat: Service allocation management

Add a CLI for the server for allocation management

  • send allocate tx to open an allocation with some tokens to an IPFS hash
  • send unallocate to close an allocation against the IPFS hash with 0x0 POI
    We are not considering indexing rewards, so always close with 0x0, and no need to consider expiring allocation lifetime.

Feat: GraphQL API service

Problem statement

To align with indexer-service, some routes should utilize GraphQL API instead of RESTful API.

  • graphql query for cost
  • graphql query for status
    ~ - [ ] graphql query and mutation for authenticated admin ~

Expectation proposal

  • set up graphql playground
  • provides output types for SubfileManifest, FileMetaInfo, ChunkFile, CostModel, ...
  • add input parameters for specific queries or filtering
  • Query and Mutation objects

Alternative considerations
Easier to do after porting to indexer-rs

Feat: File Discovery and matching across datasets

Indexer serves /status endpoint that shows the Subfile IPFS hashes. This is sufficient for matching on a subfile level, but no matching for specific files.

It is necessary to allow matching across subfiles for a specific file so that servers can more freely select the subfile IPFS without affecting the actual file availability.

Imagine server serving $subfile_a = {file_x, file_y, file_z}$. Client requests $subfile_b = {file_x}$. The current check will not match $subfile_a$ with $subfile_b$. We add an additional check (run on server/client-side, or a third party) to resolve $subfile_a$ and $subfile_b$ to set of files for matching.

 // done in the current check_availability
  1. read status from indexer_endpoints for serving_list
  2. if target_subfile is in one of the serving_list, return indexer_endpoint as available

pub fn file_availability(indexer_endpoints, target_subfile) {
  1. resolve target_subfile's vector of FileMetaInfo for all containing chunk file, represented by File hash
  2. resolve each subfile in serving_lists to get vec of FileMetaInfo
  4. for each target file, check if there is a serving_subfile's FileMetaInfo containing, record the serving indexer_endpoint
  5. if there is a target file that is not served by any indexers' subfile, immediately return unavailability as the target subfile cannot be completed
  6. return a map of file hash to serving indexer_endpoint, the serving subfile
}

When the client construct a range download request, construct request for corresponding indexer_endpoint, server subfile, and file hash

Future consideration

Consequently, it makes sense to simplify routing path subfiles/id/:subfile_hash with a header for file_hash to path files/id/:file_hash, but this means the server doesn't have to opt into a specific subfile. Consider if this makes sense from a server perspective, or add an additional configuration

Http services

A subfile server should

  • Initialize service; for one subfile, take (ipfs_hash, local_path)
    • Take a subfile IPFS hash and get the file using IPFS client
    • Parse yaml file for all the chunk_file hashes using Yaml parser, construct the subfile object
      • Take metainfo of chunk_file and search for access by the local_path
      • Verify the local version satisfy the chunk hashes
    • files added to the service availability endpoint
  • Upon receiving a service request (ipfs_hash, range, receipt)
    • Check if ipfs_hash is available
    • Check if range is valid against the subfile and the specific chunk_file
    • Valid and store receipt
    • Read in the requested chunk
    • Construct response and respond (determine if streaming is necessary)
  • Start with free service and requiring a free query auth token

Perf: Limit concurrency for parallel requests

A file may contain a large number of chunks.

To prevent overwhelming system resources, utilize tokio::sync::Semaphores to limit concurrency. A semaphore maintains a set of permits, and a task must acquire a permit from the semaphore before proceeding.

// Declare number of permits
let semaphore = Arc::new(Semaphore::new(max_concurrent_tasks));
// Before task starts
let permit = semaphore.clone().acquire_owned().await.expect("Failed to acquire semaphore permit");
// Release the permit when task finishes
drop(permit); 

Feat: graphQL client

GraphQL client queries for network subgraph to get allocation id and the corresponding indexer and deployment hash

  • also read registered indexers
    Optionally add a client to Escrow subgraph for available balances, but everything important should already be handled

Feat: Server token management

All serving files are using the same auth token, and it makes sense to use something more complicated. However, this would take attention away from payments, so this is low priority atm.

We can potentially add token management that is mutable and specific to the subfiles and the client

Segregated admin service

After porting indexer-rs, admin endpoint has been temporarily disabled due to indexer-rs traits constraints.

For security, separately set up an admin server with a different port and endpoints to manage bundles and cost models (perhaps also allocations?).

Perf: Benchmark performant sensitive functions

Basic functions to benchmark

  • read a range from a local file
  • read chunk file in local path
  • generate chunk file from a file
  • verify chunk bytes
  • verify a subfile in local path

using criterion

Deploy on cluster and test workflows

Expectation proposal

Support deployment on our cluster and test free query workflows
the file server should be

  • plugged into a s3 bucket
  • use CLI to publish files and bundles and add to the file server
  • expose service port
  • set a free query auth token

test with a different client

  • query status of the file server
  • download some bundles with free query auth token

Tracking: MVP checklist

This issue should track all the items needed for a minimal viable product.

More updated details can be found in Feature_checklist.md.

General

  • Standarlise subfile/chunkfile manifest formats #2
  • Draft out GRC #25
  • Better naming #18

User experience

  • Dataset Discovery on a marketplace
    • File Discovery and matching across datasets #19
      • Search Functionality
    • Dataset Listing from server
  • Matching Algorithm #21
    • User Preference Analysis (price per byte, response rate)
    • Transaction History Utilization

Subfile transfer

  • work with Horizon for data service interfaces
  • Server
    • port into indexer-service framework (should take care of TAP receipt handling)
    • add cost model scheme, allow updates for pricing per byte #20
  • Use generic path to be compatible with cloud storages #15

Subfile Client

  • Verifiable Payment #21
    • Take private key/mneomic for wallet connections
    • take budget for the overall subfile
      • construct receipts using budget and chunk sizes
      • add receipt to request
  • Parallelize requests #16
  • Multiple connections (HTTPS over HTTP2)
  • Continually requesting missing pieces until the complete file is obtained #22

Testing and Deployment

  • Service custom errors #23
  • Track metrics #24
  • Track code test coverage
  • Deployment Planning
    • Add deployment options (docker + binary)
    • Internal Infrastructure Support
    • Reach out to key users for early feedbacks
  • Documentation and Support
    • Support System Setup (Channel)
    • Tutorials, FAQs

Beyong MVP scope

  • Multiple hashing options/scheme; Taking file sizes and number of files, analyze performance and memory of using merkle tree vs hash list

Standarlise subfile.yaml manifest formats

Acceptance criteria:

  • Reference subgraph manifest structures
  • Take torrent file structures into account, create manifest for subfiles
  • Clear documentations
  • Implementation

Test: basic unit & e2e tests

Unit tests

  • File verification by chunks
  • Subfile verification
  • Chunk file generation
  • Publishing

E2E test

  • Basic scenario
    1. Initialize server and downloader
    2. Server serves a subfile
    3. Client request to download
    4. Check for the download result

Feat: Server cost model

To add price matching in subfile exchanges,

  1. Start with adding a CLI config --price-per-byte to ServerArgs.
    The price is not stored in a database or subfile-specific at the moment. There may be a case where the bytes in a subfile is more valuable than another, then we can consider a more complex management.

  2. Add a /cost endpoint that returns the price per byte for client/third-party price matching

  3. When receiving paid queries, parse for receipts and condition on the receipt value where $\text{price per byte} * \text{bytes range} \leq \text{receipt value}$

  4. Add a admin/cost endpoint for adjusting price_per_byte on the fly, taking a method set_price and a param price_per_byte to replace the previous price.

Additionally consider if indexer_framework can take care of cost models

Feat: Automatic escrow deposits

Problem statement

Without the gateway, we should take payment UI into careful consideration. There is a cost-availability tradeoff of depositing to multiple indexers versus minimal amount (just 1 for instance) of indexers.

Expectation proposal

  1. Fetch cost from each available indexer.
  2. Price range: $$\text{average pricing} = \frac{1}{n}\sum_{indexer_i} p_i = p_{avg}$$
    $$\text{max} = \max_{indexer_i} p_i = p_{max} \space \space \space, \space \space \space \space \text{min} = \min_{indexer_i} p_i = p_{min} $$
  3. Estimate balance required for the target. Suggest to have balance $B = \text{total bytes} * p_{avg}$, but for absolute necessary there should be $B_{min} = \text{total bytes} * p_{min}$.
  4. Check for available balance - Warn if the escrow allowance is not enough ($B &gt; allowance$); Do not proceed if $B_{min} &gt; allowance$.
  5. Automatic deposit for the target file/bundle. Take a user config of "num_download_channels" ($n$) as number of indexers to deposit towards. Order indexer by their cost and select the cheapest $n$ (later use better indexer selection such as preference for latency or geo location). For $indexer_i$, deposit $\frac{\text{total bytes}}{n}*p_i$. If a download path becomes unavailable, consider withdraw tokens and deposit new tokens to other available indexers.

Consider adding e2e testing for deposit and redeem; could involve indexer-agent for redeem, or add automatic redeem call after unallocate

Feat: Resume download

When downloading is stopped midway, relaunching the downloader client should resume download by the chunks

Options:

  • add storage of some metadata (remaining target chunks)
  • chunk hash the target subfile and identify missing chunks

Refactor: update config names

service

  • bundle -> initial_bundles (make it clear the set of bundles served can be managed through the admin endpoint)
  • price_per_byte -> default_pricing (make it clear the pricing is per byte, can be updated through admin, is unit of GRT)

Feat: Metrics for the server

Track metrics helps with managing data distribution and file handling by tracking performance and efficiency

  • Response Time: Average response time for file requests.
  • Throughput: Number of requests handled per unit of time (e.g., requests per second).
  • Error Rate: Percentage of requests resulting in errors.
  • Data Transfer Efficiency: Amount of data successfully transferred versus requested.
  • Uptime and Availability: Percentage of time the server is operational and accessible.
  • Request Distribution: Distribution of requests across different files.

Consider for the Client to have a satisfaction metrics: perceived latency, download speed, and timeouts.

Feat: deployment specific payment management in admin

Problem statement

When file-service got integrated with indexer-framework, cost mutation was deleted, but we should add it back and allow for better configurations.

Expectation proposal

  • Server tracks a map of manifest hash to prices.
  • Add mutation functions to admin endpoint to update price per byte for a deployment: set_price(deployment, price), remove_price(deployment)
  • Update the query functions such that, if specific manifests are queried (costModel(hash: ...) or costModels(hashes:...), find the specific pricing or use the default fallback. If all manifests are queried (costModels()`), only return the ones with specific pricing.

Alternative considerations
Later explore storing the prices for future sessions

Feat: Server admin API with admin token

Server API on file management

  • Add Subfile at /admin/subfiles/add: Add a subfile to the subfiles hashmap, with parameter Subfile ipfs hash and server accessible path.
  • Delete Subfile at /admin/subfiles/delete/{subfile_id}: Removes a subfile from the subfiles hashmap.
  • Subfile Statistics at /admin/subfiles/stats: Provides statistical data about the subfiles (e.g., query count, size distribution).

Optionally require an Admin token, configured by server start-up

Perf: Client parallelize requests for one file at a time

Client already make parallel requests for indexer status, make further obvious client-side improvements

  • Optimize the nested loop on calculating chunk range and making requests
  • Open file once, write multiple times
  • Make a few requests in parallel

Feat: Direct file level discovery and matching

Problem statement

There may be use cases where the users want to request transfer for a specific file, instead of a set of files. While we can wrap a single file around to make it a singleton set, it may be easier to understand from the user perspective to directly request a single file.

Expectation proposal

  • Add a new discovery method matching for a single file
  • hosting method for a single file
  • potentially add metadata fields to file schema

Alternative considerations
Update docs

Tracking: on-chain transactions

Problem statement

Missing on-chain components. The original plan was to wait for protocol v2 but we might as well go ahead exploring the options to make it compatible with v1.

Expectation proposal

Options to be compatible with protocol v1

  • Allocation based on subfiles
  • Allocation based on an IPFS file to an indexer url

Create a Transaction manager entity that

  • Allows the Publisher to deploy an IPFS hash on-chain
    - [ ] Server registers its URL
  • Add a CLI for the server for allocation management #36
    • send an allocate transaction against the IPFS hash -> Get allocation_id
    • send an close transaction against the IPFS hash with 0x POI
  • Service payment collection
    • require moving service to indexer-rs framework #30
    • send redeem tx to Escrow contracts (This should be handled by indexer-agent graphprotocol/indexer#831
  • Client wallet query payments #21
    • sends deposit transactions - identify indexers as receivers
    • Receipt Signer
    • Send receipts in query requests (chunking wrt to package sizes and budget)

Renaming

Problem statement

General File Service? or a bit more specific to File Sharing Service?

Expectation proposal

Rename to remove or replace Hosting

Additional context
Hosting isn't accurate to this data service

Feat: Downloader progress bar

Add a progress bar TUI for the downloader

Options

  • a single bar showing the number of downloaded files / total files
  • a single bar showing downloaded bytes / total bytes to be downloaded in the subfile
  • multi-bar; one bar for each file, showing number of downloaded bytes / total bytes in a file
  • multi-bar; primary bar for downloaded file / total file, secondary bar for number of downloaded bytes / total bytes in the downloading file

Potential crate: indicatif

Feat: Subfile finder

Problem statement

Subfile client is currently responsible for making discoveries, but as indexer selection algorithm grow, it makes sense to have a separate entity that handles discovery, finding, matching, etc.

Expectation proposal

Refactor the current discovery into its own struct, and have client call the struct for all things related to finding a query endpoint.

Add sufficient testing for discovery to alleviate future manual testing workload

Consumer client

We assume that the consumer runs a CLI at where the download should happen, and leave decisions to the consumer for what happens after the data has been downloaded.

The CLI should handle

  • Request (ipfs_hash, budget) from the chain after reading the subfile manifest
    • Basic client-side indexer-selection without payments
      • Read subfile manifest and construct range queries
      • Ping indexer endpoints for availability of the requested subfile
      • Construct and send requests (may be parallel) to indexer endpoints
  • Wait for the responses (For now, assume that the response chunks correspond with the verifiable chunks)
    • Keeps track of the downloaded and missing pieces,
    • attempt multiple times to download a chunk - max_retry
    • Upon receiving a response, verify the chunk data in the chunk_file
      • if failed, blacklist the indexer
    • Once all chunks for a file has been received, verify the file in subfile vacuously true
  • Once all file has been received and verified, terminate
  • Notify client (just some logs)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.