Code Monkey home page Code Monkey logo

dht_indexer's People

Contributors

0xf333 avatar eikemenzel avatar ralyodio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dht_indexer's Issues

Restrict the continuous tracking of indexed hashes

Description

After the initial indexing, indexed hashes are periodically updated indefinitely. While this is acceptable initially, we need to consider scenarios where the indexer has thousands of hashes to track. This continuous tracking could strain system resources, particularly on low-spec servers.


Proposal

To maintain the indexer's efficiency and keep it low-spec servers friendly, I would suggest limiting the tracking of indexed hashes with the following:

  • Limiting Tracking Sessions:

    • An implementation of a limit of 5 sessions per indexed hash to free up the tracking module for future batches so the tracking process doesn't overload the system with excessive updates.
  • Modifying Tracking Module Behavior:

    • Instead of initializing seeder/leecher counts as zero during the first indexing and then updating them in subsequent tracking sessions, I would suggest that we modify the tracking module to log seeder/leecher counts directly from the initial capture, as it will be better to streamline the tracking process and reduces the need for multiple updates!

Impact

These will optimize the tracking process while making sure that the system remains scalable but yet resource-efficient, especially under heavy load conditions.

Let maintain repo stability

Description

Package upgrades sometimes come with changes that could potentially break their implementation in the repo, due to factors such as alterations in some internal functions.

Proposal:

Package upgrades should be initially restricted to a separate branch to allow for changes to be tested in isolation to preserve the integrity of the main branch. Reason being, If issues emerge during testing, these can be addressed and resolved prior to merging, to maintain stability/reliability.
We should avoid tampering with the main branch. The current method of merging will cause issues in the long run.

Request:

  • Commit ecfd929 should ideally be reverted.
  • A staging branch should be created where initial packages updates will be introduced and tested/reviewed before merging.

Impact:

Repository maintenance best practice to maintain stability/reliability.

feat: Explore dht-infohash-crawler package to extend this indexer

Summary of the current implementation

The current implementation of this indexer dynamically update the database while concurrently interfacing with the DHT protocol for decentralized discovery.
So, given a bulk of torrent hashes, it's actively listening for peer activities, distinguishing between seeders and leechers in real-time and then records this info into a database while also handling duplicates by updating existing records to reflect the latest seeders and leechers count.

Details of Tracking and Recording:

  • Torrent name/title
  • File names within the torrent folder
  • File sizes in bytes in the torrent folder
  • Active seeders count in real-time
  • Active leechers count in real-time

Potential Enhancement Area:

  • Implementation of the dht-infohash-crawler package to enhance the indexer via the addition of the new torrents_hashes_finder module.

Additional Note:

Improve print logs in this codebase

Description

We have multiple console.log() statements in our codebase for printing information about a discovered infoHash and the collected files. This creates redundancy.

Example of the current format :

console.log('\n-----------------------------------------------\n');
console.log(`Discovered new infoHash:\n---> ${infoHash}\n`);
console.log('------------------------------------------------');
console.log(`${indentation(4)}>>> collected files: ${files.length} <<<`);
console.log('------------------------------------------------');

Desired Improvement

Refactor the print logs by using a single console.log() statement instead of multiple.

Example of the desired format:

console.log(
    '\n-----------------------------------------------\n'+
    `\nDiscovered new infoHash:\n---> ${infoHash}\n`+
    '\n------------------------------------------------\n' +
    `${indentation(4)}>>> collected files: ${files.length} <<<\n` +
    '------------------------------------------------'
);

Task Summary/checklist

  • Only work on print logs for this ticket.
  • Do not edit anything else, aside from print logs.
  • Test your changes to make sure that they are working as intended before submitting your pull request.
  • Tag me in your pull request so I can review it.

Follow up and project maintenance

Hi @ralyodio
Just following up on this repo for project maintenance.

Please Run This Test

  1. Make sure that you have the latest commit of the repo.
  2. Run it on a different server to avoid any confusion with any other versions of this CLI app that you might be running at the moment.
  3. Without making any changes or edits, and without replacing the .db file with an old one, let this isolated test run from scratch.

Let It Run For 24 Hours

  • After the specified time, please create new GitHub tickets for any adjustments needed and I will proceed to work on these.

Please Note

  • Keep each ticket specific to one thing regarding the CLI app. Instead of encompassing everything in one ticket, please maintain separation for easier tracking and resolving.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.