dht_indexer's People
dht_indexer's Issues
Restrict the continuous tracking of indexed hashes
Description
After the initial indexing, indexed hashes are periodically updated indefinitely. While this is acceptable initially, we need to consider scenarios where the indexer has thousands of hashes to track. This continuous tracking could strain system resources, particularly on low-spec servers.
Proposal
To maintain the indexer's efficiency and keep it low-spec servers friendly, I would suggest limiting the tracking of indexed hashes with the following:
-
Limiting Tracking Sessions:
- An implementation of a limit of 5 sessions per indexed hash to free up the tracking module for future batches so the tracking process doesn't overload the system with excessive updates.
-
Modifying Tracking Module Behavior:
- Instead of initializing seeder/leecher counts as zero during the first indexing and then updating them in subsequent tracking sessions, I would suggest that we modify the tracking module to log seeder/leecher counts directly from the initial capture, as it will be better to streamline the tracking process and reduces the need for multiple updates!
Impact
These will optimize the tracking process while making sure that the system remains scalable but yet resource-efficient, especially under heavy load conditions.
Let maintain repo stability
Description
Package upgrades sometimes come with changes that could potentially break their implementation in the repo, due to factors such as alterations in some internal functions.
Proposal:
Package upgrades should be initially restricted to a separate branch to allow for changes to be tested in isolation to preserve the integrity of the main branch. Reason being, If issues emerge during testing, these can be addressed and resolved prior to merging, to maintain stability/reliability.
We should avoid tampering with the main branch. The current method of merging will cause issues in the long run.
Request:
- Commit ecfd929 should ideally be reverted.
- A staging branch should be created where initial packages updates will be introduced and tested/reviewed before merging.
Impact:
Repository maintenance best practice to maintain stability/reliability.
chore: Synergize hash finder module with indexing process
Description
Please refer to PR #2 for more context
feat: Explore dht-infohash-crawler package to extend this indexer
Summary of the current implementation
The current implementation of this indexer dynamically update the database while concurrently interfacing with the DHT protocol for decentralized discovery.
So, given a bulk of torrent hashes, it's actively listening for peer activities, distinguishing between seeders
and leechers
in real-time and then records this info into a database while also handling duplicates by updating existing records to reflect the latest seeders
and leechers
count.
Details of Tracking and Recording:
- Torrent name/title
- File names within the torrent folder
- File sizes in
bytes
in the torrent folder - Active seeders count in real-time
- Active leechers count in real-time
Potential Enhancement Area:
- Implementation of the
dht-infohash-crawler
package to enhance the indexer via the addition of the new torrents_hashes_finder module.
Additional Note:
- For prior suggestions and alternative solution, please visit the following link:
https://github.com/0xf333/dht_indexer/blob/main/example/readme.md
Improve print logs in this codebase
Description
We have multiple console.log()
statements in our codebase for printing information about a discovered infoHash and the collected files. This creates redundancy.
Example of the current format :
console.log('\n-----------------------------------------------\n');
console.log(`Discovered new infoHash:\n---> ${infoHash}\n`);
console.log('------------------------------------------------');
console.log(`${indentation(4)}>>> collected files: ${files.length} <<<`);
console.log('------------------------------------------------');
Desired Improvement
Refactor the print logs by using a single console.log()
statement instead of multiple.
Example of the desired format:
console.log(
'\n-----------------------------------------------\n'+
`\nDiscovered new infoHash:\n---> ${infoHash}\n`+
'\n------------------------------------------------\n' +
`${indentation(4)}>>> collected files: ${files.length} <<<\n` +
'------------------------------------------------'
);
Task Summary/checklist
- Only work on print logs for this ticket.
- Do not edit anything else, aside from print logs.
- Test your changes to make sure that they are working as intended before submitting your pull request.
- Tag me in your pull request so I can review it.
what ports need to be opened for indexer to work?
It only crawls the 3 from the csv. nothing more is indexed.
add web app
https://github.com/ralyodio/dhtbuzz-web
this will use dht_indexer on the server to populate the sqlite3 db. Ideally I want to switch over to surrealdb for this project because that's what our auth uses.
Follow up and project maintenance
Hi @ralyodio
Just following up on this repo for project maintenance.
Please Run This Test
- Make sure that you have the latest commit of the repo.
- Run it on a different server to avoid any confusion with any other versions of this CLI app that you might be running at the moment.
- Without making any changes or edits, and without replacing the
.db
file with an old one, let this isolated test run from scratch.
Let It Run For 24 Hours
- After the specified time, please create new GitHub tickets for any adjustments needed and I will proceed to work on these.
Please Note
- Keep each ticket specific to one thing regarding the CLI app. Instead of encompassing everything in one ticket, please maintain separation for easier tracking and resolving.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.