Code Monkey home page Code Monkey logo

Comments (11)

mgdigital avatar mgdigital commented on June 8, 2024 3

Currently on the FAQ page we have:

You should allow roughly 50GB of disk space per 10 million torrents, which should suffice for several months of crawling, however there is no upper limit to how many torrents might ultimately be crawled.

I agree better documentation on this would be good - but at the moment things are changing at a rapid pace that will affect disk space usage, and we're just getting to the stage where people have had it running long enough to get some better numbers about the current implementation. The next thing will be rule based workflows that can auto-delete and do other things that will affect this - so maybe we should come back to this in a few months and aim to make some better docs on this when things are more stable?

from bitmagnet.

Nicolaj-H avatar Nicolaj-H commented on June 8, 2024 1

One inportant thing to note is that the size is heavily influenced by the amount of file data you store. Example: If you store 10,000 torrents and only 1 filename for each torrent, that's an additional 10,000 records, total: 20,000. Now change this to 100 files names per record and your database now has 1,010,000 records compared to the 20,000. Of course these stats of database size could be listed as the default configured file info size.

That is correct, but a estimate with the default settings would be suffices to start with, and it would give you an estimate as to what hardware you would need to at least just test and play around with it.

from bitmagnet.

mgdigital avatar mgdigital commented on June 8, 2024 1

I've checked just now and am on 67GB for 13.5 million torrents. A couple of things to bear in mind:

  • Saving content data is one reason more space is used at the start - once most of the popular stuff from TMDB is stored locally this should level off
  • Recent changes to how the files threshold works will be using more disk space by default in the latest version
  • Are we measuring the same way? I'm doing select pg_size_pretty(pg_database_size('bitmagnet'))

from bitmagnet.

leofidus avatar leofidus commented on June 8, 2024 1

To add another data point, I have 9 228 000 torrents with a total of 291 283 000 files, stored in 145 GB, using the config option DHT_CRAWLER_SAVE_FILES_THRESHOLD=500000 (to ensure file information is stored even on excessively large torrents, default cutoff is to store at most 100 files per torrent).

This means I have 31 files per torrent on average, over twice what kde99 got above. The largest torrent in my database contains 10870 files. 4.5% of torrents exceed the default DHT_CRAWLER_SAVE_FILES_THRESHOLD of 100 files.

Average size per torrent is correspondingly a bit larger at 16KB/torrent, or 535 bytes per file.

I agree that disk throughput is a much bigger factor. If you are using cheap consumer SSDs you also really feel the wear Bitmagnet puts on the disk. If I'm interpreting my disk stats correctly Bittorrent has written a total of about 180TB in service of creating this 145GB database.

from bitmagnet.

Nicolaj-H avatar Nicolaj-H commented on June 8, 2024
  • I have checked the existing issues to avoid duplicates
  • I have redacted any info hashes and content metadata from any logs or screenshots attached to this issue

Is your feature request related to a problem? Please describe

The documentation should mention the expected disk space requirements when the DHT crawler is enabled, relative to the number of torrents indexed, since this is by far the most demanding system requirement, and as it is dictated by factors outside the control of the user (total size of the bittorrent DHT... is there an estimate for this somewhere?)

Describe the solution you'd like

Document on https://bitmagnet.io/faq.html a few examples of DB sizes relative to the number of torrents, for example:

Disk space used by the database depends on the total number of indexed torrents and will tend to grow indefinitely. For example, these are the requirements you should expect for:

  • 50k indexed torrents: ???GB
  • 143k indexed torrents: 2.5GB
  • 1 million indexed torrents: ???GB

The value of 2.5GB for 143k torrents is from a measurement on my test instance. This puts the average size of a torrent at ~18KB. It would be interesting to see the numbers from instances with a lower/higher number of indexed torrents, and use that as an estimate.

I will report a separate issue about a potential setting to hard-limit the DB size.

Describe alternatives you've considered

Documenting the expected disk space requirements related to total run time, since the number of indexed torrents depends on the total time spent crawling.

Additional context

Somewhat related to #70 which would help keep the database size in check.

At 884K indexed my Postgres data/base directory is at 6.7G

from bitmagnet.

DyonR avatar DyonR commented on June 8, 2024

One inportant thing to note is that the size is heavily influenced by the amount of file data you store.
Example:
If you store 10,000 torrents and only 1 filename for each torrent, that's an additional 10,000 records, total: 20,000. Now change this to 100 files names per record and your database now has 1,010,000 records compared to the 20,000.
Of course these stats of database size could be listed as the default configured file info size.

from bitmagnet.

nodiscc avatar nodiscc commented on June 8, 2024

At 884K indexed my Postgres data/base directory is at 6.7G

This puts the average torrent size at 6,7ร—1024ร—1024รท228000 = ~30KiB vs my estimated 18KiB

that the size is heavily influenced by the amount of file data you store

Yes, hence the need to use averages which are useful for estimation. People with a higher number of indexed torrents should have averages closer to reality/less bias. It would be interesting to compare between databses with about the same number of torrents.

there is no upper limit to how many torrents might ultimately be crawled.

That was my guess, hence #187

https://bitmagnet.io/faq.html#what-are-the-system-requirements-for-bitmagnet

I did not see this section, it was right under my nose /facepalm, however

roughly 50GB of disk space per 10 million torrents

This puts the average torrent size at 5.2KiB... why so much difference between our 3 measurements? I think more samples are needed.

from bitmagnet.

kde99 avatar kde99 commented on June 8, 2024

For me:

  • Total size 4145 MB
  • 528 570 torrents
  • 7 086 417 files

Meaning:

  • 8kb average per torrent
  • 13.4 file per torrent

Though I feel that disk IO throughput is more a limiting factor than disk size when you use HDDs. Had a DB much bigger and was struggling to keep up writes.

bitmagnet=# select pg_size_pretty(pg_database_size('bitmagnet'));
 pg_size_pretty
----------------
 4145 MB
(1 row)

bitmagnet=# select count(*) from torrents;
 count
--------
 528570
(1 row)

bitmagnet=# select count(*) from torrent_files;
  count
---------
 7086417
(1 row)

from bitmagnet.

Aaron2550 avatar Aaron2550 commented on June 8, 2024

I'm at 78 GB for 7.059.136 Torrents

from bitmagnet.

nodiscc avatar nodiscc commented on June 8, 2024

more space is used at the start - once most of the popular stuff from TMDB is stored locally this should level off

I did not think about that, there is some database space used for TMDB data

Are we measuring the same way?

I was relying on netdata postgresql bd size monitoring, but it's consistent with the results I get from select pg_size_pretty(pg_database_size('bitmagnet'))

Thanks everyone for the metrics, I will start a table below and update it every time someone posts their db stats. After a while it could be added to the documentation, hopefully.

number of torrents db size (GB) average per torrent (KB) notes
143 000 2.5 17
528 000 4.1 7.8
884 000 6.7 7.6
7 059 136 78 11
9 228 000 145 7.6 DHT_CRAWLER_SAVE_FILES_THRESHOLD=500000
13 500 000 67 5

from bitmagnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.