It turned out the scheduler is surprisingly inefficient at loading very large lists. After some math it turns out it needs to be redesigned to allow lists of that size in one go:
On a 64bit system, even just collecting the pointers of all newlines takes a very large amount of memory:
14_000_000_000 * 8 = 112000000000 # 104.3 GiB
Three important bits we need to keep in mind for feature parity with the current system:
- due to threading we need to be able to process this list at multiple positions at once
- to measure the process, we need to know how many credentials we've processed, but we also need to know how many we have to process in total
- jobs can fail and need to be rescheduled
To support lists that large, we'd have to change the scheduler design:
generator thread
- Open the list of credentials
- Scan the whole file and count newlines
- Seek back to 0
- Start the worker threads
- Fill a size-limited mpsc queue with credentials, then block at send
- Every time a worker receives from the queue, send unblocks, and a new line can be loaded that we try to insert into the queue.
Memory-wise, this would be one of the most lightweight solutions.
offset + limit
This could be applied to dict-style runs as well:
- Skip
offset
number of attempts
- Submit
limit
number of attempts
- Ignore everything else
This would also allow resumption from aborted jobs (assuming the offset has been saved) or distributed tests (especially for dict style runs) as well.
It would be quirky to use though.
zero-copy + chunk assignment
To avoid overhead that comes from our data structures, we could just map the whole file into ram and then operate on slices. Since we need to process this list in parallel we could assign this file into chunks of a specific size and each worker is able to process this chunk individually, no synchronization needed until the end of that chunk has been reached.
This still requires enough ram to load the whole file at once.
Mutex<Cursor>
We can simply scan the file in the main thread, count the credentials, seek back to 0 and then lock the file handle in a mutex:
- lock the bufreader
- read an entry
- release the mutex
- parse the credentials and test them
This would introduce the need for an exception message to the msg loop since reading from the file might fail in a non-recoverable way.
Note that there's also some overhead by the way the threadpool currently works, which allocates some memory for each job that we want to run. While this isn't much, keep in mind that a single byte per credential would result in 14gb.
In the end, I'm not sure if tests that large are realistic and how much effort should go into this.