Code Monkey home page Code Monkey logo

Comments (9)

stevenengler avatar stevenengler commented on June 13, 2024

You're probably already using it (and it's included in simulations generated by tornettools), but for reference just mentioning the existence of the NumCPUs torrc option. IIRC setting it to 1 will cause tor to use 2 threads, the main thread and a worker thread. It defaults to a max of maybe 8, depending on the number of available CPUs.

from shadow.

jtracey avatar jtracey commented on June 13, 2024

Right, I think tornettools does this (or I did and forgot), but yes tor.common.torrc has NumCPUs 1. I see 3 TIDs associated with each tor PID, only one of which has any notable CPU time.

from shadow.

robgjansen avatar robgjansen commented on June 13, 2024

When's the last time someone tried to run a 100% Tor network?

I think that was us + Ian :) in Table 2 from our USENIX paper:
100%: 6,489 relays and 792k users

We don't typically run a tor+tgen for those 792k users, we instead use tornettools --process_scale=0.01 to create the necessary user load with fewer processes. Thus, I expect we would need on the order of tens of thousands to maybe a hundred thousand processes for a 100% network. Then if we factor in the threads each process is using, I think we're still below a million?

I wonder why they chose a hard upper limit for PID_MAX_LIMIT. Systems people don't like limits ;)

from shadow.

robgjansen avatar robgjansen commented on June 13, 2024

On a more serious note, many people are going to have access to smaller machines and not that many people are going to have access to giant near-super computers. So I think designing for the general case is the correct strategy for Shadow. Thus, multi-machine simulation support would be the feature I would support on the Shadow side, and it would have other benefits as well. It may allow people to utilize many small cheaper machines more effectively.

For those of us wanting to run a crazy number of simulations on one machine, the hypervisor approach could work. I never played around with the type of configuration we want, but it might be worth documenting if we figure out how to do it.

from shadow.

sporksmith avatar sporksmith commented on June 13, 2024

For the multiple-simulation use-case, I wonder if this limit is actually global or if it's per PID namespace? https://www.man7.org/linux/man-pages/man7/pid_namespaces.7.html

If the latter, then maybe putting each sim in its own pid namespace would at least be a somewhat lighter weight solution than putting them each in a full VM

from shadow.

jtracey avatar jtracey commented on June 13, 2024

On a more serious note, many people are going to have access to smaller machines and not that many people are going to have access to giant near-super computers. So I think designing for the general case is the correct strategy for Shadow. Thus, multi-machine simulation support would be the feature I would support on the Shadow side, and it would have other benefits as well. It may allow people to utilize many small cheaper machines more effectively.

Agreed, more commonly available setups should definitely be the priority. I was just discussing it with Ian, and he wanted me to make sure this limitation is documented somewhere, since it did ultimately limit the size of experiments we could run in a feasible amount of time. :)

For the multiple-simulation use-case, I wonder if this limit is actually global or if it's per PID namespace?

That's a good idea. I suspect there will still be some kernel data structure somewhere that won't allow it, but I'll try to test that and see what happens.

from shadow.

stevenengler avatar stevenengler commented on June 13, 2024

For the multiple-simulation use-case, I wonder if this limit is actually global or if it's per PID namespace? https://www.man7.org/linux/man-pages/man7/pid_namespaces.7.html

I'm not completely confident in this, but I feel that the limit would not be per-PID namespace since processes in a PID namespace still have a different PID in the original namespace. For example processes running in a container still have their own PID on the host (a different PID in the parent PID namespace), so likely still count towards the max in the original namespace. But it would still be good to test this.

pid_namespaces(7):

PID namespaces can be nested: each PID namespace has a parent, except for the initial ("root") PID namespace. The parent of a PID namespace is the PID namespace of the process that created the namespace using clone(2) or unshare(2). PID namespaces thus form a tree, with all namespaces ultimately tracing their ancestry to the root namespace.

[...]

A process has one process ID in each of the layers of the PID namespace hierarchy in which is visible, and walking back though each direct ancestor namespace through to the root PID namespace.

from shadow.

stevenengler avatar stevenengler commented on June 13, 2024

For reference, this seems to be the kernel's limit:

https://github.com/torvalds/linux/blob/b8481381d4e2549f06812eb6069198144696340c/include/linux/threads.h#L30-L35

Some places in the kernel do questionable addition, so probably best to not set it to UINT_64_MAX:

https://github.com/torvalds/linux/blob/b8481381d4e2549f06812eb6069198144696340c/kernel/cgroup/pids.c#L38

There's also a vague comment that seems to say you can't raise PIX_MAX_LIMIT past FUTEX_TID_MASK, which is 0x3fffffff (1,073,741,823):

https://github.com/torvalds/linux/blob/b8481381d4e2549f06812eb6069198144696340c/include/linux/refcount.h#L42-L43
https://github.com/torvalds/linux/blob/b8481381d4e2549f06812eb6069198144696340c/include/uapi/linux/futex.h#L165

So while it seems like you could raise the kernel's limit by recompiling, you might have much more difficulty going past a limit of ~1 billion processes/threads.

from shadow.

stevenengler avatar stevenengler commented on June 13, 2024

There's a small comment about this now in docs/system_configuration.md (added in #3350) that tries to make people aware of this limit:

The kernel has a fixed system-wide limit of 4,194,304 processes/threads. When running extremely large simulations, or when running multiple simulations in parallel, you should be aware of this limit and ensure the total number of processes/threads used by all simulations will not exceed this limit.

from shadow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.