From the proc man page: <p dir="auto

For reference, this seems to be the kernel's limit: <a href="https:/

Simulations limited by `PID_MAX_LIMIT` about shadow HOT 9 OPEN

jtracey commented on June 13, 2024

Simulations limited by `PID_MAX_LIMIT`

from shadow.

Comments (9)

stevenengler commented on June 13, 2024

You're probably already using it (and it's included in simulations generated by tornettools), but for reference just mentioning the existence of the NumCPUs torrc option. IIRC setting it to 1 will cause tor to use 2 threads, the main thread and a worker thread. It defaults to a max of maybe 8, depending on the number of available CPUs.

from shadow.

jtracey commented on June 13, 2024

Right, I think tornettools does this (or I did and forgot), but yes tor.common.torrc has NumCPUs 1. I see 3 TIDs associated with each tor PID, only one of which has any notable CPU time.

from shadow.

robgjansen commented on June 13, 2024

When's the last time someone tried to run a 100% Tor network?

I think that was us + Ian :) in Table 2 from our USENIX paper:
100%: 6,489 relays and 792k users

We don't typically run a tor+tgen for those 792k users, we instead use tornettools --process_scale=0.01 to create the necessary user load with fewer processes. Thus, I expect we would need on the order of tens of thousands to maybe a hundred thousand processes for a 100% network. Then if we factor in the threads each process is using, I think we're still below a million?

I wonder why they chose a hard upper limit for PID_MAX_LIMIT. Systems people don't like limits ;)

from shadow.

robgjansen commented on June 13, 2024

On a more serious note, many people are going to have access to smaller machines and not that many people are going to have access to giant near-super computers. So I think designing for the general case is the correct strategy for Shadow. Thus, multi-machine simulation support would be the feature I would support on the Shadow side, and it would have other benefits as well. It may allow people to utilize many small cheaper machines more effectively.

For those of us wanting to run a crazy number of simulations on one machine, the hypervisor approach could work. I never played around with the type of configuration we want, but it might be worth documenting if we figure out how to do it.

from shadow.

sporksmith commented on June 13, 2024

For the multiple-simulation use-case, I wonder if this limit is actually global or if it's per PID namespace? https://www.man7.org/linux/man-pages/man7/pid_namespaces.7.html

If the latter, then maybe putting each sim in its own pid namespace would at least be a somewhat lighter weight solution than putting them each in a full VM

from shadow.

jtracey commented on June 13, 2024

On a more serious note, many people are going to have access to smaller machines and not that many people are going to have access to giant near-super computers. So I think designing for the general case is the correct strategy for Shadow. Thus, multi-machine simulation support would be the feature I would support on the Shadow side, and it would have other benefits as well. It may allow people to utilize many small cheaper machines more effectively.

Agreed, more commonly available setups should definitely be the priority. I was just discussing it with Ian, and he wanted me to make sure this limitation is documented somewhere, since it did ultimately limit the size of experiments we could run in a feasible amount of time. :)

For the multiple-simulation use-case, I wonder if this limit is actually global or if it's per PID namespace?

That's a good idea. I suspect there will still be some kernel data structure somewhere that won't allow it, but I'll try to test that and see what happens.

from shadow.

stevenengler commented on June 13, 2024

For the multiple-simulation use-case, I wonder if this limit is actually global or if it's per PID namespace? https://www.man7.org/linux/man-pages/man7/pid_namespaces.7.html

I'm not completely confident in this, but I feel that the limit would not be per-PID namespace since processes in a PID namespace still have a different PID in the original namespace. For example processes running in a container still have their own PID on the host (a different PID in the parent PID namespace), so likely still count towards the max in the original namespace. But it would still be good to test this.

pid_namespaces(7):

PID namespaces can be nested: each PID namespace has a parent, except for the initial ("root") PID namespace. The parent of a PID namespace is the PID namespace of the process that created the namespace using clone(2) or unshare(2). PID namespaces thus form a tree, with all namespaces ultimately tracing their ancestry to the root namespace.

[...]

A process has one process ID in each of the layers of the PID namespace hierarchy in which is visible, and walking back though each direct ancestor namespace through to the root PID namespace.

from shadow.

stevenengler commented on June 13, 2024

For reference, this seems to be the kernel's limit:

https://github.com/torvalds/linux/blob/b8481381d4e2549f06812eb6069198144696340c/include/linux/threads.h#L30-L35

Some places in the kernel do questionable addition, so probably best to not set it to UINT_64_MAX:

https://github.com/torvalds/linux/blob/b8481381d4e2549f06812eb6069198144696340c/kernel/cgroup/pids.c#L38

There's also a vague comment that seems to say you can't raise PIX_MAX_LIMIT past FUTEX_TID_MASK, which is 0x3fffffff (1,073,741,823):

https://github.com/torvalds/linux/blob/b8481381d4e2549f06812eb6069198144696340c/include/linux/refcount.h#L42-L43
https://github.com/torvalds/linux/blob/b8481381d4e2549f06812eb6069198144696340c/include/uapi/linux/futex.h#L165

So while it seems like you could raise the kernel's limit by recompiling, you might have much more difficulty going past a limit of ~1 billion processes/threads.

from shadow.

stevenengler commented on June 13, 2024

There's a small comment about this now in docs/system_configuration.md (added in #3350) that tries to make people aware of this limit:

The kernel has a fixed system-wide limit of 4,194,304 processes/threads. When running extremely large simulations, or when running multiple simulations in parallel, you should be aware of this limit and ensure the total number of processes/threads used by all simulations will not exceed this limit.

from shadow.

Simulations limited by `PID_MAX_LIMIT` about shadow HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent