Comments (9)
You're probably already using it (and it's included in simulations generated by tornettools), but for reference just mentioning the existence of the NumCPUs
torrc option. IIRC setting it to 1 will cause tor to use 2 threads, the main thread and a worker thread. It defaults to a max of maybe 8, depending on the number of available CPUs.
from shadow.
Right, I think tornettools does this (or I did and forgot), but yes tor.common.torrc has NumCPUs 1
. I see 3 TIDs associated with each tor PID, only one of which has any notable CPU time.
from shadow.
When's the last time someone tried to run a 100% Tor network?
I think that was us + Ian :) in Table 2 from our USENIX paper:
100%: 6,489 relays and 792k users
We don't typically run a tor+tgen for those 792k users, we instead use tornettools --process_scale=0.01
to create the necessary user load with fewer processes. Thus, I expect we would need on the order of tens of thousands to maybe a hundred thousand processes for a 100% network. Then if we factor in the threads each process is using, I think we're still below a million?
I wonder why they chose a hard upper limit for PID_MAX_LIMIT
. Systems people don't like limits ;)
from shadow.
On a more serious note, many people are going to have access to smaller machines and not that many people are going to have access to giant near-super computers. So I think designing for the general case is the correct strategy for Shadow. Thus, multi-machine simulation support would be the feature I would support on the Shadow side, and it would have other benefits as well. It may allow people to utilize many small cheaper machines more effectively.
For those of us wanting to run a crazy number of simulations on one machine, the hypervisor approach could work. I never played around with the type of configuration we want, but it might be worth documenting if we figure out how to do it.
from shadow.
For the multiple-simulation use-case, I wonder if this limit is actually global or if it's per PID namespace? https://www.man7.org/linux/man-pages/man7/pid_namespaces.7.html
If the latter, then maybe putting each sim in its own pid namespace would at least be a somewhat lighter weight solution than putting them each in a full VM
from shadow.
On a more serious note, many people are going to have access to smaller machines and not that many people are going to have access to giant near-super computers. So I think designing for the general case is the correct strategy for Shadow. Thus, multi-machine simulation support would be the feature I would support on the Shadow side, and it would have other benefits as well. It may allow people to utilize many small cheaper machines more effectively.
Agreed, more commonly available setups should definitely be the priority. I was just discussing it with Ian, and he wanted me to make sure this limitation is documented somewhere, since it did ultimately limit the size of experiments we could run in a feasible amount of time. :)
For the multiple-simulation use-case, I wonder if this limit is actually global or if it's per PID namespace?
That's a good idea. I suspect there will still be some kernel data structure somewhere that won't allow it, but I'll try to test that and see what happens.
from shadow.
For the multiple-simulation use-case, I wonder if this limit is actually global or if it's per PID namespace? https://www.man7.org/linux/man-pages/man7/pid_namespaces.7.html
I'm not completely confident in this, but I feel that the limit would not be per-PID namespace since processes in a PID namespace still have a different PID in the original namespace. For example processes running in a container still have their own PID on the host (a different PID in the parent PID namespace), so likely still count towards the max in the original namespace. But it would still be good to test this.
pid_namespaces(7):
PID namespaces can be nested: each PID namespace has a parent, except for the initial ("root") PID namespace. The parent of a PID namespace is the PID namespace of the process that created the namespace using clone(2) or unshare(2). PID namespaces thus form a tree, with all namespaces ultimately tracing their ancestry to the root namespace.
[...]
A process has one process ID in each of the layers of the PID namespace hierarchy in which is visible, and walking back though each direct ancestor namespace through to the root PID namespace.
from shadow.
For reference, this seems to be the kernel's limit:
Some places in the kernel do questionable addition, so probably best to not set it to UINT_64_MAX
:
There's also a vague comment that seems to say you can't raise PIX_MAX_LIMIT
past FUTEX_TID_MASK
, which is 0x3fffffff
(1,073,741,823):
https://github.com/torvalds/linux/blob/b8481381d4e2549f06812eb6069198144696340c/include/linux/refcount.h#L42-L43
https://github.com/torvalds/linux/blob/b8481381d4e2549f06812eb6069198144696340c/include/uapi/linux/futex.h#L165
So while it seems like you could raise the kernel's limit by recompiling, you might have much more difficulty going past a limit of ~1 billion processes/threads.
from shadow.
There's a small comment about this now in docs/system_configuration.md
(added in #3350) that tries to make people aware of this limit:
The kernel has a fixed system-wide limit of 4,194,304 processes/threads. When running extremely large simulations, or when running multiple simulations in parallel, you should be aware of this limit and ensure the total number of processes/threads used by all simulations will not exceed this limit.
from shadow.
Related Issues (20)
- Fork test fails when perf timers are enabled
- Document that Shadow usually doesn't perform shell expansion in the config file HOT 1
- Upgrade nix version in tests
- `posix_spawn` fails starting in glibc-2.38 HOT 2
- Go tests segfault with Go 1.21 (Fedora 38 and 39) HOT 4
- Cannot include structs with bitfields in linux-api
- Partial read triggers an event in Shadow, but not Linux HOT 2
- The 'fork-linux' test is flaky
- Snowflake simulation fails to dial OR port HOT 8
- Decide policy on stub implementations (syscalls, sockopts, flags, etc) HOT 2
- "thread has no syscall_condition" warning during fork-shadow test
- crash: rootedcell "Dropped without calling `safely_drop`" HOT 9
- UDP has a weird edge-triggered EPOLLOUT behavior in Linux
- kill syscall handler incorrectly assumes pgid=pid
- Support fstat on pipes
- Implement the splice syscall HOT 5
- Consider changing config code to use figment
- Update versions of tools used in CI
- Migrate away from the nix crate HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from shadow.