Comments (15)
There may be a gross inefficiency with -X jit-enable-jit-list-wildcards.
With that option removed, the overhead is only 0.5% (still high, but 4x better).
Yikes. We don't use that option in production thus far, but that's useful to know. I've gone ahead and opened a new issue for that: #30, I see you've opened #29 for this, thank you for digging in.
from cinder.
While I agree it's not intuitive this behavior is actually intentional. The setup we have internally is our web-server goes through a warm-up process where it tries to import all the known hot Python modules. These may not necessarily be executed at this stage but when we hit cinderjit.disable()
this goes and compiles all the loaded functions which are on our JIT-list. We execute cinderjit.disable()
right before we start forking worker processes because we don't want JIT to happen in the workers as there is currently no way to share further JIT work between processes. Letting the workers independently JIT would cause excessive memory usage and wasted CPU cycles.
Note that cinderjit.disable()
only inhibits further JIT compilation, functions which have already been JIT can still be executed in their JIT compiled form. Also, JIT compilation happens lazily as allowed functions are executed (not loaded), or on cinderjit.disable()
.
While I don't think there's a way to do what you want right now, I have some changes in flight which may help. I'll see if I can get those released in the next week.
from cinder.
Thank you. Would it not just be a matter of having an alternate flag (JIT_START_DISABLED) and in that case _PyJIT_Initialize()
will leave jit_config.is_enabled
at 0?
If so I'm happy to try making a PR.
from cinder.
Hey, sorry it's taking me a while to get back to you. Like I said I have something unpublished I think will help but sands have shifted under me and it's proving harder to get this out than I'd thought.
What you're suggesting may work (if it's that easy, maybe try it out for yourself and see how it does?). My concern is right now "disabling" the JIT is more nuanced than it appears. The only two things we've really tried are never enabling JIT, or enabling JIT from start-up and then "disabling" it once (which is not quite what it sounds like from my earlier comment). I'm not sure how well enabling it part way through would work, which is I think what you're describing would require. At the very least you'd need to go through and review all the call-sites that check to see if the JIT is enabled and make sure they support this new operation.
The work I mentioned I have in mind allows you to add entries to the JIT list at runtime from managed code. This way you could start with JIT enabled but an empty JIT list so nothing actually happens, and then you could lazily or otherwise use decorators to dynamically add things to the list and force them to compile.
If you do decide to play with this yourself you'll probably find the -X jit-debug
option useful as this will let you see what functions are getting compiled (along with lots of other noise). You'll probably also want to use some of the functions in the cinderjit
module (search for jit_methods
in pyjit.cpp
).
FWIW, the reason we use an externally specified list is we populate this based on functions we are know are hot based on data collected from execution in production. In 0764b2e I added some features you could use to do the same thing (although it's a bit of a nuisance to use right now which is what the other things I mentioned would help fix).
from cinder.
At the very least you'd need to go through and review all the call-sites that check to see if the JIT is enabled and make sure they support this new operation.
This part confused me-- maybe I still don't understand what "enable JIT" implies. cinderjit.enable()
and disable()
set jit_config.is_enabled
internally. Isn't that flag already honored by the call sites? I was thinking that init could start with jit_config.is_enabled = 0
, and then rely on force_compile()
to select functions at runtime.
Lines 964 to 970 in 8664502
Background on where I'm at with Cinder: out of the box our app is seeing a nice speedup of 12% vs. python 3.7. (I haven't checked if any of that is attributable to python 3.8 or building with gcc 8.x vs. 6.x.) The app is heavily async so I suspect a lot comes from the optimization to avoid StopIteration that is upstreamed to future Python. (On the other hand, per Trio policy, we always have one yield in async functions so the other optimization to elide coroutines doesn't help us.)
Beyond that though, I haven't been able to produce any real gains from the JIT. For sure keeping JIT enabled is not viable, because it gets bogged down with the numerous lambdas and closures. And it's not clear which functions would give us a win if jitted (a sampling profiler wouldn't help much here).
The instruction counter feature sounds promising, I'll check it out.
from cinder.
@jbower-fb from my attempts, -X jit-capture-interp-cost
hasn't worked out too well:
The counts seem to include the cost of children in the call stack, which isn't what you'd want for identifying functions to JIT. Notably, it seems to include everything from a yield
-- for example, @contextmanager
functions with simply a yield in the body have high cost. And since async relies heavily on yield, the top items by instruction count are mostly meaningless in my heavily async app.
Even when discounting the above issue, and locating functions with high count that don't yield or explicitly call other functions, I have not been able to come up with a win from the JIT. After enabling a small number of functions via jit-list, my program always ends up being a few percent slower (even though the jitted functions themselves might run 25 - 50% faster). It may be that cinderjit just being enabled has some overhead that I haven't been able to overcome.
from cinder.
The counts seem to include the cost of children in the call stack,
Interesting, that's not what should happen and I wonder how you're seeing that? I wrote a small example and it seems to give the results I would expect - counts are attributed non-cumulatively to functions which perform execution:
$ cat simple.py
import cinder
import contextlib
import pprint
async def y():
class DummyGenerator:
def __await__(self): return iter([])
for _ in range(100): pass
await DummyGenerator()
async def x(): await y()
# Clear out existing data for a clean base-line
cinder.get_and_clear_code_interp_cost()
try:
x().send(None)
except StopIteration: pass
pprint.pprint(cinder.get_and_clear_code_interp_cost())
@contextlib.contextmanager
def a(): yield 2
def use_cms():
with a():
for _ in range(100): pass
cinder.get_and_clear_code_interp_cost()
use_cms()
pprint.pprint(cinder.get_and_clear_code_interp_cost())
$ ./python -X jit-capture-interp-cost ./simple.py
{'DummyGenerator@./simple.py:6': 10,
'x@./simple.py:11': 5,
'y.<locals>.DummyGenerator.__await__@./simple.py:7': 4,
'y@./simple.py:5': 320}
{'_GeneratorContextManager.__enter__@/data/users/jbower/cinder2/Lib/contextlib.py:108': 13,
'_GeneratorContextManager.__exit__@/data/users/jbower/cinder2/Lib/contextlib.py:117': 19,
'_GeneratorContextManagerBase.__init__@/data/users/jbower/cinder2/Lib/contextlib.py:82': 37,
'a@./simple.py:20': 5,
'contextmanager.<locals>.helper@/data/users/jbower/cinder2/Lib/contextlib.py:238': 6,
'use_cms@./simple.py:23': 316}
As I would expect most of the cost in the above is in y()
and use_cms()
, and not attributed to x()
, a()
as I think you're finding? Can you provide a repro of some kind?
After enabling a small number of functions via jit-list, my program always ends up being a few percent slower (even though the jitted functions themselves might run 25 - 50% faster). It may be that cinderjit just being enabled has some overhead that I haven't been able to overcome.
The cost of compiling JIT functions is really quite high (indeed, excruciating for a debug build). If the overall time spent executing is not very significant, e.g. just a few 10s of seconds, I would not be surprised to see a net-loss. Is your test target long-running, or can its runtime be extended (e.g. running its core in an artificial loop)? If it's still running slower after that it may be you're hitting some poorly optimized JIT situations. For now you'd need to prune the JIT-list more, although this should get better over the next couple of months. So far this year we focused on JIT "coverage" which means making as many things work at any cost, now we're focused on tightening those up.
from cinder.
The accounting problem may be specific to async the async library we're using (Trio)-- I'll try to give a specific repro.
The cost of compiling JIT functions is really quite high (indeed, excruciating for a debug build.
I mean there seems to be overhead independent of JIT compiling. For example, with an empty jit-list, our app still seems to slow down by 1 - 2%.
from cinder.
update:
- after looking a little closer, I couldn't find a problem with
-X jit-capture-interp-cost
accounting - I attempted enabling various combinations of the top 20 or 30 items in the interp cost list, and couldn't achieve an overall CPU gain over non-jit
- as mentioned, I see a 1 - 2% slowdown in our app even with an empty jit list. I imagine cinderjit has to intercept every call and compare it against the list, and that may be a lot of overhead for an application with many short calls.
- I'm exploring
--static
, as that seems like the path to accumulate enough speedup to offset the overhead of jit being enabled - I have a single-module test with some numeric functions like bezier and inverse-bezier, suitable for experimenting with
--static
- however lack of primitive double support (#26 and #27) is blocking me from getting the expected 4x or 8x gains from static + jit (based on similar tests I've done with integer functions). Without using primitives on these functions,
--static
+ jit is not giving a speedup vs. plain jit.
from cinder.
as mentioned, I see a 1 - 2% slowdown in our app even with an empty jit list. I imagine cinderjit has to intercept every call and compare it against the list, and that may be a lot of overhead for an application with many short calls.
The overhead should only come on the first call to a given function (see PyEntry_LazyInit()
and friends which swap around the value of func->vectorcall
in ceval.c
). So, this shouldn't be very much unless you're invoking a large number of new functions throughout the program's execution.
Shooting in the dark a bit, does -X jit-no-type-slots
help? If not perhaps a perf
run could shed some light on things.
from cinder.
I didn't notice a change from -X jit-no-type-slots
.
this shouldn't be very much unless you're invoking a large number of new functions throughout the program's execution.
I would say our app is heavy on nested functions, closures, and lambdas.
from cinder.
I would say our app is heavy on nested functions, closures, and lambdas.
Hmm, that may actually have an impact. After my comment yesterday I spoke to a colleague and we discussed that nested functions etc. go down the PyEntry_LazyInit()
path every time the containing function is entered. Again, I think perf
may help confirm that theory.
from cinder.
I don't know how to use perf
offhand, but I fumbled around with it using perf top -g -p ...
and then filtering on PyEntry. It seems to confirm the 2% overhead when running our production application with an empty jit list.
from cinder.
There may be a gross inefficiency with -X jit-enable-jit-list-wildcards
.
With that option removed, the overhead is only 0.5% (still high, but 4x better).
empty jit list (wildcards disabled):
empty jit list, -X jit-enable-jit-list-wildcards
:
from cinder.
For what it's worth, we now have -X jit-auto=N
, where the JIT will automatically compile functions after N calls. It must be used in conjunction with -X jit
.
from cinder.
Related Issues (20)
- Type errors in deltablue_static HOT 3
- Build error: C++20 ambiguous & forming offset 4 is out of the bounds HOT 1
- fannkuch benchmark & implicit returns
- type aliases no longer work? HOT 2
- Error compiling with `--with-lto` -- undefined reference to `vtable for strictmod::objects::StrictExceptionObject`
- Cannot install insightface by python-cinder HOT 2
- RuntimeError: Failed to initialize CinderX HOT 3
- could not find symtab HOT 4
- take infinite loops into account in type narrowing HOT 5
- A test in `test_static` should catch `AssertionError` rather than `TypeError` HOT 4
- add README for benchmarks HOT 7
- questions about CheckedDict and unknown types HOT 5
- Benchmark Cinder vs. Cpython performance; improve pipeline and performance of the Cinder build HOT 15
- 15 failing tests fail when running `make test` on a Mac HOT 1
- build error: listnode.gcda missing-profile HOT 3
- Cinder vs. Python 3.11 HOT 9
- can't compare double to Literal HOT 1
- nonsensical TypeError from "-X jit -m compile --static" HOT 3
- No pattern found for opcode Fadd: Xxr HOT 2
- HPy support HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cinder.