Code Monkey home page Code Monkey logo

pythonspeed / filprofiler Goto Github PK

View Code? Open in Web Editor NEW
812.0 9.0 24.0 1.44 MB

A Python memory profiler for data processing and scientific computing applications

Home Page: https://pythonspeed.com/products/filmemoryprofiler/

License: Apache License 2.0

Emacs Lisp 0.02% Makefile 1.31% Python 36.02% C 10.72% Rust 49.73% Shell 0.48% C++ 0.16% Fortran 0.06% Jupyter Notebook 0.62% Cython 0.88%
python memory-profiler memory-profiling memory-leak memory-leaks memory-leak-detection memory-leak-finder memory memory-

filprofiler's Introduction

The Fil memory profiler for Python

Your Python code reads some data, processes it, and uses too much memory; maybe it even dies due to an out-of-memory error. In order to reduce memory usage, you first need to figure out:

  1. Where peak memory usage is, also known as the high-water mark.
  2. What code was responsible for allocating the memory that was present at that peak moment.

That's exactly what Fil will help you find. Fil an open source memory profiler designed for data processing applications written in Python, and includes native support for Jupyter. Fil runs on Linux and macOS, and supports CPython 3.7 and later.

Getting help

What users are saying

"Within minutes of using your tool, I was able to identify a major memory bottleneck that I never would have thought existed. The ability to track memory allocated via the Python interface and also C allocation is awesome, especially for my NumPy / Pandas programs."

—Derrick Kondo

"Fil has just pointed straight at the cause of a memory issue that's been costing my team tons of time and compute power. Thanks again for such an excellent tool!"

—Peter Sobot

License

Copyright 2021 Hyphenated Enterprises LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

filprofiler's People

Contributors

bast avatar dependabot[bot] avatar itamarst avatar plnech avatar pythonspeed avatar whalesalad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

filprofiler's Issues

Non-file-backed mmap() tracking

These are effectively just allocations, though presumably larger.

Unlike free(), munmap() can deallocate some of the memory.

Fil missing massive chunks of memory usage in a Python program that creates mostly just Python objects

Hoping for reproducer, but may have to make my own:

  1. Python 3.6 (running on Debian) loads 200MB CSV. Result uses 1GB of RAM, this shows up fine in Fil.
  2. The CSV rows are then loaded into a whole pile of objects. Per htop this is using another 3GB of RAM, but that 3GB doesn't show up in Fil.

What could cause this?

  1. Fil isn't being told about all allocations.
    1. Somehow some allocations from Python aren't reported.
      1. PYMALLOC env variable doesn't work completely.
      2. PYMALLOC env variable only works for some Python APIs.
    2. It's one of the unsupported APIs, e.g. posix_memalign.
  2. Fil is correctly tracking allocations, but memory is leaking.
    1. Due to #35.
    2. Memory fragmentation in the allocator due to many small allocations.
    3. free() isn't being called, or being called wrongly somehow.
  3. Fil is not tracking allocations correctly.

PyPy Support

Is PyPy support planned?
Is it currently not feasible? If so, how can we help make it feasible?

macOS support

In theory this shouldn't be hard: just use DYLD_INSERT_LIBRARIES instead of LD_PRELOAD.

Jupyter magic for memory profiling

This would require having the profiler off by default, and enabled only on demand? Also maybe want a custom kernel to make this more usable.

  • Custom kernel is available, that runs Python with Fil inserted but inactive
  • Documentation for installing custom kernel
  • Test that Fil isn't tracking when started in this mode
  • Fil can trace some code temporarily
    • Test that code is profiled
    • Test that code raising an exception is profiled
    • Test that it stops tracking when done, both memory and Python tracing
  • Jupyter magic to trace some code, and then display it
    • Test for Jupyter magic
  • Documentation for Jupyter magic
  • Test for error message when wrong kernel is used

Windows support

What needs doing:

  • Figure out how to hook APIs, whatever the equivalent of LD_PRELOAD is. this may involve compiling a new python.exe or something terrible like that.
  • Figure out what APIs to override and their semantics. There's malloc() etc. but presumably Windows has its own APIs too?

https://microsoft.github.io/mimalloc/overrides.html might be useful for understanding what to do.

Replace FunctionTemplate with a pointer to PyCodeObject

Additionally, replace line number with bytecode index, deferring line number calculation to report-generation time.

Together these changes should speed up the tracing part of the proflier.

This in progress in a branch.

Switch away from having same data structure for current and peak memory usage

Right now peak usage is kept as copy of the same data structure as current allocations. But it doesn't need to be!

Instead, we could keep the summary info, what we actually care about, a mapping of callstack ID → total allocated bytes. On new allocation, we'd increment that, on freeing allocation, we'd decrement. This mapping can just be a vector, because we have small, somewhat restricted number of different callstack IDs, and the IDs are 0, 1, 2, 3, etc.. Could be a immutable Vector, maybe, but the main point is that it'd be much smaller, so much less copying.

So we'd have three data structures:

  1. Current allocations, mapping address to (size, callstack_id). Can be normal HashMap, OrdMap, or what have you: whatever is fastest and has least memory overhead.
  2. Total current memory usage, mapping callstack_id to total size.
  3. Peak memory usage, mapping callstack_id to total size. This is a read-only snapshot of #2.

Keeping two datastructures is in theory more expensive, in practice this'll probably be faster, and reduce memory usage.

More robust fallback malloc() and friends on Linux

Right now any memory allocated during bootstrap period before Fil module is loaded will cause crash when free()ed. While unlikely, it is possible.

One solution: keep track of addresses allocated during this period and don't allow them to be free()ed. mmap()ed memory can be munmap()ed, so we need to make sure this doesn't break.

Probably better solution: since this is not a problem on macOS, only Linux, we can use the glibc-specific _malloc or __malloc or whatever it's called, that we used in early iterations of Fil.

Spyder integration

Once #12 is done, consider support for Spyrder as another scientific computing environment.

Possible to attach to running process in Linux via BPF or related technologies?

Using runtime instrumentation would allow not having to start with Fil from the very start, and would make it safe to use with production servers.

In particular, for long-running servers this would make tracking down memory leaks much easier, since it wouldn't need to be attached from the start. One would want to report not peak allocations, as Fil does by default, but rather current allocations. This is somewhat different than Fil's current use case, but it's a real problem people have.

Thanks to @jvns for the inspiration for this idea.

Related:

Try to reduce memory usage by using nested data structures

The theory: memory tracking overhead mostly matters if you have lots of small allocations. If you have lots of small allocations, they will end up in similar parts of the address space. Thus you could have a nested data structure, HashMap<MostSignificant48bitOfAddressSpace,HashMap<LeastSignificant16bitOfAddress,Allocation>> which would reduce memory overhead since it would effectively compress the addresses.

Potential problems: im's HashMap has ~600 bytes overhead, so if you have a bunch of larger allocations that will increase their overhead dramatically.

So maybe it should be 32bit/32bit, rather than 48bit/16bit, and then the amortized saving will be ~32bits per tracked allocation... unless you're doing pretty massive allocations, in which case im's per-map overhead doesn't matter.

There is also the potential CPU cost, since this will involve two hashmap lookups instead of one. For BTrees, if we switch, this overhead will perhaps matter less?

Sampling mode for memory leak detection

When detecting memory leaks, you don't need to track every allocation: the whole point is that some allocation is going to happen over and over. Sampling in this case is fine, and would reduce performance overhead.

Play around with key/value databases

Goals:

  1. Persistence could enable better UX, e.g. for crashes.
  2. Reduce memory overhead from tracking allocations.

Things to look for:

  1. Ability to create snapshots, for peak allocations
  2. Performance
  3. Memory overhead

Things to look at

  • Sanakirja: It might allow for filesystem-backed allocation tracking, and has cheap clones. Probably too slow, but worth trying at least.
  • LMDB: Long-lived transaction would be same as clone. No malloc, mmap() only.
  • RocksDB: Has snapshots.
  • Faster KV: Might not have capabilities we need.

Support a --no-browser argument

This program has a lot of potential and I'm very excited about it.

The documentation states that it "automatically starts up a browser" to display the results when it's finished, but it would be helpful to manage the use case where there is no browser running on the same system that is running fil-profile

I'm not an expert in this area, but perhaps start a web server and update the fil-results directory. We could then port-forward a browser to that server, to see the results. I tinkered with a few ideas, but as this is my first day using the program, I want to keep this as non-specific as possible. To get a sense of the art of the possible, I ran an nginx container on the same system, with a shared mount to the fil-results directory.

In my "copious" free time, which the virus has provided, I may try to get a mock-up of what I'm talking about.

Figure out if `ld --wrap` can remove need for initialization code on Linux

--wrap + --defsym might do the trick:

  1. --wrap provides __real_malloc (and explicit calls to malloc in the shared library to __wrap_malloc although in practice we don't care about this part).
  2. --defsym allows mapping malloc to e.g. __wrap_malloc.
  3. We can then define __wrap_malloc.

Or it might fail.

Reduce tracking memory usage with compressed 32bit length

Once #45 is done, it seems that switching size in Allocation struct from 64-bit to 32-bit will reduce memory overhead meaningfully.

Suggested compression scheme:

  1. If high bit is set, remaining bits indicate how many bytes.
  2. If high bit not set, remaining bits indicate how many 64KBs are used.

High bit should only be set for allocations > 16MB, say, where loss of accuracy isn't big deal. The result allows recording allocations up to 64TB, which seems sufficient.

(Numbers can be adjusted in various ways.)

Extend UX to support tracking differences

For profiling, the real usage pattern is:

  1. Run with current code.
  2. Try to fix code.
  3. Run again, figure out difference, go to step 2 if not fixed.

So the UX should support that.

E.g.

$ fil-profile run yourscript.py
... you look at visualization ...
... time passes, you try to fix something ...
$ fil-profile again
... Re-runs last command, pops up visualization of differences in memory usage from original run.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.