trailofbits / abi3audit Goto Github PK
View Code? Open in Web Editor NEWScans Python packages for abi3 violations and inconsistencies
Home Page: https://pypi.org/project/abi3audit
License: MIT License
Scans Python packages for abi3 violations and inconsistencies
Home Page: https://pypi.org/project/abi3audit
License: MIT License
Right now we vendor the Kaitai Struct project's generated Mach-O parser, which has some disadvantages:
kaitaistruct
for parsing supportWe really only need to parse the symbol table(s), so we could probably get away with a tiny Mach-O parser that only does that.
Thank you for making this package! I have a question:
I'm currently building a Bazel support project for nanobind, a C++ project for generating lean, fast Python bindings of C++ code. It provides stable ABI support starting from Python 3.12, and supports generating as-small-as-possible bindings by setting the -Os
optimization flag.
I've been implementing build flags and options to match its default CMake config as closely as possible, with (what I thought) the last step being defaulting size optimizations (read: supplying -Os
to clang/gcc) to True.
I test on a small example bindings project, and use abi3audit
to check Python 3.12 wheels for stable ABI violations. Windows and MacOS are green, but Linux has since generated reports of the Py_XDECREF
symbol (which is a preprocessor macro apparently?) being contained in the wheel, which it flags as an ABI violation.
The C++ code I'm building into an extension is the easiest it can be:
#include <nanobind/nanobind.h>
namespace nb = nanobind;
using namespace nb::literals;
NB_MODULE(nanobind_example_ext, m) {
m.def("add", [](int a, int b) { return a + b; }, "a"_a, "b"_a);
}
i.e., it doesn't even import Python headers, so at most this is an internal nanobind thing. I'm not too familiar with its internals, but I was confused because it only happens reproducibly on Linux with -Os
enabled.
From what I've read in a quick Internet search, this seems to inhibit inlining on gcc, which might be relevant with some of these refcounting APIs being static inline
. What is your opinion on this?
Related threads and links:
_Py_XDECREF
API)...and opportunistically re-use pip
's cache, if possible.
This should help with repeated audits of the same wheel(s) from PyPI, especially when debugging.
We should have 100% test coverage, with the CLI and vendored components excluded.
This should be enforced in the CI.
libcurl
).All APIs should be documented and enforced by interrogate
.
This should also be enforced in CI.
It might be nice to support Conda-style packaging directly. I don't know too much about Conda, so some questions that will need to be answered:
.abi3.
infix on shared objects?I might be missing something dumb, but this would be useful to run abi3audit automatically, e.g. as suggested in pypa/cibuildwheel#1342
~/Downloads λ pipx run abi3audit refl1d-0.8.13-cp32-abi3-macosx_10_14_x86_64.whl --strict
[11:59:50] 💁 refl1d-0.8.13-cp32-abi3-macosx_10_14_x86_64.whl: 1 extensions scanned; 1 ABI version mismatches and 0 ABI violations found
~/Downloads λ echo $?
0
~/Downloads λ pipx run abi3audit hdbcli-2.13.13-cp34-abi3-manylinux1_x86_64.whl --strict
[11:59:59] 💁 hdbcli-2.13.13-cp34-abi3-manylinux1_x86_64.whl: 1 extensions scanned; 0 ABI version mismatches and 2 ABI violations found
~/Downloads λ echo $?
0
It might also make sense to have the strict flag be the default — at the very least you'll get more bug reports ;-)
Hello! I read your blog post and was amazed by the amount of resources that could be saved in organizations like conda-forge if abi3
was supported by default in compatible cases.
I would like to ask some questions to assess how easy it would be run an audit like the one done with PyPI, but for conda-forge (we have an upper bound of ~2k projects that could be audited):
Thank you so much for this work and the fascinating story in the blog post, really enjoyed that read!
Background: Wrapping a decently sized c++ utility library using Swig-4.2 (including its stable abi option) to generate python wrappers which subsequently provide features to user facing python classes.
Tools: python 3.8.18 (via conda) and gcc 11.4.0 under Ubuntu22.04 on wsl
% python -m abi3audit foo.whl -v --report
[11:18:59] 👎 foo.whl: _foo.cpython-38-x86_64-linux-gnu.so has non-ABI3 symbols
┏━━━━━━━━━━━━━┓
┃ Symbol ┃
┡━━━━━━━━━━━━━┩
│ _Py_XDECREF │
└─────────────┘
💁 foo-cp38-abi3-manylinux_2_34_x86_64.whl: 1 extensions scanned; 0 ABI version mismatches and 1 ABI violations found
{"specs": {"wheelhouse/foo-cp38-abi3-manylinux_2_34_x86_64.whl": {"kind": "wheel", "wheel": [{"name": "_foo-38-x86_64-linux-gnu.so", "result": {"is_abi3": false, "is_abi3_baseline_compatible": true, "baseline": "3.8", "computed": "3.8", "non_abi3_sy
mbols": ["_Py_XDECREF"], "future_abi3_objects": {}}}]}}}
From what I have read, which is hardly extensive, Py_XDECREF
is a macro which expands to a non-null check and Py_DECREF
which seems to have always been abi3.
Please forgive me if there is something I am misunderstanding about any part of this process, I have only recently jumped into wrapping an established code with python bindings and am filling in a lot of blanks on the fly. Any information on this would be appreciated.
Symbol tables in Mach-Os can contain all kinds of junk, including symbolic debug entries and entries without actual names. This can result in a lot of unnecessary work, since we end up scanning symbols that don't actually correspond to the CPython ABI being linked against.
The solution here is to filter the symbol table, and only audit symbols that correspond to function or data entries and that are marked as "undefined" (meaning external, not local).
Not every shared object in a wheel is a Python extension. For example, a wheel might vendor a copy of a dependency so that the target host does not have to supply it. Tools like auditwheel
automate this dependency vendoring.
As such, abi3audit
should avoid false positives by limiting its search to only those shared objects that are actually native Python extensions. These can be identified by the following rules:
foo(.abi3)?.{so,pyd}
PyInit_foo
or PyInitU_foo
, where the latter is the PEP 489 format for unicode module names (foo
is punycoded with -
replaced with _
)In other words: if a shared object does not satisfy these conditions, then it is not a Python extension and should be skipped by abi3audit
. In addition to eliminating potential false positives, this will speed up auditing a bit by excluding unnecessary files.
As mentioned in the README:
abi3audit considers the abi3 version when a symbol was stabilized, not introduced. In other words: abi3audit will produce a warning when an abi3-cp36 extension contains a function stabilized in 3.7, even if that function was introduced in 3.6. This is not a false positive (it is an ABI version mismatch), but it's generally not a source of bugs.
A symbol might become marked as ABI stable in 3.y but had its ABI unchanged since 3.x with x < y
While strictly speaking, this is an ABI3 version mismatch and hence, not a false positive, it would be nice to have the option not to return with exit code 1 for such mismatch.
but it's generally not a source of bugs.
If the wheel are correctly tested, it will never be a source of bugs and allows a broader range of python versions to be supported with a single wheel.
Thanks for the project, seems like a very useful tool and it's nice to have it to check things over.
I notice in the README you mention that abi3audit also checks symbols for inlined non-exported functions (giving _Py_DECREF
) as an example. I think I've run into some issues with those checks.
In particular, I think there may be an issue with the tool flagging some functions that are part of the limited API and implemented as static inline functions vs. those which are imported from CPython and part of the stable ABI (and explicitly listed in the manifest).
I wonder if the checks looking at symtab
might be too strict?
For example, I've tested this on a limited ABI wheel, that sets the Py_LIMITED_API
macro. If I disable optimizations (to keep functions from being inlined) I see errors for:
_Py_INCREF
_Py_DECREF
_Py_IS_TYPE
which, from what I can tell, are implemented in Python 3.10 as static inline functions (and were in 3.8 as well)
It does seem like CPython anticipates these being used in limited API modules even if they aren't in the manifest.
A bit of digging on those trying to double check:
_Py_INCREF
and _Py_DECREF
have preprocessor conditions internally depending on the limited API state in Python 3.10. The Py_{INC,DEC}REF
macros are also mentioned in PEP683 as a consideration for implementing immortal objects and the _Py_{INC,DEC}REF
are part of their inlined implementations.
The _Py_IS_TYPE
comes from a few places: such as a use of PyCapsule_CheckExact
(where the *_Check
macros are mentioned in PEP 384) and PyObject_TypeCheck
(which is implemented with Py_IS_TYPE
and PyType_IsSubtype
which is part of the limited ABI). Py_IS_TYPE
also gets special limited API handling on the current main branch.
Thanks again!
Interesting project, thanks for releasing it!
I wanted to try this on a numpy
wheel, and it's not happy, see traceback below. I'm not sure if this is off-label usage or not - I know it's not an abi3
wheel, however abi3audit
seemed to me to be a nice way of listing all symbols used that are not in the limited API.
$ abi3audit numpy-1.23.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/rgommers/mambaforge/envs/dev/bin/abi3audit:8 in <module> │
│ │
│ 5 from abi3audit._cli import main │
│ 6 if __name__ == '__main__': │
│ 7 │ sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) │
│ ❱ 8 │ sys.exit(main()) │
│ 9 │
│ │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ __annotations__ = {} │ │
│ │ __builtins__ = <module 'builtins' (built-in)> │ │
│ │ __cached__ = None │ │
│ │ __doc__ = None │ │
│ │ __file__ = '/home/rgommers/mambaforge/envs/dev/bin/abi3audit' │ │
│ │ __loader__ = <_frozen_importlib_external.SourceFileLoader object at 0x7ff50701d9c0> │ │
│ │ __name__ = '__main__' │ │
│ │ __package__ = None │ │
│ │ __spec__ = None │ │
│ │ main = <function main at 0x7ff5054f4ca0> │ │
│ │ re = <module 're' from │ │
│ │ '/home/rgommers/mambaforge/envs/dev/lib/python3.10/re.py'> │ │
│ │ sys = <module 'sys' (built-in)> │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /home/rgommers/mambaforge/envs/dev/lib/python3.10/site-packages/abi3audit/_cli.py:215 in main │
│ │
│ 212 │ │ │ │ │ │ sys.exit(1) │
│ 213 │ │ │ │ │ continue │
│ 214 │ │ │ │ │
│ ❱ 215 │ │ │ │ results.add(extractor, so, result) │
│ 216 │ │ │ │ if not result and args.verbose: │
│ 217 │ │ │ │ │ console.log(result) │
│ 218 │
│ │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ args = Namespace(specs=['numpy-1.23.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_… │ │
│ │ debug=False, verbose=False, report=False, output=<_io.TextIOWrapper │ │
│ │ name='<stdout>' mode='w' encoding='utf-8'>, strict=False) │ │
│ │ extractor = <abi3audit._extract.WheelExtractor object at 0x7ff5054dee00> │ │
│ │ parser = ArgumentParser(prog='abi3audit', usage=None, description='Scans Python │ │
│ │ extensions for abi3 violations and inconsistencies', formatter_class=<class │ │
│ │ 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True) │ │
│ │ result = AuditResult( │ │
│ │ │ so=<abi3audit._object._So object at 0x7ff5054df250>, │ │
│ │ │ baseline=None, │ │
│ │ │ computed=None, │ │
│ │ │ non_abi3_symbols=set(), │ │
│ │ │ future_abi3_objects=set() │ │
│ │ ) │ │
│ │ results = <abi3audit._cli.SpecResults object at 0x7ff5054dee60> │ │
│ │ so = <abi3audit._object._So object at 0x7ff5054df250> │ │
│ │ spec = 'numpy-1.23.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl' │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /home/rgommers/mambaforge/envs/dev/lib/python3.10/site-packages/abi3audit/_cli.py:68 in add │
│ │
│ 65 │ def add(self, extractor: Extractor, so: SharedObject, result: AuditResult) -> None: │
│ 66 │ │ self._results[extractor].append(result) │
│ 67 │ │ │
│ ❱ 68 │ │ if result.computed > result.baseline: │
│ 69 │ │ │ self._bad_abi3_version_counts[so] += 1 │
│ 70 │ │ │
│ 71 │ │ self._abi3_violation_counts[so] += len(result.non_abi3_symbols) │
│ │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │ extractor = <abi3audit._extract.WheelExtractor object at 0x7ff5054dee00> │ │
│ │ result = AuditResult( │ │
│ │ │ so=<abi3audit._object._So object at 0x7ff5054df250>, │ │
│ │ │ baseline=None, │ │
│ │ │ computed=None, │ │
│ │ │ non_abi3_symbols=set(), │ │
│ │ │ future_abi3_objects=set() │ │
│ │ ) │ │
│ │ self = <abi3audit._cli.SpecResults object at 0x7ff5054dee60> │ │
│ │ so = <abi3audit._object._So object at 0x7ff5054df250> │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'
The abi3audit
CLI should support some kind of machine-readable output format, which should include:
package name > wheel name > shared object name
)I'm not sure if this is a good idea yet.
When dealing with lots of specs (especially full Python package histories), auditing is pretty slow (since it's entirely serial). It doesn't need to be this way, since auditing is embarassingly parallel (each step is entirely independent).
The only real obstacles here are UI/UX ones: if we break auditing up into a pool of threads or processes, we'll want to make sure that the current output and progress bars remain about the same (or get nicer).
abi3audit
currently considers all symbols when auditing. This is generally correct, but results in false positives when a static inline
function (such as Py_XDECREF
) doesn't get inlined, but instead remains as a local/private symbol.
The cause below this is nuanced: static inline
functions like Py_XDECREF
are part of the limited API but not the stable ABI; they're expected to be inlined into code that is part of the stable ABI. In other words, static inline
functions are referentially opaque: their expansion is compatible with the stable ABI, but function identifiers themselves are not.
In practice this is a non-issue, and abi3audit
should not flag local-only symbols for static inline
functions.
To do this, the audit
phase probably needs two things:
static inline
functions to ignoreSymbol
's visibility, to know whether to ignore itFor (1), we can just start with Py_XDECREF
. For (2), I think we'll need to extend the abi3info
Symbol
model to include a visibility: Visibility | None
attribute, which will need to be populated as appropriate from each supported symbol table/object file.
CC @nicholasjng, who helped triage this and has graciously offered to help out 🙂
We should be a good CLI citizen and support NO_COLOR
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.