Code Monkey home page Code Monkey logo

abi3audit's People

Contributors

dependabot[bot] avatar jaimergp avatar nicholasjng avatar ret2libc avatar woodruffw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

abi3audit's Issues

Pick a better Mach-O parser

Right now we vendor the Kaitai Struct project's generated Mach-O parser, which has some disadvantages:

  • It only supports "single-arch" Mach-Os, which means we have to do some nasty hacking to parse "fat" Mach-Os
  • It has a dependency on Kaitai's ASN.1 parser, for components we don't require. This means vendoring more code than strictly necessary
  • It introduces a runtime dependency on kaitaistruct for parsing support

We really only need to parse the symbol table(s), so we could probably get away with a tiny Mach-O parser that only does that.

(Question) Categorizing a Py_XDECREF ABI violation report

Thank you for making this package! I have a question:

I'm currently building a Bazel support project for nanobind, a C++ project for generating lean, fast Python bindings of C++ code. It provides stable ABI support starting from Python 3.12, and supports generating as-small-as-possible bindings by setting the -Os optimization flag.

I've been implementing build flags and options to match its default CMake config as closely as possible, with (what I thought) the last step being defaulting size optimizations (read: supplying -Os to clang/gcc) to True.

I test on a small example bindings project, and use abi3audit to check Python 3.12 wheels for stable ABI violations. Windows and MacOS are green, but Linux has since generated reports of the Py_XDECREF symbol (which is a preprocessor macro apparently?) being contained in the wheel, which it flags as an ABI violation.

The C++ code I'm building into an extension is the easiest it can be:

#include <nanobind/nanobind.h>

namespace nb = nanobind;

using namespace nb::literals;

NB_MODULE(nanobind_example_ext, m) {
    m.def("add", [](int a, int b) { return a + b; }, "a"_a, "b"_a);
}

i.e., it doesn't even import Python headers, so at most this is an internal nanobind thing. I'm not too familiar with its internals, but I was confused because it only happens reproducibly on Linux with -Os enabled.

From what I've read in a quick Internet search, this seems to inhibit inlining on gcc, which might be relevant with some of these refcounting APIs being static inline. What is your opinion on this?

Related threads and links:

  • #82 (though that's about the leading-underscore _Py_XDECREF API)
  • #55 (about inlined reference counting symbols)
  • My GitHub CI logs.

Cache HTTP requests

...and opportunistically re-use pip's cache, if possible.

This should help with repeated audits of the same wheel(s) from PyPI, especially when debugging.

Tests: 100% test coverage

We should have 100% test coverage, with the CLI and vendored components excluded.

This should be enforced in the CI.

CLI: Warn or notify when an input is skipped

  • When an input is a single shared object, we should probably fail outright if we can't audit it.
  • When an input is a wheel, we should fail if it isn't an abi3 wheel.
    • If it is an abi3 wheel, we need to figure out what to do if the individual shared object(s) are not tagged as abi3 -- is that a failure, or just a warning? Probably just a warning, because a wheel could have vendored objects that are not intended to be Python extensions (e.g., a vendored copy of libcurl).
  • When an input is a package specification, we should probably fail if none of the distributions for the package are abi3 wheels.

Support Conda-style packaging?

It might be nice to support Conda-style packaging directly. I don't know too much about Conda, so some questions that will need to be answered:

  1. Does Conda have any notion of abi3 compatibility, or a way to signal which Python version a package is abi3 compatible with?
  2. Does Conda's normal package tooling include the .abi3. infix on shared objects?

Use exit code to communicate failure

I might be missing something dumb, but this would be useful to run abi3audit automatically, e.g. as suggested in pypa/cibuildwheel#1342

~/Downloads λ pipx run abi3audit refl1d-0.8.13-cp32-abi3-macosx_10_14_x86_64.whl --strict
[11:59:50] 💁 refl1d-0.8.13-cp32-abi3-macosx_10_14_x86_64.whl: 1 extensions scanned; 1 ABI version mismatches and 0 ABI violations found                           
~/Downloads λ echo $?                    
0
~/Downloads λ pipx run abi3audit hdbcli-2.13.13-cp34-abi3-manylinux1_x86_64.whl --strict 
[11:59:59] 💁 hdbcli-2.13.13-cp34-abi3-manylinux1_x86_64.whl: 1 extensions scanned; 0 ABI version mismatches and 2 ABI violations found                            
~/Downloads λ echo $?
0

It might also make sense to have the strict flag be the default — at the very least you'll get more bug reports ;-)

Question: Non-wheel audits

Hello! I read your blog post and was amazed by the amount of resources that could be saved in organizations like conda-forge if abi3 was supported by default in compatible cases.

I would like to ask some questions to assess how easy it would be run an audit like the one done with PyPI, but for conda-forge (we have an upper bound of ~2k projects that could be audited):

  • In this issue, you mention that support for non-abi3 wheels is not available (yet?). One concern is that a no-errors report might give the impression that wheels can be retagged as abi3 with no further changes. However, I guess that rebuilding from source with abi3 mode on would indeed work, right?
  • Following on that issue, is it possible to audit arbitrary collections of shared objects, even if they are not part of a wheel? This would enable us to run the tool on extracted conda packages.
  • What's the story for Windows? Are DLLs supported? If not, would it be easy to implement? nvm, I see support for DLLs here, awesome!

Thank you so much for this work and the fascinating story in the blog post, really enjoyed that read!

Potential false positive (compliant but flagged as non-compliant)

Background: Wrapping a decently sized c++ utility library using Swig-4.2 (including its stable abi option) to generate python wrappers which subsequently provide features to user facing python classes.

Tools: python 3.8.18 (via conda) and gcc 11.4.0 under Ubuntu22.04 on wsl

% python -m abi3audit foo.whl -v --report
[11:18:59] 👎 foo.whl: _foo.cpython-38-x86_64-linux-gnu.so has non-ABI3 symbols
           ┏━━━━━━━━━━━━━┓
           ┃ Symbol      ┃
           ┡━━━━━━━━━━━━━┩
           │ _Py_XDECREF │
           └─────────────┘
           💁 foo-cp38-abi3-manylinux_2_34_x86_64.whl: 1 extensions scanned; 0 ABI version mismatches and 1 ABI violations found
  {"specs": {"wheelhouse/foo-cp38-abi3-manylinux_2_34_x86_64.whl": {"kind": "wheel", "wheel": [{"name": "_foo-38-x86_64-linux-gnu.so", "result": {"is_abi3": false, "is_abi3_baseline_compatible": true, "baseline": "3.8", "computed": "3.8", "non_abi3_sy
  mbols": ["_Py_XDECREF"], "future_abi3_objects": {}}}]}}}

From what I have read, which is hardly extensive, Py_XDECREF is a macro which expands to a non-null check and Py_DECREF which seems to have always been abi3.

Please forgive me if there is something I am misunderstanding about any part of this process, I have only recently jumped into wrapping an established code with python bindings and am filling in a lot of blanks on the fly. Any information on this would be appreciated.

Do a better job of filtering Mach-O symbol tables

Symbol tables in Mach-Os can contain all kinds of junk, including symbolic debug entries and entries without actual names. This can result in a lot of unnecessary work, since we end up scanning symbols that don't actually correspond to the CPython ABI being linked against.

The solution here is to filter the symbol table, and only audit symbols that correspond to function or data entries and that are marked as "undefined" (meaning external, not local).

Don't audit a shared object if it doesn't look like a Python extension

Not every shared object in a wheel is a Python extension. For example, a wheel might vendor a copy of a dependency so that the target host does not have to supply it. Tools like auditwheel automate this dependency vendoring.

As such, abi3audit should avoid false positives by limiting its search to only those shared objects that are actually native Python extensions. These can be identified by the following rules:

  1. They have a filename like foo(.abi3)?.{so,pyd}
  2. They have an initialization function with a public symbol of PyInit_foo or PyInitU_foo, where the latter is the PEP 489 format for unicode module names (foo is punycoded with - replaced with _)

In other words: if a shared object does not satisfy these conditions, then it is not a Python extension and should be skipped by abi3audit. In addition to eliminating potential false positives, this will speed up auditing a bit by excluding unnecessary files.

Add an option not to fail on version mismatch

As mentioned in the README:

abi3audit considers the abi3 version when a symbol was stabilized, not introduced. In other words: abi3audit will produce a warning when an abi3-cp36 extension contains a function stabilized in 3.7, even if that function was introduced in 3.6. This is not a false positive (it is an ABI version mismatch), but it's generally not a source of bugs.

A symbol might become marked as ABI stable in 3.y but had its ABI unchanged since 3.x with x < y
While strictly speaking, this is an ABI3 version mismatch and hence, not a false positive, it would be nice to have the option not to return with exit code 1 for such mismatch.

but it's generally not a source of bugs.

If the wheel are correctly tested, it will never be a source of bugs and allows a broader range of python versions to be supported with a single wheel.

Inlined functions from CPython API

Thanks for the project, seems like a very useful tool and it's nice to have it to check things over.

I notice in the README you mention that abi3audit also checks symbols for inlined non-exported functions (giving _Py_DECREF) as an example. I think I've run into some issues with those checks.

In particular, I think there may be an issue with the tool flagging some functions that are part of the limited API and implemented as static inline functions vs. those which are imported from CPython and part of the stable ABI (and explicitly listed in the manifest).

I wonder if the checks looking at symtab might be too strict?

For example, I've tested this on a limited ABI wheel, that sets the Py_LIMITED_API macro. If I disable optimizations (to keep functions from being inlined) I see errors for:

  • _Py_INCREF
  • _Py_DECREF
  • _Py_IS_TYPE

which, from what I can tell, are implemented in Python 3.10 as static inline functions (and were in 3.8 as well)

It does seem like CPython anticipates these being used in limited API modules even if they aren't in the manifest.

A bit of digging on those trying to double check:

_Py_INCREF and _Py_DECREF have preprocessor conditions internally depending on the limited API state in Python 3.10. The Py_{INC,DEC}REF macros are also mentioned in PEP683 as a consideration for implementing immortal objects and the _Py_{INC,DEC}REF are part of their inlined implementations.

The _Py_IS_TYPE comes from a few places: such as a use of PyCapsule_CheckExact (where the *_Check macros are mentioned in PEP 384) and PyObject_TypeCheck (which is implemented with Py_IS_TYPE and PyType_IsSubtype which is part of the limited ABI). Py_IS_TYPE also gets special limited API handling on the current main branch.

Thanks again!

`abi3audit` crashes on a numpy manylinux2014 wheel

Interesting project, thanks for releasing it!

I wanted to try this on a numpy wheel, and it's not happy, see traceback below. I'm not sure if this is off-label usage or not - I know it's not an abi3 wheel, however abi3audit seemed to me to be a nice way of listing all symbols used that are not in the limited API.

$ abi3audit numpy-1.23.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/rgommers/mambaforge/envs/dev/bin/abi3audit:8 in <module>                                   │
│                                                                                                  │
│   5 from abi3audit._cli import main                                                              │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(main())                                                                         │
│   9                                                                                              │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ __annotations__ = {}                                                                         │ │
│ │    __builtins__ = <module 'builtins' (built-in)>                                             │ │
│ │      __cached__ = None                                                                       │ │
│ │         __doc__ = None                                                                       │ │
│ │        __file__ = '/home/rgommers/mambaforge/envs/dev/bin/abi3audit'                         │ │
│ │      __loader__ = <_frozen_importlib_external.SourceFileLoader object at 0x7ff50701d9c0>     │ │
│ │        __name__ = '__main__'                                                                 │ │
│ │     __package__ = None                                                                       │ │
│ │        __spec__ = None                                                                       │ │
│ │            main = <function main at 0x7ff5054f4ca0>                                          │ │
│ │              re = <module 're' from                                                          │ │
│ │                   '/home/rgommers/mambaforge/envs/dev/lib/python3.10/re.py'>                 │ │
│ │             sys = <module 'sys' (built-in)>                                                  │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/rgommers/mambaforge/envs/dev/lib/python3.10/site-packages/abi3audit/_cli.py:215 in main    │
│                                                                                                  │
│   212 │   │   │   │   │   │   sys.exit(1)                                                        │
│   213 │   │   │   │   │   continue                                                               │
│   214 │   │   │   │                                                                              │
│ ❱ 215 │   │   │   │   results.add(extractor, so, result)                                         │
│   216 │   │   │   │   if not result and args.verbose:                                            │
│   217 │   │   │   │   │   console.log(result)                                                    │
│   218                                                                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │      args = Namespace(specs=['numpy-1.23.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_… │ │
│ │             debug=False, verbose=False, report=False, output=<_io.TextIOWrapper              │ │
│ │             name='<stdout>' mode='w' encoding='utf-8'>, strict=False)                        │ │
│ │ extractor = <abi3audit._extract.WheelExtractor object at 0x7ff5054dee00>                     │ │
│ │    parser = ArgumentParser(prog='abi3audit', usage=None, description='Scans Python           │ │
│ │             extensions for abi3 violations and inconsistencies', formatter_class=<class      │ │
│ │             'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)              │ │
│ │    result = AuditResult(                                                                     │ │
│ │             │   so=<abi3audit._object._So object at 0x7ff5054df250>,                         │ │
│ │             │   baseline=None,                                                               │ │
│ │             │   computed=None,                                                               │ │
│ │             │   non_abi3_symbols=set(),                                                      │ │
│ │             │   future_abi3_objects=set()                                                    │ │
│ │             )                                                                                │ │
│ │   results = <abi3audit._cli.SpecResults object at 0x7ff5054dee60>                            │ │
│ │        so = <abi3audit._object._So object at 0x7ff5054df250>                                 │ │
│ │      spec = 'numpy-1.23.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl'        │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/rgommers/mambaforge/envs/dev/lib/python3.10/site-packages/abi3audit/_cli.py:68 in add      │
│                                                                                                  │
│    65 │   def add(self, extractor: Extractor, so: SharedObject, result: AuditResult) -> None:    │
│    66 │   │   self._results[extractor].append(result)                                            │
│    67 │   │                                                                                      │
│ ❱  68 │   │   if result.computed > result.baseline:                                              │
│    69 │   │   │   self._bad_abi3_version_counts[so] += 1                                         │
│    70 │   │                                                                                      │
│    71 │   │   self._abi3_violation_counts[so] += len(result.non_abi3_symbols)                    │
│                                                                                                  │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮                     │
│ │ extractor = <abi3audit._extract.WheelExtractor object at 0x7ff5054dee00> │                     │
│ │    result = AuditResult(                                                 │                     │
│ │             │   so=<abi3audit._object._So object at 0x7ff5054df250>,     │                     │
│ │             │   baseline=None,                                           │                     │
│ │             │   computed=None,                                           │                     │
│ │             │   non_abi3_symbols=set(),                                  │                     │
│ │             │   future_abi3_objects=set()                                │                     │
│ │             )                                                            │                     │
│ │      self = <abi3audit._cli.SpecResults object at 0x7ff5054dee60>        │                     │
│ │        so = <abi3audit._object._So object at 0x7ff5054df250>             │                     │
│ ╰──────────────────────────────────────────────────────────────────────────╯                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'

CLI: JSON report format

The abi3audit CLI should support some kind of machine-readable output format, which should include:

  • Each input that was audited (nested appropriately, e.g. package name > wheel name > shared object name)
  • The audit results for each input, including:
    • The "baseline" ABI, inferred from the wheel or shared object filename(s)
    • The "computed" ABI, derived from the list of symbols present in the shared object
    • The set of abi3 violating symbols, i.e. the symbols present that are not standardized by any specific abi3 version
    • The set of version violating symbols, i.e. the symbols present that are abi3 but are not compatible with the "baseline" ABI
    • Other things?

Parallelize auditing?

I'm not sure if this is a good idea yet.

When dealing with lots of specs (especially full Python package histories), auditing is pretty slow (since it's entirely serial). It doesn't need to be this way, since auditing is embarassingly parallel (each step is entirely independent).

The only real obstacles here are UI/UX ones: if we break auditing up into a pool of threads or processes, we'll want to make sure that the current output and progress bars remain about the same (or get nicer).

Reduce FPs when handling `static inline` functions

See #55, #82, and #83.

abi3audit currently considers all symbols when auditing. This is generally correct, but results in false positives when a static inline function (such as Py_XDECREF) doesn't get inlined, but instead remains as a local/private symbol.

The cause below this is nuanced: static inline functions like Py_XDECREF are part of the limited API but not the stable ABI; they're expected to be inlined into code that is part of the stable ABI. In other words, static inline functions are referentially opaque: their expansion is compatible with the stable ABI, but function identifiers themselves are not.

In practice this is a non-issue, and abi3audit should not flag local-only symbols for static inline functions.

To do this, the audit phase probably needs two things:

  1. A list of static inline functions to ignore
  2. Each Symbol's visibility, to know whether to ignore it

For (1), we can just start with Py_XDECREF. For (2), I think we'll need to extend the abi3info Symbol model to include a visibility: Visibility | None attribute, which will need to be populated as appropriate from each supported symbol table/object file.

CC @nicholasjng, who helped triage this and has graciously offered to help out 🙂

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.