Code Monkey home page Code Monkey logo

opendp's Introduction

OpenDP

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. License: MIT

Python R Rust

main CI nightly CI

The OpenDP Library is a modular collection of statistical algorithms that adhere to the definition of differential privacy. It can be used to build applications of privacy-preserving computations, using a number of different models of privacy. OpenDP is implemented in Rust, with bindings for easy use from Python and R.

The architecture of the OpenDP Library is based on a conceptual framework for expressing privacy-aware computations. This framework is described in the paper A Programming Framework for OpenDP.

The OpenDP Library is part of the larger OpenDP Project, a community effort to build trustworthy, open source software tools for analysis of private data. (For simplicity in these docs, when we refer to “OpenDP,” we mean just the library, not the entire project.)

Status

OpenDP is under development, and we expect to release new versions frequently, incorporating feedback and code contributions from the OpenDP Community. It's a work in progress, but it can already be used to build some applications and to prototype contributions that will expand its functionality. We welcome you to try it and look forward to feedback on the library! However, please be aware of the following limitations:

OpenDP, like all real-world software, has both known and unknown issues. If you intend to use OpenDP for a privacy-critical application, you should evaluate the impact of these issues on your use case.

More details can be found in the Limitations section of the User Guide.

Installation

Install OpenDP for Python with pip (the package installer for Python):

$ pip install opendp

Install OpenDP for R from an R session:

install.packages("opendp", repos = "https://opendp.r-universe.dev")

More information can be found in the Getting Started section of the User Guide.

Documentation

The full documentation for OpenDP is located at https://docs.opendp.org. Here are some helpful entry points:

Getting Help

If you're having problems using OpenDP, or want to submit feedback, please reach out! Here are some ways to contact us:

Contributing

OpenDP is a community effort, and we welcome your contributions to its development! If you'd like to participate, please contact us! We also have a contribution process section in the Contributor Guide.

opendp's People

Contributors

alexwhitworth avatar andrewvyrros avatar ankke avatar chikeabuah avatar christianlebeda avatar clairemckaybowen avatar ecowan avatar matchaginseng avatar mccalluc avatar michaeleliot avatar orespo avatar paulinemauryl avatar pdurbin avatar raprasad avatar shoeboxam avatar silviacasac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opendp's Issues

RowTransform Transformation -- Implementation

Implement the RowTransform() concept from the programming framework paper.

This is a Transformation constructor that takes a user-defined function and applies it to every member of a dataset.

Dependencies

Data Loading

This is a placeholder for some basic means to get data in/out of the library. Specific instances TBD.

  • Read/write CSV
  • ???

Numerical Instability Audit

We need to audit the code for privacy issues because of numerical instability. Some of this will likely happen as a result of writing proofs for components. But we should also have a system-level view of this.

Where appropriate, we have facilities for doing arbitrary-precision math with MPFR, and some of the mechanisms make use of this via the sampler abstraction.

There will probably be a lot of individual tasks for this. We might want to fork off separate issues for the different components. For now, this issue can serve as a placeholder.

FFI constructor dispatch for Metrics & Measures

We don't have a way for FFI constructors to dispatch on different metrics. Currently, this is handled in a clumsy way by having separate entry points. (E.g., opendp_trans__make_bounded_sum_l1() & opendp_trans__make_bounded_sum_l2().) This should be cleaned up, so that FFI clients can specify the Metrics/Measures they want.

Type checking for FFI functions with Measurement/Transformation args

Add type checking for FFI functions with Measurement/Transformation args.

FFI functions like make_chain_mt() don't currently validate the type of their Measurement or Transformation arguments. This is error-prone, because it's easy to supply a Transformation instead of a Measurement, or vice versa. (This was part of the problem in #36.) We should add some type checking like is done for arguments to measurement_invoke() and transformation_invoke().

The naive solution would be to embed the FfiMeasurement or FfiTransformation in an FFIObject, which has a type slot, but that probably won't be workable, because that'll capture the concrete type with all type args resolved. I suspect instead we'll want some way to look at the generic type Measurement<...> or Transformation<...>, not the concrete type.

Library User Guide

Create docs to help people developing applications that use the library.

This is a big undertaking. Here are some initial tasks (add more once we have an outline):

FFI Strategy

Strategy for exposing OpenDP Library functionality via FFI, so that bindings can be created for different languages:

  • Work out approach for constructors, combinators and invocation.
  • Define C-compatible data structures
  • Decide on support for generic types and functions
  • Specify policy around memory management
  • Prototype a few examples
  • Write design overview

Python library project structure

Organize the Python code into a rational structure for a library.

Currently, the Python wrapper code is just sitting in a bare scripts. This should be reorganized into a proper library project. Proposed layout:

opendp/
    python/
        docs/
        opendp/
            __init__.py
            opendp.py
            ...
        requirements.txt
        tests/
            ....

Library Contributor Guide

Create docs to help people developing contributions to the library.

  • Technical stuff (how the library works, how to structure code)
  • Logistical stuff (how contributions are validated)

Error Handling Strategy

Strategy for error handling in OpenDP, especially across the FFI boundary:

  • Develop general approach for signaling errors from library functions.
  • Define a way to expose this to FFI in safe manner.
  • Write some example code for how this should be applied in the library.

Inconsistent chain constructors

In the changes for #30, we messed up something, and now test.py is crashing:

/usr/local/bin/python3.8 /Users/av/Projects/opendp/python/test.py
Initialized OpenDP Library
"hello, world!"

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

The crash happens at line 38:

    everything = odp.core.make_chain_tt(composition, parse_dataframe)

Error Handling Glue -- Python

Implement the strategy in #22:

  • Implement a set of Python Exception classes to mirror the Rust error cases.
  • Write glue to check for errors returned through FFI, raise Python Exceptions.

Split combinators into separate module

The amount of code in core.rs is becoming unwieldy. It's not totally clear what's the best organization, but a first step would be to take all the combinator-related stuff (make_chain_xx, make_composition, etc) and put it into a separate top-level module. Proposed name of comb.rs.

FFI data constructors and accessors

When calling OpenDP FFI Measurements/Transformations/Relations, the system requires that values are wrapped as an FFIObject (which ensures types compatibility). We need convenience functions to construct and access these from primitive values. These will be used a lot in FFI contexts, so we should think carefully about signatures. Perhaps something like this:

pub extern "C" fn opendp_data__new_scalar(type_args: *const c_char, val: *const c_void) -> FfiResult<*mut FfiObject> ...

(This would be analogous to the existing opendp_data__from_string() & opendp_data__to_string() functions, which should be folded into this.)

This should also include convenience wrappers in Python that automatically Python objects.

Remove obsolete module data

Since we went with full-on generics everywhere, the ADT model in module data.rs is now obsolete. This needs a sanity check, but I believe that the entire module (Data, Form, Element, TraitObject, etc) can all be removed. Same for the parallel module in opendp-ffi (though opendp_data__from_string() & opendp_data__to_string() will need to live somewhere, see #39).

End-to-end Python code for all major library APIs

We need Python code that exercises all library entry points. There's a start for this in python/test.py, but it doesn't cover all constructors.

Ideally, this would take the form of an integration test we could run CI. But something that does a minimal sanity check to make sure we haven't broken any signatures would be a good start.

Rationalize types for LaplaceMechanism and GaussianMechanism

LaplaceMechanism and GaussianMechanism currently support any primitive types (including integers), which is probably not what we want. We need to rationalize this. Simplest solution would be to support f64 only, but we should think this through.

Make DistanceCast properly handle f64 -> f32

DistanceCast properly handles rounding for size change and int -> float changes. But in the corner case of f64 -> f32, it's possible that the resulting distance will be smaller.

Chaining hints

Combinator functions make_chain_mt() and make_chain_tt() need an additional argument for a hint function from the framework paper. This function chooses an intermediate distance so that relations can be chained.

  • Specify APIs
  • Implement chaining logic
  • Add forward/backward map functionality
  • Add convenience constructors using stability constant
  • Update existing operations to generate relations with maps
  • Write unit tests

FFI Macros and Utilities

Tools to make life easier and code cleaner in FFI layer:

  • Dispatch based on type parameters
  • Marshaling data
  • Memory management

Operation Design

High-level design for Measurements and Transformations from the framework paper:

  • Define Rust data structures
  • Develop strategy for constructors of specific instances
  • Prototype a few examples
  • Write design overview

Basic Composition Combinator -- Implementation

  • Implement function
  • Include comments
  • Write tests

We have a simple implementation of this, but it only accepts exactly two Measurements, and constructs a function returning a 2-tuple. Now that we have the AnyXXX facilities, it should accept an arbitrary number of Measurements, and construct a function returning Vec.

trait constructor calling convention

  • switch calling convention from individual functions to impls of constructor traits in trans.rs, meas.rs, core.rs
  • refactor trans.rs and meas.rs into folders
  • split trans.rs code between mod.rs and dataframe.rs
  • add num crate and remove OpenDPInto trait

Automated generation of FFI metadata

Create an automated mechanism that can generate FFI metadata from annotations in the code.

Currently, the Python bindings are generated from metadata describing the FFI wrappers. These metadata are contained in JSON files (bootstrap.json). This works very well, but it requires manual creation of the metadata, and duplication of information between Rust code and JSON. It'd be great to have a more robust mechanism.

This could be done with a build script in a couple of ways:

  • Parse the code and annotations in openddp-ffi to get the metadata directly.
  • Parse the code and annotations in opendp to infer the metadata. This is more work, but has better long-term potential.

This could be a first step towards fully automatic generation of everything from the core Rust functions. Issue #131 is for the fuller solution (if we get there).

Data Model Design

High-level design for data objects consumed and produced by Measurements and Transformations:

  • Enumerate common use cases
  • Define Rust data structures
  • Prototype a few examples
  • Write design overview

Error Handling Cleanup -- Rust

Implement the strategy in #22:

  • Make a pass through public APIs, add error annotations as appropriate.
  • Audit code for existing uses of unwrap(), assert(), other things that panic, convert to proper error handling.
  • Add plumbing to expose errors at FFI layer.
  • Write unit tests to check that errors are propagated correctly.

Python Bindings

Python bindings for all library APIs. These should be as close as possible to idiomatic Python code. Ideally, they would be generated automatically from metadata.

  • Calling generic functions
  • Loading data
  • Invoking operations
  • Memory management

Histogram Transformation -- Implementation

  • Implement constructor
  • Write tests
  • Add documentation

Note: This issue originally covered both category-based and stability-based histograms. In the interest of modularity, and because the proofs will presumably be separate, I've split off the stability part into #116.)

Untrusted mode

Provide a way to activate "untrusted" mode, where privacy guarantees are loosened, and more features are available. This could be used to enable things outside the strict OpenDP constraints:

  • newly contributed components that haven't been validated yet
  • user-supplied functions (e.g. row_transform)
  • permissive floating point calculations

Need to figure out the mechanics of this. Some things we could leverage:

  • Rust module(s)
  • Rust conditional compilation
  • Separate Rust crate
  • Python package flags that load different versions of the library

Facility for using callbacks implemented in client code

It'd be very nice to have a facility whereby functions implemented in client code (i.e., outside FFI, in Python) could be passed into the library and used as callbacks. This would allow us to support custom transformations and relations. (This would be available only in an explicit "unsafe" mode.)

Dependencies

Relations Design

High-level design for Privacy Relations and Stability Relations from the framework paper:

  • Define Rust data structures
  • Prototype a few examples
  • Write design overview

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.