Code Monkey home page Code Monkey logo

pydp's Introduction

Tests Version License

PyDP

In today's data-driven world, more and more researchers and data scientists use machine learning to create better models or more innovative solutions for a better future.

These models often tend to handle sensitive or personal data, which can cause privacy issues. For example, some AI models can memorize details about the data they've been trained on and could potentially leak these details later on.

To help measure sensitive data leakage and reduce the possibility of it happening, there is a mathematical framework called differential privacy.

In 2020, OpenMined created a Python wrapper for Google's Differential Privacy project called PyDP. The library provides a set of ฮต-differentially private algorithms, which can be used to produce aggregate statistics over numeric data sets containing private or sensitive information. Therefore, with PyDP you can control the privacy guarantee and accuracy of your model written in Python.

Things to remember about PyDP:

  • ๐Ÿš€ Features differentially private algorithms including: BoundedMean, BoundedSum, Max, Count Above, Percentile, Min, Median, etc.
    • All the computation methods mentioned above use Laplace noise only (other noise mechanisms will be added soon! ๐Ÿ˜ƒ)
  • ๐Ÿ”ฅ Compatible with all three types of Operating Systems - Linux, macOS, and Windows ๐Ÿ˜ƒ
  • โญ Use Python 3.x.

Installation

To install PyDP, use the PyPI package manager:

pip install python-dp

(If you have pip3 separately for Python 3.x, use pip3 install python-dp.)

Examples

Refer to the curated list of tutorials and sample code to learn more about the PyDP library.

You can also get started with an introduction to PyDP (a Jupyter notebook) and the carrots demo (a Python file).

Example: calculate the Bounded Mean

# Import PyDP
import pydp as dp
# Import the Bounded Mean algorithm
from pydp.algorithms.laplacian import BoundedMean

# Calculate the Bounded Mean
# Basic Structure: `BoundedMean(epsilon: float, lower_bound: Union[int, float, None], upper_bound: Union[int, float, None])`
# `epsilon`: a Double, between 0 and 1, denoting the privacy threshold,
#            measures the acceptable loss of privacy (with 0 meaning no loss is acceptable)
x = BoundedMean(epsilon=0.6, lower_bound=1, upper_bound=10)

# If the lower and upper bounds are not specified,
# PyDP automatically calculates these bounds
# x = BoundedMean(epsilon: float)
x = BoundedMean(0.6)

# Calculate the result
# Currently supported data types are integers and floats
# Future versions will support additional data types
# (Refer to https://github.com/OpenMined/PyDP/blob/dev/examples/carrots.py)
x.quick_result(input_data: list)

Learning Resources

Go to resources to learn more about differential privacy.

Support and Community on Slack

If you have questions about the PyDP library, join OpenMined's Slack and check the #lib_pydp channel. To follow the code source changes, join #code_dp_python.

Contributing

To contribute to the PyDP project, read the guidelines.

Pull requests are welcome. If you want to introduce major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

Apache License 2.0

pydp's People

Contributors

8bitmp3 avatar alejandrosame avatar benjamindev avatar brendanschell avatar cereallarceny avatar chinmayshah99 avatar codeboy5 avatar divyanshugit avatar dnabanita7 avatar dvadym avatar festusdrakon avatar frank-7 avatar gilangrilhami avatar hellomynameisjiji avatar iamtrask avatar jabertuhin avatar jandremarais avatar jeamick avatar leclair-7 avatar levzlotnik avatar madhavajay avatar mhosankalp avatar nileshpant1999 avatar paulkarikari avatar replomancer avatar shaistha24 avatar shivaylamba avatar simcof avatar systemshift avatar vishalsubbiah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pydp's Issues

Add Support for Windows

Expected Behavior

Add a script to generate a .dll file as part of build process. Also, edits needs to made in setup.py to include this generated dll file. This dll file will be generated using bazel build.

Current Behavior

Right now, pydp only supports Linux systems. This is due to the fact that we are just generating the .so file as seen here.

Logs

>>> import pydp as dp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\lib\site-packages\pydp\__init__.py", line 1, in <module>
    from .pydp import *
ModuleNotFoundError: No module named 'pydp.pydp'

System information

  • PyDP version: 0.1.0dev
  • Operating System: Any windows system

Carrots demo

Bind enough of the library to execute the carrots demo found in the examples folder of googles DP library

Running docker build fails

Expected Behavior

Building a docker image from the Dockerfile should not require changes on the host machine.

Current Behavior

Currently, building a docker image from the Dockerfile fails and it's required to run:

git submodule init
git submodule update

Steps to Reproduce Behavior

From the project directory run:
sudo docker build -t pydb .

Logs

grep: ./third_party/pybind11_bazel/python_configure.bzl: No such file or directory
sed: can't read ./third_party/pybind11_bazel/python_configure.bzl: No such file or directory
sed: can't read ./third_party/pybind11_bazel/python_configure.bzl: No such file or directory
sed: can't read ./third_party/pybind11_bazel/python_configure.bzl: No such file or directory
sed: can't read ./third_party/pybind11_bazel/python_configure.bzl: No such file or directory
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
Loading: 
Loading: 0 packages loaded
ERROR: error loading package '': Label '//third_party/pybind11_bazel:python_configure.bzl' is invalid because 'third_party/pybind11_bazel' is not a package; perhaps you meant to put the colon here: '//third_party:pybind11_bazel/python_configure.bzl'?
ERROR: error loading package '': Label '//third_party/pybind11_bazel:python_configure.bzl' is invalid because 'third_party/pybind11_bazel' is not a package; perhaps you meant to put the colon here: '//third_party:pybind11_bazel/python_configure.bzl'?
INFO: Elapsed time: 3.884s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
FAILED: Build did NOT complete successfully (0 packages loaded)
cp: cannot stat './bazel-bin/src/bindings/pydp.so': No such file or directory
cp: cannot stat './bazel-bin/src/bindings/pydp.so': No such file or directory

Integration of PyDP with TensorFlow

Expected Behavior

The PyDP package should be easily usable in a typical TensorFlow based project

Current Behavior

No integration

Steps to Reproduce Behavior

N/A - no integrations yet

Logs

N/A - no integrations yet

System information

  • PyDP version: current
  • Python version: 3
  • Bazel version: as per current PyDP release
  • Operating System: as per current PyDP release

base\status

elements to bind

  • Status constructor
  • GetPayload
  • SetPayload
  • Erase Payload
  • StatusCode (scoped enumerator)
  • StatusCodeToString
  • Status operator
  • StatusCode operator

Remove old submodules

The following submodules need to be purged from .git config cache:
/PyBind11 (it has been moved to third_party/PyBind11)
/diffferential-privacy (it has been moved to third_party/differential-privacy)
/third_party/abseil-cpp (this is now configured as a bazel git repository)
/third_party/boringssl (not needed anymore)
/third_party/protobuf (this is now configured as a bazel git repository)

Docker image doesn't build

Expected Behavior

docker build -t pydp:test .

should build the image from Dockerfile.

Current Behavior

Due to package name change the building process ends with:

adding 'python_dp-0.1.0.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel
WARNING: Requirement 'dist/pydp-0.1.0-py2.py3-none-any.whl' looks like a filename, but the file does not exist
Processing ./dist/pydp-0.1.0-py2.py3-none-any.whl
ERROR: Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/root/PyDP/dist/pydp-0.1.0-py2.py3-none-any.whl'
The command '/bin/sh -c bash build_PyDP.sh &&     python3 setup.py sdist bdist_wheel &&     pip install dist/pydp-0.1.0-py2.py3-none-any.whl &&     pip install -r requirements_dev.txt' returned a non-zero code: 1

After changing the package path to dist/python_dp-0.1.0-py2.py3-none-any.whl in Dockerfile I get another error:

Sending build context to Docker daemon  2.168MB
Step 1/12 : FROM python:3.6-slim-buster
 ---> 8bf54e6af8e1
Step 2/12 : ENV HOME /root
 ---> Using cache
 ---> 5d67399a725e
Step 3/12 : ENV PATH "/root/bin:${PATH}"
 ---> Using cache
 ---> a0ec8b25b647
Step 4/12 : WORKDIR /root
 ---> Using cache
 ---> daa89cb03d18
Step 5/12 : RUN     apt-get update &&     apt-get -y install software-properties-common     sudo     wget     unzip     gcc     g++     build-essential     python3-distutils     pkg-config     zip     zlib1g-dev     git &&     wget https://github.com/bazelbuild/bazel/releases/download/2.1.0/bazel-2.1.0-installer-linux-x86_64.sh &&     chmod +x bazel-2.1.0-installer-linux-x86_64.sh &&     ./bazel-2.1.0-installer-linux-x86_64.sh --user &&     export PATH="$PATH:$HOME/bin" &&     rm bazel-2.1.0-installer-linux-x86_64.sh
 ---> Using cache
 ---> 8b0d41e5433a
Step 6/12 : WORKDIR /tmp/third_party
 ---> Using cache
 ---> e1f2ccd7e9e9
Step 7/12 : RUN git clone https://github.com/google/differential-privacy.git &&     git clone https://github.com/pybind/pybind11_bazel.git
 ---> Using cache
 ---> dccd66076d9c
Step 8/12 : WORKDIR /root/PyDP
 ---> Using cache
 ---> 479c29e85c61
Step 9/12 : COPY . /root/PyDP
 ---> Using cache
 ---> a9dd0ef391a7
Step 10/12 : RUN cp -r /tmp/third_party/* /root/PyDP/third_party
 ---> Running in f42db9ebf3a1
cp: cannot overwrite non-directory '/root/PyDP/third_party/differential-privacy/.git' with directory '/tmp/third_party/differential-privacy/.git'
cp: cannot overwrite non-directory '/root/PyDP/third_party/pybind11_bazel/.git' with directory '/tmp/third_party/pybind11_bazel/.git'
The command '/bin/sh -c cp -r /tmp/third_party/* /root/PyDP/third_party' returned a non-zero code: 1

Workaround:

docker build --no-cache -t pydp:test .

Steps to Reproduce Behavior

How can we reproduce this?

docker build -t pydp:test .

System information

  • PyDP version: commit 2c1d36b
  • Python version: (Docker base image is python:3.6-slim-buster)
  • Bazel version: (Bazel 2.1.0 is installed during image building)
  • Operating System: (Docker base image is python:3.6-slim-buster)

Add template for common tasks

We need a template to show how to handle structures that are through out the PyDP lib. This will help us accelerate through the wrapping process

e.g. scoped enums, C++ templates etc...

Deployment to pypi

Expected Behavior

Package should be available on pypi (e.g. pip install pypi)

Current Behavior

Package has not been deployed

Steps to Reproduce Behavior

N/A

Logs

N/A

System information

  • PyDP version: all

pydp scenarios

Expected Behavior

A set of scenarios that describe how users will consume the pydp library. This should include the full data lifecycle (prepare, explore, compute, output etc...)

These scenarios will be places in the docs folder and will be the basis of the examples that will be built to accompany the library.

Current Behavior

No scenarios are described.

Steps to Reproduce Behavior

Look for scenarios in the docs folder! (they aren't there)

Logs

N/A

System information

  • PyDP version: N/A
  • Python version: N/A
  • Bazel version: N/A
  • Operating System: N/A

Add Dockerfile

Since the environment setup is very complex, using Docker would be the easiest way to setup.
This will boost cross-platform development too, especially among our friends using Windows!

Configure packaging for pypl

Currently we can build the wheel for pydp but cannot access the pydp package once installed. We need to learn how to bundle the output from pybind11 into the python packaging system

Failing test: module 'pydp' has no attribute 'status_code_to_string'

When I run make test-all I see 10 passing tests and 1 fail:

base/test_logging.py .....                                                                                                                                                                               [ 45%]
base/test_status.py ..F...                                                                                                                                                                               [100%]

=================================================================================================== FAILURES ===================================================================================================
________________________________________________________________________________________ TestStatus.test_code_to_string ________________________________________________________________________________________

self = <tests.base.test_status.TestStatus object at 0x7f438b720588>

>   ???
E   AttributeError: module 'pydp' has no attribute 'status_code_to_string'

/home/lukasz/src/OpenMind/PyDP/tests/base/test_status.py:39: AttributeError

Steps to Reproduce Behavior

Run make test-all.

System information

  • PyDP version: commit b6abd75
  • Python version: 3.6.9
  • Bazel version: 2.1.0
  • Operating System: Ubuntu 18.04

Maintain Python3 support in all files

In Makefile and tox.ini, the command invoked is python file.py.

The problem is in conda env, it defaults to python3. But in systems with both python 2 and 3 installed, it calls python2.

What can be done:

Make sure the correct python version is invoked when you call python and make changes accordingly in these two files.

Add documentation for Testing Framework

We are using pytest for testing and tox to automate this process.

While we have set-up the scripts, we need to add documentation on how to write Unit Tests as well as establish standards to write these tests.

Workflow for publishing directly on PYPI

Expected Behavior

Create a Github workflow, which when a release is created, automatically builds the package and uploads it on pypi.

We will need to create three different workflows for Linux, OSX and windows.

Current Behavior

What is the current behavior?
As of now, we have to manually upload the package when we are about to release them using twine on the local system.

Python Package Structure

Expected Behavior

When we import the pydp package and print the type() of any object, it should be pydp.item.

Current Behavior

Currently, when we import the pydp package and print the type() of any object, it's pydp.pydp.item

Steps to Reproduce Behavior

>>> import pydp as pd
>>> pd.Status.StatusCode(0)
StatusCode.kOk
>>> type(pd.Status.StatusCode(0))
<class 'pydp.pydp.Status.StatusCode'>

System information

  • Python version: 3.6

PyDP does not work on python 3,5, 3.7

Expected Behavior

It should work on all python3 interpreters.

Current Behavior

When we import pydp on interpreters other than py3.6, It says module was compiled for python3.6; interpreter incompatible.

Logs

image

System information

  • PyDP version: 0.1.0dev
  • Python version: 3.5, 3.7

Ensure code adheres to coding standards.

Expected Behavior

As decided on slack channel, we are following PEP8 style for python and Google code formating for C++. This was pushed in #98

Current Behavior

The current code does not adhere to these standards and fails.

How to make it work

clang-format -i src/bindings/PyDP/*/*.cpp
clang-format -i src/bindings/PyDP/*/*.hpp

Error handling for Result, statusor

Expected Behavior

When we call the result function, which in turn calls ValueorDie(), if it dies, it returns a status code with it.
We need to take that status code from statusorr, integrate with base/status.cpp and raise an error in python.

Current Behavior

When we call the result function, which in turn calls ValueorDie(), if it dies, the program crashes

Steps to Reproduce Behavior

>>> import pydp as dp
>>> x = dp.BoundedMean(0.5)
>>> x.result([2,4,4,8])

Logs

>>> import pydp as dp
>>> x = dp.BoundedMean(0.5)
>>> x.result([2,4,4,8])
2020-05-02 00:05:16  FATAL  statusor.cc : 38 : Attempting to fetch value instead of handling error kInvalidArgument: Bin count threshold was too large to find approximate bounds. Either run over a larger dataset or decrease success_probability and try again.
2020-05-02 00:05:16  FATAL  statusor.cc : 38 : Attempting to fetch value instead of handling error kInvalidArgument: Bin count threshold was too large to find approximate bounds. Either run over a larger dataset or decrease success_probability and try again.
Aborted (core dumped)

System information

  • PyDP version: 0.1.0dev
  • Python version: 3.6

Wheel tag doesn't match the actual package

Expected Behavior

The distributed wheel package is tagged with 'py2.py3-none-any' which means it should work on all platform with both python 2 and 3, however, the wheel is including the pydp.so file which is compiled for a specific platform and python3.6 exactly.

Current Behavior

Can't run on a different platform or python other than 3.6

Steps to Reproduce Behavior

Install pydp for other version than python3.6, or install it in a different platform than Linux (which the pydp.so file is compiled for)

algorithms\util

  • XorStrings
  • GetNextPowerOfTwo
  • Qnorm
  • Clamp
  • SafeAdd
  • SafeSquare
  • SafeSubtract
  • Mean
  • StandardDev
  • OrderStatistic
  • Correlation
  • VectorFilter
  • VectorToString

Establish Testing Framework

  • [ Document testing methodology]
  • [ Update scripts to automate testing]
  • [ Implement a fully worked example of testing]
  • [ Create any new issues required to improve quality]

Documentation for PyDP

We need to start the documentation at /docs folder accessible at here .

Before we start the documentation:

  1. We need to identify what should be the documentation structure. What should be the template documentation (ex. should we get inspired by how Tensorflow is documented or some other way?)

Once we decide on the template, we should start with the documentation on how to use the library.

Additionally, #75 also needs to be added in the same documentation structure, in the section contribution guidelines.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.