Code Monkey home page Code Monkey logo

datadiligence's Introduction

ReadTheDocs PyPI-Server PyPI - License Twitter

datadiligence

Respect generative AI opt-outs in your ML training pipeline.

datadiligence aims to make it simple for ML practitioners to respect opt-outs in their training by providing a consistent interface to check if a given work is opted-out using any known method. The goal of this project is to make respecting opt-outs as painless as possible, while being flexible enough to support new opt-out methods as they are developed.

Why is this needed?

ML training datasets are often harvested without consent from the data or content owners, meaning any ML models trained with these datasets could be violating the wishes of content creators on how their content is used. With the absence of an opt-out standard, many platforms and individuals have come up with their own methods of stating their consent.

Additionally, consent can change over time, and static datasets obviously cannot. A work which was consenting at the time of the dataset's creation may not be consenting at the time of training. Keeping up with the current state of opt-outs is unrealistic for most practitioners, and so this project aims to make it as easy as possible to respect opt-outs in your training pipeline.

Basic Usage

To install:

pip install datadiligence

Add bulk pre-processing for URLs in your pipeline (requires Spawning API Key):

>>> import datadiligence as dd
>>> urls = ["https://www.example.com/art-123456789.jpg", "https://www.example.com/art-987654321.jpg"]
>>> dd.filter_allowed(urls=urls)
 ["https://www.example.com/art-123456789.jpg"]
>>> dd.is_allowed(urls=urls)
 [True, False]

Check HTTP responses in post-processing:

>>> response = requests.get("https://www.example.com/art-123456789.jpg")
>>> is_allowed = dd.is_allowed(response=response)
True
>>> if is_allowed:
>>>     process_image(response.content)

Full documentation is available on readthedocs.

Respected Opt-Out Methods

This project currently supports the following opt-out methods:

With these opt-out methods coming soon:

Contributing

See contribution guidelines here.

datadiligence's People

Contributors

padge91 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.