Code Monkey home page Code Monkey logo

private-detector's Introduction

Private Detector

This is the repo for Bumble's Private Detector™ model - an image classifier that can detect lewd images.

The internal repo has been heavily refactored and released as a fully open-source project to allow for the wider community to use and finetune a Private Detector model of their own. You can download the pretrained SavedModel, Frozen Model and checkpoint here

Model

The SavedModel can be found in saved_model/ within private_detector.zip above

The model is based on Efficientnet-v2 and trained on our internal dataset of lewd images - more information can be found at the whitepaper here or here

Inference

Inference is pretty simple and an example has been given in inference.py. The model is released as a SavedModel so it can be deployed in many different ways, but here's a quick runthrough of one way to get it working for those less familiar with Python/Tensorflow.

First you need to install Python and Conda on your system and go to the Terminal/Command Prompt on your machine

Then you can use the environment.yaml file to install the necessary packages to run the inference.

conda env create -f environment.yaml
conda activate private_detector

Once that's set up, you can run the inference script. Simply replace the sample .jpg file paths below with your own

python3 inference.py \
    --model saved_model/ \
    --image_paths \
        Yes_samples/1.jpg \
        Yes_samples/2.jpg \
        Yes_samples/3.jpg \
        Yes_samples/4.jpg \
        Yes_samples/5.jpg \
        No_samples/1.jpg \
        No_samples/2.jpg \
        No_samples/3.jpg \
        No_samples/4.jpg \
        No_samples/5.jpg \
Sample Output
Probability: 93.71% - Yes_samples/1.jpg
Probability: 93.43% - Yes_samples/2.jpg
Probability: 94.06% - Yes_samples/3.jpg
Probability: 94.08% - Yes_samples/4.jpg
Probability: 91.01% - Yes_samples/5.jpg
Probability: 9.76% - No_samples/1.jpg
Probability: 7.14% - No_samples/2.jpg
Probability: 8.83% - No_samples/3.jpg
Probability: 4.87% - No_samples/4.jpg
Probability: 5.29% - No_samples/5.jpg

Serving

See Tensorflow Serving example

Additional Training

You can finetune the model yourself on your own data, to do so is fairly simple - though you will need the checkpoint files as can be found in saved_checkpoint/ in private_detector.zip

Set up a JSON file with links to your image path lists for each class:

{
    "Yes": {
        "path": "/home/sofarrell/private_detector/Yes.txt",
        "label": 0
    },
    "No": {
         "path": "/home/sofarrell/private_detector/No.txt",
         "label": 1
    }
}

With each .txt file listing off the image paths to your images

/home/sofarrell/private_detector_images/Yes/1093840880_309463828.jpg
/home/sofarrell/private_detector_images/Yes/657954182_3459624.jpg
/home/sofarrell/private_detector_images/Yes/1503714421_3048734.jpg

You can create the training environment with conda:

conda env create -f environment.yaml
conda activate private_detector

And then retrain like so:

python3 ./train.py \
    --train_json /home/sofarrell/private_detector/train_classes.json \
    --eval_json /home/sofarrell/private_detector/eval_classes.json \
    --checkpoint_dir saved_checkpoint/ \
    --train_id retrained_private_detector

The training script has several parameters that can be tweaked:

Command Description Type Default
train_id ID for this particular training run str
train_json JSON file(s) which describes classes and contains lists of filenames of data files List[str]
eval_json Validation json file which describes classes and contains lists of filenames of data files str
num_epochs Number of epochs to train for int
batch_size Number of images to process in a batch int 64
checkpoint_dir Directory to store checkpoints in str
model_dir Directory to store graph in str .
data_format Data format: [channels_first, channels_last] str channels_last
initial_learning_rate Initial learning rate float 1e-4
min_learning_rate Minimal learning rate float 1e-6
min_eval_metric Minimal evaluation metric to start saving models float 0.01
float_dtype Float Dtype to use in image tensors: [16, 32] int 16
steps_per_train_epoch Number of steps per train epoch int 800
steps_per_eval_epoch Number of steps per evaluation epoch int 1
reset_on_lr_update Whether to reset to the best model after learning rate update bool False
rotation_augmentation Rotation augmentation angle, value <= 0 disables it float 0
use_augmentation Add speckle, v0, random or color distortion augmentation str
scale_crop_augmentation Resize image to the model's size times this scale and then randomly crop needed size float 1.4
reg_loss_weight L2 regularization weight float 0
skip_saving_epochs Do not save good checkpoint and update best metric for this number of the first epochs int 0
sequential Use sequential run over randomly shuffled filenames vs equal sampling from each class bool False
eval_threshold Threshold above which to consider a prediction positive for evaluation float 0.5
epochs_lr_update Maximum number of epochs without improvement used to reset/decrease learning rate int 20

private-detector's People

Contributors

punkerpunker avatar ss18 avatar steeeephen avatar xxaier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

private-detector's Issues

What's the output node names?

I tried to convert model to frozen graph, but couldn't find the output names need in freee_graph tool

freeze_graph --input_saved_model_dir=saved_model --output_node_names= --output_graph=frozen_graph.pb

thanks a lot!

pre-trained model license

Dear Bumble tech, I can see the code is under the Apache 2.0 license, what about the pre-trained model. You stand it is trained on 'private' data, what is the pre-trained model license ?

base64 input

Hello, I was wondering if there is a possibility to implement a functionality to serve this model straight away using any kind of containerized environment. Specifically, would be nice if the model would be able to accept some kind of standard (e.g base64) image representation instead of tensor representation, since that wouldn't require clients to implement the convertation on their end. Thank you!

Upload Model to Hugging Face Hub

Hey there, it would be awesome to see this model on the Hugging Face Model Hub. :)

I added a copy to my profile real quick to show you how to do it, and how easy it is to load once its up there...

import tensorflow as tf
from huggingface_hub import snapshot_download

model = tf.saved_model.load(snapshot_download(repo_id='nateraw/bumble-private-detector'))

I'd love to move this to an official org for bumble-tech and have you folks fill out the model card. What do you think?

onnx model,please

I am writing to seek assistance with converting the model into the ONNX format. I have encountered some unresolved issues during the conversion process, and I am hoping to receive your guidance in order to successfully convert the model to the ONNX format.

Is it possible to convert the private-detector model to a CoreML model?

Dear Bumble tech,
I've been attempting to convert the private-detector's saved_model into a CoreML model. However, after the conversion, it seems unable to successfully identify NSFW images. I suspect there might be an issue during the conversion process. Could you guide me on how to correctly convert a saved_model.pb into a CoreML .mlPackage? Thanks a lot!
Here's my code:

import coremltools as ct
mlmodel_from_tf = ct.convert(model="/Path/To/private_detector/saved_model",
                           inputs=[ct.ImageType(shape=(1,480,480,3))],
                           source="tensorflow",
                           compute_precision=ct.precision.FLOAT32)

Results of testing the CoreML model:

from PIL import Image
img = Image.open('/Path/To/Desktop/dick3.png')
img = img.resize((480,480)) 
if img.mode != 'RGB':
    img = img.convert('RGB')
out_dict = mlmodel_from_tf.predict({"model_input_images": img})
print(out_dict) 
# {'Identity': array([[0.00386974, 0.9961302 ]], dtype=float32)}
# I believe the first element of the array represents the "confidence level that the content is NSFW." or I'm misunderstanding something?

Multi-Label classification

Hi, Is it possible to extend this to do multi-label classification to detect what type of nudity is shown? Or is it just not designed for that?

Thanks.

Is it possible to get Tensorflow lite model?

Hi, I would like to inquire if you could provide a TensorFlow Lite private detector model with a dType of float32.
I've attempted to convert an existing saved_model.pb to .tflite for use on Android platform mobile phones. Android platform does not support Float16. Ultimately, I found a way to obtain a Float32 model, which is by retraining and setting the dType from Float16 to Float32 during the training process. However, this method does not seem to be orthodox, so I was wondering if you could provide a .tflite file with a dType of float32. Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.