bumble-tech / private-detector Goto Github PK

Bumble's Private Detector - a pretrained model for detecting lewd images

Home Page: https://medium.com/bumble-tech/bumble-inc-open-sources-private-detector-and-makes-another-step-towards-a-safer-internet-for-women-8e6cdb111d81

License: Apache License 2.0

Python 100.00%

bumble efficientnet image-classification tensorflow

private-detector's Introduction

Private Detector

This is the repo for Bumble's Private Detector™ model - an image classifier that can detect lewd images.

The internal repo has been heavily refactored and released as a fully open-source project to allow for the wider community to use and finetune a Private Detector model of their own. You can download the pretrained SavedModel, Frozen Model and checkpoint here

Model

The SavedModel can be found in saved_model/ within private_detector.zip above

The model is based on Efficientnet-v2 and trained on our internal dataset of lewd images - more information can be found at the whitepaper here or here

Inference

Inference is pretty simple and an example has been given in inference.py. The model is released as a SavedModel so it can be deployed in many different ways, but here's a quick runthrough of one way to get it working for those less familiar with Python/Tensorflow.

First you need to install Python and Conda on your system and go to the Terminal/Command Prompt on your machine

Then you can use the environment.yaml file to install the necessary packages to run the inference.

conda env create -f environment.yaml
conda activate private_detector

Once that's set up, you can run the inference script. Simply replace the sample .jpg file paths below with your own

python3 inference.py \
    --model saved_model/ \
    --image_paths \
        Yes_samples/1.jpg \
        Yes_samples/2.jpg \
        Yes_samples/3.jpg \
        Yes_samples/4.jpg \
        Yes_samples/5.jpg \
        No_samples/1.jpg \
        No_samples/2.jpg \
        No_samples/3.jpg \
        No_samples/4.jpg \
        No_samples/5.jpg \

Sample Output


Probability: 93.71% - Yes_samples/1.jpg
Probability: 93.43% - Yes_samples/2.jpg
Probability: 94.06% - Yes_samples/3.jpg
Probability: 94.08% - Yes_samples/4.jpg
Probability: 91.01% - Yes_samples/5.jpg
Probability: 9.76% - No_samples/1.jpg
Probability: 7.14% - No_samples/2.jpg
Probability: 8.83% - No_samples/3.jpg
Probability: 4.87% - No_samples/4.jpg
Probability: 5.29% - No_samples/5.jpg

Serving

See Tensorflow Serving example

Additional Training

You can finetune the model yourself on your own data, to do so is fairly simple - though you will need the checkpoint files as can be found in saved_checkpoint/ in private_detector.zip

Set up a JSON file with links to your image path lists for each class:

{
    "Yes": {
        "path": "/home/sofarrell/private_detector/Yes.txt",
        "label": 0
    },
    "No": {
         "path": "/home/sofarrell/private_detector/No.txt",
         "label": 1
    }
}

With each .txt file listing off the image paths to your images

/home/sofarrell/private_detector_images/Yes/1093840880_309463828.jpg
/home/sofarrell/private_detector_images/Yes/657954182_3459624.jpg
/home/sofarrell/private_detector_images/Yes/1503714421_3048734.jpg

You can create the training environment with conda:

conda env create -f environment.yaml
conda activate private_detector

And then retrain like so:

python3 ./train.py \
    --train_json /home/sofarrell/private_detector/train_classes.json \
    --eval_json /home/sofarrell/private_detector/eval_classes.json \
    --checkpoint_dir saved_checkpoint/ \
    --train_id retrained_private_detector

The training script has several parameters that can be tweaked:

Command	Description	Type	Default
`train_id`	ID for this particular training run	str
`train_json`	JSON file(s) which describes classes and contains lists of filenames of data files	List[str]
`eval_json`	Validation json file which describes classes and contains lists of filenames of data files	str
`num_epochs`	Number of epochs to train for	int
`batch_size`	Number of images to process in a batch	int	`64`
`checkpoint_dir`	Directory to store checkpoints in	str
`model_dir`	Directory to store graph in	str	`.`
`data_format`	Data format: [channels_first, channels_last]	str	`channels_last`
`initial_learning_rate`	Initial learning rate	float	`1e-4`
`min_learning_rate`	Minimal learning rate	float	`1e-6`
`min_eval_metric`	Minimal evaluation metric to start saving models	float	`0.01`
`float_dtype`	Float Dtype to use in image tensors: [16, 32]	int	`16`
`steps_per_train_epoch`	Number of steps per train epoch	int	`800`
`steps_per_eval_epoch`	Number of steps per evaluation epoch	int	`1`
`reset_on_lr_update`	Whether to reset to the best model after learning rate update	bool	`False`
`rotation_augmentation`	Rotation augmentation angle, value <= 0 disables it	float	`0`
`use_augmentation`	Add speckle, v0, random or color distortion augmentation	str
`scale_crop_augmentation`	Resize image to the model's size times this scale and then randomly crop needed size	float	`1.4`
`reg_loss_weight`	L2 regularization weight	float	`0`
`skip_saving_epochs`	Do not save good checkpoint and update best metric for this number of the first epochs	int	`0`
`sequential`	Use sequential run over randomly shuffled filenames vs equal sampling from each class	bool	`False`
`eval_threshold`	Threshold above which to consider a prediction positive for evaluation	float	`0.5`
`epochs_lr_update`	Maximum number of epochs without improvement used to reset/decrease learning rate	int	`20`

private-detector's People

Contributors

Stargazers

Watchers

Forkers

alexdruso lordnynex d9j aalekhpatel07 siahaanbernard abinator-1308 techthiyanes bedros davidmpaz shackleslayer linux-devil pushpendre proanon funkimunk lucasg2000 oferchen ahmed-masud kazakovdmitriy loretoparisi themagicbean pavadik cloudguruab imgkl pnatraj seekingscholars majiajue p2yxr6wo7hhigqdq1 plutosaints linecode b-xiang duke24k freshy969 vesper8 miknyko positioner qinci pythoncloudbase lijuny jithinraj dongbinghua tianlegetian simohyha414 jaedukseo basti0nz chenshuichuan locb65 burakakrishna newbienewbie ss18 nilportugues beausoft ljh-1999 cloudnepal ssafeluck zomkey lyrl bruian kerlic seemanne hotelzululima taimujietai jenk1 dut3062796s c0c1 luomor-ai oraclefinance uptheclock ryoumon slzzintheforest huseyinbahtiyar thealteria oozankilic developer-mide wacdev aier-art theboatymcboatface cualquiercosa327 abradolf-l riaanlab pikqu pinktoadette kumarneeraj2005 vck antoinersw furkangozukara 5l1v3r1 mvandermeulen goodxiaowan romanzhang9733 hwj6666666 stephaniez pincident donhardman

private-detector's Issues

What's the output node names?

I tried to convert model to frozen graph, but couldn't find the output names need in freee_graph tool

freeze_graph --input_saved_model_dir=saved_model --output_node_names= --output_graph=frozen_graph.pb

thanks a lot!

pre-trained model license

Dear Bumble tech, I can see the code is under the Apache 2.0 license, what about the pre-trained model. You stand it is trained on 'private' data, what is the pre-trained model license ?

base64 input

Hello, I was wondering if there is a possibility to implement a functionality to serve this model straight away using any kind of containerized environment. Specifically, would be nice if the model would be able to accept some kind of standard (e.g base64) image representation instead of tensor representation, since that wouldn't require clients to implement the convertation on their end. Thank you!

Is it possible to install with Pip?

Is it possible to install with Pip? I use venv and don't want to use conda.

Upload Model to Hugging Face Hub

Hey there, it would be awesome to see this model on the Hugging Face Model Hub. :)

I added a copy to my profile real quick to show you how to do it, and how easy it is to load once its up there...

import tensorflow as tf
from huggingface_hub import snapshot_download

model = tf.saved_model.load(snapshot_download(repo_id='nateraw/bumble-private-detector'))

I'd love to move this to an official org for bumble-tech and have you folks fill out the model card. What do you think?

onnx model，please

I am writing to seek assistance with converting the model into the ONNX format. I have encountered some unresolved issues during the conversion process, and I am hoping to receive your guidance in order to successfully convert the model to the ONNX format.

Is it possible to convert the private-detector model to a CoreML model?

Dear Bumble tech,
I've been attempting to convert the private-detector's saved_model into a CoreML model. However, after the conversion, it seems unable to successfully identify NSFW images. I suspect there might be an issue during the conversion process. Could you guide me on how to correctly convert a saved_model.pb into a CoreML .mlPackage? Thanks a lot!
Here's my code:

import coremltools as ct
mlmodel_from_tf = ct.convert(model="/Path/To/private_detector/saved_model",
                           inputs=[ct.ImageType(shape=(1,480,480,3))],
                           source="tensorflow",
                           compute_precision=ct.precision.FLOAT32)

Results of testing the CoreML model:

from PIL import Image
img = Image.open('/Path/To/Desktop/dick3.png')
img = img.resize((480,480)) 
if img.mode != 'RGB':
    img = img.convert('RGB')
out_dict = mlmodel_from_tf.predict({"model_input_images": img})
print(out_dict) 
# {'Identity': array([[0.00386974, 0.9961302 ]], dtype=float32)}
# I believe the first element of the array represents the "confidence level that the content is NSFW." or I'm misunderstanding something?

Tensorflow serving Dockerfile

Would it make sense to include the Dockerfile for the vazhega/private-detector image to https://github.com/bumble-tech/private-detector/tree/main/deployments/tensorflow-serving?

put frozen model in download package

I am AI newbie struggling to freeze the model but can not make it so far.

It would be very nice to put frozen model in the zipped model package as well.

issue 6

Multi-Label classification

Hi, Is it possible to extend this to do multi-label classification to detect what type of nudity is shown? Or is it just not designed for that?

Thanks.

Is it possible to get Tensorflow lite model?

Hi, I would like to inquire if you could provide a TensorFlow Lite private detector model with a dType of float32.
I've attempted to convert an existing saved_model.pb to .tflite for use on Android platform mobile phones. Android platform does not support Float16. Ultimately, I found a way to obtain a Float32 model, which is by retraining and setting the dType from Float16 to Float32 during the training process. However, this method does not seem to be orthodox, so I was wondering if you could provide a .tflite file with a dType of float32. Thank you.

Any effects on Private-Detector with TensorFlow Addon Wind Down?

Hello guys,
I hope this is the right place for that. TensorFlow Addons will be discontinued as of May 2024. I wondered if this will have any effect on this project? And if so what could those be?