Code Monkey home page Code Monkey logo

mobile-semantic-segmentation's Introduction

Real-Time Semantic Segmentation in Mobile device

This project is an example project of semantic segmentation for mobile real-time app.

The architecture is inspired by MobileNetV2 and U-Net.

LFW, Labeled Faces in the Wild, is used as a Dataset.

The goal of this project is to detect hair segments with reasonable accuracy and speed in mobile device. Currently, it achieves 0.89 IoU.

About speed vs accuracy, more details are available at my post.

Example of predicted image.

Example application

  • iOS
  • Android (TODO)

Requirements

  • Python 3.8
  • pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
  • CoreML for iOS app.

About Model

At this time, there is only one model in this repository, MobileNetV2_unet. As a typical U-Net architecture, it has encoder and decoder parts, which consist of depthwise conv blocks proposed by MobileNets.

Input image is encoded to 1/32 size, and then decoded to 1/2. Finally, it scores the results and make it to original size.

Steps to training

Data Preparation

Data is available at LFW. To get mask images, refer issue #11 for more. After you got images and masks, put the images of faces and masks as shown below.

data/
  lfw/
    raw/
      images/
        0001.jpg
        0002.jpg
      masks/
        0001.ppm
        0002.ppm

Training

If you use 224 x 224 as input size, pre-trained weight of MobileNetV2 is available. It will be automatically downloaded when you train model with the following command.

cd src
python run_train.py params/002.yaml

Dice coefficient is used as a loss function.

Pretrained model

Input size IoU Download
224 0.89 Google Drive

Converting

As the purpose of this project is to make model run in mobile device, this repository contains some scripts to convert models for iOS and Android.

TBD

  • Report speed vs accuracy in mobile device.
  • Convert pytorch to Android using TesorFlow Light

mobile-semantic-segmentation's People

Contributors

akirasosa avatar chnghia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mobile-semantic-segmentation's Issues

Script to evaluate MobileUNet

Hi, could you provide a script to evaluate MobileUNet? I have trained the model and wanted to evaluate the same. Thanks

about the Version of keras and tensorflow

python train_full.py --img_file=data/images-128.npy --mask_file=data/masks-128.npy

... ...
File "/home/.local/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 1376, in init
layer.outbound_nodes.append(self)
AttributeError: 'Activation' object has no attribute 'outbound_nodes'

hi, I run the training code above,got an error,could tell me the version of keras and tensorflow ?Thank you!
My Version: Keras:2.2.0 Tensorflow:1.8.0

Training Issue

when i run this command
python train_full.py \ --img_file=/path/to/images.npy \ --mask_file=/path/to/masks.npy

i got this error:

Traceback (most recent call last): File "train_full.py", line 96, in <module> train(**vars(args)) File "train_full.py", line 21, in train train_gen, validation_gen, img_shape = load_data(img_file, mask_file) File "/home/lafi/Desktop/HairSegmentation/mobile-semantic-segmentation-master/data.py", line 60, in load_data train_img_gen.fit(images) File "/usr/local/lib/python2.7/dist-packages/keras/preprocessing/image.py", line 675, in fit 'Got array with shape: ' + str(x.shape)) ValueError: Input to .fit()should have rank 4. Got array with shape: (0,)
Please how i can fix it
Thank you
Lafi

How to calculate the mean and std value?

Hello, in the data.py script, the standardize function is as below:
def standardize(images, mean=None, std=None):
if mean is None:
# These values are available from all images.
mean = [[[29.24429131, 29.24429131, 29.24429131]]]
if std is None:
# These values are available from all images.
std = [[[69.8833313, 63.37436676, 61.38568878]]]
x = (images - np.array(mean)) / (np.array(std) + 1e-7)
return x
But I calculate the mean and std value of three channels of all the images respectively by np.mean() and np.std() , the result is just not the value above. Do anyone knows how to calculate the mean and std value ?

ImportError: cannot import name 'relu6'

Hi,

with Keras 2.2.2, there is problem importing DepthwiseConv2D and relu6 in MobileUNet.py

DepthwiseConv2D can be probably imported from keras.layers, but I don't know where to find relu6.

Face segmentation

Hi

The ppm files of LFW dataset has face as green, hair as red & remaining area as blue.
image

Suppose if I want to train the model for face segmentation, what are the changes need to be done?
Please let me know.

Regards
Gopi. J

I trained with my own dataset input size is 224x224, but the output is 112 x 112

hi ,I would like to use your training method to train my own dataset and want to replace my own mlmodel to the iOS sample Hair mlmodel.
However, when I done training and covert it to mlmoldel, input size is 224 x 224 color, output is 112x112 color. may I ask how to make the output same as the ios sample hair model mlmodel (MultiArray (Double 1 x 224 x 224))?

Have you tried the model on real world images?

Hi,
I have tried running produced from the training on sample images from normal real world seances, and I'm getting very bad results and idea why? the training and validation scores seem alright?

pred.py question

When I run pred.py and save the plt, I get these kinds of predicted images, which are not clean like in your github
image

image

Any idea?

why predict values are strange

I trained a model with my own data, the test output are as follows with "pred.py":
image

it seems there is something wrong, what's the reason?

I trained with my own data, my mask's pixel value is 0,1,2,3,4

dataset transform?

hi,
why apply the same color jitter transform to the mask? as for as i know, mask is just the label which stands for pixels classification.

README Training Instructions Broken

@akirasosa I tried to follow the instructions in the Repo README, and received the following error:
AttributeError: Can't pickle local object 'dice_loss.<locals>.fn'
Am i doing something incorrect, or is this a common error?

image size not correct

#1
document issue
the correct command line to set image size is
python data.py --img_size=128
not
python data.py --image_size=128

#2
training issue
when data is generated with default size 192
I started train
and the training process got killed after log
"
lr: 0.100000
Epoch 1/250
"

Variable classes

I can't see any way to define the number of classes detected. How would I increase the number of classes?

EDIT: It seems to me it should be the last conv2D layer in the model; that is, changing Conv2D(1, (1,1) ...)(b18) to Conv2D(num_classes, (1,1) ...)(b18). Of course, this gives me a shape mismatch error. Ideally, I'd like my output to be a (1, height, width) matrix where each value indicates the inferred class for that pixel (i.e., as the most highly activated channel for that pixel). I'm pretty new to keras, so any thoughts appreciated.

integrate with Lightning ecosystem CI

Hello and so happy to see you use Pytorch-Lightning! 🎉
Just wondering if you already heard about quite the new Pytorch Lightning (PL) ecosystem CI where we would like to invite you to... You can check out our blog post about it: Stay Ahead of Breaking Changes with the New Lightning Ecosystem CI
As you use PL framework for your cool project, we would like to enhance your experience and offer you safe updates to our future releases. At this moment, you run tests with a particular PL version, but it may accidentally happen that the next version will be incompatible with your project... 😕 We do not intend to change anything on our project side, but still here we have a solution - ecosystem CI with testing both - your and our latest development head we can find it very early and prevent releasing eventually bad version... 👍

What is needed to do?

What will you get?

  • scheduled nightly testing configured for development/stable versions
  • slack notification if something went wrong to investigate
  • testing also on multi-GPU machine as our gift to you 🐰

cc: @Borda

Android app

Hello,
i'm asking about the android app because i want to develop it, so please can you give me some details about what things i have to do and what script needed to be ported from python to java.
Thanks
Lafi

About the edge

Hi @akirasosa ,
Your result looks good. How did you process the edge? Bilinear interpolation will smooth the detail of edge, right?

Negative Loss

Thank you for your post this repo. It's been very helpful.
I am trying to run full training on LFW dataset and I'm seeing negative loss value.

[==============================] - 17s 2s/step - loss: -0.0873

Is this expected?

Matching shapes

First of all thanks for this great repo and project.

I'm using an image of the size 240 and not 128 and when running the train_full script or any other method of training I receive the following error in the net instantiation (building) procedure:

ValueError: Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 16, 16, 512), (None, 15, 15, 512)]

This error happens when the following layer is constructed

up1 = concatenate([
        Conv2DTranspose(filters, (2, 2), strides=(2, 2), padding='same')(b13),
        b11,
    ], axis=3)

Any idea why? or how to solve it?

Furthermore how would you suggest that use the net for arbitrary size of an image (w,h) where w != h

Please consider Hydra

Hi,
I am the author of Hydra and OmegaConf which you are already using.
I think Hydra is much better suited for a project like this one than using OmegaConf only, please take a look.

How to run the model?

Hi, what are the steps to runt the file? I tried to load the model in run_eval.py, but It's throwing error. Is there any other way to run file

How long to train the model?

I'm planning on mimicking your architecture on the cityscapes dataset. Do you mind sharing some details on the training phase? ( Hardware / how long).

how to fine-tune

hey, after run train_full.py, I got good effect, but not good enough.
when I tried
#python train_top_model.py \

--img_file=/tmp/images.npy \

--mask_file=/tmp/masks.npy

#python train_fine_tune.py \

--img_file=/tmp/images.npy \

--mask_file=/tmp/masks.npy \

I got terrible effect...
I want to know the correct train method.

CoreML model crashes in real device.

It generates non-trained CoreML model.

python coreml-converter-bench.py

If I run the generated model in iOS simulator, it works. But if I run it in real device, it crashes.

I use following Swift unit test code to run it.

    func testConvExample() {
        guard let input_data = try? MLMultiArray(shape:[3, 128, 128], dataType:MLMultiArrayDataType.float32) else {
            fatalError("Unexpected runtime error. MLMultiArray")
        }
        let model = mu_128_1_025()
        do {
            try model.prediction(data: input_data)
            print("predicted------------")
        } catch  {
            print("error------------")
        }
    }

LFW related

Hi

I see the below comments in Readme

Data is available at LFW. Put the images of faces and masks as shown below.

data/
raw/
images/
0001.jpg
0002.jpg
masks/
0001.ppm
0002.ppm
<<<<

But in the link http://vis-www.cs.umass.edu/lfw/part_labels/#download, I can download files like lfw-funneled.tgz. If I extract I get folders with person names & corresponding jpg files.
Do I need to manually remove the folder structure & rearrange those files as *0001.jpg, *0002.jpg, etc... similar to the ppm files list ?

FYI, I got ppm files from https://drive.google.com/file/d/1TbQ24nIc3GGNWzV_GGX_D-1WpI2KOGii/view (will be great if you can share the method/code used to generate ppm files)

Regards
Gopi. J

Loading model with custom loss in coreml

I use
def matting_loss(x):

def loss(y_true, y_pred):

   La = dice_coef_loss(y_true, y_pred)

   mul_true = K.concatenate([y_true, y_true, y_true], axis=-1) * x

   mul_pred = K.concatenate([y_pred, y_pred, y_pred], axis=-1) * x

   Lcolor = K.mean(K.sqrt(K.square(mul_true - mul_pred) + K.epsilon()))

   return La + 0.5 * Lcolor

return loss
in keras, but cannot convet it to coreml, anyone who can help me!!!

output size of the converted model is wrong

Hi,
I tried your coreml_converter.py script and the output layer (identifier 597) is:
tensor_type {
elem_type: FLOAT
shape {
dim {
dim_value: 1
}
dim {
dim_value: 3
}
dim {
dim_value: 224
}
dim {
dim_value: 224
}

The identifier is 597 and not 595 as you told because I used your interpolate layer (was in # in your original code):
x = interpolate(x, scale_factor=2, mode='bilinear', align_corners=False)

Now, I have two questions:
(1) why the output is 3x224x224 and not 1x224x224 ?
(2) the interpolate layer is getting error when I am trying to convert. Is there another way to upload from 112x112 to 224x224?

Thanks.

How to verify a check point model?

I try to verify a check point model by firstly try to convert it to pb format.

python tf-converter.py --input_model_path artifacts/checkpoint_weights.10--0.51.h5

but I got an error like below

Using TensorFlow backend.
Traceback (most recent call last):
  File "tf-converter.py", line 58, in <module>
    main(**vars(args))
  File "tf-converter.py", line 24, in main
    model = load_model(input_model_path, custom_objects=custom_objects())
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/keras/models.py", line 238, in load_model
    raise ValueError('No model found in config file.')
ValueError: No model found in config file.

Do you know what could be wrong?

Is there any easy way to verify the h5 model directly on the PC?

Any Inference Speed Record in CPU Mode

I am looking for a semantic segmentation model that could run fast in CPU mode.
I appreciated your post offers a very detailed analysis on the runtime and accuracy for variants of MobileNetv2-UNet.
However, I found that the mobile devices you did experiments on are all embedded with GPU device. I am more concerned with the runtime and accuracy performance on CPU mode (hopefully the model could run at FPS >= 10).
Did you do any similar analysis before? Or are you confident that the model (MobileNetv2-UNet) could give a speed of at least 10FPS with a normal Macbook Air? (8GB memory, i7 CPU)

FYI, I tried your model (image size = 224, 224) in my laptop and found that the inference speed is ~0.3s. Not sure how I could optimize the speed

the last depthwise_conv_block(b18)

I noticed that the last depthwise_conv_block(b18) is annotated. You used a standard conv layer to replace it. This standard conv layer cost about 20% computation. So why not use depthwise_conv_block as above layers?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.