Code Monkey home page Code Monkey logo

segment-anything-u-specify's Introduction

Segment-Anything-U-Specify

Use SAM and CLIP model to segment unique instances you want. You may use this repo to segment any instances in the picture with text prompts.

The main network architecture is as follows:

Clip Model Architecture CLIP_MODEL

SAM Model Architecture SAM

Installation

Install python packages via commands:

pip3 install -r requirements.txt

Download pretrained model weights

cd PROJECT_ROOT_DIR
bash scripts/download_pretrained_ckpt.sh

Instance Segmentation With Text Prompts

Instance segmentor first using sam model to get all obj's mask of the input image. Second using clip model to classify each mask with both image features and your text prompts features.

cd PROJECT_ROOT_DIR
export PYTHONPATH=$PWD:$PYTHONPATH
python tools/sam_clip_text_seg.py --input_image_path ./data/test_images/test_bear.jpg --text bear

Bear Instance Segmentation Result, Text Prompt: bear bear_insseg_result

Athelete Instance Segmentation Result, Text Prompt: athlete athlete_insseg_result

Horse Instance Segmentation Result, Text Prompt: horse horse_insseg_result

Dog Instance Segmentation Result, Text Prompt: dog dog_insseg_result

Fish Instance Segmentation Result, Text Prompt: fish fish_insseg_result

Strawberry Instance Segmentaton Result, Text Prompt: strawberry strawberry_insseg_result

Glasses Instance Segmentaton Result, Text Prompt: glasses glasses_insseg_result

Tv Instance Segmentaton Result, Text Prompt: television tv_insseg_result

Shoes Instance Segmentaton Result, Text Prompt: shoe shoes_insseg_result

Bridge Instance Segmentaton Result, Text Prompt: bridge bridge_insseg_result

Airplane Instance Segmentaton Result, Text Prompt: airplane airplane_insseg_result

Support Multiple Classes Segmentation All In Once ---- YOSO ---- You Only Segment Once

cd PROJECT_ROOT_DIR
export PYTHONPATH=$PWD:$PYTHONPATH
python tools/sam_clip_text_seg.py --input_image_path ./data/test_images/test_horse.jpg --text "horse,mountain,grass,sky,clouds,tree" --cls_score_thresh 0.5 --use_text_prefix

Horse Instance Segmentation Result, Text Prompt: horse,mountain,grass,sky,clouds,tree horse_insseg_result Tv Instance Segmentaton Result, Text Prompt: television,audio system,tape recorder,box tv_insseg_result Strawberry Instance Segmentaton Result, Text Prompt: strawberry,grapefruit,spoon,wolfberry,oatmeal strawberry_insseg_result Frog Instance Segmentaton Result, Text Prompt: frog,turtle,snail,eye frog_insseg_result

Instance Segmentation Provement

2023-04-21 improve background segmentation problem

Befor Optimize before After Optimize after

Unsupervised Cluster Semantic Objects From SAM Model

Cluster first using sam model to get all obj's mask of the input image. Second using clip model to extract image features for each objects. Third calculate feature distance of every two object pairs. Finally using a similarity threshold to cluster source objects.

To test the cluster simply run

cd PROJECT_ROOT_DIR
export PYTHONPATH=$PWD:$PYTHONPATH
python tools/cluster_sam.py --input_image_path ./data/test_images/test_bear.jpg --simi_thresh 0.82

Bear Cluster Result bear_cluster_result

Horse Cluster Result horse_cluster_result

Each row represents source image, sam origin mask, ori masked image, clustered mask, cluster masked image

UPDATES

2023-07-04 Integrate MobileSAM

Integrate MobileSAM into the pipeline for lightweight and faster inference. If you want to use mobile-sam to segment your image all you need to do is to modify ./config/sam.yaml file. Modify the model name field to vit_t and modify the model weight file path to ./pretrained/sam/mobile_sam.pt

TODO

  • Test different kinds of cluster method
  • Using cluster result as input prompts to reseg the image via sam model
  • Merge embedding feats of global image and masked image

Acknowledgement

Most of the repo's code borrows from opeai's clip repo and facebook's segment-anything repo:

Star History

Star History Chart

Visitor Count

Visitor Count

Contact

segment-anything-u-specify's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

segment-anything-u-specify's Issues

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

image

image

Best Wishes,

Qiao

inference on geotiff and output as geojson

Dear team,

Firstly, I would like to extend my gratitude for the incredible work that you have done on this project. It is truly fascinating and I appreciate the effort you have put into it.

I have a query regarding running the inference on a multispectral satellite image with more than 3 channels or bands. I have attempted to do so with a 4 bands image, however, it did not work. Can you suggest any workaround for this issue?

Furthermore, for geospatial data related to satellite imagery, I would also like to request a geojson of the polygonized mask, in addition to the mask, if possible.

Thank you again for your fantastic work and I look forward to your response.

Best regards,

ModuleNotFoundError: No module named 'local_utils'

作者你好,按照你的步驟安裝
執行python tools/sam_clip_text_seg.py --input_image_path ./data/test_images/test_bear.jpg --text bear
顯示Traceback (most recent call last):
File "tools/sam_clip_text_seg.py", line 17, in
from local_utils.log_util import init_logger
ModuleNotFoundError: No module named 'local_utils'

不清楚哪邊出了問題

CUDA out of memory

First of all, great work and thank you for open source all of the code! I was trying to test the model with bear example. But it returns the RuntimeError as "RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 3.94 GiB total capacity; 2.92 GiB already allocated; 364.19 MiB free; 2.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF". I wonder what kind of GPU are you using? It doesn't make sense that the segmentation for one image needs that much CUDA memory though. Thanks!

segmentation

According to the prompt, my target object cannot be segmented. What should I do?

finetune code

请问您进行微调了吗?是否可以公开微调代码?

No module named 'local_utils'

I'm trying to test this repo im getting this error.
from local_utils.log_util import init_logger
ModuleNotFoundError: No module named 'local_utils'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.