maybeshewill-cv / segment-anything-u-specify Goto Github PK

using clip and sam to segment any instance you specify with text prompt of any instance names

License: MIT License

Python 99.58% Shell 0.42%

deep-learning instance-segmentation object-detection segment-anything sam-model

segment-anything-u-specify's Introduction

Segment-Anything-U-Specify

Use SAM and CLIP model to segment unique instances you want. You may use this repo to segment any instances in the picture with text prompts.

The main network architecture is as follows:

Clip Model Architecture

SAM Model Architecture

Installation

Install python packages via commands:

pip3 install -r requirements.txt

Download pretrained model weights

cd PROJECT_ROOT_DIR
bash scripts/download_pretrained_ckpt.sh

Instance Segmentation With Text Prompts

Instance segmentor first using sam model to get all obj's mask of the input image. Second using clip model to classify each mask with both image features and your text prompts features.

cd PROJECT_ROOT_DIR
export PYTHONPATH=$PWD:$PYTHONPATH
python tools/sam_clip_text_seg.py --input_image_path ./data/test_images/test_bear.jpg --text bear

Bear Instance Segmentation Result, Text Prompt: bear

Athelete Instance Segmentation Result, Text Prompt: athlete

Horse Instance Segmentation Result, Text Prompt: horse

Dog Instance Segmentation Result, Text Prompt: dog

Fish Instance Segmentation Result, Text Prompt: fish

Strawberry Instance Segmentaton Result, Text Prompt: strawberry

Glasses Instance Segmentaton Result, Text Prompt: glasses

Tv Instance Segmentaton Result, Text Prompt: television

Shoes Instance Segmentaton Result, Text Prompt: shoe

Bridge Instance Segmentaton Result, Text Prompt: bridge

Airplane Instance Segmentaton Result, Text Prompt: airplane

Support Multiple Classes Segmentation All In Once ---- YOSO ---- You Only Segment Once

cd PROJECT_ROOT_DIR
export PYTHONPATH=$PWD:$PYTHONPATH
python tools/sam_clip_text_seg.py --input_image_path ./data/test_images/test_horse.jpg --text "horse,mountain,grass,sky,clouds,tree" --cls_score_thresh 0.5 --use_text_prefix

Horse Instance Segmentation Result, Text Prompt: horse,mountain,grass,sky,clouds,tree Tv Instance Segmentaton Result, Text Prompt: television,audio system,tape recorder,box Strawberry Instance Segmentaton Result, Text Prompt: strawberry,grapefruit,spoon,wolfberry,oatmeal Frog Instance Segmentaton Result, Text Prompt: frog,turtle,snail,eye

Instance Segmentation Provement

2023-04-21 improve background segmentation problem

Befor Optimize After Optimize

Unsupervised Cluster Semantic Objects From SAM Model

Cluster first using sam model to get all obj's mask of the input image. Second using clip model to extract image features for each objects. Third calculate feature distance of every two object pairs. Finally using a similarity threshold to cluster source objects.

To test the cluster simply run

cd PROJECT_ROOT_DIR
export PYTHONPATH=$PWD:$PYTHONPATH
python tools/cluster_sam.py --input_image_path ./data/test_images/test_bear.jpg --simi_thresh 0.82

Bear Cluster Result

Horse Cluster Result

Each row represents source image, sam origin mask, ori masked image, clustered mask, cluster masked image

UPDATES

2023-07-04 Integrate MobileSAM

Integrate MobileSAM into the pipeline for lightweight and faster inference. If you want to use mobile-sam to segment your image all you need to do is to modify ./config/sam.yaml file. Modify the model name field to vit_t and modify the model weight file path to ./pretrained/sam/mobile_sam.pt

TODO

Test different kinds of cluster method
Using cluster result as input prompts to reseg the image via sam model
Merge embedding feats of global image and masked image

Acknowledgement

Most of the repo's code borrows from opeai's clip repo and facebook's segment-anything repo:

Star History

Visitor Count

Contact

segment-anything-u-specify's People

Stargazers

Watchers

Forkers

hiromasa-h hsaigroup mohammadreza-sheykhmousa 2132660698 vayzenb kwonyoung9120 linhong00316 ctl2016 jszh jsoderholm

segment-anything-u-specify's Issues

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

Best Wishes,

Qiao

inference on geotiff and output as geojson

Dear team,

Firstly, I would like to extend my gratitude for the incredible work that you have done on this project. It is truly fascinating and I appreciate the effort you have put into it.

I have a query regarding running the inference on a multispectral satellite image with more than 3 channels or bands. I have attempted to do so with a 4 bands image, however, it did not work. Can you suggest any workaround for this issue?

Furthermore, for geospatial data related to satellite imagery, I would also like to request a geojson of the polygonized mask, in addition to the mask, if possible.

Thank you again for your fantastic work and I look forward to your response.

Best regards,

ModuleNotFoundError: No module named 'local_utils'

作者你好，按照你的步驟安裝
執行python tools/sam_clip_text_seg.py --input_image_path ./data/test_images/test_bear.jpg --text bear
顯示Traceback (most recent call last):
File "tools/sam_clip_text_seg.py", line 17, in
from local_utils.log_util import init_logger
ModuleNotFoundError: No module named 'local_utils'

不清楚哪邊出了問題

CUDA out of memory

First of all, great work and thank you for open source all of the code! I was trying to test the model with bear example. But it returns the RuntimeError as "RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 3.94 GiB total capacity; 2.92 GiB already allocated; 364.19 MiB free; 2.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF". I wonder what kind of GPU are you using? It doesn't make sense that the segmentation for one image needs that much CUDA memory though. Thanks!

ftfy
loguru=
matplotlib

The below pull request should get rid of it.

No module named 'local_utils'

I'm trying to test this repo im getting this error.
from local_utils.log_util import init_logger
ModuleNotFoundError: No module named 'local_utils'

what is the architecture of this work? is it a repro of what SAM had said?

Hi, what is the architecture of this work? is it a repro of what SAM had said?