Code Monkey home page Code Monkey logo

segment-anything-u-specify's Introduction

Segment-Anything-U-Specify

Use SAM and CLIP model to segment unique instances you want. You may use this repo to segment any instances in the picture with text prompts.

The main network architecture is as follows:

Clip Model Architecture CLIP_MODEL

SAM Model Architecture SAM

Installation

Install python packages via commands:

pip3 install -r requirements.txt

Download pretrained model weights

cd PROJECT_ROOT_DIR
bash scripts/download_pretrained_ckpt.sh

Instance Segmentation With Text Prompts

Instance segmentor first using sam model to get all obj's mask of the input image. Second using clip model to classify each mask with both image features and your text prompts features.

cd PROJECT_ROOT_DIR
export PYTHONPATH=$PWD:$PYTHONPATH
python tools/sam_clip_text_seg.py --input_image_path ./data/test_images/test_bear.jpg --text bear

Bear Instance Segmentation Result, Text Prompt: bear bear_insseg_result

Athelete Instance Segmentation Result, Text Prompt: athlete athlete_insseg_result

Horse Instance Segmentation Result, Text Prompt: horse horse_insseg_result

Dog Instance Segmentation Result, Text Prompt: dog dog_insseg_result

Fish Instance Segmentation Result, Text Prompt: fish fish_insseg_result

Strawberry Instance Segmentaton Result, Text Prompt: strawberry strawberry_insseg_result

Glasses Instance Segmentaton Result, Text Prompt: glasses glasses_insseg_result

Tv Instance Segmentaton Result, Text Prompt: television tv_insseg_result

Shoes Instance Segmentaton Result, Text Prompt: shoe shoes_insseg_result

Bridge Instance Segmentaton Result, Text Prompt: bridge bridge_insseg_result

Airplane Instance Segmentaton Result, Text Prompt: airplane airplane_insseg_result

Support Multiple Classes Segmentation All In Once ---- YOSO ---- You Only Segment Once

cd PROJECT_ROOT_DIR
export PYTHONPATH=$PWD:$PYTHONPATH
python tools/sam_clip_text_seg.py --input_image_path ./data/test_images/test_horse.jpg --text "horse,mountain,grass,sky,clouds,tree" --cls_score_thresh 0.5 --use_text_prefix

Horse Instance Segmentation Result, Text Prompt: horse,mountain,grass,sky,clouds,tree horse_insseg_result Tv Instance Segmentaton Result, Text Prompt: television,audio system,tape recorder,box tv_insseg_result Strawberry Instance Segmentaton Result, Text Prompt: strawberry,grapefruit,spoon,wolfberry,oatmeal strawberry_insseg_result Frog Instance Segmentaton Result, Text Prompt: frog,turtle,snail,eye frog_insseg_result

Instance Segmentation Provement

2023-04-21 improve background segmentation problem

Befor Optimize before After Optimize after

Unsupervised Cluster Semantic Objects From SAM Model

Cluster first using sam model to get all obj's mask of the input image. Second using clip model to extract image features for each objects. Third calculate feature distance of every two object pairs. Finally using a similarity threshold to cluster source objects.

To test the cluster simply run

cd PROJECT_ROOT_DIR
export PYTHONPATH=$PWD:$PYTHONPATH
python tools/cluster_sam.py --input_image_path ./data/test_images/test_bear.jpg --simi_thresh 0.82

Bear Cluster Result bear_cluster_result

Horse Cluster Result horse_cluster_result

Each row represents source image, sam origin mask, ori masked image, clustered mask, cluster masked image

UPDATES

2023-07-04 Integrate MobileSAM

Integrate MobileSAM into the pipeline for lightweight and faster inference. If you want to use mobile-sam to segment your image all you need to do is to modify ./config/sam.yaml file. Modify the model name field to vit_t and modify the model weight file path to ./pretrained/sam/mobile_sam.pt

TODO

  • Test different kinds of cluster method
  • Using cluster result as input prompts to reseg the image via sam model
  • Merge embedding feats of global image and masked image

Acknowledgement

Most of the repo's code borrows from opeai's clip repo and facebook's segment-anything repo:

Star History

Star History Chart

Visitor Count

Visitor Count

Contact

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.