casia-iva-lab / fastsam Goto Github PK

Fast Segment Anything

License: GNU Affero General Public License v3.0

Python 100.00%

fastsam's Introduction

Fast Segment Anything

[📕Paper] [🤗HuggingFace Demo] [Colab demo] [Replicate demo & API] [~~OpenXLab Demo~~] [Model Zoo] [BibTeX] [Video Demo]

The Fast Segment Anything Model(FastSAM) is a CNN Segment Anything Model trained using only 2% of the SA-1B dataset published by SAM authors. FastSAM achieves comparable performance with the SAM method at 50× higher run-time speed.

🍇 Updates

2023/11/28 Recommendation: Semantic FastSAM, which add the semantic class labels to FastSAM. Thanks to KBH00 for this valuable contribution.
2023/09/11 Release Training and Validation Code.
2023/08/17 Release OpenXLab Demo. Thanks to OpenXLab Team for help.
2023/07/06 Added to Ultralytics (YOLOv8) Model Hub. Thanks to Ultralytics for help 🌹.
2023/06/29 Support text mode in HuggingFace Space. Thanks a lot to gaoxinge for help 🌹.
2023/06/29 Release FastSAM_Awesome_TensorRT. Thanks a lot to ChuRuaNh0 for providing the TensorRT model of FastSAM 🌹.
2023/06/26 Release FastSAM Replicate Online Demo. Thanks a lot to Chenxi for providing this nice demo 🌹.
2023/06/26 Support points mode in HuggingFace Space. Better and faster interaction will come soon!
2023/06/24 Thanks a lot to Grounding-SAM for Combining Grounding-DINO with FastSAM in Grounded-FastSAM 🌹.

Installation

Clone the repository locally:

git clone https://github.com/CASIA-IVA-Lab/FastSAM.git

Create the conda env. The code requires python>=3.7, as well as pytorch>=1.7 and torchvision>=0.8. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

conda create -n FastSAM python=3.9
conda activate FastSAM

Install the packages:

cd FastSAM
pip install -r requirements.txt

Install CLIP:

pip install git+https://github.com/openai/CLIP.git

Getting Started

First download a model checkpoint.

Then, you can run the scripts to try the everything mode and three prompt modes.

# Everything mode
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg

# Text prompt
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg  --text_prompt "the yellow dog"

# Box prompt (xywh)
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --box_prompt "[[570,200,230,400]]"

# Points prompt
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg  --point_prompt "[[520,360],[620,300]]" --point_label "[1,0]"

You can use the following code to generate all masks, make mask selection based on prompts, and visualize the results.

from fastsam import FastSAM, FastSAMPrompt

model = FastSAM('./weights/FastSAM.pt')
IMAGE_PATH = './images/dogs.jpg'
DEVICE = 'cpu'
everything_results = model(IMAGE_PATH, device=DEVICE, retina_masks=True, imgsz=1024, conf=0.4, iou=0.9,)
prompt_process = FastSAMPrompt(IMAGE_PATH, everything_results, device=DEVICE)

# everything prompt
ann = prompt_process.everything_prompt()

# bbox default shape [0,0,0,0] -> [x1,y1,x2,y2]
ann = prompt_process.box_prompt(bboxes=[[200, 200, 300, 300]])

# text prompt
ann = prompt_process.text_prompt(text='a photo of a dog')

# point prompt
# points default [[0,0]] [[x1,y1],[x2,y2]]
# point_label default [0] [1,0] 0:background, 1:foreground
ann = prompt_process.point_prompt(points=[[620, 360]], pointlabel=[1])

prompt_process.plot(annotations=ann,output_path='./output/dog.jpg',)

You are also welcomed to try our Colab demo: FastSAM_example.ipynb.

Different Inference Options

We provide various options for different purposes, details are in MORE_USAGES.md.

Training or Validation

Training from scratch or validation: Training and Validation Code.

Web demo

Gradio demo

We also provide a UI for testing our method that is built with gradio. You can upload a custom image, select the mode and set the parameters, click the segment button, and get a satisfactory segmentation result. Currently, the UI supports interaction with the 'Everything mode' and 'points mode'. We plan to add support for additional modes in the future. Running the following command in a terminal will launch the demo:

# Download the pre-trained model in "./weights/FastSAM.pt"
python app_gradio.py

This demo is also hosted on HuggingFace Space.

Replicate demo

Replicate demo has supported all modes, you can experience points/box/text mode.

Model Checkpoints

Two model versions of the model are available with different sizes. Click the links below to download the checkpoint for the corresponding model type.

default or FastSAM: YOLOv8x based Segment Anything Model | Baidu Cloud (pwd: 0000).
FastSAM-s: YOLOv8s based Segment Anything Model.

Results

All result were tested on a single NVIDIA GeForce RTX 3090.

1. Inference time

Running Speed under Different Point Prompt Numbers(ms).

method	params	1	10	100	E(16x16)	E(32x32*)	E(64x64)
SAM-H	0.6G	446	464	627	852	2099	6972
SAM-B	136M	110	125	230	432	1383	5417
FastSAM	68M	40	40	40	40	40	40

2. Memory usage

Dataset	Method	GPU Memory (MB)
COCO 2017	FastSAM	2608
COCO 2017	SAM-H	7060
COCO 2017	SAM-B	4670

3. Zero-shot Transfer Experiments

Edge Detection

Test on the BSDB500 dataset.

method	year	ODS	OIS	AP	R50
HED	2015	.788	.808	.840	.923
SAM	2023	.768	.786	.794	.928
FastSAM	2023	.750	.790	.793	.903

Object Proposals

COCO

method	AR10	AR100	AR1000	AUC
SAM-H E64	15.5	45.6	67.7	32.1
SAM-H E32	18.5	49.5	62.5	33.7
SAM-B E32	11.4	39.6	59.1	27.3
FastSAM	15.7	47.3	63.7	32.2

LVIS

bbox AR@1000

method	all	small	med.	large
ViTDet-H	65.0	53.2	83.3	91.2
zero-shot transfer methods
SAM-H E64	52.1	36.6	75.1	88.2
SAM-H E32	50.3	33.1	76.2	89.8
SAM-B E32	45.0	29.3	68.7	80.6
FastSAM	57.1	44.3	77.1	85.3

Instance Segmentation On COCO 2017

method	AP	APS	APM	APL
ViTDet-H	.510	.320	.543	.689
SAM	.465	.308	.510	.617
FastSAM	.379	.239	.434	.500

4. Performance Visualization

Several segmentation results:

Natural Images

Text to Mask

5.Downstream tasks

The results of several downstream tasks to show the effectiveness.

Anomaly Detection

Salient Object Detection

Building Extracting

License

The model is licensed under the Apache 2.0 license.

Acknowledgement

Segment Anything provides the SA-1B dataset and the base codes.
YOLOv8 provides codes and pre-trained models.
YOLACT provides powerful instance segmentation method.
Grounded-Segment-Anything provides a useful web demo template.

Contributors

Our project wouldn't be possible without the contributions of these amazing people! Thank you all for making this project better.

Citing FastSAM

If you find this project useful for your research, please consider citing the following BibTeX entry.

@misc{zhao2023fast,
      title={Fast Segment Anything},
      author={Xu Zhao and Wenchao Ding and Yongqi An and Yinglong Du and Tao Yu and Min Li and Ming Tang and Jinqiao Wang},
      year={2023},
      eprint={2306.12156},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

fastsam's People

Contributors

Stargazers

Watchers

Forkers

cat-stack-boop fictiverse msanov trizko hmidani-abdelilah matthewgard1 healthonrails vietanhdev zoq samigroup treksis realsky-lgh vn-os huangwenjunlovedy mcx andykeh710 adam-aalah spinkoo damlacoskun karayakar russ76 xymfei standardgalactic dariush-bahrami sandeewestgate habujhabn chenxwh uglierh f901107 qscuio iman yiranvang ezhangle kyrie10favor dralkebsi fkarionue hsaigroup evdcush linhong00316 robinmarily 2132660698 kkd4soei rizwanmunawar aust-hansen zhangjf2018 yanxg eltociear nstoa137 sirius1002 frederic33366 roderickgj123 ruyu37 marenan nemonameless ebiness ssahgal siennaknox lasyka qianlong2 artemardashev hyeyqwq yuqianf larasmithhh aarnrvera mustostark1 neerajkanhere gen-ai-experts henrykndr crazyboy9103 ytang67 perfyperfect sunkaianna rockystevejobs hhy5277 autogyro geocng kopigreenx mahimairaja iamleon121 admirind1 shaunwei rogerclarkgc sagum1 soon14 1bill2 solololololo faucet10i celum2 karynaur gierry tamnguyenvan floatingpoint64 weifj0212 bigshipai aojdfff tomdeaneight tezeragee soundwazzack gavinljj amkev101

fastsam's Issues

Can you help me passing a video for inferencing using cv2 library? and also export the output into the video or get a output frame on every image inference?

strategy for training data selection?

Thanks for your work. Could you please provide the details of how the 2% of the training data was selected? Did you use any specific strategy or just random selection ?

Thanks

Is only one box_prompt used in FastSAM?

Hi, thank you very much for your research.
Does box_prompt currently support only one box?

how finetune fastsam on my datasets?

Object crop or template image as prompt support any plans?

Fine Tuning Code

Hey,
Huge thanks for such a great repo that you have made, just wanted to know if it can be fine tuned for specific purposes, it would be great if you could provide code to fine tune the model on COCO dataset.

Minor error

Hello,

I am running the cat example given in the image folder. On MAC-M1, I get the following error:

img_array = np.frombuffer(buf, dtype=np.uint8).reshape(rows, cols, 3)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot reshape array of size 24883200 into shape (1920,1080,3)

can someone help?

Mask Label

Since FastSAM uses Yolo detector, is it possible to get the mask labels?

CoreML export with variable Input Size

Hi,
Can someone help in understanding how to give variable input image sizes to FastSAM model exported in onnx format, currently it takes only (1024,1024) size image which is leading to mismatch in desired outcome

Interface compatibility

Nice work!

I'd like to suggest that you create a compatible interface to SAM itself, i.e. a drop-in replacement to SamAutomaticMaskGenerator which would make it easier for people to start using this.

Training code

Hi author,
Will you release the code so we can have a try?

RuntimeError: expected scalar type long int but found float

I get following error when I run the model

File /root/FastSAM/fastsam/utils.py:17, in adjust_bboxes_to_image_border(boxes, image_shape, threshold)
     14 h, w = image_shape
     16 # Adjust boxes
---> 17 boxes[:, 0] = torch.where(boxes[:, 0] < threshold, 0, boxes[:, 0])  # x1
     18 boxes[:, 1] = torch.where(boxes[:, 1] < threshold, 0, boxes[:, 1])  # y1
     19 boxes[:, 2] = torch.where(boxes[:, 2] > w - threshold, w, boxes[:, 2])  # x2

RuntimeError: expected scalar type long int but found float

Can this project be exported to ONNX?

I want to use this pro to deploy.

Why my output image has blank parts around result image but with the correct same size

How can i remove the blank bounds in the output image.I am not familiar with matplotlib.Is there anything I should change?

it can only detact (and segment) by minor prompts if applying with text mode?

I tried to test some other prompt-words such as black eyes, wood, sands in text mode with the sample picture in huggingface, but wrong results were given. Does it is possibly because the sample prompt-words such as yellow dog and blcak dog were ever applied in the prompts of training dataset?

Errro:

Hello,

I am trying the mode for the first time. I get the following message on Apple M1. Can someone help?
File "/Users/Projects/FastSAM/gitsrc/utils/tools.py", line 179, in fast_process
img_array = np.fromstring(buf, dtype=np.uint8).reshape(rows, cols, 3)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot reshape array of size 7756992 into shape (603,1072,3)

License question

Thanks for you interesting work.

I have license question: How can FastSAM model be released under Apache license when the SA-1B training dataset is licensed under research only license + ultralytics yolo codebase under AGPL?

Is Yolact anchor based or anchor free ?

Thanks for the great engineering application research ！

I see the Yolo-seg series models completely borrow the idea of coefficientized masks from Yolact.

In your paper 3.2 section saying about yolov8: The updated Head module embraces a decoupled structure, separating classification and detection heads, and shifts from Anchor-Based to Anchor-Free.
And I'm wondering, does the threshold crop step in Yolact makes the yolo-seg architecture back to an Anchor-Based method?

Thank you very much for your excellent work, can fastsam be integrated into label-studio?

Can I use my own YOLOv8 model weights with FastSAM ?

I have tried to use a trained YOLOv8 models weights with FastSAM but it is raising an error about mismatching. I know you guys working on a fine-tune/train code right now but I was just curious about whether replacing the weights possible or not ?

Thanks for a great work!

can you provide a model download link of baidu cloud？

Suggestion - replace yolo8 with YoloNas for better performance

Thanks for the great work!
Can you consider replacing yolo8 with YoloNas for better speed+performance?

Can FastSAM be integrated into CVAT?

As you know, CVAT integrated SAM via serverless mode. @gaoxinge @zxDeepDiver

the model outputs only masks, no labels?

Are you also planning combining this model to Grounding-DINO by IDEA?

Output from onnx format

Hi, output from onnx format is of shape ([(1, 37, 21504), (1, 32, 256, 256)]. If I post process them using below method
where conf = 0.4, iou -> 0,=.9, and agnostic_nms = False like in the FastSAM .pt model but it doesn't return masks fo same length.

Can someone explain the outputs from onnx format FastSAM model and how to postprocess them .
def postprocess(preds, conf, iou, agnostic_nms=False):
"""TODO: filter by classes."""

p = ops.non_max_suppression(preds[0],
                            conf,
                            iou,
                            agnostic_nms,
                            max_det=100,
                            nc=1)



results = []
proto = preds[1]  # second output is len 3 if pt, but only 1 if exported
for i, pred in enumerate(p):

    pred[:, :4] = ops.scale_boxes(torch.Size([1024,1024]), pred[:, :4],(1024,1024))
    masks = ops.process_mask_native(proto[i], pred[:, 6:], pred[:, :4],(1024,1024))  # HWC
    return masks

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

Best Wishes,
Qiao

Great work! would you like to add this to ultralytics models HUB?

Hey guys,
great work on this. I'm one of the co-authors of Ultralytics YOLOv8 and was wondering if you'd like to add support for fast SAM to Ultralytics models HUB here -> https://docs.ultralytics.com/models/
I'd be happy to help. Thanks!

Why is FastSAM worse than MobileSAM with points as the prompt?

We have just released MobileSAM project https://github.com/ChaoningZhang/MobileSAM. We found that FastSAM seems to perform much worse than MobileSAM with points as the prompt, especially when the foreground point and background point as set close. Can you give thoughts on what might be the reason? Thank you for your help in advance.

Hello, when exporting to tensorrt (engine), an error is reported: export running on CPU but must be on GPU. How to fix it?

Training Cost

Hi! Thanks for the great repo! I really like it.
Could you please provide more details about training costs, such as GPUs / training time? Besides, why are only 2% of images from SA-1B used for training? Is there some reason for the setting?

can you share the checkpoint through baidu yun?

Output Image - Colab

Why does the example script used in the Colab notebook save the final image in a downsized version? When I use my own custom photo, the resulting image dimensions are considerably smaller.

cog是啥库?

向作者致敬，predict文件中import cog，这个cog是什么？也是pip安装的嘛

Output Image

Hi there,

I passed a 19201080 image into the model for segmenation, but the output image doesn't stay in the same resolution. The image with masks is scaled down into around 1328741 (a rough number) in equal proportions.

Any measures to keep the ourput image same size as the original image?

Thanks

about the train code

Thanks for your work, does the training code mean that I can use my labeled segmented data containing one class to train a model that segments only that one class?

感谢您的工作，请问训练代码是指可以使用我包含一个类的已标注分割数据训练一个只分割这一个类的模型吗？

Convert FastSAM to onnx and coreml format

Hi,

Has someone convert FastSAM to the onnx or coreml format?
Since, FastSam is based on YOLOV8 model and takes img.path as input, how to get an image trace for this and convert to core ml ormat?
Also, how to convert it to coreml format such that output can be of variable size ?

FigureCanvasTkAgg

Hello,

python Inference.py --model_path FastSAM-x.pt --img_path images/dogs.jpg

I got the following error when I ran it.

AttributeError: 'FigureCanvasTkAgg' object has no attribute 'renderer'

If anyone encounters this problem, they can add the following line to tools.py (line 171).

buf = fig.canvas.draw()
buf = fig.canvas.tostring_rgb()

Spliting model into Encoder and Decoder

Hello! I really like this project.
Do you plan to support splitting this model into Encoder and Decoder like the original SAM?
In that way, the Decoder part can be run very fast, and we can apply it to some applications like AnyLabeling.
I'd love to help integrate into AnyLabeling if we can find a way to split the model.
Thank you very much!

FileNotFoundError: [Errno 2] No such file or directory: 'weights\\FastSAM.pt'

I need help.I don't know why.

Tensor rt converter

Hi
Can you share the code how to convert the model to trt?

Read multiple images at once

Can I read all the pictures in the folder at one time and use only one text prompt? I tried to do this, but an opencv error occurred. Is there a mistake in my operation or other reasons, because this operation is more realistic significance

qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/usr/local/lib/python3.8/site-packages/cv2/qt/plugins" even though it was found

tensorrt inference?

Multiple objects text prompt

Is it possible with the text prompt that it detects multiple objects ? (e.g., "people" will find all people in the image)

Can't a 12GB GPU run?

Can't a 12GB GPU run?

Inference has the following error：

Install Error

I'm getting this error when I'm trying to run pip.

Installing collected packages: seaborn, ultralytics
ERROR: Could not install packages due to an OSError: [WinError 2] The system cannot find the file specified: 'C:\Python311\Scripts\ultralytics.exe' -> 'C:\Python311\Scripts\ultralytics.exe.deleteme'

FastSAM batch inference?

Does FastSAM support batch inference on segment everything mode?

TypeError: torch._VariableFunctionsClass.meshgrid() got multiple values for keyword argument 'indexing'

Run the example script.
python Inference.py --model_path FastSAM-x.pt --img_path .\examples\dogs.jpg

File "d:\ProgramData\Anaconda3\envs\fastsam\lib\site-packages\torch\functional.py", line 504, in _meshgrid
return _VF.meshgrid(tensors, **kwargs, indexing ='ij') # type: ignore[attr-defined]
TypeError: torch._VariableFunctionsClass.meshgrid() got multiple values for keyword argument 'indexing'
Any one know how to solve it?

bounding box interaction option

Thank you so much for your contribution. Will you support bounding box interaction option in the future?

Some questions about full graph segmentation

Thank you for your efforts!! I would like to ask if this whole image segmentation can get the label of each split block, or if there is any good solution to get the label.

casia-iva-lab / fastsam Goto Github PK

fastsam's Introduction

Fast Segment Anything

Installation

Getting Started

Different Inference Options

Training or Validation

Web demo

Gradio demo

Replicate demo

Model Checkpoints

Results

1. Inference time

2. Memory usage

3. Zero-shot Transfer Experiments

Edge Detection

Object Proposals

COCO

LVIS

Instance Segmentation On COCO 2017

4. Performance Visualization

Natural Images

Text to Mask

5.Downstream tasks

Anomaly Detection

Salient Object Detection

Building Extracting

License

Acknowledgement

Contributors

Citing FastSAM

fastsam's People

Contributors

Stargazers

Watchers

Forkers

fastsam's Issues

Recommend Projects

Recommend Topics

Recommend Org