baaivision / painter Goto Github PK

Painter & SegGPT Series: Vision Foundation Models from BAAI

License: MIT License

Python 97.76% Shell 1.26% MATLAB 0.99%

cvpr2023 in-context-learning generalist-model generalist-painter in-context-visual-learning seggpt segmentation-foundation-model

painter's Introduction

Painter → SegGPT: Vision Foundation Models from BAAI

final_video.mp4

Painter (CVPR 2023) - Images Speak in Images: A Generalist Painter for In-Context Visual Learning
SegGPT (ICCV 2023) - SegGPT: Segmenting Everything In Context

News

2023.4 Inference code and model of SegGPT are available at here.
2023.4 We combined SAM and SegGPT to enable one touch for segmentation in all images (一触百通), as well as any segmentation in a video (分割视频中的一切). Check it out here.
2023.4 Enjoy the SegGPT demo.
2023.3 Code and model of Painter are available.

Contact

We are hiring at all levels at BAAI Vision Team, including full-time researchers, engineers and interns. If you are interested in working with us on foundation model, visual perception and multimodal learning, please contact Xinlong Wang ([email protected]) and Yue Cao ([email protected]).

License

The content of this project itself is licensed under LICENSE.

Misc

painter's People

Contributors

Stargazers

Watchers

Forkers

ncbwct louderthanthunderx1 provable0816 qinb 3a1b2c3 wearmheart codeaudit triple-mu aaaawt babyblue26 zhang-tao-whu yiyu hertera1 rauniyar01 techthiyanes andreped rhinojosa cjh88888 hoooon89 sanjibnarzary hhy5277 turbo955 zeroonegame 0iui0 jasonmhead amitshah d710055071 aalizzwell-ai hritam-98 zhenlongsong mike575 david20080125 m-liu1987 apollohuang1 carlqyuan abaafdec skytodmoon maodong2056 h1code2 magicpose qianqian121 kingking888 ian-z liuwenhaha harryoung mohamadmansourx clxie qqq-tech joskid edsun3941 chinaray cv-seg nicehero westail sym0nb chen23b alienmckoon mykadrit corlixa brookja eltociear lenni991 cierrasimph lyrl jaedukseo viningr wdc233 hedlen jingzhang617 rchanggogogo fourthm ayatsujitsukasa simplewy lz118 klonggan arunbanswal sarvex centaurioun wang-tianwei mdmmn378 standardgalactic kkd4soei sammy42779 hongbo-sun gaoy74 amura terrance-wcy ductai199x jiaqi-chen-00 dcdekime encounter1997 sunrainyg omarsamiranov mrnp95 sontoriyama songokumbl smartadpole liujiaxing7 awesome-adversarial-related-work asdlei99

painter's Issues

Question about evaluation on denosing task

Great work ! I have some question about the evaluation on denosing task.

The input image size is 448x448 by default. However, in the standrad evaluation of SIDD dataset, the PSNR and SSIM are computed on the croped 256x256 image patches. I want to know in this work, the evaluation is conducted on the 256x256 image patches (resize to 448x448 as input) or 448x448 image patches (re-crop the original image to 448 patches) ?

Thanks!

painter运行不起来

感谢分享最新的研究成果，确实令人印象深刻。
实验条件所限，没有24G的显卡，但是有32G内存的台式机，尝试使用CPU运行toy数据集，但是运行结果很奇怪，loss不降反增。
代码修改了很多地方，看起来主要是围绕自动精度调整，分布式运算等。
希望研究团队能提供一下使用CPU运行的代码，万分感谢

A question about task prompt

A perfect paper. But I am a little confused about this:
“The second baseline is to generate a task prompt. We define the task prompt as the learnable tensors, freeze the whole model, and then use the training loss to optimize the task prompts.”
Could you give us more details about training the task prompt?

how can I get mask after prediction

How can I get the mask result after prediction with prompt image?

random coloring scheme in SegGPT

Thanks for sharing the awesome work!

I do not fully understand the random color scheme in Sec 3.1.

randomly sampling another image that shares a similar context with the input image; Q1: this step is the same as Painter, right?
randomly sample a set of colors from the target image and map each color to a random one. Q2: what do you mean by sampling colors from the target image? Q3: How to map the color to the randomly sampled image?
two pairs of images, which are defined as an in-context pair; Q4: In Painter, I know the two pairs are prompt /input image and its mask, target image and its mask. but what are the two pairs here?
we introduce the mix-context training method which trains the model using mixed examples. This involves stitching together multiple images with the same color mapping. The resulting image is then randomly cropped and resized to form a mixed-context training sample. Q5: compared to Painter, do you stitch more than two prompt images in SegGPT?

Looking forward to your reply:)

Question about the prompt in input during evaluation

Hello,

Thanks for your great work!
I find the prompt is fixed in the evaluation, how to choose the prompt?

Question about the learnable image tensor of in-context tuning in SegGPT

Hi there, thanks for your amazing work. After reading your paper of SegGPT. I'm little confused about the in-context tuning. In the paper, during the training stage, SegGPT treat a learnable image tensor as learnable prompt. But in the normal training stage, the input is a pair of in-context images with each mask, such as image1-mask1 and image2-mask2. So the learnable image tensor is a random image-mask? With image3-mask3 from the datasets, the whole input is image-mask(prompt) and image3-mask3? Due to the mask of random image-mask is random, so there is no label for loss calculation and gradient backward, how does it be trained? Please tell me more and help me solve this. Thanks!

hugging face demo is not working

Questions about the in-context inference

Hi,

I found your work to be very interesting. But I was not super clear on how you predict the segmentation mask from a blank mask during inference? Does it just take a sequence of mask tokens to reconstruct all the mask patches simultaneously? Or it is an auto-regressive approach? Could you elaborate?

Thanks!

Does SegGPT support "class-aware" instance segmentation?

I read the paper but could not understand whether SegGPT support instance segmentation. Can it predict category id and the mask for each instance of categories present in the input image?

SegGPT训练代码是否有公开计划？

问题如题。
补充问题：1、SegGPT在推理时，每个prompt图需要与待检测图合并，重新推理提取特征然后特征组合，效率不高。
能否把实现方式改为prompt图的特征不需要与待检测图片合并，单独提取特征后做特征组合；对于每个新的待检测图片不需要重新提取特征，在推理时的效率可以大幅度提高。
2、SegGPT在使用多个prompt图时确实分割效果确实有一定提升，但是在某些场景下并不是prompt图越多越好，尤其是对于不熟悉的场景，例如待分割目标占图片比例较小，非日常场景等。对于这类问题有没有更多的研究？

How to get semantic segmentation results with SegGPT?

I'm tring to test the model on COCO semantic segmentation, how to process the data? And any ideas to get the result as showing in the rainbow demo?

Is it possible to launch the painter runVideo in local machine

Hi ,

Painter is a great project. We try to launch it for daily task, and find in the app_gradio.py, it post the request to another server, which make it impossible to do any customize development.
r = requests.post("http://120.92.79.209/painter/runVideo", files = files)
Is it possible to launch the painter with the source code in our local server.

Thanks very much!!

train code for training SegGPT

Thanks for your great work, I am look forward to realize the code for segGPT, when will you realease the code?

inference is not working

Painter/SegGPT/SegGPT_inference/seggpt_inference.py

Line 8 in 07c8a2a

import models_seggpt

import models_seggpt

it seems that there is temporarily no models in the inference?

掩码是黑色，对于黑白ct,图像，无法使用，建议如何使用和微调吗？

Question about few-shot segmentation experiment

关于SegGPT在FSS的实验中有点困惑，我想知道SegGPT在训练的时候是否已经用上Pascal和COCO 的所有类别？
据我的了解，过去FSS的实验中，训练集的类别与测试集的类别是不交叉的。
而论文中说：For a fair comparison, we also evaluate specialist models on in-domain categories marked by *. * indicates that the
categories in training cover the categories in testing.
根据论文的描述，SegGPT应该是主要跟加" * "号的方法比较。那么SegGPT是否也是"the categories in training cover the categories in testing"

什么时候会上传代码呀？

hi、对你们的工作很感兴趣，想问下什么时候会上传代码

Windows下配置SegGPT环境报错

当我运行pip install -r requirements.txt时报了一下错误，请问是什么原因
(SegGPT) D:\pythonproject\Painter-main\Painter-main\SegGPT\SegGPT_inference>pip install -r requirements.txt
Collecting git+https://github.com/facebookresearch/detectron2.git (from -r requirements.txt (line 6))
Cloning https://github.com/facebookresearch/detectron2.git to c:\users\pc\appdata\local\temp\pip-req-build-knw03leb
Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/detectron2.git 'C:\Users\PC\AppData\Local\Temp\pip-req-build-knw03leb'
fatal: unable to access 'https://github.com/facebookresearch/detectron2.git/': Failed to connect to github.com port 443: Timed out
error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/facebookresearch/detectron2.git 'C:\Users\PC\AppData\Local\Temp\pip-req-build-knw03leb' did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

note: This error originates from a subprocess, and is likely not a problem with pip.

(SegGPT) D:\pythonproject\Painter-main\Painter-main\SegGPT\SegGPT_inference>

No such file or directory: 'seggpt_vit_large.pth'

Multiple categories prompting

Thank for your amazing!! Would SegGpt do multiple categories prompting on inference?

A question about fine-tuning

 I am very interested in this project, and I have few questions, does anybody can help me?
1.  What is the relationship between painter and SegGPT?
2.  If I want to fine tune SegGPT on my own dataset, are there any suggestions？
3. And can author provide a script similar to Painter for developers to verify and use ?

Question about pretraining and segmentation evaluation.

Nice work! I have some questions about the details. Hope you can help.

Is the model trained from scratch or pretrained?
As the model has a fixed input size 448*448. How to perform segmentation evaluation on images with various aspect ratios? Pad it with constant or perform sliding inference?
Does the training loss mask ignored pixels out, as well as padded pixels?

window_block_indexes not effectively used

Hi, it seems window_block_indexes is set as a nested list when building the model

Painter/Painter/models_painter.py

Line 481 in 07c8a2a

    
           window_block_indexes=(list(range(0, 2)) + list(range(3, 5)) + list(range(6, 8)) + list(range(9, 11)) + \

in which case window_size in this line is always 0

Painter/Painter/models_painter.py

Line 307 in 07c8a2a

window_size=window_size if i in window_block_indexes else 0,

Is this expected?

mmdet_custom & mmpose_custom failed

version is different, mmdet and mmpose are not compatible to data/mmdet_custom and data/mmpose_custom
preparing coco-pose and coco-inst failed in ./tools/train.py

model of seggpt will be open?

AttributeError: 'SemSegEvaluatorCustom' object has no attribute 'sem_seg_loading_fn'

Your work is so interesting! But when I evalute the results, I can't find "sem_seg_loading_fn"in your ADE20kSemSegEvaluatorCustom.py.

Aborted (core dumped) problem

I got this message after I run the first example.

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

Questions on Painter's inference without groundtruths

Hello! I notice in your code that the model's input remains consistent during training and inference, i.e., paired images imgs, paired labels tgts, and mask bool_masked_pos. During forward(), the model can see the information of test labels before the labels get masked (with patch_embed), see the screenshot below:

This is acceptable during masked image modeling training, but what about during inference (test image has no label)?

I mean, since the information of test labels should not be seen by Painter, does the paired labels tgts already have all 0 values on those pixels belonging to test labels? or you have other preprocessing strategy for tgts during inference?

I draw a sketch and hopefully this would make myself clearer:

Poorly Defined Requirements

I am unable to run SegGPT Inference. Your instructions on the environment are extremely vague and following them does not allow me to run the code. Would it be possible for you to be a bit more specific i.e. which Version of Python you are using, whether torch should be installed with CUDA supprt, etc.?

Questions about the prompt selection

Thank you for your excellent work. But I am confused about the details of obtaining a prompt via selection :
“The first baseline is to obtain a better prompt via selection, that we traverse the whole training set in a heuristic manner and select the best-performing example pair for each task.”
What does "a heuristic manner" mean? My understanding is to concatenate sample-specific features and all the prompt features, and then conduct the selection process. But could you provide more details about how does the selection process work? Is it weighted for each prompt?

SegGpt will be open source？

Thanks for your amazing~ When will the SegGpt training code be open source?

Questions about num_frames

Hi!I don't quite understand the num_frames parameter in the video, does it have anything to do with multiple prompts, does the video support multiple prompts?Thanks

Seg gpt to onnx

Hello, could you please provide the code to convert the model to ONNX？

Demo app went wrong

Computational cost

Hi there, I would like to inquire about the computational resources and training time used by the authors to train the SegGPT model. Can you please provide some information on this?

training setting for single-task (i.e., semantic segmentation in ade20k) in painter

Hi, what are the specific training settings for single-task of semantic segmentation in ade20k?
Now, our settings are: batch size=8 (per gpu), accumulate iterations=16, nodes=2, base lr=1e-3 (also tried actual lr=1e-3 which not worked), epoch=300, warmup epoch=20, layer decay=0.8.
However, this hyper-params setting is not worked. Is there something we're missing?
Thank you very much!

launch.py: error: argument --node_rank: invalid int value: ''

follow this step, i want to try toy dataset, but i got error:
launch.py: error: argument --node_rank: invalid int value: ''

Questions about Painter Paper

Hi there 👋

We are a community of CV Engineers and we were reading Visual Prompting via Image Inpainting

We would like to ask a couple of questions:

Why did you create the dataset in that way? It is not similar to the final input and you could have created something way easier by taking normal CV segmentation datasets and composing the grid image.

This is the image in the paper for training, yet it is unclear how you do inference. Can you give me some pseudo code assuming we take as input $x$ (image) and $m$ the mask part that we will have to fill

How did you find the right z_i for each "patch" token coming from MAE?

Could you give us the intuition on why you are doing the training in this way and not directly predicting the patch tokens on the missing parts?

Thank you

Cheers,

Fra

Is training code of SegGPT available？

how to apply the model to my own dataset？

is there any pre-trained model for fine-tuning？
if i want to train on my own dataset，what should i do？which file should i start from？
thank u for your amazing work！

Question about input of evaluation

Hello,

Thanks for your great work!
What is the format of input in evaluation, directly using the input image or using stitched input?

我想要推理一张图片，怎么编写代码？

1.我想要达到rainbow.gif的效果，我应该怎写推理代码？
2.我使用了/Painter/eval/ade20k_semantic/painter_inference_segm.py 输入只有一个类别的mask and 和图片，推理时却识别的都是车一类的东西？

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

Best Wishes,

Qiao

backbone

Have you train SegGPT on ViT-B? How is its performance? Because I don't have enough GPUs to reproduce on ViT-L.

combining Painter+SAM code

Hey,
Thanks a lot for your efforts building this project !! Appreciate all your efforts !
Are you going to share the code when combining Painter with SAM ?

Panoptic segmentation example for SegGPT

I have the inference demos from the README working for SegGPT and with my own data, producing a binary mask for an input image or video. Is there an example of how to use the model for panoptic segmentation? Are per-instance input masks provided as separate input files or a single image with different values per instance ID? And what is the output format?

Question about keypoint detection

Hello, your work is very innovative. I have some implementation questions regarding keypoint detection and I hope to get your answers:

Why do keypoints need to be described using squares? Also, why are different sizes of squares used in the cls and loc channels? How are the sizes determined?
Is the output square size 17*17 during inference?

Negative targets

Hi, can I give SegGPT negative image targets to avoid confusion among similiar objects?
Thanks.