Code Monkey home page Code Monkey logo

painter's Introduction

Painter → SegGPT: Vision Foundation Models from BAAI

final_video.mp4

  • Painter (CVPR 2023) - Images Speak in Images: A Generalist Painter for In-Context Visual Learning

  • SegGPT (ICCV 2023) - SegGPT: Segmenting Everything In Context

News

  • 2023.4 Inference code and model of SegGPT are available at here.
  • 2023.4 We combined SAM and SegGPT to enable one touch for segmentation in all images (一触百通), as well as any segmentation in a video (分割视频中的一切). Check it out here.
  • 2023.4 Enjoy the SegGPT demo.
  • 2023.3 Code and model of Painter are available.

Contact

  • We are hiring at all levels at BAAI Vision Team, including full-time researchers, engineers and interns. If you are interested in working with us on foundation model, visual perception and multimodal learning, please contact Xinlong Wang ([email protected]) and Yue Cao ([email protected]).

License

The content of this project itself is licensed under LICENSE.

Misc

Stargazers repo roster for @baaivision/Painter

Forkers repo roster for @baaivision/Painter

Star History Chart

painter's People

Contributors

caoyue10 avatar encounter1997 avatar wxinlong avatar zhangxiaosong18 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

painter's Issues

Question about evaluation on denosing task

Great work ! I have some question about the evaluation on denosing task.

The input image size is 448x448 by default. However, in the standrad evaluation of SIDD dataset, the PSNR and SSIM are computed on the croped 256x256 image patches. I want to know in this work, the evaluation is conducted on the 256x256 image patches (resize to 448x448 as input) or 448x448 image patches (re-crop the original image to 448 patches) ?

Thanks!

painter运行不起来

感谢分享最新的研究成果,确实令人印象深刻。
实验条件所限,没有24G的显卡,但是有32G内存的台式机,尝试使用CPU运行toy数据集,但是运行结果很奇怪,loss不降反增。
代码修改了很多地方,看起来主要是围绕自动精度调整,分布式运算等。
希望研究团队能提供一下使用CPU运行的代码,万分感谢

A question about task prompt

A perfect paper. But I am a little confused about this:
“The second baseline is to generate a task prompt. We define the task prompt as the learnable tensors, freeze the whole model, and then use the training loss to optimize the task prompts.”
Could you give us more details about training the task prompt?

random coloring scheme in SegGPT

Thanks for sharing the awesome work!

I do not fully understand the random color scheme in Sec 3.1.

  • randomly sampling another image that shares a similar context with the input image; Q1: this step is the same as Painter, right?
  • randomly sample a set of colors from the target image and map each color to a random one. Q2: what do you mean by sampling colors from the target image? Q3: How to map the color to the randomly sampled image?
  • two pairs of images, which are defined as an in-context pair; Q4: In Painter, I know the two pairs are prompt /input image and its mask, target image and its mask. but what are the two pairs here?
  • we introduce the mix-context training method which trains the model using mixed examples. This involves stitching together multiple images with the same color mapping. The resulting image is then randomly cropped and resized to form a mixed-context training sample. Q5: compared to Painter, do you stitch more than two prompt images in SegGPT?

Looking forward to your reply:)

Question about the learnable image tensor of in-context tuning in SegGPT

Hi there, thanks for your amazing work. After reading your paper of SegGPT. I'm little confused about the in-context tuning. In the paper, during the training stage, SegGPT treat a learnable image tensor as learnable prompt. But in the normal training stage, the input is a pair of in-context images with each mask, such as image1-mask1 and image2-mask2. So the learnable image tensor is a random image-mask? With image3-mask3 from the datasets, the whole input is image-mask(prompt) and image3-mask3? Due to the mask of random image-mask is random, so there is no label for loss calculation and gradient backward, how does it be trained? Please tell me more and help me solve this. Thanks!

Questions about the in-context inference

Hi,

I found your work to be very interesting. But I was not super clear on how you predict the segmentation mask from a blank mask during inference? Does it just take a sequence of mask tokens to reconstruct all the mask patches simultaneously? Or it is an auto-regressive approach? Could you elaborate?

Thanks!

SegGPT训练代码是否有公开计划?

问题如题。
补充问题:1、SegGPT在推理时,每个prompt图需要与待检测图合并,重新推理提取特征然后特征组合,效率不高。
能否把实现方式改为prompt图的特征不需要与待检测图片合并,单独提取特征后做特征组合;对于每个新的待检测图片不需要重新提取特征,在推理时的效率可以大幅度提高。
2、SegGPT在使用多个prompt图时确实分割效果确实有一定提升,但是在某些场景下并不是prompt图越多越好,尤其是对于不熟悉的场景,例如待分割目标占图片比例较小,非日常场景等。对于这类问题有没有更多的研究?

Question about few-shot segmentation experiment

image
关于SegGPT在FSS的实验中有点困惑,我想知道SegGPT在训练的时候是否已经用上Pascal和COCO 的所有类别?
据我的了解,过去FSS的实验中,训练集的类别与测试集的类别是不交叉的。
而论文中说:For a fair comparison, we also evaluate specialist models on in-domain categories marked by *. * indicates that the
categories in training cover the categories in testing.
根据论文的描述,SegGPT应该是主要跟加" * "号的方法比较。那么SegGPT是否也是"the categories in training cover the categories in testing"

Windows下配置SegGPT环境报错

当我运行pip install -r requirements.txt时报了一下错误,请问是什么原因
(SegGPT) D:\pythonproject\Painter-main\Painter-main\SegGPT\SegGPT_inference>pip install -r requirements.txt
Collecting git+https://github.com/facebookresearch/detectron2.git (from -r requirements.txt (line 6))
Cloning https://github.com/facebookresearch/detectron2.git to c:\users\pc\appdata\local\temp\pip-req-build-knw03leb
Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/detectron2.git 'C:\Users\PC\AppData\Local\Temp\pip-req-build-knw03leb'
fatal: unable to access 'https://github.com/facebookresearch/detectron2.git/': Failed to connect to github.com port 443: Timed out
error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/facebookresearch/detectron2.git 'C:\Users\PC\AppData\Local\Temp\pip-req-build-knw03leb' did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/facebookresearch/detectron2.git 'C:\Users\PC\AppData\Local\Temp\pip-req-build-knw03leb' did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

(SegGPT) D:\pythonproject\Painter-main\Painter-main\SegGPT\SegGPT_inference>

A question about fine-tuning

 I am very interested in this project, and I have few questions, does anybody can help me?
1.  What is the relationship between painter and SegGPT?
2.  If I want to fine tune SegGPT on my own dataset, are there any suggestions?
3. And can author provide a script similar to Painter for developers to verify and use ?

Question about pretraining and segmentation evaluation.

Nice work! I have some questions about the details. Hope you can help.

  1. Is the model trained from scratch or pretrained?
  2. As the model has a fixed input size 448*448. How to perform segmentation evaluation on images with various aspect ratios? Pad it with constant or perform sliding inference?
  3. Does the training loss mask ignored pixels out, as well as padded pixels?

mmdet_custom & mmpose_custom failed

version is different, mmdet and mmpose are not compatible to data/mmdet_custom and data/mmpose_custom
preparing coco-pose and coco-inst failed in ./tools/train.py

Aborted (core dumped) problem

I got this message after I run the first example.

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

Questions on Painter's inference without groundtruths

Hello! I notice in your code that the model's input remains consistent during training and inference, i.e., paired images imgs, paired labels tgts, and mask bool_masked_pos. During forward(), the model can see the information of test labels before the labels get masked (with patch_embed), see the screenshot below:
image

This is acceptable during masked image modeling training, but what about during inference (test image has no label)?

I mean, since the information of test labels should not be seen by Painter, does the paired labels tgts already have all 0 values on those pixels belonging to test labels? or you have other preprocessing strategy for tgts during inference?

I draw a sketch and hopefully this would make myself clearer:

Q%RGMO8_)NBJ9OQ~_RBJVVS

Poorly Defined Requirements

I am unable to run SegGPT Inference. Your instructions on the environment are extremely vague and following them does not allow me to run the code. Would it be possible for you to be a bit more specific i.e. which Version of Python you are using, whether torch should be installed with CUDA supprt, etc.?

Questions about the prompt selection

Thank you for your excellent work. But I am confused about the details of obtaining a prompt via selection :
“The first baseline is to obtain a better prompt via selection, that we traverse the whole training set in a heuristic manner and select the best-performing example pair for each task.”
What does "a heuristic manner" mean? My understanding is to concatenate sample-specific features and all the prompt features, and then conduct the selection process. But could you provide more details about how does the selection process work? Is it weighted for each prompt?

Questions about num_frames

Hi!I don't quite understand the num_frames parameter in the video, does it have anything to do with multiple prompts, does the video support multiple prompts?Thanks

Seg gpt to onnx

Hello, could you please provide the code to convert the model to ONNX?

Computational cost

Hi there, I would like to inquire about the computational resources and training time used by the authors to train the SegGPT model. Can you please provide some information on this?

training setting for single-task (i.e., semantic segmentation in ade20k) in painter

Hi, what are the specific training settings for single-task of semantic segmentation in ade20k?
Now, our settings are: batch size=8 (per gpu), accumulate iterations=16, nodes=2, base lr=1e-3 (also tried actual lr=1e-3 which not worked), epoch=300, warmup epoch=20, layer decay=0.8.
However, this hyper-params setting is not worked. Is there something we're missing?
Thank you very much!

Questions about Painter Paper

Hi there 👋

We are a community of CV Engineers and we were reading Visual Prompting via Image Inpainting

We would like to ask a couple of questions:

Why did you create the dataset in that way? It is not similar to the final input and you could have created something way easier by taking normal CV segmentation datasets and composing the grid image.

This is the image in the paper for training, yet it is unclear how you do inference. Can you give me some pseudo code assuming we take as input $x$ (image) and $m$ the mask part that we will have to fill

image

How did you find the right z_i for each "patch" token coming from MAE?

Could you give us the intuition on why you are doing the training in this way and not directly predicting the patch tokens on the missing parts?

Thank you

Cheers,

Fra

how to apply the model to my own dataset?

is there any pre-trained model for fine-tuning?
if i want to train on my own dataset,what should i do?which file should i start from?
thank u for your amazing work!

Question about input of evaluation

Hello,

Thanks for your great work!
What is the format of input in evaluation, directly using the input image or using stitched input?

我想要推理一张图片,怎么编写代码?

1.我想要达到rainbow.gif的效果, 我应该怎写推理代码?
2.我使用了/Painter/eval/ade20k_semantic/painter_inference_segm.py 输入只有一个类别的mask and 和图片,推理时却识别的都是车一类的东西?

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

image

image

Best Wishes,

Qiao

backbone

Have you train SegGPT on ViT-B? How is its performance? Because I don't have enough GPUs to reproduce on ViT-L.

combining Painter+SAM code

Hey,
Thanks a lot for your efforts building this project !! Appreciate all your efforts !
Are you going to share the code when combining Painter with SAM ?

Panoptic segmentation example for SegGPT

I have the inference demos from the README working for SegGPT and with my own data, producing a binary mask for an input image or video. Is there an example of how to use the model for panoptic segmentation? Are per-instance input masks provided as separate input files or a single image with different values per instance ID? And what is the output format?

Question about keypoint detection

Hello, your work is very innovative. I have some implementation questions regarding keypoint detection and I hope to get your answers:

  1. Why do keypoints need to be described using squares? Also, why are different sizes of squares used in the cls and loc channels? How are the sizes determined?
  2. Is the output square size 17*17 during inference?

Negative targets

Hi, can I give SegGPT negative image targets to avoid confusion among similiar objects?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.