baaivision / painter Goto Github PK
View Code? Open in Web Editor NEWPainter & SegGPT Series: Vision Foundation Models from BAAI
License: MIT License
Painter & SegGPT Series: Vision Foundation Models from BAAI
License: MIT License
Nice work! I have some questions about the details. Hope you can help.
Great work ! I have some question about the evaluation on denosing task.
The input image size is 448x448 by default. However, in the standrad evaluation of SIDD dataset, the PSNR and SSIM are computed on the croped 256x256 image patches. I want to know in this work, the evaluation is conducted on the 256x256 image patches (resize to 448x448 as input) or 448x448 image patches (re-crop the original image to 448 patches) ?
Thanks!
How can I get the mask result after prediction with prompt image?
I am unable to run SegGPT Inference. Your instructions on the environment are extremely vague and following them does not allow me to run the code. Would it be possible for you to be a bit more specific i.e. which Version of Python you are using, whether torch should be installed with CUDA supprt, etc.?
Thanks for your amazing~ When will the SegGpt training code be open source?
I am very interested in this project, and I have few questions, does anybody can help me?
1. What is the relationship between painter and SegGPT?
2. If I want to fine tune SegGPT on my own dataset, are there any suggestions?
3. And can author provide a script similar to Painter for developers to verify and use ?
Thank you for your excellent work. But I am confused about the details of obtaining a prompt via selection :
“The first baseline is to obtain a better prompt via selection, that we traverse the whole training set in a heuristic manner and select the best-performing example pair for each task.”
What does "a heuristic manner" mean? My understanding is to concatenate sample-specific features and all the prompt features, and then conduct the selection process. But could you provide more details about how does the selection process work? Is it weighted for each prompt?
Hi, can I give SegGPT negative image targets to avoid confusion among similiar objects?
Thanks.
I read the paper but could not understand whether SegGPT support instance segmentation. Can it predict category id and the mask for each instance of categories present in the input image?
Thanks for sharing the awesome work!
I do not fully understand the random color scheme in Sec 3.1.
Looking forward to your reply:)
Hello, your work is very innovative. I have some implementation questions regarding keypoint detection and I hope to get your answers:
Thank for your amazing!! Would SegGpt do multiple categories prompting on inference?
问题如题。
补充问题:1、SegGPT在推理时,每个prompt图需要与待检测图合并,重新推理提取特征然后特征组合,效率不高。
能否把实现方式改为prompt图的特征不需要与待检测图片合并,单独提取特征后做特征组合;对于每个新的待检测图片不需要重新提取特征,在推理时的效率可以大幅度提高。
2、SegGPT在使用多个prompt图时确实分割效果确实有一定提升,但是在某些场景下并不是prompt图越多越好,尤其是对于不熟悉的场景,例如待分割目标占图片比例较小,非日常场景等。对于这类问题有没有更多的研究?
感谢分享最新的研究成果,确实令人印象深刻。
实验条件所限,没有24G的显卡,但是有32G内存的台式机,尝试使用CPU运行toy数据集,但是运行结果很奇怪,loss不降反增。
代码修改了很多地方,看起来主要是围绕自动精度调整,分布式运算等。
希望研究团队能提供一下使用CPU运行的代码,万分感谢
Hello,
Thanks for your great work!
What is the format of input in evaluation, directly using the input image or using stitched input?
Hi, what are the specific training settings for single-task of semantic segmentation in ade20k?
Now, our settings are: batch size=8 (per gpu), accumulate iterations=16, nodes=2, base lr=1e-3 (also tried actual lr=1e-3 which not worked), epoch=300, warmup epoch=20, layer decay=0.8.
However, this hyper-params setting is not worked. Is there something we're missing?
Thank you very much!
Hi there 👋
We are a community of CV Engineers and we were reading Visual Prompting via Image Inpainting
We would like to ask a couple of questions:
Why did you create the dataset in that way? It is not similar to the final input and you could have created something way easier by taking normal CV segmentation datasets and composing the grid image.
This is the image in the paper for training, yet it is unclear how you do inference. Can you give me some pseudo code assuming we take as input
How did you find the right z_i for each "patch" token coming from MAE?
Could you give us the intuition on why you are doing the training in this way and not directly predicting the patch tokens on the missing parts?
Thank you
Cheers,
Fra
I got this message after I run the first example.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
关于SegGPT在FSS的实验中有点困惑,我想知道SegGPT在训练的时候是否已经用上Pascal和COCO 的所有类别?
据我的了解,过去FSS的实验中,训练集的类别与测试集的类别是不交叉的。
而论文中说:For a fair comparison, we also evaluate specialist models on in-domain categories marked by *. * indicates that the
categories in training cover the categories in testing.
根据论文的描述,SegGPT应该是主要跟加" * "号的方法比较。那么SegGPT是否也是"the categories in training cover the categories in testing"
Thanks for your great work, I am look forward to realize the code for segGPT, when will you realease the code?
Hi there, I would like to inquire about the computational resources and training time used by the authors to train the SegGPT model. Can you please provide some information on this?
A perfect paper. But I am a little confused about this:
“The second baseline is to generate a task prompt. We define the task prompt as the learnable tensors, freeze the whole model, and then use the training loss to optimize the task prompts.”
Could you give us more details about training the task prompt?
Hi, it seems window_block_indexes
is set as a nested list when building the model
Painter/Painter/models_painter.py
Line 481 in 07c8a2a
in which case window_size
in this line is always 0
Painter/Painter/models_painter.py
Line 307 in 07c8a2a
Is this expected?
Have you train SegGPT on ViT-B? How is its performance? Because I don't have enough GPUs to reproduce on ViT-L.
I have the inference demos from the README working for SegGPT and with my own data, producing a binary mask for an input image or video. Is there an example of how to use the model for panoptic segmentation? Are per-instance input masks provided as separate input files or a single image with different values per instance ID? And what is the output format?
Hi!I don't quite understand the num_frames parameter in the video, does it have anything to do with multiple prompts, does the video support multiple prompts?Thanks
version is different, mmdet and mmpose are not compatible to data/mmdet_custom and data/mmpose_custom
preparing coco-pose and coco-inst failed in ./tools/train.py
Your work is so interesting! But when I evalute the results, I can't find "sem_seg_loading_fn"in your ADE20kSemSegEvaluatorCustom.py.
import models_seggpt
it seems that there is temporarily no models in the inference?
1.我想要达到rainbow.gif的效果, 我应该怎写推理代码?
2.我使用了/Painter/eval/ade20k_semantic/painter_inference_segm.py 输入只有一个类别的mask and 和图片,推理时却识别的都是车一类的东西?
Hello, could you please provide the code to convert the model to ONNX?
Hello! I notice in your code that the model's input remains consistent during training and inference, i.e., paired images imgs
, paired labels tgts
, and mask bool_masked_pos
. During forward()
, the model can see the information of test labels before the labels get masked (with patch_embed
), see the screenshot below:
This is acceptable during masked image modeling training, but what about during inference (test image has no label)?
I mean, since the information of test labels should not be seen by Painter, does the paired labels tgts
already have all 0 values on those pixels belonging to test labels? or you have other preprocessing strategy for tgts
during inference?
I draw a sketch and hopefully this would make myself clearer:
Reference: https://github.com/ChaoningZhang/MobileSAM
Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.
MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:
Best Wishes,
Qiao
I'm tring to test the model on COCO semantic segmentation, how to process the data? And any ideas to get the result as showing in the rainbow demo?
当我运行pip install -r requirements.txt时报了一下错误,请问是什么原因
(SegGPT) D:\pythonproject\Painter-main\Painter-main\SegGPT\SegGPT_inference>pip install -r requirements.txt
Collecting git+https://github.com/facebookresearch/detectron2.git (from -r requirements.txt (line 6))
Cloning https://github.com/facebookresearch/detectron2.git to c:\users\pc\appdata\local\temp\pip-req-build-knw03leb
Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/detectron2.git 'C:\Users\PC\AppData\Local\Temp\pip-req-build-knw03leb'
fatal: unable to access 'https://github.com/facebookresearch/detectron2.git/': Failed to connect to github.com port 443: Timed out
error: subprocess-exited-with-error
× git clone --filter=blob:none --quiet https://github.com/facebookresearch/detectron2.git 'C:\Users\PC\AppData\Local\Temp\pip-req-build-knw03leb' did not run successfully.
│ exit code: 128
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× git clone --filter=blob:none --quiet https://github.com/facebookresearch/detectron2.git 'C:\Users\PC\AppData\Local\Temp\pip-req-build-knw03leb' did not run successfully.
│ exit code: 128
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
(SegGPT) D:\pythonproject\Painter-main\Painter-main\SegGPT\SegGPT_inference>
Hi ,
Painter is a great project. We try to launch it for daily task, and find in the app_gradio.py, it post the request to another server, which make it impossible to do any customize development.
r = requests.post("http://120.92.79.209/painter/runVideo", files = files)
Is it possible to launch the painter with the source code in our local server.
Thanks very much!!
Hey,
Thanks a lot for your efforts building this project !! Appreciate all your efforts !
Are you going to share the code when combining Painter with SAM ?
Hello,
Thanks for your great work!
I find the prompt is fixed in the evaluation, how to choose the prompt?
hi、对你们的工作很感兴趣,想问下什么时候会上传代码
Hi,
I found your work to be very interesting. But I was not super clear on how you predict the segmentation mask from a blank mask during inference? Does it just take a sequence of mask
tokens to reconstruct all the mask patches simultaneously? Or it is an auto-regressive approach? Could you elaborate?
Thanks!
Hi there, thanks for your amazing work. After reading your paper of SegGPT. I'm little confused about the in-context tuning. In the paper, during the training stage, SegGPT treat a learnable image tensor as learnable prompt. But in the normal training stage, the input is a pair of in-context images with each mask, such as image1-mask1 and image2-mask2. So the learnable image tensor is a random image-mask? With image3-mask3 from the datasets, the whole input is image-mask(prompt) and image3-mask3? Due to the mask of random image-mask is random, so there is no label for loss calculation and gradient backward, how does it be trained? Please tell me more and help me solve this. Thanks!
is there any pre-trained model for fine-tuning?
if i want to train on my own dataset,what should i do?which file should i start from?
thank u for your amazing work!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.