showlab / image2paragraph Goto Github PK

[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.

License: Apache License 2.0

Python 100.00%

chatgpt toolbox segmentt-anything gpt4

image2paragraph's Introduction

Image.txt: Transform Image Into Unique Paragraph

, ,

Project Website

(huggingface sometimes may not work with safari, use chrome)

Demo

News

17/April/2023. In addition to semantic segment anything, we use Edit Anything to get region-level semantic. Now all models takes less than 20s on 8G memory GPU card. (10times faster than previous version on cpu)
17/April/2023. Our project is online on Huggingface. Have a try! huggingface
14/April/2023. Our project is very popular in twitter. Looking the posted twitter for details.

(Can run on 8GB memory GPU within 20S!)

Main Pipeline

Reasoning Details

To Do List

Done

Doing

Replace ChatGPT with own trained LLM.
Other grounding text2image model as instead of Canny ControlNet.
Show retrieval result in gradio.

Visualization

The text to image model is conrolnet with canny from diffuser.

Installation

Please find installation instructions in install.md.

2. Start

Simple visualization

export OPENAI_KEY=[YOUR KEY HERE]
python main.py  --image_src [image_path] --out_image_name [out_file_name]

If your GPU memory smaller than 8 GPB.

python main.py --image_caption_device cpu --semantic_segment_device cpu

If you have no GPU available.

python main.py --image_caption_device cpu --semantic_segment_device cpu --dense_caption_device cpu  --contolnet_device cpu

python main.py --image_src "examples/3.jpg" --out_image_name "output/3_result.jpg"

Note: If you have GPU card with larger memory than 15GB. Set all device to GPU for fast inference.

The generated text and image are show in "output/".

Note: Use GPT4 for good result as GPT3.5 miss the position information sometime.

Use gradio directly

python main_gradio.py

If you have GPU Memory larger than 20GB. Use device='cuda' as default.

3. Visualization

	A dog sitting on a porch with a bike.
Input	BLIP2 Image Caption	GRIT Dense Caption	Semantic Segment Anything

The final generated paragraph with ChatGPT is:

  This image depicts a black and white dog sitting on a porch beside a red bike. The dense caption mentions other objects in the scene, such as a white car parked on the street and a red bike parked on the side of the road. The region semantic provides more specific information, including the porch, floor, wall, and trees. The dog can be seen sitting on the floor beside the bike, and there is also a parked bicycle and tree in the background. The wall is visible on one side of the image, while the street and trees can be seen in the other direction.

4. Retrieval Result on COCO

Method	Trainable Parameter	Running Time	IR@1	TR@1
Image-text	230M	9H	43.8	33.2
Generated Paragraph-text	0	5m	49.7	36.1

Interesting, we find compress image into paragraph. The retrieval result is even better than use source image.

Others

If you have more suggestions or functions need to be implemented in this codebase, feel free to drop me an email awinyimg dot gmail dot com or open an issue.

Acknowledgment

This work is based on ChatGPT, Edit_Anything, BLIP2, GRIT, OFA,Segment-Anything, Semantic-Segment-Anything, ControlNet.

image2paragraph's People

Contributors

Stargazers

Watchers

Forkers

ccaiccie brianjking albertoual kgonia alexandor91 tinker713 code4indo techthiyanes animebing paperwave matianlongg jiachen0212 u-k-l huangzhii guoyilin grv805 eobi robbinhan ithink3iam ismailozenc haorand darkman111a ws1hope ylqi robotpin adfwer233 hongwen-sun yutong-zhou-cv anshuai666 pppppyamap daitranskku anonymousdestroyer writartcrit lishuanzhu zhaohengyuan1 kidist-amde dmcconachie-tri chengamo iq-scm autogyro adambear 2132660698 andyoung009 cswaynecool y10ab1 cachewechat blue-j rfan-debug tejas1995 lumiwastaken unkownworld

image2paragraph's Issues

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

Best Wishes,

Qiao

Excellent work

Thank you.

run question

run python main.py --image_src "examples/3.jpg" --out_image_name "output/3_result.jpg" is not work

Comment instructions outdated

The comment instructions in the README.md seem to be outdated? The lines numbers listed don't seem to be meaningful if I am understanding this correctly.

About the visualization

Thanks for this great work and open-sourced repo. I didn't touch the segment tasks before. I am wondering to know how to visualize the dense segment image like you show in the repo.

_{Sent from PPHub}

Retrieval results with no Trainable Parameter?

I'm interesting in Retrieval CoCo with no Trainable Parameter,Can you explain in detail?

Can corresponding text descriptions be generated in batches for images in the image dataset?

Hello, author!, This is a great job, thank you for your contribution!
Can we batch process the images in the image dataset to generate corresponding text descriptions for each image?

install not work with python = 3.8.16

Hi,
Thanks for the great work.
try to replicate your work here, i created a new env and try pip install -r requirements.txt. but it give me an error.

ERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu111 (from versions: 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0)
ERROR: No matching distribution found for torch==1.9.0+cu111

I also try install torch using conda, but it seems not work at the implementation stage.

Any suggestions are appreciated.

Input type and weight type should be the same

I follow the instruction and run python main.py but meet such an error:

Region Semantic Models Do Not Work Well

First of all, thanks for the great work.

Image caption and dense caption modules all work fine here, however, the region caption module does not seem work well. I tested both edit_anything and ssa models.

For edit_anything model, it returns obviously wrong object descriptions. The following the the test image I input.

And the Region Segment module returns

a dog is walking on the floor in a room: [0, 50, 383, 165]; a person riding a skateboard down a street: [234, 49, 149, 166]; a piece of paper with a black background: [0, 0, 64, 110]; a white light switch with a black light: [312, 0, 53, 80]; the moon is seen over the city skyline: [116, 0, 56, 38];

There are clearly no dogs or skateboard in the picture.

For the ssa model, when I add --region_classify_model ssa option and change region_semantic method to use ssa, the method errors out with

│ /share/data/ripl/fjd/Image2Paragraph/models/segment_models/semantic_segment_anything_model.py:14 │
│ 7 in semantic_class_w_mask                                                                       │
│                                                                                                  │
│   144 │   │   │                                                                                  │
│   145 │   │   │   valid_mask_large_crop = mmcv.imcrop(valid_mask.numpy(), np.array([bbox[0], b   │
│   146 │   │   │   scale_large)                                                                   │
│ ❱ 147 │   │   │   top_1_patch_large = torch.bincount(class_ids_patch_large[torch.tensor(valid_   │
│   148 │   │   │   top_1_mask_category = mask_categories[top_1_patch_large.item()]                │
│   149 │   │   │                                                                                  │
│   150 │   │   │   ann['class_name'] = str(top_1_mask_category)                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: The shape of the mask [3, 23] at index 0 does not match the shape of the indexed tensor [23, 3] at index 0

I wonder you have a good way to use region segment methods.

Possible Langchain integration?

Hey, kind man! Your model is just awesome, wondering if you have some plans for langchain integration? Current they're support blip-image-captioning for image capture, but your variant looks much more useful honestly! Great work!

Dense Caption always return empty

Hi,

Thanks for sharing this work. Very interesting and potentially very impactful.

I encounter this issue while running python main.py --image_src "/Code/Image2Paragraph/examples/3.jpg" --out_image_name "output/3_result.jpg"

"Dense Cpation" always returns "/", and the program processes without error. I was able to get the generated text at the end along with the style-transferred image, but the caption is a bit off potentially due to the missing dense caption.

Retrieval Result on COCO

Hi, thanks for your interesting work. Could you explain why better retrieval result on COCO achieved by the Image2Paragraph method more clearly?

pip install markupsafe not compatible with project source code

I just run

pip install -r requirements.txt

in conda enviroment with python 3.8.10

I got error

ImportError: cannot import name 'soft_unicode' from 'markupsafe'

Version MarkupSafe:2.1.0 doesn't have soft_unicode so temporary solution might be adding this

MarkupSafe<=2.0.1

to requirements.txt

pydantic

in pydantic.fields.ModelField._type_analysis:550
TypeError: issubclass() arg 1 must be a class

Anyone with this issue, can advise how to resolve?

Unexpected token '<', " <!DOCTYPE "... is not valid JSON

DenseCaptioning contains hardcoded paths to local env

    def __init__(self) -> None:
        self.grit_working_directory = "../GRiT/"
        self.grit_env_python = '/home/aiops/wangjp/anaconda3/envs/grit/bin/python'
        self.grit_script = 'image_dense_captions.py'
        self.model_weights = 'models/grit_b_densecap_objectdet.pth'

Does anyone want to help on getting a notebook or huggingface space version of this repo?

I'm starting but I would love some help, especially from someone with more talent 😂

TypeError: issubclass() arg 1 must be a class

Hello. Thanks for great project.

I faced with an error "TypeError: issubclass() arg 1 must be a class"
when I use "python main.py --image_src [image_path] --out_image_name [out_file_name]".

I don't know how to solve it. Please give me an advice.

I used these commands for making an environment.

conda create -n i2p python=3.8
pip install Pillow==9.5
pip install requests
pip install -r requirements.txt

full error code here. ↓

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/matsuzaki.takumi/workspace/nissan/Image2Paragraph/main.py:2 in │
│ │
│ 1 import argparse │
│ ❱ 2 from models.image_text_transformation import ImageTextTransformation │
│ 3 from utils.util import display_images_and_text │
│ 4 │
│ 5 if name == 'main': │
│ │
│ /home/matsuzaki.takumi/workspace/nissan/Image2Paragraph/models/image_text_transformation.py:5 in │
│ │
│ │
│ 2 from models.grit_model import DenseCaptioning │
│ 3 from models.gpt_model import ImageToText │
│ 4 from models.controlnet_model import TextToImage │
│ ❱ 5 from models.region_semantic import RegionSemantic │
│ 6 from utils.util import read_image_width_height, display_images_and_text, resize_long_edg │
│ 7 import argparse │
│ 8 from PIL import Image │
│ │
│ /home/matsuzaki.takumi/workspace/nissan/Image2Paragraph/models/region_semantic.py:2 in │
│ │
│ 1 from models.segment_models.semgent_anything_model import SegmentAnything │
│ ❱ 2 from models.segment_models.semantic_segment_anything_model import SemanticSegment │
│ 3 from models.segment_models.edit_anything_model import EditAnything │
│ 4 │
│ 5 │
│ │
│ /home/matsuzaki.takumi/workspace/nissan/Image2Paragraph/models/segment_models/semantic_segment_a │
│ nything_model.py:16 in │
│ │
│ 13 from utils.util import resize_long_edge, resize_long_edge_cv2 │
│ 14 # from mmdet.core.visualization.image import imshow_det_bboxes # comment this line if yo │
│ 15 │
│ ❱ 16 nlp = spacy.load('en_core_web_sm') │
│ 17 │
│ 18 class SemanticSegment(): │
│ 19 │ def init(self, device): │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/init.py:50 in load │
│ │
│ 47 │ │ keyed by section values in dot notation. │
│ 48 │ RETURNS (Language): The loaded nlp object. │
│ 49 │ """ │
│ ❱ 50 │ return util.load_model( │
│ 51 │ │ name, vocab=vocab, disable=disable, exclude=exclude, config=config │
│ 52 │ ) │
│ 53 │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:324 in │
│ load_model │
│ │
│ 321 │ │ if name.startswith("blank:"): # shortcut for blank model │
│ 322 │ │ │ return get_lang_class(name.replace("blank:", ""))() │
│ 323 │ │ if is_package(name): # installed as package │
│ ❱ 324 │ │ │ return load_model_from_package(name, **kwargs) │
│ 325 │ │ if Path(name).exists(): # path to model data directory │
│ 326 │ │ │ return load_model_from_path(Path(name), **kwargs) │
│ 327 │ elif hasattr(name, "exists"): # Path or Path-like to model data │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:357 in │
│ load_model_from_package │
│ │
│ 354 │ RETURNS (Language): The loaded nlp object. │
│ 355 │ """ │
│ 356 │ cls = importlib.import_module(name) │
│ ❱ 357 │ return cls.load(vocab=vocab, disable=disable, exclude=exclude, config=config) │
│ 358 │
│ 359 │
│ 360 def load_model_from_path( │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/en_core_web_sm/init.py:10 │
│ in load │
│ │
│ 7 │
│ 8 │
│ 9 def load(**overrides): │
│ ❱ 10 │ return load_model_from_init_py(file, **overrides) │
│ 11 │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:517 in │
│ load_model_from_init_py │
│ │
│ 514 │ data_path = model_path / data_dir │
│ 515 │ if not model_path.exists(): │
│ 516 │ │ raise IOError(Errors.E052.format(path=data_path)) │
│ ❱ 517 │ return load_model_from_path( │
│ 518 │ │ data_path, │
│ 519 │ │ vocab=vocab, │
│ 520 │ │ meta=meta, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:392 in │
│ load_model_from_path │
│ │
│ 389 │ config_path = model_path / "config.cfg" │
│ 390 │ overrides = dict_to_dot(config) │
│ 391 │ config = load_config(config_path, overrides=overrides) │
│ ❱ 392 │ nlp = load_model_from_config(config, vocab=vocab, disable=disable, exclude=exclude) │
│ 393 │ return nlp.from_disk(model_path, exclude=exclude, overrides=overrides) │
│ 394 │
│ 395 │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:429 in │
│ load_model_from_config │
│ │
│ 426 │ # This will automatically handle all codes registered via the languages │
│ 427 │ # registry, including custom subclasses provided via entry points │
│ 428 │ lang_cls = get_lang_class(nlp_config["lang"]) │
│ ❱ 429 │ nlp = lang_cls.from_config( │
│ 430 │ │ config, │
│ 431 │ │ vocab=vocab, │
│ 432 │ │ disable=disable, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/language.py:1672 in │
│ from_config │
│ │
│ 1669 │ │ │ │ │ factory = pipe_cfg.pop("factory") │
│ 1670 │ │ │ │ │ # The pipe name (key in the config) here is the unique name │
│ 1671 │ │ │ │ │ # of the component, not necessarily the factory │
│ ❱ 1672 │ │ │ │ │ nlp.add_pipe( │
│ 1673 │ │ │ │ │ │ factory, │
│ 1674 │ │ │ │ │ │ name=pipe_name, │
│ 1675 │ │ │ │ │ │ config=pipe_cfg, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/language.py:774 in │
│ add_pipe │
│ │
│ 771 │ │ │ │ │ lang=util.get_object_name(self), │
│ 772 │ │ │ │ │ lang_code=self.lang, │
│ 773 │ │ │ │ ) │
│ ❱ 774 │ │ │ pipe_component = self.create_pipe( │
│ 775 │ │ │ │ factory_name, │
│ 776 │ │ │ │ name=name, │
│ 777 │ │ │ │ config=config, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/language.py:660 in │
│ create_pipe │
│ │
│ 657 │ │ cfg = {factory_name: config} │
│ 658 │ │ # We're calling the internal _fill here to avoid constructing the │
│ 659 │ │ # registered functions twice │
│ ❱ 660 │ │ resolved = registry.resolve(cfg, validate=validate) │
│ 661 │ │ filled = registry.fill({"cfg": cfg[factory_name]}, validate=validate)["cfg"] │
│ 662 │ │ filled = Config(filled) │
│ 663 │ │ filled["factory"] = factory_name │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:746 in │
│ resolve │
│ │
│ 743 │ │ overrides: Dict[str, Any] = {}, │
│ 744 │ │ validate: bool = True, │
│ 745 │ ) -> Dict[str, Any]: │
│ ❱ 746 │ │ resolved, _ = cls._make( │
│ 747 │ │ │ config, schema=schema, overrides=overrides, validate=validate, resolve=True │
│ 748 │ │ ) │
│ 749 │ │ return resolved │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:795 in _make │
│ │
│ 792 │ │ orig_config = config │
│ 793 │ │ if not is_interpolated: │
│ 794 │ │ │ config = Config(orig_config).interpolate() │
│ ❱ 795 │ │ filled, _, resolved = cls._fill( │
│ 796 │ │ │ config, schema, validate=validate, overrides=overrides, resolve=resolve │
│ 797 │ │ ) │
│ 798 │ │ filled = Config(filled, section_order=section_order) │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:850 in _fill │
│ │
│ 847 │ │ │ │ │ field = schema.fields[key] │
│ 848 │ │ │ │ │ schema.fields[key] = copy_model_field(field, Any) │
│ 849 │ │ │ │ promise_schema = cls.make_promise_schema(value, resolve=resolve) │
│ ❱ 850 │ │ │ │ filled[key], validation[v_key], final[key] = cls._fill( │
│ 851 │ │ │ │ │ value, │
│ 852 │ │ │ │ │ promise_schema, │
│ 853 │ │ │ │ │ validate=validate, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:849 in _fill │
│ │
│ 846 │ │ │ │ │ # validation if it doesn't receive the function return value │
│ 847 │ │ │ │ │ field = schema.fields[key] │
│ 848 │ │ │ │ │ schema.fields[key] = copy_model_field(field, Any) │
│ ❱ 849 │ │ │ │ promise_schema = cls.make_promise_schema(value, resolve=resolve) │
│ 850 │ │ │ │ filled[key], validation[v_key], final[key] = cls._fill( │
│ 851 │ │ │ │ │ value, │
│ 852 │ │ │ │ │ promise_schema, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:1057 in │
│ make_promise_schema │
│ │
│ 1054 │ │ │ │ name = RESERVED_FIELDS.get(param.name, param.name) │
│ 1055 │ │ │ │ sig_args[name] = (annotation, default) │
│ 1056 │ │ sig_args["config"] = _PromiseSchemaConfig │
│ ❱ 1057 │ │ return create_model("ArgModel", **sig_args) │
│ 1058 │
│ 1059 │
│ 1060 all = ["Config", "registry", "ConfigValidationError"] │
│ │
│ in pydantic.main.create_model:990 │
│ │
│ in pydantic.main.ModelMetaclass.new:299 │
│ │
│ in pydantic.fields.ModelField.infer:411 │
│ │
│ in pydantic.fields.ModelField.init:342 │
│ │
│ in pydantic.fields.ModelField.prepare:451 │
│ │
│ in pydantic.fields.ModelField._type_analysis:550 │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/typing.py:774 in subclasscheck │
│ │
│ 771 │ def subclasscheck(self, cls): │
│ 772 │ │ if self._special: │
│ 773 │ │ │ if not isinstance(cls, _GenericAlias): │
│ ❱ 774 │ │ │ │ return issubclass(cls, self.origin) │
│ 775 │ │ │ if cls._special: │
│ 776 │ │ │ │ return issubclass(cls.origin, self.origin) │
│ 777 │ │ raise TypeError("Subscripted generics cannot be used with" │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: issubclass() arg 1 must be a class

requirements.txt file is missing

hi - it seems the requirements.txt file is missing so the install.sh script will fail. Could you add this in?

Unable to load weights from checkpoint file

Hi, it is a nice work. I followed the install.md to build the virtual env with scapy==3.0.0. But when I run the example with python main.py --image_src "examples/3.jpg" --out_image_name "output/3_result.jpg", there is a OSError as follow：
------This is time-consuming, please wait...------
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/diffusers/models/modeling_utils.py │
│ :109 in load_state_dict │
│ │
│ 106 │ │ if os.path.basename(checkpoint_file) == _add_variant(WEIGHTS_NAME, variant): │
│ 107 │ │ │ return torch.load(checkpoint_file, map_location="cpu") │
│ 108 │ │ else: │
│ ❱ 109 │ │ │ return safetensors.torch.load_file(checkpoint_file, device="cpu") │
│ 110 │ except Exception as e: │
│ 111 │ │ try: │
│ 112 │ │ │ with open(checkpoint_file) as f: │
│ │
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/safetensors/torch.py:261 in │
│ load_file │
│ │
│ 258 │ result = {} │
│ 259 │ with safe_open(filename, framework="pt", device=device) as f: │
│ 260 │ │ for k in f.keys(): │
│ ❱ 261 │ │ │ result[k] = f.get_tensor(k) │
│ 262 │ return result │
│ 263 │
│ 264 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: module 'torch' has no attribute 'frombuffer'

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/diffusers/models/modeling_utils.py │
│ :113 in load_state_dict │
│ │
│ 110 │ except Exception as e: │
│ 111 │ │ try: │
│ 112 │ │ │ with open(checkpoint_file) as f: │
│ ❱ 113 │ │ │ │ if f.read().startswith("version"): │
│ 114 │ │ │ │ │ raise OSError( │
│ 115 │ │ │ │ │ │ "You seem to have cloned a repository without having git-lfs ins │
│ 116 │ │ │ │ │ │ "git-lfs and run git lfs install followed by git lfs pull in │
│ │
│/miniconda3/envs/i2p/lib/python3.8/codecs.py:322 in decode │
│ │
│ 319 │ def decode(self, input, final=False): │
│ 320 │ │ # decode input (taking the buffer into account) │
│ 321 │ │ data = self.buffer + input │
│ ❱ 322 │ │ (result, consumed) = self._buffer_decode(data, self.errors, final) │
│ 323 │ │ # keep undecoded input until the next call │
│ 324 │ │ self.buffer = data[consumed:] │
│ 325 │ │ return result │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 0: invalid start byte

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /LOG/realman/LLM/Image2Paragraph/main.py:23 in │
│ │
│ 20 │ │
│ 21 │ args = parser.parse_args() │
│ 22 │ │
│ ❱ 23 │ processor = ImageTextTransformation(args) │
│ 24 │ generated_text = processor.image_to_text(args.image_src) │
│ 25 │ generated_image = processor.text_to_image(generated_text) │
│ 26 │ ## then text to image │
│ │
│ /LOG/realman/LLM/Image2Paragraph/models/image_text_transformation.py:24 in init │
│ │
│ 21 │ def init(self, args): │
│ 22 │ │ # Load your big model here │
│ 23 │ │ self.args = args │
│ ❱ 24 │ │ self.init_models() │
│ 25 │ │ self.ref_image = None │
│ 26 │ │
│ 27 │ def init_models(self): │
│ │
│ /LOG/realman/LLM/Image2Paragraph/models/image_text_transformation.py:38 in init_models │
│ │
│ 35 │ │ self.image_caption_model = ImageCaptioning(device=self.args.image_caption_device │
│ 36 │ │ self.dense_caption_model = DenseCaptioning(device=self.args.dense_caption_device │
│ 37 │ │ self.gpt_model = ImageToText(openai_key) │
│ ❱ 38 │ │ self.controlnet_model = TextToImage(device=self.args.contolnet_device) │
│ 39 │ │ self.region_semantic_model = RegionSemantic(device=self.args.semantic_segment_de │
│ 40 │ │ print('\033[1;32m' + "Model initialization finished!".center(50, '-') + '\033[0m │
│ 41 │
│ │
│ /LOG/realman/LLM/Image2Paragraph/models/controlnet_model.py:15 in init │
│ │
│ 12 class TextToImage: │
│ 13 │ def init(self, device): │
│ 14 │ │ self.device = device │
│ ❱ 15 │ │ self.model = self.initialize_model() │
│ 16 │ │
│ 17 │ def initialize_model(self): │
│ 18 │ │ if self.device == 'cpu': │
│ │
│ /LOG/realman/LLM/Image2Paragraph/models/controlnet_model.py:22 in initialize_model │
│ │
│ 19 │ │ │ self.data_type = torch.float32 │
│ 20 │ │ else: │
│ 21 │ │ │ self.data_type = torch.float16 │
│ ❱ 22 │ │ controlnet = ControlNetModel.from_pretrained( │
│ 23 │ │ │ "fusing/stable-diffusion-v1-5-controlnet-canny", │
│ 24 │ │ │ torch_dtype=self.data_type, │
│ 25 │ │ │ map_location=self.device, # Add this line │
│ │
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/diffusers/models/modeling_utils.py │
│ :602 in from_pretrained │
│ │
│ 599 │ │ │ │ # if device_map is None, load the state dict and move the params from me │
│ 600 │ │ │ │ if device_map is None: │
│ 601 │ │ │ │ │ param_device = "cpu" │
│ ❱ 602 │ │ │ │ │ state_dict = load_state_dict(model_file, variant=variant) │
│ 603 │ │ │ │ │ model._convert_deprecated_attention_blocks(state_dict) │
│ 604 │ │ │ │ │ # move the params from meta device to cpu │
│ 605 │ │ │ │ │ missing_keys = set(model.state_dict().keys()) - set(state_dict.keys( │
│ │
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/diffusers/models/modeling_utils.py │
│ :125 in load_state_dict │
│ │
│ 122 │ │ │ │ │ │ "model. Make sure you have saved the model properly." │
│ 123 │ │ │ │ │ ) from e │
│ 124 │ │ except (UnicodeDecodeError, ValueError): │
│ ❱ 125 │ │ │ raise OSError( │
│ 126 │ │ │ │ f"Unable to load weights from checkpoint file for '{checkpoint_file}' " │
│ 127 │ │ │ │ f"at '{checkpoint_file}'. " │
│ 128 │ │ │ │ "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please s │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: Unable to load weights from checkpoint file for
'/.cache/huggingface/hub/models--fusing--stable-diffusion-v1-5-controlnet-canny/snapshots/7f2f69197050967007f6bbd23ab5e52f0384162a/d
iffusion_pytorch_model.safetensors' at
'/.cache/huggingface/hub/models--fusing--stable-diffusion-v1-5-controlnet-canny/snapshots/7f2f69197050967007f6bbd23ab5e52f0384162a/d
iffusion_pytorch_model.safetensors'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

In order to debug, I try to build a new virtual env following the install.sh and deleted the cache model documents and re-downloaded them again by running the main.py. But the error still happens.
How can I deal with the bug?
My torch version is as follows:
torch 1.9.0+cu111
torchaudio 0.9.0
torchvision 0.10.0+cu111

showlab / image2paragraph Goto Github PK

image2paragraph's Introduction

Image.txt: Transform Image Into Unique Paragraph

To Do List

Done

Doing

Visualization

Installation

2. Start

Simple visualization

Use gradio directly

3. Visualization

4. Retrieval Result on COCO

Others

Acknowledgment

image2paragraph's People

Contributors

Stargazers

Watchers

Forkers

image2paragraph's Issues

Recommend Projects

Recommend Topics

Recommend Org