Code Monkey home page Code Monkey logo

image2paragraph's Introduction

Your Image Description

Image.txt: Transform Image Into Unique Paragraph

Hugging Face Spaces, , Open In Colab

Project Website

(huggingface sometimes may not work with safari, use chrome)

Demo

Your Image Description News

  • 17/April/2023. In addition to semantic segment anything, we use Edit Anything to get region-level semantic. Now all models takes less than 20s on 8G memory GPU card. (10times faster than previous version on cpu)
  • 17/April/2023. Our project is online on Huggingface. Have a try! huggingface
  • 14/April/2023. Our project is very popular in twitter. Looking the posted twitter for details.

(Can run on 8GB memory GPU within 20S!)

Your Image Description

Main Pipeline

Your Image Description

Reasoning Details

Your Image Description

To Do List

Done

  • GRIT example.
  • ControNet, BLIP2.
  • Semantic Segment Anything.
  • Segment Anything for fine-grained semantic.
  • Gradio.
  • Integrate GRIT into our code.
  • Support GPT4 API.
  • Notebook/Huggingface Space.
  • Region Semantic Classification from Edit-Anything.
  • Make the model lightweight.

Doing

  • Replace ChatGPT with own trained LLM.
  • Other grounding text2image model as instead of Canny ControlNet.
  • Show retrieval result in gradio.

Visualization

The text to image model is conrolnet with canny from diffuser.

Your Image Description

Your Image Description

Your Image Description

Installation

Please find installation instructions in install.md.

2. Start

Simple visualization

export OPENAI_KEY=[YOUR KEY HERE]
python main.py  --image_src [image_path] --out_image_name [out_file_name]

If your GPU memory smaller than 8 GPB.

python main.py --image_caption_device cpu --semantic_segment_device cpu

If you have no GPU available.

python main.py --image_caption_device cpu --semantic_segment_device cpu --dense_caption_device cpu  --contolnet_device cpu

like

python main.py --image_src "examples/3.jpg" --out_image_name "output/3_result.jpg"

Note: If you have GPU card with larger memory than 15GB. Set all device to GPU for fast inference.

The generated text and image are show in "output/".

Note: Use GPT4 for good result as GPT3.5 miss the position information sometime.

Use gradio directly

python main_gradio.py

If you have GPU Memory larger than 20GB. Use device='cuda' as default.

3. Visualization

Your Image Description A dog sitting on a porch with a bike. Your Image Description Your Image Description
Input BLIP2 Image Caption GRIT Dense Caption Semantic Segment Anything

The final generated paragraph with ChatGPT is:

  This image depicts a black and white dog sitting on a porch beside a red bike. The dense caption mentions other objects in the scene, such as a white car parked on the street and a red bike parked on the side of the road. The region semantic provides more specific information, including the porch, floor, wall, and trees. The dog can be seen sitting on the floor beside the bike, and there is also a parked bicycle and tree in the background. The wall is visible on one side of the image, while the street and trees can be seen in the other direction. 

4. Retrieval Result on COCO

Method Trainable Parameter Running Time IR@1 TR@1
Image-text 230M 9H 43.8 33.2
Generated Paragraph-text 0 5m 49.7 36.1

Interesting, we find compress image into paragraph. The retrieval result is even better than use source image.

Others

If you have more suggestions or functions need to be implemented in this codebase, feel free to drop me an email awinyimg dot gmail dot com or open an issue.

Acknowledgment

This work is based on ChatGPT, Edit_Anything, BLIP2, GRIT, OFA,Segment-Anything, Semantic-Segment-Anything, ControlNet.

image2paragraph's People

Contributors

fingerrec avatar zhaohengyuan1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

image2paragraph's Issues

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

image

image

Best Wishes,

Qiao

run question

run python main.py --image_src "examples/3.jpg" --out_image_name "output/3_result.jpg" is not work
image

Comment instructions outdated

The comment instructions in the README.md seem to be outdated? The lines numbers listed don't seem to be meaningful if I am understanding this correctly.

About the visualization

Thanks for this great work and open-sourced repo. I didn't touch the segment tasks before. I am wondering to know how to visualize the dense segment image like you show in the repo.
3_semantic_segment_anything

Sent from PPHub

install not work with python = 3.8.16

Hi,
Thanks for the great work.
try to replicate your work here, i created a new env and try pip install -r requirements.txt. but it give me an error.

ERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu111 (from versions: 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0)
ERROR: No matching distribution found for torch==1.9.0+cu111

I also try install torch using conda, but it seems not work at the implementation stage.

Any suggestions are appreciated.

Region Semantic Models Do Not Work Well

First of all, thanks for the great work.

Image caption and dense caption modules all work fine here, however, the region caption module does not seem work well. I tested both edit_anything and ssa models.

For edit_anything model, it returns obviously wrong object descriptions. The following the the test image I input.
image
And the Region Segment module returns

a dog is walking on the floor in a room: [0, 50, 383, 165]; a person riding a skateboard down a street: [234, 49, 149, 166]; a piece of paper with a black background: [0, 0, 64, 110]; a white light switch with a black light: [312, 0, 53, 80]; the moon is seen over the city skyline: [116, 0, 56, 38]; 

There are clearly no dogs or skateboard in the picture.

For the ssa model, when I add --region_classify_model ssa option and change region_semantic method to use ssa, the method errors out with

│ /share/data/ripl/fjd/Image2Paragraph/models/segment_models/semantic_segment_anything_model.py:14 │
│ 7 in semantic_class_w_mask                                                                       │
│                                                                                                  │
│   144 │   │   │                                                                                  │
│   145 │   │   │   valid_mask_large_crop = mmcv.imcrop(valid_mask.numpy(), np.array([bbox[0], b   │
│   146 │   │   │   scale_large)                                                                   │
│ ❱ 147 │   │   │   top_1_patch_large = torch.bincount(class_ids_patch_large[torch.tensor(valid_   │
│   148 │   │   │   top_1_mask_category = mask_categories[top_1_patch_large.item()]                │
│   149 │   │   │                                                                                  │
│   150 │   │   │   ann['class_name'] = str(top_1_mask_category)                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: The shape of the mask [3, 23] at index 0 does not match the shape of the indexed tensor [23, 3] at index 0

I wonder you have a good way to use region segment methods.

Possible Langchain integration?

Hey, kind man! Your model is just awesome, wondering if you have some plans for langchain integration? Current they're support blip-image-captioning for image capture, but your variant looks much more useful honestly! Great work!

Dense Caption always return empty

Hi,

Thanks for sharing this work. Very interesting and potentially very impactful.

I encounter this issue while running python main.py --image_src "/Code/Image2Paragraph/examples/3.jpg" --out_image_name "output/3_result.jpg"

"Dense Cpation" always returns "/", and the program processes without error. I was able to get the generated text at the end along with the style-transferred image, but the caption is a bit off potentially due to the missing dense caption.

1

Retrieval Result on COCO

Hi, thanks for your interesting work. Could you explain why better retrieval result on COCO achieved by the Image2Paragraph method more clearly?

pip install markupsafe not compatible with project source code

I just run

pip install -r requirements.txt

in conda enviroment with python 3.8.10

I got error

ImportError: cannot import name 'soft_unicode' from 'markupsafe'

Version MarkupSafe:2.1.0 doesn't have soft_unicode so temporary solution might be adding this

MarkupSafe<=2.0.1

to requirements.txt

pydantic

in pydantic.fields.ModelField._type_analysis:550
TypeError: issubclass() arg 1 must be a class

Anyone with this issue, can advise how to resolve?

DenseCaptioning contains hardcoded paths to local env

    def __init__(self) -> None:
        self.grit_working_directory = "../GRiT/"
        self.grit_env_python = '/home/aiops/wangjp/anaconda3/envs/grit/bin/python'
        self.grit_script = 'image_dense_captions.py'
        self.model_weights = 'models/grit_b_densecap_objectdet.pth'

TypeError: issubclass() arg 1 must be a class

Hello. Thanks for great project.

I faced with an error "TypeError: issubclass() arg 1 must be a class"
when I use "python main.py --image_src [image_path] --out_image_name [out_file_name]".

I don't know how to solve it. Please give me an advice.

I used these commands for making an environment.

  • conda create -n i2p python=3.8
  • pip install Pillow==9.5
  • pip install requests
  • pip install -r requirements.txt

full error code here. ↓

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/matsuzaki.takumi/workspace/nissan/Image2Paragraph/main.py:2 in │
│ │
│ 1 import argparse │
│ ❱ 2 from models.image_text_transformation import ImageTextTransformation │
│ 3 from utils.util import display_images_and_text │
│ 4 │
│ 5 if name == 'main': │
│ │
│ /home/matsuzaki.takumi/workspace/nissan/Image2Paragraph/models/image_text_transformation.py:5 in │
│ │
│ │
│ 2 from models.grit_model import DenseCaptioning │
│ 3 from models.gpt_model import ImageToText │
│ 4 from models.controlnet_model import TextToImage │
│ ❱ 5 from models.region_semantic import RegionSemantic │
│ 6 from utils.util import read_image_width_height, display_images_and_text, resize_long_edg │
│ 7 import argparse │
│ 8 from PIL import Image │
│ │
│ /home/matsuzaki.takumi/workspace/nissan/Image2Paragraph/models/region_semantic.py:2 in │
│ │
│ 1 from models.segment_models.semgent_anything_model import SegmentAnything │
│ ❱ 2 from models.segment_models.semantic_segment_anything_model import SemanticSegment │
│ 3 from models.segment_models.edit_anything_model import EditAnything │
│ 4 │
│ 5 │
│ │
│ /home/matsuzaki.takumi/workspace/nissan/Image2Paragraph/models/segment_models/semantic_segment_a │
│ nything_model.py:16 in │
│ │
│ 13 from utils.util import resize_long_edge, resize_long_edge_cv2 │
│ 14 # from mmdet.core.visualization.image import imshow_det_bboxes # comment this line if yo │
│ 15 │
│ ❱ 16 nlp = spacy.load('en_core_web_sm') │
│ 17 │
│ 18 class SemanticSegment(): │
│ 19 │ def init(self, device): │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/init.py:50 in load │
│ │
│ 47 │ │ keyed by section values in dot notation. │
│ 48 │ RETURNS (Language): The loaded nlp object. │
│ 49 │ """ │
│ ❱ 50 │ return util.load_model( │
│ 51 │ │ name, vocab=vocab, disable=disable, exclude=exclude, config=config │
│ 52 │ ) │
│ 53 │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:324 in │
│ load_model │
│ │
│ 321 │ │ if name.startswith("blank:"): # shortcut for blank model │
│ 322 │ │ │ return get_lang_class(name.replace("blank:", ""))() │
│ 323 │ │ if is_package(name): # installed as package │
│ ❱ 324 │ │ │ return load_model_from_package(name, **kwargs) │
│ 325 │ │ if Path(name).exists(): # path to model data directory │
│ 326 │ │ │ return load_model_from_path(Path(name), **kwargs) │
│ 327 │ elif hasattr(name, "exists"): # Path or Path-like to model data │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:357 in │
│ load_model_from_package │
│ │
│ 354 │ RETURNS (Language): The loaded nlp object. │
│ 355 │ """ │
│ 356 │ cls = importlib.import_module(name) │
│ ❱ 357 │ return cls.load(vocab=vocab, disable=disable, exclude=exclude, config=config) │
│ 358 │
│ 359 │
│ 360 def load_model_from_path( │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/en_core_web_sm/init.py:10 │
│ in load │
│ │
│ 7 │
│ 8 │
│ 9 def load(**overrides): │
│ ❱ 10 │ return load_model_from_init_py(file, **overrides) │
│ 11 │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:517 in │
│ load_model_from_init_py │
│ │
│ 514 │ data_path = model_path / data_dir │
│ 515 │ if not model_path.exists(): │
│ 516 │ │ raise IOError(Errors.E052.format(path=data_path)) │
│ ❱ 517 │ return load_model_from_path( │
│ 518 │ │ data_path, │
│ 519 │ │ vocab=vocab, │
│ 520 │ │ meta=meta, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:392 in │
│ load_model_from_path │
│ │
│ 389 │ config_path = model_path / "config.cfg" │
│ 390 │ overrides = dict_to_dot(config) │
│ 391 │ config = load_config(config_path, overrides=overrides) │
│ ❱ 392 │ nlp = load_model_from_config(config, vocab=vocab, disable=disable, exclude=exclude) │
│ 393 │ return nlp.from_disk(model_path, exclude=exclude, overrides=overrides) │
│ 394 │
│ 395 │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:429 in │
│ load_model_from_config │
│ │
│ 426 │ # This will automatically handle all codes registered via the languages │
│ 427 │ # registry, including custom subclasses provided via entry points │
│ 428 │ lang_cls = get_lang_class(nlp_config["lang"]) │
│ ❱ 429 │ nlp = lang_cls.from_config( │
│ 430 │ │ config, │
│ 431 │ │ vocab=vocab, │
│ 432 │ │ disable=disable, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/language.py:1672 in │
│ from_config │
│ │
│ 1669 │ │ │ │ │ factory = pipe_cfg.pop("factory") │
│ 1670 │ │ │ │ │ # The pipe name (key in the config) here is the unique name │
│ 1671 │ │ │ │ │ # of the component, not necessarily the factory │
│ ❱ 1672 │ │ │ │ │ nlp.add_pipe( │
│ 1673 │ │ │ │ │ │ factory, │
│ 1674 │ │ │ │ │ │ name=pipe_name, │
│ 1675 │ │ │ │ │ │ config=pipe_cfg, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/language.py:774 in │
│ add_pipe │
│ │
│ 771 │ │ │ │ │ lang=util.get_object_name(self), │
│ 772 │ │ │ │ │ lang_code=self.lang, │
│ 773 │ │ │ │ ) │
│ ❱ 774 │ │ │ pipe_component = self.create_pipe( │
│ 775 │ │ │ │ factory_name, │
│ 776 │ │ │ │ name=name, │
│ 777 │ │ │ │ config=config, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/language.py:660 in │
│ create_pipe │
│ │
│ 657 │ │ cfg = {factory_name: config} │
│ 658 │ │ # We're calling the internal _fill here to avoid constructing the │
│ 659 │ │ # registered functions twice │
│ ❱ 660 │ │ resolved = registry.resolve(cfg, validate=validate) │
│ 661 │ │ filled = registry.fill({"cfg": cfg[factory_name]}, validate=validate)["cfg"] │
│ 662 │ │ filled = Config(filled) │
│ 663 │ │ filled["factory"] = factory_name │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:746 in │
│ resolve │
│ │
│ 743 │ │ overrides: Dict[str, Any] = {}, │
│ 744 │ │ validate: bool = True, │
│ 745 │ ) -> Dict[str, Any]: │
│ ❱ 746 │ │ resolved, _ = cls._make( │
│ 747 │ │ │ config, schema=schema, overrides=overrides, validate=validate, resolve=True │
│ 748 │ │ ) │
│ 749 │ │ return resolved │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:795 in _make │
│ │
│ 792 │ │ orig_config = config │
│ 793 │ │ if not is_interpolated: │
│ 794 │ │ │ config = Config(orig_config).interpolate() │
│ ❱ 795 │ │ filled, _, resolved = cls._fill( │
│ 796 │ │ │ config, schema, validate=validate, overrides=overrides, resolve=resolve │
│ 797 │ │ ) │
│ 798 │ │ filled = Config(filled, section_order=section_order) │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:850 in _fill │
│ │
│ 847 │ │ │ │ │ field = schema.fields[key] │
│ 848 │ │ │ │ │ schema.fields[key] = copy_model_field(field, Any) │
│ 849 │ │ │ │ promise_schema = cls.make_promise_schema(value, resolve=resolve) │
│ ❱ 850 │ │ │ │ filled[key], validation[v_key], final[key] = cls._fill( │
│ 851 │ │ │ │ │ value, │
│ 852 │ │ │ │ │ promise_schema, │
│ 853 │ │ │ │ │ validate=validate, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:849 in _fill │
│ │
│ 846 │ │ │ │ │ # validation if it doesn't receive the function return value │
│ 847 │ │ │ │ │ field = schema.fields[key] │
│ 848 │ │ │ │ │ schema.fields[key] = copy_model_field(field, Any) │
│ ❱ 849 │ │ │ │ promise_schema = cls.make_promise_schema(value, resolve=resolve) │
│ 850 │ │ │ │ filled[key], validation[v_key], final[key] = cls._fill( │
│ 851 │ │ │ │ │ value, │
│ 852 │ │ │ │ │ promise_schema, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:1057 in │
│ make_promise_schema │
│ │
│ 1054 │ │ │ │ name = RESERVED_FIELDS.get(param.name, param.name) │
│ 1055 │ │ │ │ sig_args[name] = (annotation, default) │
│ 1056 │ │ sig_args["config"] = _PromiseSchemaConfig │
│ ❱ 1057 │ │ return create_model("ArgModel", **sig_args) │
│ 1058 │
│ 1059 │
│ 1060 all = ["Config", "registry", "ConfigValidationError"] │
│ │
│ in pydantic.main.create_model:990 │
│ │
│ in pydantic.main.ModelMetaclass.new:299 │
│ │
│ in pydantic.fields.ModelField.infer:411 │
│ │
│ in pydantic.fields.ModelField.init:342 │
│ │
│ in pydantic.fields.ModelField.prepare:451 │
│ │
│ in pydantic.fields.ModelField._type_analysis:550 │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/typing.py:774 in subclasscheck
│ │
│ 771 │ def subclasscheck(self, cls): │
│ 772 │ │ if self._special: │
│ 773 │ │ │ if not isinstance(cls, _GenericAlias): │
│ ❱ 774 │ │ │ │ return issubclass(cls, self.origin) │
│ 775 │ │ │ if cls._special: │
│ 776 │ │ │ │ return issubclass(cls.origin, self.origin) │
│ 777 │ │ raise TypeError("Subscripted generics cannot be used with" │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: issubclass() arg 1 must be a class

Unable to load weights from checkpoint file

Hi, it is a nice work. I followed the install.md to build the virtual env with scapy==3.0.0. But when I run the example with python main.py --image_src "examples/3.jpg" --out_image_name "output/3_result.jpg", there is a OSError as follow:
------This is time-consuming, please wait...------
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/diffusers/models/modeling_utils.py │
│ :109 in load_state_dict │
│ │
│ 106 │ │ if os.path.basename(checkpoint_file) == _add_variant(WEIGHTS_NAME, variant): │
│ 107 │ │ │ return torch.load(checkpoint_file, map_location="cpu") │
│ 108 │ │ else: │
│ ❱ 109 │ │ │ return safetensors.torch.load_file(checkpoint_file, device="cpu") │
│ 110 │ except Exception as e: │
│ 111 │ │ try: │
│ 112 │ │ │ with open(checkpoint_file) as f: │
│ │
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/safetensors/torch.py:261 in │
│ load_file │
│ │
│ 258 │ result = {} │
│ 259 │ with safe_open(filename, framework="pt", device=device) as f: │
│ 260 │ │ for k in f.keys(): │
│ ❱ 261 │ │ │ result[k] = f.get_tensor(k) │
│ 262 │ return result │
│ 263 │
│ 264 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: module 'torch' has no attribute 'frombuffer'

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/diffusers/models/modeling_utils.py │
│ :113 in load_state_dict │
│ │
│ 110 │ except Exception as e: │
│ 111 │ │ try: │
│ 112 │ │ │ with open(checkpoint_file) as f: │
│ ❱ 113 │ │ │ │ if f.read().startswith("version"): │
│ 114 │ │ │ │ │ raise OSError( │
│ 115 │ │ │ │ │ │ "You seem to have cloned a repository without having git-lfs ins │
│ 116 │ │ │ │ │ │ "git-lfs and run git lfs install followed by git lfs pull in │
│ │
│/miniconda3/envs/i2p/lib/python3.8/codecs.py:322 in decode │
│ │
│ 319 │ def decode(self, input, final=False): │
│ 320 │ │ # decode input (taking the buffer into account) │
│ 321 │ │ data = self.buffer + input │
│ ❱ 322 │ │ (result, consumed) = self._buffer_decode(data, self.errors, final) │
│ 323 │ │ # keep undecoded input until the next call │
│ 324 │ │ self.buffer = data[consumed:] │
│ 325 │ │ return result │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 0: invalid start byte

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /LOG/realman/LLM/Image2Paragraph/main.py:23 in │
│ │
│ 20 │ │
│ 21 │ args = parser.parse_args() │
│ 22 │ │
│ ❱ 23 │ processor = ImageTextTransformation(args) │
│ 24 │ generated_text = processor.image_to_text(args.image_src) │
│ 25 │ generated_image = processor.text_to_image(generated_text) │
│ 26 │ ## then text to image │
│ │
│ /LOG/realman/LLM/Image2Paragraph/models/image_text_transformation.py:24 in init
│ │
│ 21 │ def init(self, args): │
│ 22 │ │ # Load your big model here │
│ 23 │ │ self.args = args │
│ ❱ 24 │ │ self.init_models() │
│ 25 │ │ self.ref_image = None │
│ 26 │ │
│ 27 │ def init_models(self): │
│ │
│ /LOG/realman/LLM/Image2Paragraph/models/image_text_transformation.py:38 in init_models │
│ │
│ 35 │ │ self.image_caption_model = ImageCaptioning(device=self.args.image_caption_device │
│ 36 │ │ self.dense_caption_model = DenseCaptioning(device=self.args.dense_caption_device │
│ 37 │ │ self.gpt_model = ImageToText(openai_key) │
│ ❱ 38 │ │ self.controlnet_model = TextToImage(device=self.args.contolnet_device) │
│ 39 │ │ self.region_semantic_model = RegionSemantic(device=self.args.semantic_segment_de │
│ 40 │ │ print('\033[1;32m' + "Model initialization finished!".center(50, '-') + '\033[0m │
│ 41 │
│ │
│ /LOG/realman/LLM/Image2Paragraph/models/controlnet_model.py:15 in init
│ │
│ 12 class TextToImage: │
│ 13 │ def init(self, device): │
│ 14 │ │ self.device = device │
│ ❱ 15 │ │ self.model = self.initialize_model() │
│ 16 │ │
│ 17 │ def initialize_model(self): │
│ 18 │ │ if self.device == 'cpu': │
│ │
│ /LOG/realman/LLM/Image2Paragraph/models/controlnet_model.py:22 in initialize_model │
│ │
│ 19 │ │ │ self.data_type = torch.float32 │
│ 20 │ │ else: │
│ 21 │ │ │ self.data_type = torch.float16 │
│ ❱ 22 │ │ controlnet = ControlNetModel.from_pretrained( │
│ 23 │ │ │ "fusing/stable-diffusion-v1-5-controlnet-canny", │
│ 24 │ │ │ torch_dtype=self.data_type, │
│ 25 │ │ │ map_location=self.device, # Add this line │
│ │
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/diffusers/models/modeling_utils.py │
│ :602 in from_pretrained │
│ │
│ 599 │ │ │ │ # if device_map is None, load the state dict and move the params from me │
│ 600 │ │ │ │ if device_map is None: │
│ 601 │ │ │ │ │ param_device = "cpu" │
│ ❱ 602 │ │ │ │ │ state_dict = load_state_dict(model_file, variant=variant) │
│ 603 │ │ │ │ │ model._convert_deprecated_attention_blocks(state_dict) │
│ 604 │ │ │ │ │ # move the params from meta device to cpu │
│ 605 │ │ │ │ │ missing_keys = set(model.state_dict().keys()) - set(state_dict.keys( │
│ │
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/diffusers/models/modeling_utils.py │
│ :125 in load_state_dict │
│ │
│ 122 │ │ │ │ │ │ "model. Make sure you have saved the model properly." │
│ 123 │ │ │ │ │ ) from e │
│ 124 │ │ except (UnicodeDecodeError, ValueError): │
│ ❱ 125 │ │ │ raise OSError( │
│ 126 │ │ │ │ f"Unable to load weights from checkpoint file for '{checkpoint_file}' " │
│ 127 │ │ │ │ f"at '{checkpoint_file}'. " │
│ 128 │ │ │ │ "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please s │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: Unable to load weights from checkpoint file for
'/.cache/huggingface/hub/models--fusing--stable-diffusion-v1-5-controlnet-canny/snapshots/7f2f69197050967007f6bbd23ab5e52f0384162a/d
iffusion_pytorch_model.safetensors' at
'/.cache/huggingface/hub/models--fusing--stable-diffusion-v1-5-controlnet-canny/snapshots/7f2f69197050967007f6bbd23ab5e52f0384162a/d
iffusion_pytorch_model.safetensors'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

In order to debug, I try to build a new virtual env following the install.sh and deleted the cache model documents and re-downloaded them again by running the main.py. But the error still happens.
How can I deal with the bug?
My torch version is as follows:
torch 1.9.0+cu111
torchaudio 0.9.0
torchvision 0.10.0+cu111

Data Generation

May I ask if it is convenient for you to make the generated results public? For example, images and corresponding descriptive data?

Out of Memory Issue in Semantic Segmentation

Why is it that when working on semantic segmentation, I constantly encounter out of memory errors, even though I have two GPUs with 15GB each? Is it possible to distribute the model workload across the GPUs in parallel?

The GRIT integration doesn't work

hi - i think there may be an issue with the grit integration; there's no git .submodules file so the repo doesn't know about GRIT at all and running the code as is returns the error :

ModuleNotFoundError: No module named 'models.grit.image_dense_captions'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.