The image2paragraph's discuss from showlab

TypeError: issubclass() arg 1 must be a class

Hello. Thanks for great project.

I faced with an error "TypeError: issubclass() arg 1 must be a class"
when I use "python main.py --image_src [image_path] --out_image_name [out_file_name]".

I don't know how to solve it. Please give me an advice.

I used these commands for making an environment.

conda create -n i2p python=3.8
pip install Pillow==9.5
pip install requests
pip install -r requirements.txt

full error code here. ↓

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/matsuzaki.takumi/workspace/nissan/Image2Paragraph/main.py:2 in │
│ │
│ 1 import argparse │
│ ❱ 2 from models.image_text_transformation import ImageTextTransformation │
│ 3 from utils.util import display_images_and_text │
│ 4 │
│ 5 if name == 'main': │
│ │
│ /home/matsuzaki.takumi/workspace/nissan/Image2Paragraph/models/image_text_transformation.py:5 in │
│ │
│ │
│ 2 from models.grit_model import DenseCaptioning │
│ 3 from models.gpt_model import ImageToText │
│ 4 from models.controlnet_model import TextToImage │
│ ❱ 5 from models.region_semantic import RegionSemantic │
│ 6 from utils.util import read_image_width_height, display_images_and_text, resize_long_edg │
│ 7 import argparse │
│ 8 from PIL import Image │
│ │
│ /home/matsuzaki.takumi/workspace/nissan/Image2Paragraph/models/region_semantic.py:2 in │
│ │
│ 1 from models.segment_models.semgent_anything_model import SegmentAnything │
│ ❱ 2 from models.segment_models.semantic_segment_anything_model import SemanticSegment │
│ 3 from models.segment_models.edit_anything_model import EditAnything │
│ 4 │
│ 5 │
│ │
│ /home/matsuzaki.takumi/workspace/nissan/Image2Paragraph/models/segment_models/semantic_segment_a │
│ nything_model.py:16 in │
│ │
│ 13 from utils.util import resize_long_edge, resize_long_edge_cv2 │
│ 14 # from mmdet.core.visualization.image import imshow_det_bboxes # comment this line if yo │
│ 15 │
│ ❱ 16 nlp = spacy.load('en_core_web_sm') │
│ 17 │
│ 18 class SemanticSegment(): │
│ 19 │ def init(self, device): │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/init.py:50 in load │
│ │
│ 47 │ │ keyed by section values in dot notation. │
│ 48 │ RETURNS (Language): The loaded nlp object. │
│ 49 │ """ │
│ ❱ 50 │ return util.load_model( │
│ 51 │ │ name, vocab=vocab, disable=disable, exclude=exclude, config=config │
│ 52 │ ) │
│ 53 │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:324 in │
│ load_model │
│ │
│ 321 │ │ if name.startswith("blank:"): # shortcut for blank model │
│ 322 │ │ │ return get_lang_class(name.replace("blank:", ""))() │
│ 323 │ │ if is_package(name): # installed as package │
│ ❱ 324 │ │ │ return load_model_from_package(name, **kwargs) │
│ 325 │ │ if Path(name).exists(): # path to model data directory │
│ 326 │ │ │ return load_model_from_path(Path(name), **kwargs) │
│ 327 │ elif hasattr(name, "exists"): # Path or Path-like to model data │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:357 in │
│ load_model_from_package │
│ │
│ 354 │ RETURNS (Language): The loaded nlp object. │
│ 355 │ """ │
│ 356 │ cls = importlib.import_module(name) │
│ ❱ 357 │ return cls.load(vocab=vocab, disable=disable, exclude=exclude, config=config) │
│ 358 │
│ 359 │
│ 360 def load_model_from_path( │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/en_core_web_sm/init.py:10 │
│ in load │
│ │
│ 7 │
│ 8 │
│ 9 def load(**overrides): │
│ ❱ 10 │ return load_model_from_init_py(file, **overrides) │
│ 11 │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:517 in │
│ load_model_from_init_py │
│ │
│ 514 │ data_path = model_path / data_dir │
│ 515 │ if not model_path.exists(): │
│ 516 │ │ raise IOError(Errors.E052.format(path=data_path)) │
│ ❱ 517 │ return load_model_from_path( │
│ 518 │ │ data_path, │
│ 519 │ │ vocab=vocab, │
│ 520 │ │ meta=meta, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:392 in │
│ load_model_from_path │
│ │
│ 389 │ config_path = model_path / "config.cfg" │
│ 390 │ overrides = dict_to_dot(config) │
│ 391 │ config = load_config(config_path, overrides=overrides) │
│ ❱ 392 │ nlp = load_model_from_config(config, vocab=vocab, disable=disable, exclude=exclude) │
│ 393 │ return nlp.from_disk(model_path, exclude=exclude, overrides=overrides) │
│ 394 │
│ 395 │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/util.py:429 in │
│ load_model_from_config │
│ │
│ 426 │ # This will automatically handle all codes registered via the languages │
│ 427 │ # registry, including custom subclasses provided via entry points │
│ 428 │ lang_cls = get_lang_class(nlp_config["lang"]) │
│ ❱ 429 │ nlp = lang_cls.from_config( │
│ 430 │ │ config, │
│ 431 │ │ vocab=vocab, │
│ 432 │ │ disable=disable, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/language.py:1672 in │
│ from_config │
│ │
│ 1669 │ │ │ │ │ factory = pipe_cfg.pop("factory") │
│ 1670 │ │ │ │ │ # The pipe name (key in the config) here is the unique name │
│ 1671 │ │ │ │ │ # of the component, not necessarily the factory │
│ ❱ 1672 │ │ │ │ │ nlp.add_pipe( │
│ 1673 │ │ │ │ │ │ factory, │
│ 1674 │ │ │ │ │ │ name=pipe_name, │
│ 1675 │ │ │ │ │ │ config=pipe_cfg, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/language.py:774 in │
│ add_pipe │
│ │
│ 771 │ │ │ │ │ lang=util.get_object_name(self), │
│ 772 │ │ │ │ │ lang_code=self.lang, │
│ 773 │ │ │ │ ) │
│ ❱ 774 │ │ │ pipe_component = self.create_pipe( │
│ 775 │ │ │ │ factory_name, │
│ 776 │ │ │ │ name=name, │
│ 777 │ │ │ │ config=config, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/spacy/language.py:660 in │
│ create_pipe │
│ │
│ 657 │ │ cfg = {factory_name: config} │
│ 658 │ │ # We're calling the internal _fill here to avoid constructing the │
│ 659 │ │ # registered functions twice │
│ ❱ 660 │ │ resolved = registry.resolve(cfg, validate=validate) │
│ 661 │ │ filled = registry.fill({"cfg": cfg[factory_name]}, validate=validate)["cfg"] │
│ 662 │ │ filled = Config(filled) │
│ 663 │ │ filled["factory"] = factory_name │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:746 in │
│ resolve │
│ │
│ 743 │ │ overrides: Dict[str, Any] = {}, │
│ 744 │ │ validate: bool = True, │
│ 745 │ ) -> Dict[str, Any]: │
│ ❱ 746 │ │ resolved, _ = cls._make( │
│ 747 │ │ │ config, schema=schema, overrides=overrides, validate=validate, resolve=True │
│ 748 │ │ ) │
│ 749 │ │ return resolved │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:795 in _make │
│ │
│ 792 │ │ orig_config = config │
│ 793 │ │ if not is_interpolated: │
│ 794 │ │ │ config = Config(orig_config).interpolate() │
│ ❱ 795 │ │ filled, _, resolved = cls._fill( │
│ 796 │ │ │ config, schema, validate=validate, overrides=overrides, resolve=resolve │
│ 797 │ │ ) │
│ 798 │ │ filled = Config(filled, section_order=section_order) │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:850 in _fill │
│ │
│ 847 │ │ │ │ │ field = schema.fields[key] │
│ 848 │ │ │ │ │ schema.fields[key] = copy_model_field(field, Any) │
│ 849 │ │ │ │ promise_schema = cls.make_promise_schema(value, resolve=resolve) │
│ ❱ 850 │ │ │ │ filled[key], validation[v_key], final[key] = cls._fill( │
│ 851 │ │ │ │ │ value, │
│ 852 │ │ │ │ │ promise_schema, │
│ 853 │ │ │ │ │ validate=validate, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:849 in _fill │
│ │
│ 846 │ │ │ │ │ # validation if it doesn't receive the function return value │
│ 847 │ │ │ │ │ field = schema.fields[key] │
│ 848 │ │ │ │ │ schema.fields[key] = copy_model_field(field, Any) │
│ ❱ 849 │ │ │ │ promise_schema = cls.make_promise_schema(value, resolve=resolve) │
│ 850 │ │ │ │ filled[key], validation[v_key], final[key] = cls._fill( │
│ 851 │ │ │ │ │ value, │
│ 852 │ │ │ │ │ promise_schema, │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/site-packages/thinc/config.py:1057 in │
│ make_promise_schema │
│ │
│ 1054 │ │ │ │ name = RESERVED_FIELDS.get(param.name, param.name) │
│ 1055 │ │ │ │ sig_args[name] = (annotation, default) │
│ 1056 │ │ sig_args["config"] = _PromiseSchemaConfig │
│ ❱ 1057 │ │ return create_model("ArgModel", **sig_args) │
│ 1058 │
│ 1059 │
│ 1060 all = ["Config", "registry", "ConfigValidationError"] │
│ │
│ in pydantic.main.create_model:990 │
│ │
│ in pydantic.main.ModelMetaclass.new:299 │
│ │
│ in pydantic.fields.ModelField.infer:411 │
│ │
│ in pydantic.fields.ModelField.init:342 │
│ │
│ in pydantic.fields.ModelField.prepare:451 │
│ │
│ in pydantic.fields.ModelField._type_analysis:550 │
│ │
│ /home/matsuzaki.takumi/.conda/envs/i2p/lib/python3.8/typing.py:774 in subclasscheck │
│ │
│ 771 │ def subclasscheck(self, cls): │
│ 772 │ │ if self._special: │
│ 773 │ │ │ if not isinstance(cls, _GenericAlias): │
│ ❱ 774 │ │ │ │ return issubclass(cls, self.origin) │
│ 775 │ │ │ if cls._special: │
│ 776 │ │ │ │ return issubclass(cls.origin, self.origin) │
│ 777 │ │ raise TypeError("Subscripted generics cannot be used with" │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: issubclass() arg 1 must be a class

Comparison with minigpt4

what's the pros compare with minigpt4?

Input type and weight type should be the same

I follow the instruction and run python main.py but meet such an error:

Does anyone want to help on getting a notebook or huggingface space version of this repo?

I'm starting but I would love some help, especially from someone with more talent 😂

Unable to load weights from checkpoint file

Hi, it is a nice work. I followed the install.md to build the virtual env with scapy==3.0.0. But when I run the example with python main.py --image_src "examples/3.jpg" --out_image_name "output/3_result.jpg", there is a OSError as follow：
------This is time-consuming, please wait...------
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/diffusers/models/modeling_utils.py │
│ :109 in load_state_dict │
│ │
│ 106 │ │ if os.path.basename(checkpoint_file) == _add_variant(WEIGHTS_NAME, variant): │
│ 107 │ │ │ return torch.load(checkpoint_file, map_location="cpu") │
│ 108 │ │ else: │
│ ❱ 109 │ │ │ return safetensors.torch.load_file(checkpoint_file, device="cpu") │
│ 110 │ except Exception as e: │
│ 111 │ │ try: │
│ 112 │ │ │ with open(checkpoint_file) as f: │
│ │
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/safetensors/torch.py:261 in │
│ load_file │
│ │
│ 258 │ result = {} │
│ 259 │ with safe_open(filename, framework="pt", device=device) as f: │
│ 260 │ │ for k in f.keys(): │
│ ❱ 261 │ │ │ result[k] = f.get_tensor(k) │
│ 262 │ return result │
│ 263 │
│ 264 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: module 'torch' has no attribute 'frombuffer'

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/diffusers/models/modeling_utils.py │
│ :113 in load_state_dict │
│ │
│ 110 │ except Exception as e: │
│ 111 │ │ try: │
│ 112 │ │ │ with open(checkpoint_file) as f: │
│ ❱ 113 │ │ │ │ if f.read().startswith("version"): │
│ 114 │ │ │ │ │ raise OSError( │
│ 115 │ │ │ │ │ │ "You seem to have cloned a repository without having git-lfs ins │
│ 116 │ │ │ │ │ │ "git-lfs and run git lfs install followed by git lfs pull in │
│ │
│/miniconda3/envs/i2p/lib/python3.8/codecs.py:322 in decode │
│ │
│ 319 │ def decode(self, input, final=False): │
│ 320 │ │ # decode input (taking the buffer into account) │
│ 321 │ │ data = self.buffer + input │
│ ❱ 322 │ │ (result, consumed) = self._buffer_decode(data, self.errors, final) │
│ 323 │ │ # keep undecoded input until the next call │
│ 324 │ │ self.buffer = data[consumed:] │
│ 325 │ │ return result │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 0: invalid start byte

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /LOG/realman/LLM/Image2Paragraph/main.py:23 in │
│ │
│ 20 │ │
│ 21 │ args = parser.parse_args() │
│ 22 │ │
│ ❱ 23 │ processor = ImageTextTransformation(args) │
│ 24 │ generated_text = processor.image_to_text(args.image_src) │
│ 25 │ generated_image = processor.text_to_image(generated_text) │
│ 26 │ ## then text to image │
│ │
│ /LOG/realman/LLM/Image2Paragraph/models/image_text_transformation.py:24 in init │
│ │
│ 21 │ def init(self, args): │
│ 22 │ │ # Load your big model here │
│ 23 │ │ self.args = args │
│ ❱ 24 │ │ self.init_models() │
│ 25 │ │ self.ref_image = None │
│ 26 │ │
│ 27 │ def init_models(self): │
│ │
│ /LOG/realman/LLM/Image2Paragraph/models/image_text_transformation.py:38 in init_models │
│ │
│ 35 │ │ self.image_caption_model = ImageCaptioning(device=self.args.image_caption_device │
│ 36 │ │ self.dense_caption_model = DenseCaptioning(device=self.args.dense_caption_device │
│ 37 │ │ self.gpt_model = ImageToText(openai_key) │
│ ❱ 38 │ │ self.controlnet_model = TextToImage(device=self.args.contolnet_device) │
│ 39 │ │ self.region_semantic_model = RegionSemantic(device=self.args.semantic_segment_de │
│ 40 │ │ print('\033[1;32m' + "Model initialization finished!".center(50, '-') + '\033[0m │
│ 41 │
│ │
│ /LOG/realman/LLM/Image2Paragraph/models/controlnet_model.py:15 in init │
│ │
│ 12 class TextToImage: │
│ 13 │ def init(self, device): │
│ 14 │ │ self.device = device │
│ ❱ 15 │ │ self.model = self.initialize_model() │
│ 16 │ │
│ 17 │ def initialize_model(self): │
│ 18 │ │ if self.device == 'cpu': │
│ │
│ /LOG/realman/LLM/Image2Paragraph/models/controlnet_model.py:22 in initialize_model │
│ │
│ 19 │ │ │ self.data_type = torch.float32 │
│ 20 │ │ else: │
│ 21 │ │ │ self.data_type = torch.float16 │
│ ❱ 22 │ │ controlnet = ControlNetModel.from_pretrained( │
│ 23 │ │ │ "fusing/stable-diffusion-v1-5-controlnet-canny", │
│ 24 │ │ │ torch_dtype=self.data_type, │
│ 25 │ │ │ map_location=self.device, # Add this line │
│ │
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/diffusers/models/modeling_utils.py │
│ :602 in from_pretrained │
│ │
│ 599 │ │ │ │ # if device_map is None, load the state dict and move the params from me │
│ 600 │ │ │ │ if device_map is None: │
│ 601 │ │ │ │ │ param_device = "cpu" │
│ ❱ 602 │ │ │ │ │ state_dict = load_state_dict(model_file, variant=variant) │
│ 603 │ │ │ │ │ model._convert_deprecated_attention_blocks(state_dict) │
│ 604 │ │ │ │ │ # move the params from meta device to cpu │
│ 605 │ │ │ │ │ missing_keys = set(model.state_dict().keys()) - set(state_dict.keys( │
│ │
│ /miniconda3/envs/i2p/lib/python3.8/site-packages/diffusers/models/modeling_utils.py │
│ :125 in load_state_dict │
│ │
│ 122 │ │ │ │ │ │ "model. Make sure you have saved the model properly." │
│ 123 │ │ │ │ │ ) from e │
│ 124 │ │ except (UnicodeDecodeError, ValueError): │
│ ❱ 125 │ │ │ raise OSError( │
│ 126 │ │ │ │ f"Unable to load weights from checkpoint file for '{checkpoint_file}' " │
│ 127 │ │ │ │ f"at '{checkpoint_file}'. " │
│ 128 │ │ │ │ "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please s │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: Unable to load weights from checkpoint file for
'/.cache/huggingface/hub/models--fusing--stable-diffusion-v1-5-controlnet-canny/snapshots/7f2f69197050967007f6bbd23ab5e52f0384162a/d
iffusion_pytorch_model.safetensors' at
'/.cache/huggingface/hub/models--fusing--stable-diffusion-v1-5-controlnet-canny/snapshots/7f2f69197050967007f6bbd23ab5e52f0384162a/d
iffusion_pytorch_model.safetensors'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

In order to debug, I try to build a new virtual env following the install.sh and deleted the cache model documents and re-downloaded them again by running the main.py. But the error still happens.
How can I deal with the bug?
My torch version is as follows:
torch 1.9.0+cu111
torchaudio 0.9.0
torchvision 0.10.0+cu111

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

Best Wishes,

Qiao

requirements.txt file is missing

hi - it seems the requirements.txt file is missing so the install.sh script will fail. Could you add this in?

Excellent work

Thank you.

Data Generation

May I ask if it is convenient for you to make the generated results public? For example, images and corresponding descriptive data?

Comment instructions outdated

The comment instructions in the README.md seem to be outdated? The lines numbers listed don't seem to be meaningful if I am understanding this correctly.

install not work with python = 3.8.16

Hi,
Thanks for the great work.
try to replicate your work here, i created a new env and try pip install -r requirements.txt. but it give me an error.

ERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu111 (from versions: 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0)
ERROR: No matching distribution found for torch==1.9.0+cu111

I also try install torch using conda, but it seems not work at the implementation stage.

Any suggestions are appreciated.

Out of Memory Issue in Semantic Segmentation

Why is it that when working on semantic segmentation, I constantly encounter out of memory errors, even though I have two GPUs with 15GB each? Is it possible to distribute the model workload across the GPUs in parallel?

Region Semantic Models Do Not Work Well

First of all, thanks for the great work.

Image caption and dense caption modules all work fine here, however, the region caption module does not seem work well. I tested both edit_anything and ssa models.

For edit_anything model, it returns obviously wrong object descriptions. The following the the test image I input.

And the Region Segment module returns

a dog is walking on the floor in a room: [0, 50, 383, 165]; a person riding a skateboard down a street: [234, 49, 149, 166]; a piece of paper with a black background: [0, 0, 64, 110]; a white light switch with a black light: [312, 0, 53, 80]; the moon is seen over the city skyline: [116, 0, 56, 38];

There are clearly no dogs or skateboard in the picture.

For the ssa model, when I add --region_classify_model ssa option and change region_semantic method to use ssa, the method errors out with

│ /share/data/ripl/fjd/Image2Paragraph/models/segment_models/semantic_segment_anything_model.py:14 │
│ 7 in semantic_class_w_mask                                                                       │
│                                                                                                  │
│   144 │   │   │                                                                                  │
│   145 │   │   │   valid_mask_large_crop = mmcv.imcrop(valid_mask.numpy(), np.array([bbox[0], b   │
│   146 │   │   │   scale_large)                                                                   │
│ ❱ 147 │   │   │   top_1_patch_large = torch.bincount(class_ids_patch_large[torch.tensor(valid_   │
│   148 │   │   │   top_1_mask_category = mask_categories[top_1_patch_large.item()]                │
│   149 │   │   │                                                                                  │
│   150 │   │   │   ann['class_name'] = str(top_1_mask_category)                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: The shape of the mask [3, 23] at index 0 does not match the shape of the indexed tensor [23, 3] at index 0

I wonder you have a good way to use region segment methods.

Colab example is broken (403 error)

Colab demo is broken (403 error)

Retrieval results with no Trainable Parameter?

I'm interesting in Retrieval CoCo with no Trainable Parameter,Can you explain in detail?

Unexpected token '<', " <!DOCTYPE "... is not valid JSON

pip install markupsafe not compatible with project source code

I just run

pip install -r requirements.txt

in conda enviroment with python 3.8.10

I got error

ImportError: cannot import name 'soft_unicode' from 'markupsafe'

Version MarkupSafe:2.1.0 doesn't have soft_unicode so temporary solution might be adding this

MarkupSafe<=2.0.1

to requirements.txt

Dense Caption always return empty

Hi,

Thanks for sharing this work. Very interesting and potentially very impactful.

I encounter this issue while running python main.py --image_src "/Code/Image2Paragraph/examples/3.jpg" --out_image_name "output/3_result.jpg"

"Dense Cpation" always returns "/", and the program processes without error. I was able to get the generated text at the end along with the style-transferred image, but the caption is a bit off potentially due to the missing dense caption.

Retrieval Result on COCO

Hi, thanks for your interesting work. Could you explain why better retrieval result on COCO achieved by the Image2Paragraph method more clearly?

run question

run python main.py --image_src "examples/3.jpg" --out_image_name "output/3_result.jpg" is not work

FileNotFoundError: [Errno 2] No such file or directory: '/home/xxx/grit_output.txt'

pydantic

in pydantic.fields.ModelField._type_analysis:550
TypeError: issubclass() arg 1 must be a class

Anyone with this issue, can advise how to resolve?

DenseCaptioning contains hardcoded paths to local env

    def __init__(self) -> None:
        self.grit_working_directory = "../GRiT/"
        self.grit_env_python = '/home/aiops/wangjp/anaconda3/envs/grit/bin/python'
        self.grit_script = 'image_dense_captions.py'
        self.model_weights = 'models/grit_b_densecap_objectdet.pth'

showlab / image2paragraph Goto Github PK

image2paragraph's Issues

Recommend Projects

Recommend Topics

Recommend Org