Code Monkey home page Code Monkey logo

inst's People


zyxelsa avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

inst's Issues



Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the samethis may be cased by the code in line 1112 in, plz change the code like this, but im not sure, plz let me see the hole issue report

          this may be cased by the code in line 1112 in, plz change the code like this, but im not sure, plz let me see the hole issue report


Originally posted by @neverenough7 in #13 (comment)

this is my issue, I do not know how to find it`s problem. please help me solve it. Thank you

Load pre-trained model:CLIP transformer encoder

while runing InST.ipynb, I have encountered the following problem. It seems to be failing to load CLIPProcessor when loading.
Has anyone had a similar problem, please help, thanks!

Traceback (most recent call last):
File "", line 68, in
model = load_model_from_config(config, f"{ckpt}")
File "", line 39, in load_model_from_config
model = instantiate_from_config(config.model)
File "/home/mll/DeepLearning/InST/ldm/", line 85, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()), **kwargs)
File "/home/mll/DeepLearning/InST/ldm/models/diffusion/", line 477, in init
File "/home/mll/DeepLearning/InST/ldm/models/diffusion/", line 553, in instantiate_cond_stage
model = instantiate_from_config(config)
File "/home/mll/DeepLearning/InST/ldm/", line 85, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()), **kwargs)
File "/home/mll/DeepLearning/InST/ldm/modules/encoders/", line 169, in init
self.processor = CLIPProcessor.from_pretrained(version)
File "/home/mll/miniconda3/envs/inst/lib/python3.8/site-packages/transformers/", line 186, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/mll/miniconda3/envs/inst/lib/python3.8/site-packages/transformers/", line 230, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
File "/home/mll/miniconda3/envs/inst/lib/python3.8/site-packages/transformers/", line 308, in from_pretrained
feature_extractor_dict, kwargs = cls.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs)
File "/home/mll/miniconda3/envs/inst/lib/python3.8/site-packages/transformers/", line 436, in get_feature_extractor_dict
raise EnvironmentError(
OSError: We couldn't connect to '' to load this model, couldn't find it in the cached files and it looks like openai/clip-vit-large-patch14 is not the path to a directory containing a preprocessor_config.json file.


i am currently facing issue with torch._six module.
current torch version=
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch

ModuleNotFoundError Traceback (most recent call last)
Cell In[7], line 10
8 sys.path.append(".")
9 sys.path.append('./taming-transformers')
---> 10 from taming.models import vqgan

File ~/Desktop/Difussion_InST/InST/scripts/./taming-transformers/taming/models/
2 import torch.nn.functional as F
3 import pytorch_lightning as pl
----> 5 from main import instantiate_from_config
7 from taming.modules.diffusionmodules.model import Encoder, Decoder
8 from taming.modules.vqvae.quantize import VectorQuantizer2 as VectorQuantizer

File ~/Desktop/Difussion_InST/InST/scripts/./taming-transformers/
11 from pytorch_lightning.callbacks import ModelCheckpoint, Callback, LearningRateMonitor
12 from pytorch_lightning.utilities import rank_zero_only
---> 14 from import custom_collate
17 def get_obj_from_str(string, reload=False):
18 module, cls = string.rsplit(".", 1)

File ~/Desktop/Difussion_InST/InST/scripts/./taming-transformers/taming/data/
10 from import Annotation
11 # from torch._six import str as string_classes
---> 12 import torch._six.string_classes as string_classess
13 from import np_str_obj_array_pattern, default_collate_err_msg_format
14 from tqdm import tqdm

ModuleNotFoundError: No module named 'torch._six'

NameError: name 'trainer' is not defined

这个问题不知道怎么解决,我运行的参数如下python --base configs/stable-diffusion/v1-finetune.yaml -t --actual_resume ./models/sd/sd-v1-4.ckpt -n myrun --gpus 0, --data_root .\images
NameError: name 'trainer' is not defined


Can you tell me which dataset you use?

Style doesn't transfer at all when running InST.ipynb with empty prompt.

Thank you for your paper, the results look amazing!!

Tried to run InST.ipynb to reproduce results published in the Comparison Data section.

As for style reference, I specified the following parameters: style_dir=andre-derain.jpg, embeddings - andre-derain_embeddings.pt_


Other parameters: content_dir=pexels-anastasiya-lobanovskaya-804952.jpg, and prompt=''

After running it with n_iter=3, got following results:


So it seems that style wasn't transferred at all in my case.

Could you please suggest whether I run InST.ipynb with the wrong parameters, or what could go not as expected in my setup?

How to separate style and content in the reference image?

If one use a textual inversion to describe the style image, how to separate style and content in the reference image? I understand that "content" is part of the "style"(as the paper mentioned, semantics elements are part of style). However, in Fig.4, 2rd row, InST result is totally a robot as the reference image, while the content is a human. So I am confused about the evaluation: which parts of elements of the style image are we really want to transfer to content image...

How about your training dataset?

Hi expert,
As you mentioned in the paper "The training process takes about 20 minutes each image on one NVIDIA GeForce RTX3090 with a batch size of 1".
So, what the dataset you used as training data? I would like to compare with some state of art method of style transfer.

the results are bad

I train the and use it in InST.ipynb to generate picture, but the results are bad not as good as what the author provided.
I want ask what caused it, Is it because there were other details during training?

using prompt

Hi ,
Can I use your network to transform my image to a different style (anime for example) and use also textual prompt (for example 'riding a horse') and receive an image of myself in the style of anime riding a horse?
How can I do it ? I trained the network with the anime style and tried adding a prompt in the evaluation and the prompt didn't effect the image
Thank you very much in advance,



When I was training, I encountered the following error:

pytorch_lightning.utilities.exceptions.MisconfigurationException: No test_step() method defined to

It seems like the training has been completed, but there is no way to test it.What should I do? DoI need to modify the code?
I also want to know if the training data and test data are the same during training? Run command only have --data_root.

Thank a lot!

During the training process, I encountered the following issue. Has anyone encountered a similar problem? Please help me to solve it, thank you!

(ldm) lz@manager-Precision-7920-Tower:~/Documents/InST$ python --base configs/stable-diffusion/v1-finetune.yaml -t --actual_resume ./models/sd/sd-v1-4.ckpt -n log1_shuimo --gpus 0, --data_root /home/lz/Documents/InST/style
Global seed set to 23
Running on GPUs 0,
Loading model from ./models/sd/sd-v1-4.ckpt
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Traceback (most recent call last):
File "", line 582, in
model = load_model_from_config(config, opt.actual_resume)
File "", line 29, in load_model_from_config
model = instantiate_from_config(config.model)
File "/home/lz/Documents/InST/ldm/", line 85, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()), **kwargs)
File "/home/lz/Documents/InST/ldm/models/diffusion/", line 477, in init
File "/home/lz/Documents/InST/ldm/models/diffusion/", line 561, in instantiate_cond_stage
model = instantiate_from_config(config)
File "/home/lz/Documents/InST/ldm/", line 85, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()), **kwargs)
File "/home/lz/Documents/InST/ldm/modules/encoders/", line 166, in init
self.tokenizer = CLIPTokenizer.from_pretrained(version)
File "/home/lz/anaconda3/envs/ldm/lib/python3.8/site-packages/transformers/", line 1764, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for 'openai/clip-vit-large-patch14'. If you were trying to load it from '', make sure you don't have a local directory with the same name. Otherwise, make sure 'openai/clip-vit-large-patch14' is the correct path to a directory containing all relevant files for a CLIPTokenizer tokenizer.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 795, in
if trainer.global_rank == 0:
NameError: name 'trainer' is not defined

Error when running: $conda env create -f environment.yaml

When try to install packages under project folder, error occurs.
The error message is:

(base) PS C:\Users\bchan\Desktop\project\creativity-transfer> conda env create -f environment.yaml
Collecting package metadata (repodata.json): done
Solving environment: done
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: / Ran pip subprocess with arguments:
['C:\\Users\\bchan\\Miniconda3\\envs\\ldm\\python.exe', '-m', 'pip', 'install', '-U', '-r', 'C:\\Users\\bchan\\Desktop\\project\\creativity-transfer\\condaenv.v7jnhji2.requirements.txt']
Pip subprocess output:

Pip subprocess error:
ERROR: File "" not found. Directory cannot be installed in editable mode: C:\Users\bchan\Desktop\project\creativity-transfer


CondaEnvException: Pip failed

After some research online, by removing the last line in environment.yaml, that is, " - -e .", can avoid this error and successfully install the above dependencies.
OS: windows 11 22H2

bug in embedding_manager

embedding_manager save和load模型时使用的是initial_embeddings,设置为requires_grad=False,测试了一下训练过程中没有更新这个参数,只更新了attention网络。看了一下InST的代码,这里应该是保存的string_to_token_dict而不是initial_embeddings吧。, data) in

code:, data) in
error: RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
I don't know how to solve the problem

training time for ""

The paper says "The training process takes about 20 minutes each image on one NVIDIA GeForce RTX3090 with a batch size of 1."
I used two RTX3090. However, it already cost three hours and it doesn't seem to stop.

For "v1-finetune.yaml”, which paramater specifys the number of epochs in the training process?

Trained results are not so good

Thank you for your great work!
I trained an image and get results by and CreativityTransfer.ipynb, but I can't get similar results to the results in the paper. Can you give specific steps or parameters in
Below is content image and my result.
src001-_-embeddings_gs-549 pt-0052  3

Request for exemplar images

Hi, thank you for sharing the code!

Can you also share some exemplar images so that we can check whether we are on the right track of training and evaluation?

Also, I tried to transfer the style of a sample image to the newly trained style of TheStarryNight and got the following.

Is this an expected result?

What should I do to change this old man into Domhnall Gleeson?


you will release yours model?

  1. I trained myself poorly,
  2. Can this be achieved? use a image trained a model, then we can transfer a video (or a scene) not only transfer a image (likeness trained image)


Train Step: occur some errors as the following

  1. ConnectionRefusedError: [Errno 111] Connection refused
  2. urllib.error.URLError: <urlopen error [Errno 111] Connection refused>
  3. NameError: name 'trainer' is not defined
    I don't know how to solve these problems. I would like to know if there is a code for linking pages in this project.I don't actually see any code like that

No "models/sd/sd-v1-4.ckpt"

Hello, thank you for sharing your code. But when I ran InST.ipynb for testing, I found that there was no corresponding trained model. Can you please share this test model?

Clarification on style-transfer details

First, really nice idea and results but for the style transfer case where a content reference image is given, the steps is not really explained clearly in the paper about how the content is preserved. Is the stochastic inversion related to this? If so, how?

Training doesn't work


I run the training code with reference image, but it seems doesn't work. I use the trained model to stylize the image, it always generate same images no matter what input style image I provide. And in the logs, the sampled images are all noises like this:

conda install error report

Really fantastic work! Respect firstly :-)

When I typed conda env create -f environment.yaml , error occured:

Installing pip dependencies: / Ran pip subprocess with arguments:
['/home/miniconda3/envs/ldm/bin/python', '-m', 'pip', 'install', '-U', '-r', '/data/application/creativity-transfer/condaenv.hx046q8u.requirements.txt']
Pip subprocess output:

Pip subprocess error:
ERROR: File "" not found. Directory cannot be installed in editable mode: /data/application/creativity-transfer

CondaEnvException: Pip failed

OS: Centos7

Maybe because of missing of in repo folder?

Unable to reproduce paper results,无法复现论文结果

By relying on style style screenshots in the paper to train embeddings. pt, it is impossible to achieve results that look as good as the paper,Doubt the feasibility of this method. May I ask if you can provide the embeddings. pt file for the author's training, any one is sufficient. Thank you very much!



Different style_dir output the same result

When training with, how many images are typically placed in the data_root folder? Is it one image per style? Why do I get the same output results when setting different style_dir in Inst.ipynb?

any difference with the textual version

In your code, I saw that you mainly employed the textual version framework. I wanted to know the mainly difference is that you only used single image to embed the style and the textual version method used 3~5 images?

Question about why the predict noise is used ?

InST is a great work! Thanks for your great work and releasing your code . But i feel a little confused that content image is added the predicted noise , and why the predict noise is used instead of the stochastic noise? Thanks for your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.