Comments (14)
@Laidawang you can detect the face in the image, and crop it.
from ip-adapter.
@Laidawang you are right, but building such a dataset needs a certain amount of work, of course it will make the IP-Adapter more powerful (in fact, that is in our plan). By the way, we have trained an IP-Adapter which uses face image as image prompt (https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter-plus-face_demo.ipynb). during training, we use face as image condition, but the full image is GT.
That's to say, the
batch["clip_image"]
in the training script corresponds to the cropped image, and thebatch["images"]
corresponds to the full image?
yes.
from ip-adapter.
@Laidawang hi, the IP-Adapter only needs trained on sd1.5, but can be used on most community models. For training, you need prepare image-text pairs, and convert the data into a json file:
[ {"text": "A dog", "image_file": "dog.jpg"}, {"text": "A cat", "image_file": "cat.jpg"} ]
Dear xiaohu,
May I ask one question about the json file of training data? Is the 'text' similar to how we train Lora model? I mean, if there are many elements in my images in terms of table,chair,carpet .etc, how should I prepare the 'text'?
Looking forward your reply.
Best,
from ip-adapter.
@Laidawang hi, the IP-Adapter only needs trained on sd1.5, but can be used on most community models. For training, you need prepare image-text pairs, and convert the data into a json file:
[
{"text": "A dog", "image_file": "dog.jpg"},
{"text": "A cat", "image_file": "cat.jpg"}
]
from ip-adapter.
@xiaohu2015 , thank you for your help, so we use this image as input of clip and the ground truth, Will this limit the variety of image embbending? Because in my experiment, when the scale is high (0.9 or higher), it will basically restore the image completely, but when it is low (0.3), it will cause some empty scenes.
I'm trying to use inpaiting to create a background for some small objects with this tec.
from ip-adapter.
@Laidawang you maybe adjust the scale and add some text prompts to get good results. For now, we just use same image as condition and ground truth, it maybe limit its generation ability. In addition, we are also exploring better solutions
from ip-adapter.
@xiaohu2015 I think you use semantically consistent prompt and image during training, which will cause problems when the input image and prompt semantically inconsistent.
maybe we can try such training, for example: prompt: a cat image: an empty scene, Gt: a cat in such a scene. Or vice versa: prompt: describes the scene, image: a cat, GT: the cat is in the scene, i think this separate the influence of the prompt and the input image at the embbending level.
from ip-adapter.
@Laidawang you are right, but building such a dataset needs a certain amount of work, of course it will make the IP-Adapter more powerful (in fact, that is in our plan). By the way, we have trained an IP-Adapter which uses face image as image prompt (https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter-plus-face_demo.ipynb). during training, we use face as image condition, but the full image is GT.
from ip-adapter.
wow,that's really nice
from ip-adapter.
@Laidawang you are right, but building such a dataset needs a certain amount of work, of course it will make the IP-Adapter more powerful (in fact, that is in our plan). By the way, we have trained an IP-Adapter which uses face image as image prompt (https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter-plus-face_demo.ipynb). during training, we use face as image condition, but the full image is GT.
in that case, How to make a dataset, can you give an example?
from ip-adapter.
@Laidawang you are right, but building such a dataset needs a certain amount of work, of course it will make the IP-Adapter more powerful (in fact, that is in our plan). By the way, we have trained an IP-Adapter which uses face image as image prompt (https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter-plus-face_demo.ipynb). during training, we use face as image condition, but the full image is GT.
That's to say, the batch["clip_image"]
in the training script corresponds to the cropped image, and the batch["images"]
corresponds to the full image?
from ip-adapter.
@Laidawang you are right, but building such a dataset needs a certain amount of work, of course it will make the IP-Adapter more powerful (in fact, that is in our plan). By the way, we have trained an IP-Adapter which uses face image as image prompt (https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter-plus-face_demo.ipynb). during training, we use face as image condition, but the full image is GT.
That's to say, the
batch["clip_image"]
in the training script corresponds to the cropped image, and thebatch["images"]
corresponds to the full image?yes.
Thank you for such a quick reply! I've tried the model ip-adapter-plus-face_sd15.bin
, and I find it's still hard to preserve human face likeness as discussed in #5. Do you think it would be better to replace the original CLIP with a face-specified CLIP model like FaRL, or do you have a better suggestion?
from ip-adapter.
@hkunzhe I think you can make a try, for CLIP models, I found that it can only learn the similar structure of face. Hence, I think using face-specified model is more hopeful. However, my eary experiments using features from face recognition models does not work well, it is hard to training and learing using only diffusion losses.
from ip-adapter.
@xiaohu2015 Thanks for your great work!
Did you pre-train a face recognition before training it with the diffusion model, or train them simultaneously?
Is it possible for you to describe your previous experiments briefly?
Thanks a lot for your help in advance! Have a nice day :)
from ip-adapter.
Related Issues (20)
- How to generate near-head zoom in image
- The problem occurs when I use instantID in conjunction with Ipadapter
- ip-adapter-faceid-portrait_sdxl_unnorm ? not working
- format of a JSON file for the dataset
- train strategy
- person's skin is blue HOT 3
- how much time will convergence
- is it right while unet set in evaluation mode for training? HOT 1
- About version of ClipVisionModel HOT 4
- Error in IP-Adapter Portrait when using list of prompts HOT 3
- How can i retain the detail and structure of the reference image as much as possible HOT 2
- have no 'pytorch_model.bin' been saved during training HOT 2
- faceid training got stuck
- Basic understanding of the IP adapter during image generation HOT 1
- Cross Attention in training and generation HOT 3
- ipadapter train with other model
- faceid separate
- Using Prompt_Embeds in IPAdapter Generation? HOT 1
- Training without text HOT 1
- Is training on small datasets feasible HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ip-adapter.