Comments (51)
@shaform https://drive.google.com/file/d/0Bx_f0Ep2RFzhb1ktVkRnTENlQjA/view?usp=sharing
Here is the result
from makegirlsmoe_web.
@lllyasviel I've already crawled all the images, but since the face detector is not very good, it requires huge manual efforts to clean up the dataset. Anyway, you could find the scripts to crawl the images here: https://github.com/shaform/GirlsManifold.
from makegirlsmoe_web.
Sorry, I do not own the copyright of them. You should know that copyright related issues are very sensitive in Japan, I am in Japan and working for a company that may have future collaboration with those copyright owners. Publish training dataset online is almost impossible for us.
from makegirlsmoe_web.
I think the problem is lbpcascade_animeface has a poor precision/recall, especially for male characters, the recall is lower than 40% on the default setting.
The lack of labeled dataset is an obstacle for building up a powerful anime-face detection model.
Weakly supervised methods might work, but I am not sure about their performance.
from makegirlsmoe_web.
I see.
That belongs to what I mean by weakly supervised methods.
Anyway, we need experiments to prove the idea and the performance.
from makegirlsmoe_web.
Could you release the urls, labels and bounding boxes to crop out?
from makegirlsmoe_web.
@danielwaterworth I will take it into consideration. It will take some time to do this since all image should be checked for release purpose. (Like NSFW images must be removed)
from makegirlsmoe_web.
@Aixile, Thanks, I appreciate your consideration and thanks for publishing the project!
from makegirlsmoe_web.
@Aixile Is it possible to simply release the SQL query results on ErogameScape
so others can crawl them by themselves? It appears that ErogameScape has blocked IPs from other countries.
from makegirlsmoe_web.
@Aixile Thanks a lot!!
from makegirlsmoe_web.
oh thank you
from makegirlsmoe_web.
@shaform Face detection is not a problem in 2017 lol. How many pictures you have download? Can you give me a sample image of the dataset?
from makegirlsmoe_web.
We can detect anime faces with or without mathine learning. With mathine learning, we can do it with or without training.
Without ml, traditional pattern recognition works well.
Without training, Illustration2Vec has labels related to eye or mouth. Hack the predicted eye result or mouth result, and then trace back to the input to get an activation map.
With training, .......
from makegirlsmoe_web.
- I've downloaded 48,144 images.
- 37,556 of them are after year 2005.
- 33,190 faces larger than 80x80 are detected from 37,556 images.
I am not sure about the exact parameters of the face detector, so the detected results might be a little bit different from the original settings. The parameters I used are in the repo I provided.
Sample detected results could be seen here: https://imgur.com/a/9Saf3, as you can see, some of them are false positives.
from makegirlsmoe_web.
BTW, someone has tried to improve on anime face detection but failed: https://qiita.com/homulerdora/items/9a9af1481bf63470731a
from makegirlsmoe_web.
@shaform Oh nowadays people rely so much on CNN and ML! It is a simple problem that can be tackled with traditional PR filters.... Maybe I wil write a sample code in c++ later...
from makegirlsmoe_web.
I've tried to train the SRResNet-like architecture. While the discriminator is okay, once I use the SRResNet generator, the GAN stops learning and generates corrupted results. I am wondering if anyone has successfully replicated the results?
from makegirlsmoe_web.
- Depends on your situation.
If your generator does not even appear to learn, you should try a lower learning rate.
If your generator looks well, but the loss suddenly grows up to a very big value (about 1e6 in my situation) in less then 10 iterations, which leads the generator to crash.
In my experience, this phenomenon happens on DRAGAN sometimes.
I feel that the problem is due to the numerical precision of gradient penalty calculation.
This can be solved by loading the saved snapshot and restarting training from 1000~2000 iteration before the crash point. Anything changes the randomness would help you overcome the crash point and continue on training.
Using a lower learning rate can make the phenomenon happens less often.
- I apologize that I made a small mistake on my discriminator architecture in the published manuscript. The mistake is not critical, and I think it will not influence the performance.
In my experience, the discriminator architecture is less sensitive as long as you add a gradient penalty term to enforce the lipschitz.
from makegirlsmoe_web.
@Aixile Thanks! I'll try your suggestions.
from makegirlsmoe_web.
@shaform
Why not add a VAE?
Paper: https://arxiv.org/abs/1706.04987
code: https://github.com/victor-shepardson/alpha-GAN
from makegirlsmoe_web.
if you ever played with alphagan you should find it is extremely hard to train.
from makegirlsmoe_web.
Not with Bayesian calculated channels instead of casually stacked res blockes and violent GP.
from makegirlsmoe_web.
@lllyasviel Thanks! Since I haven't read the paper yet, I am wondering whether the Bayesian calculated channels
concept is already in that paper or if you are referring to other techniques?
from makegirlsmoe_web.
It is a concept from the very beginning of the first GAN. The quantity of channels are critical for dcGAN and some GANs proposed by Google Brain. The channels in some cases should be limited to the dimentions of latent space vector because it is Bayesian. For example, the-state-of-art face generative model BEGAN has a G shaped as 1x1x128->8x8x128->16x16x128->32x32x128->64x64x128->128x128x128->128x128x3, it seems strange that the channels keeps 128 in the whole procedure but the BEGAN has very very impressive and incrediable results in face ganeration. Another example is hyperGAN which can generate 256x256 images without any label/mask/pair or other conditional hints, the secret of its success is also Bayesian calculated channels, and you can check it at their repo.
from makegirlsmoe_web.
@lllyasviel Thank you. It's very insightful. I'll try it in my experiments.
from makegirlsmoe_web.
The channels in some cases should be limited to the dimentions of latent space vector because it is Bayesian.
What do you mean by Bayesian?
from makegirlsmoe_web.
Indeed I am also curious. It appears that HyperGAN is not using fixed number of channels: 128x128x3 -> 64x64x16 -> 32x32x32 -> 16x16x48 -> 8x8x64 -> 4x4x80
. Is this still Bayesian?
from makegirlsmoe_web.
yes.
You can try this to understand. replace 128x128x3 -> 64x64x16 -> 32x32x32 -> 16x16x48 -> 8x8x64 -> 4x4x80 to 128x128x3 -> 64x64x64 -> 32x32x128 -> 16x16x256 -> 8x8x512 -> 4x4x512(or 4x4x1024)->1x1x128(or 256 or 512). Then you should only have some noise maps as GAN results, whatever your training data is.
Because the training of MakeGirlsMoe is supervised by conditional hints, channels and layers or resnets can be stacked casually without so much consideration. But if all hints are removed, a proper channel quantity and depth is of critical importance. Some think the alphaGAN is difficulty to train because nowadays supervised GAN training makes people pay less attention to the channels and depth, resorting to resnet and GP.
from makegirlsmoe_web.
@lllyasviel Thanks, but DCGAN uses 64x64x64 -> 32x32x128 -> 16x16x256 -> 8x8x512 -> 4x4x1024
. So this is not the optimal settings?
Are there any guidelines to choose the number of channels? Is it enough to just make sure that the number of channels doesn't exceed the dimensions of latent space vector as indicated by your previous comment?
The channels in some cases should be limited to the dimentions of latent space vector
from makegirlsmoe_web.
I am talking about the resolution of 256. DCGAN works in 64x64 in this architecture.
Oh it is my mistake. I mean replace the 256x256x3 -> 128x128x16 -> 64x64x32 -> 32x32x48 -> 16x16x64 -> 8x8x80 in hyperGAN
from makegirlsmoe_web.
Not with Bayesian calculated channels instead of casually stacked res blockes and violent GP.
@lllyasviel The same question: what are you exactly referring to by mentioning "bayesian"?
It would be helpful to make explicit references to relative publications in the discussion.
from makegirlsmoe_web.
I do not agree with you.
Vanilla dcgan can work on large images.
Here is a result I trained with vanilla dcgan.
192x192x3 -> 96x96x32 -> 48x48x64 -> 24x24x128 -> 12x12x256 -> 6x6x512
Because the training of MakeGirlsMoe is supervised by conditional hints, channels and layers or resnets can be stacked casually without so much consideration.
No, the model without conditional hints works well.
resblock
is not critical, it can work without any resblock
, but adding resblock
make the quality slightly better in my experiment. As the resblock
is only added to low resolution feature maps, it is not the bottleneck of the computation.
GP
is critical because it can enforce the discriminator to be lipschitz and make the optimization problem well behavior.
But the form of GP is not important in my experiments.
from makegirlsmoe_web.
I can only give some personal experience because many papers holds different ideas. But what is sure is that the channels must be taken into consideration.
For example, human face generation. Howcomes the channels of BEGAN is 128? why is 128 not 256 or 512? This can be tested in the following way:
step 1: make an encoder and a decoder, and link them with a 1x1xN layer. N should be large enough, for example 1024.
step 2: train the network to copy faces, use l1 loss.
step 3: you will see a blur face as output.
step 4: cut down the N by a number like 128 or 256 and train again.
step 5: repeat step 4 till the blur face begin to be very very blur and we can not see a figure of face.
Then finally in my experiment of BEGAN, the N is 128, the same value of Google Brain.
Then we know one thing, the human face can be coded into a 128 vector. (And if you think of it carefully you will find this is why BEGAN works terribly in lsun bedroom.)
Then we get an important number 128, the number is the source of a face, any generated things should be able to be traced back to the vector bayesianly. I mean every layer should not exceed this limitation, and layers can do decorations to the feature from previous layer, but we should not give a layer the ability to disorder the main latent features and disturb the next layer's bayesian backtracing. For example 1x1x128->8x8x4096->8x8x4096 is a bad choice.
from makegirlsmoe_web.
Here is an example of Vanilla GAN, I trained on 256x256 images.
Where I use
8x8x1024 -> 16x16x512 -> 32x32x256 -> 64x64x128 -> 128x128x64 -> 256x256x32 -> 256x256x1
If you find the architecture not appear to converge, using lower learning rate can solve most problems.
from makegirlsmoe_web.
@Aixile Wow, this is so realistic. It seems like we don't really need the complex StackGANs.
from makegirlsmoe_web.
These are results of a dcGAN of 8x8x1024 -> 16x16x512 -> 32x32x256 -> 64x64x128 -> 128x128x64 -> 256x256x32 -> 256x256x1?
from makegirlsmoe_web.
@lllyasviel Yes
from makegirlsmoe_web.
OK. these flowers are impressive and defeated me. I will delete my stackGAN and reimplement your architecture.
from makegirlsmoe_web.
@shaform We have new results on end-end training of high-resolution images. I am working on the paper. Codes will be released after that.
from makegirlsmoe_web.
@Aixile Thanks~ I am looking forward to read the paper.
from makegirlsmoe_web.
and then have a war with stack GAN ++?
https://github.com/hanzhanggit/StackGAN-v2
from makegirlsmoe_web.
The above DCGAN is trained with a learning rate starting from 0.0001, and decreasing with a delay rate of 0.8 every 3000 iterations after 30000 iterations. (Batch size 64)
from makegirlsmoe_web.
@lllyasviel
As you mention HyperGAN, have you ever tried to reproduce their 256 results?
It seems that they did not mention how 256 results were trained.
I run their code with their default settings, which use selu and lsgan on 192x192 CelebA images, but the training is failed.
(As CelebA images have a width of 178, I center crop the 178x178 image, and upscale to 192x192. Since the full-scale images are used, the dataset image contains more noises from the background, which would make generating high-quality images harder. It seems that they didn't use the full-scale CelebA. )
from makegirlsmoe_web.
Just saw this:
https://github.com/tkarras/progressive_growing_of_gans
from makegirlsmoe_web.
Yelp, my friend forwarded this to me this morning, and I was totally blown away
from makegirlsmoe_web.
holy shiiit.
And their G looks so nature and excellent without any flaring things.
Many things can be reconsidered now, including cGAN I think.
from makegirlsmoe_web.
Almost all things they used are different from the gan literature...
Weight Scaling, Pixel normalization, Smoothed Generator Weight, additional regularization for WGAN-GP, which make it extremely difficult to catch up with their progress.
I quickly implement a Weight Scaling + Pixel normalization based 32x32 generator, but I failed to train it in an end2end manner.
from makegirlsmoe_web.
I think the "additional regularization for WGAN-GP" is not so critical and the architecture can be reimplemented without it.
In my opinion, the "Weight Scaling, Pixel normalization, Smoothed Generator Weight" can be replaced by a very unreasonable method but it should works: Just lock the weights for different layers in different training procedures.
I am buzy with exams now but I will try it once I have time.
from makegirlsmoe_web.
The main objective of these weight regulations, I think, is to avoid the trained weights being disturbed by newly initialized weights.
Then why not lock these weights directly? I have not tried it....
BTW, I read the paper again and I think maybe the results can be achieved directly without the "progressive", because their methods make no improvements to GAN system or G performance. Maybe the "progressive" does nothing other than a accelerator? Maybe without their methods I can also achieve these results in one year of training and I can get these results in 40 days with their methods?
from makegirlsmoe_web.
The so called weight scaling (EQUALIZED LEARNING RATE) looks very similar to weight normalization, and someone has shown that WN works well with GANs(1). I personally dislike BN very much so I have been using WN with GANs for some time. I felt that when BN is removed, it appears more likely for GANs to encounter gradient explosion. Perhaps their pixel normalization mitigates this issue very well.
(1): https://arxiv.org/abs/1704.03971
from makegirlsmoe_web.
@Aixile did you release the new code for training high-resolution images? I'm looking forward to seeing the paper. Thanks
from makegirlsmoe_web.
Related Issues (20)
- make result image not scroll off page HOT 1
- Clarification about the generator archetecture HOT 1
- bodys HOT 4
- Keep waifu the same, randomize pose and/or gesture HOT 1
- Provide up-to-date links to the data set HOT 1
- Model training scripts code HOT 3
- Sanity down results: problem of model or program?
- Hairstyle does not reflect on generated image
- Looking for cooperation. HOT 1
- [Idea]Guideline photo Generate
- Add precommit hooks using husky to run npm test before each commit HOT 2
- Need more data packs
- Generate anime girl based on real image HOT 2
- I have an error while using it locally. Network Error HOT 5
- Releasing the Chainer model
- how does this model being trained?
- Display driver crash when using WebGL
- How to increase the resolution from 128*128 to 256*256?
- Can I train the model myself?
- Look at your creation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from makegirlsmoe_web.