Is it possible to share the 31255 dataset? 31255 128p images will be not that big.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The Training dataset,about makegirlsmoe/makegirlsmoe_web

Comments (51)

Aixile commented on May 24, 2024 4

@shaform https://drive.google.com/file/d/0Bx_f0Ep2RFzhb1ktVkRnTENlQjA/view?usp=sharing
Here is the result

from makegirlsmoe_web.

shaform commented on May 24, 2024 2

@lllyasviel I've already crawled all the images, but since the face detector is not very good, it requires huge manual efforts to clean up the dataset. Anyway, you could find the scripts to crawl the images here: https://github.com/shaform/GirlsManifold.

from makegirlsmoe_web.

Aixile commented on May 24, 2024 1

Sorry, I do not own the copyright of them. You should know that copyright related issues are very sensitive in Japan, I am in Japan and working for a company that may have future collaboration with those copyright owners. Publish training dataset online is almost impossible for us.

from makegirlsmoe_web.

Aixile commented on May 24, 2024 1

I think the problem is lbpcascade_animeface has a poor precision/recall, especially for male characters, the recall is lower than 40% on the default setting.
The lack of labeled dataset is an obstacle for building up a powerful anime-face detection model.
Weakly supervised methods might work, but I am not sure about their performance.

from makegirlsmoe_web.

Aixile commented on May 24, 2024 1

I see.
That belongs to what I mean by weakly supervised methods.
Anyway, we need experiments to prove the idea and the performance.

from makegirlsmoe_web.

danielwaterworth commented on May 24, 2024

Could you release the urls, labels and bounding boxes to crop out?

from makegirlsmoe_web.

Aixile commented on May 24, 2024

@danielwaterworth I will take it into consideration. It will take some time to do this since all image should be checked for release purpose. (Like NSFW images must be removed)

from makegirlsmoe_web.

danielwaterworth commented on May 24, 2024

@Aixile, Thanks, I appreciate your consideration and thanks for publishing the project!

from makegirlsmoe_web.

shaform commented on May 24, 2024

@Aixile Is it possible to simply release the SQL query results on ErogameScape so others can crawl them by themselves? It appears that ErogameScape has blocked IPs from other countries.

from makegirlsmoe_web.

shaform commented on May 24, 2024

@Aixile Thanks a lot!!

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

oh thank you

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

@shaform Face detection is not a problem in 2017 lol. How many pictures you have download? Can you give me a sample image of the dataset?

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

We can detect anime faces with or without mathine learning. With mathine learning, we can do it with or without training.
Without ml, traditional pattern recognition works well.
Without training, Illustration2Vec has labels related to eye or mouth. Hack the predicted eye result or mouth result, and then trace back to the input to get an activation map.
With training, .......

from makegirlsmoe_web.

shaform commented on May 24, 2024

@lllyasviel

I've downloaded 48,144 images.
37,556 of them are after year 2005.
33,190 faces larger than 80x80 are detected from 37,556 images.

I am not sure about the exact parameters of the face detector, so the detected results might be a little bit different from the original settings. The parameters I used are in the repo I provided.

Sample detected results could be seen here: https://imgur.com/a/9Saf3, as you can see, some of them are false positives.

from makegirlsmoe_web.

shaform commented on May 24, 2024

BTW, someone has tried to improve on anime face detection but failed: https://qiita.com/homulerdora/items/9a9af1481bf63470731a

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

@shaform Oh nowadays people rely so much on CNN and ML! It is a simple problem that can be tackled with traditional PR filters.... Maybe I wil write a sample code in c++ later...

from makegirlsmoe_web.

shaform commented on May 24, 2024

I've tried to train the SRResNet-like architecture. While the discriminator is okay, once I use the SRResNet generator, the GAN stops learning and generates corrupted results. I am wondering if anyone has successfully replicated the results?

from makegirlsmoe_web.

Aixile commented on May 24, 2024

@shaform

Depends on your situation.

If your generator does not even appear to learn, you should try a lower learning rate.

If your generator looks well, but the loss suddenly grows up to a very big value (about 1e6 in my situation) in less then 10 iterations, which leads the generator to crash.
In my experience, this phenomenon happens on DRAGAN sometimes.
I feel that the problem is due to the numerical precision of gradient penalty calculation.
This can be solved by loading the saved snapshot and restarting training from 1000~2000 iteration before the crash point. Anything changes the randomness would help you overcome the crash point and continue on training.
Using a lower learning rate can make the phenomenon happens less often.

I apologize that I made a small mistake on my discriminator architecture in the published manuscript. The mistake is not critical, and I think it will not influence the performance.
In my experience, the discriminator architecture is less sensitive as long as you add a gradient penalty term to enforce the lipschitz.

from makegirlsmoe_web.

shaform commented on May 24, 2024

@Aixile Thanks! I'll try your suggestions.

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

@shaform
Why not add a VAE?
Paper: https://arxiv.org/abs/1706.04987
code: https://github.com/victor-shepardson/alpha-GAN

from makegirlsmoe_web.

Aixile commented on May 24, 2024

if you ever played with alphagan you should find it is extremely hard to train.

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

Not with Bayesian calculated channels instead of casually stacked res blockes and violent GP.

from makegirlsmoe_web.

shaform commented on May 24, 2024

@lllyasviel Thanks! Since I haven't read the paper yet, I am wondering whether the Bayesian calculated channels concept is already in that paper or if you are referring to other techniques?

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

It is a concept from the very beginning of the first GAN. The quantity of channels are critical for dcGAN and some GANs proposed by Google Brain. The channels in some cases should be limited to the dimentions of latent space vector because it is Bayesian. For example, the-state-of-art face generative model BEGAN has a G shaped as 1x1x128->8x8x128->16x16x128->32x32x128->64x64x128->128x128x128->128x128x3, it seems strange that the channels keeps 128 in the whole procedure but the BEGAN has very very impressive and incrediable results in face ganeration. Another example is hyperGAN which can generate 256x256 images without any label/mask/pair or other conditional hints, the secret of its success is also Bayesian calculated channels, and you can check it at their repo.

from makegirlsmoe_web.

shaform commented on May 24, 2024

@lllyasviel Thank you. It's very insightful. I'll try it in my experiments.

from makegirlsmoe_web.

Aixile commented on May 24, 2024

The channels in some cases should be limited to the dimentions of latent space vector because it is Bayesian.

What do you mean by Bayesian?

from makegirlsmoe_web.

shaform commented on May 24, 2024

Indeed I am also curious. It appears that HyperGAN is not using fixed number of channels: 128x128x3 -> 64x64x16 -> 32x32x32 -> 16x16x48 -> 8x8x64 -> 4x4x80. Is this still Bayesian?

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

yes.
You can try this to understand. replace 128x128x3 -> 64x64x16 -> 32x32x32 -> 16x16x48 -> 8x8x64 -> 4x4x80 to 128x128x3 -> 64x64x64 -> 32x32x128 -> 16x16x256 -> 8x8x512 -> 4x4x512(or 4x4x1024)->1x1x128(or 256 or 512). Then you should only have some noise maps as GAN results, whatever your training data is.
Because the training of MakeGirlsMoe is supervised by conditional hints, channels and layers or resnets can be stacked casually without so much consideration. But if all hints are removed, a proper channel quantity and depth is of critical importance. Some think the alphaGAN is difficulty to train because nowadays supervised GAN training makes people pay less attention to the channels and depth, resorting to resnet and GP.

from makegirlsmoe_web.

shaform commented on May 24, 2024

@lllyasviel Thanks, but DCGAN uses 64x64x64 -> 32x32x128 -> 16x16x256 -> 8x8x512 -> 4x4x1024. So this is not the optimal settings?
Are there any guidelines to choose the number of channels? Is it enough to just make sure that the number of channels doesn't exceed the dimensions of latent space vector as indicated by your previous comment?

The channels in some cases should be limited to the dimentions of latent space vector

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

I am talking about the resolution of 256. DCGAN works in 64x64 in this architecture.
Oh it is my mistake. I mean replace the 256x256x3 -> 128x128x16 -> 64x64x32 -> 32x32x48 -> 16x16x64 -> 8x8x80 in hyperGAN

from makegirlsmoe_web.

alantian commented on May 24, 2024

Not with Bayesian calculated channels instead of casually stacked res blockes and violent GP.

@lllyasviel The same question: what are you exactly referring to by mentioning "bayesian"?
It would be helpful to make explicit references to relative publications in the discussion.

from makegirlsmoe_web.

Aixile commented on May 24, 2024

I do not agree with you.
Vanilla dcgan can work on large images.

Here is a result I trained with vanilla dcgan.
192x192x3 -> 96x96x32 -> 48x48x64 -> 24x24x128 -> 12x12x256 -> 6x6x512

Because the training of MakeGirlsMoe is supervised by conditional hints, channels and layers or resnets can be stacked casually without so much consideration.

No, the model without conditional hints works well.
resblock is not critical, it can work without any resblock, but adding resblock make the quality slightly better in my experiment. As the resblock is only added to low resolution feature maps, it is not the bottleneck of the computation.
GP is critical because it can enforce the discriminator to be lipschitz and make the optimization problem well behavior.
But the form of GP is not important in my experiments.

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

I can only give some personal experience because many papers holds different ideas. But what is sure is that the channels must be taken into consideration.
For example, human face generation. Howcomes the channels of BEGAN is 128? why is 128 not 256 or 512? This can be tested in the following way:
step 1: make an encoder and a decoder, and link them with a 1x1xN layer. N should be large enough, for example 1024.
step 2: train the network to copy faces, use l1 loss.
step 3: you will see a blur face as output.
step 4: cut down the N by a number like 128 or 256 and train again.
step 5: repeat step 4 till the blur face begin to be very very blur and we can not see a figure of face.
Then finally in my experiment of BEGAN, the N is 128, the same value of Google Brain.
Then we know one thing, the human face can be coded into a 128 vector. (And if you think of it carefully you will find this is why BEGAN works terribly in lsun bedroom.)
Then we get an important number 128, the number is the source of a face, any generated things should be able to be traced back to the vector bayesianly. I mean every layer should not exceed this limitation, and layers can do decorations to the feature from previous layer, but we should not give a layer the ability to disorder the main latent features and disturb the next layer's bayesian backtracing. For example 1x1x128->8x8x4096->8x8x4096 is a bad choice.

from makegirlsmoe_web.

Aixile commented on May 24, 2024

Here is an example of Vanilla GAN, I trained on 256x256 images.

Where I use
8x8x1024 -> 16x16x512 -> 32x32x256 -> 64x64x128 -> 128x128x64 -> 256x256x32 -> 256x256x1

If you find the architecture not appear to converge, using lower learning rate can solve most problems.

from makegirlsmoe_web.

shaform commented on May 24, 2024

@Aixile Wow, this is so realistic. It seems like we don't really need the complex StackGANs.

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

These are results of a dcGAN of 8x8x1024 -> 16x16x512 -> 32x32x256 -> 64x64x128 -> 128x128x64 -> 256x256x32 -> 256x256x1?

from makegirlsmoe_web.

Aixile commented on May 24, 2024

@lllyasviel Yes

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

OK. these flowers are impressive and defeated me. I will delete my stackGAN and reimplement your architecture.

from makegirlsmoe_web.

Aixile commented on May 24, 2024

@shaform We have new results on end-end training of high-resolution images. I am working on the paper. Codes will be released after that.

from makegirlsmoe_web.

shaform commented on May 24, 2024

@Aixile Thanks~ I am looking forward to read the paper.

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

and then have a war with stack GAN ++?
https://github.com/hanzhanggit/StackGAN-v2

from makegirlsmoe_web.

Aixile commented on May 24, 2024

The above DCGAN is trained with a learning rate starting from 0.0001, and decreasing with a delay rate of 0.8 every 3000 iterations after 30000 iterations. (Batch size 64)

from makegirlsmoe_web.

Aixile commented on May 24, 2024

@lllyasviel
As you mention HyperGAN, have you ever tried to reproduce their 256 results?
It seems that they did not mention how 256 results were trained.
I run their code with their default settings, which use selu and lsgan on 192x192 CelebA images, but the training is failed.

(As CelebA images have a width of 178, I center crop the 178x178 image, and upscale to 192x192. Since the full-scale images are used, the dataset image contains more noises from the background, which would make generating high-quality images harder. It seems that they didn't use the full-scale CelebA. )

from makegirlsmoe_web.

shaform commented on May 24, 2024

Just saw this:
https://github.com/tkarras/progressive_growing_of_gans

from makegirlsmoe_web.

Aixile commented on May 24, 2024

Yelp, my friend forwarded this to me this morning, and I was totally blown away

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

holy shiiit.

And their G looks so nature and excellent without any flaring things.
Many things can be reconsidered now, including cGAN I think.

from makegirlsmoe_web.

Aixile commented on May 24, 2024

Almost all things they used are different from the gan literature...
Weight Scaling, Pixel normalization, Smoothed Generator Weight, additional regularization for WGAN-GP, which make it extremely difficult to catch up with their progress.
I quickly implement a Weight Scaling + Pixel normalization based 32x32 generator, but I failed to train it in an end2end manner.

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

I think the "additional regularization for WGAN-GP" is not so critical and the architecture can be reimplemented without it.
In my opinion, the "Weight Scaling, Pixel normalization, Smoothed Generator Weight" can be replaced by a very unreasonable method but it should works: Just lock the weights for different layers in different training procedures.
I am buzy with exams now but I will try it once I have time.

from makegirlsmoe_web.

lllyasviel commented on May 24, 2024

The main objective of these weight regulations, I think, is to avoid the trained weights being disturbed by newly initialized weights.
Then why not lock these weights directly? I have not tried it....
BTW, I read the paper again and I think maybe the results can be achieved directly without the "progressive", because their methods make no improvements to GAN system or G performance. Maybe the "progressive" does nothing other than a accelerator? Maybe without their methods I can also achieve these results in one year of training and I can get these results in 40 days with their methods?

from makegirlsmoe_web.

shaform commented on May 24, 2024

The so called weight scaling (EQUALIZED LEARNING RATE) looks very similar to weight normalization, and someone has shown that WN works well with GANs(1). I personally dislike BN very much so I have been using WN with GANs for some time. I felt that when BN is removed, it appears more likely for GANs to encounter gradient explosion. Perhaps their pixel normalization mitigates this issue very well.

(1): https://arxiv.org/abs/1704.03971

from makegirlsmoe_web.

vonhathanh commented on May 24, 2024

@Aixile did you release the new code for training high-resolution images? I'm looking forward to seeing the paper. Thanks

from makegirlsmoe_web.

The Training dataset about makegirlsmoe_web HOT 51 CLOSED

Comments (51)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent