Code Monkey home page Code Monkey logo

dall-e-baby's Introduction

Hi there ๐Ÿ‘‹

My name is Yash Bonde, AI researcher @ Tune AI.

Experience: Worked in all domains of deep learning CV, NLP, Documents, Audio and now working in RL and industry applied AI. I regularly blog on topics that interest me. Open for freelancing gigs, connect on my LinkedIn.

I used to stream my work on YouTube.

You should also check out the largest collection of utils I've built over years in tuneapi.utils


Here is a meme on AI

dall-e-baby's People

Contributors

yashbonde avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dall-e-baby's Issues

Dude, Come help us

I help out with the DALLE-pytorch project by @lucidrains. Someone linked me to this page yesterday and wow - you've done a lot of useful work here.

Would you be interested in helping us to get a large enough dataset to train on? Now that we have a pretrained VAE from OpenAI (and an awesome 1024 token pretrained VAE from the taming-transformers research), momentum has really picked up and we're basically ready to train. We just need the dataset. It's gotta be humongous per the OpenAI paper. I've done some mild training myself and it has a lot of trouble generalizing without: a huge batch_size (they use 512 in the paper) and a very large dataset. That's kind of the key insight from their recent efforts - big enough data generalizes.

Also - WIT came out today as well, and it's going to highly similar to what OpenAI scraped from wikipedia themselves. Exciting times.

At any rate, it would be awesome if you could help us with the datasets. It seems like you've figured out how to generate a good deal of captions even for datasets that don't include captions, which I now realize OpenAI must have done a good deal of. Feel free to stop by:

https://github.com/lucidrains/DALLE-pytorch/issues

Code that would be great to include in DALLE-pytorch

Hey @yashbonde, I'm having a lot of trouble navigating DALLE-pytorch discussions right now so I'm just gonna leave this here. I'm gonna spend a bit perusing your code for promising candidates to include in DALLE-pytorch.

lucidrains is a pretty busy dude. I don't believe he's really working on DALLE-pytorch this week as he has about fifty other repositories he keeps up with and is (understandably) more concerned with his protein folding efforts which arguably have a great impact.

anway, that's a lengthy way of me saying that I've found the best way to get something into one of his repositories is to just build it and make a pull request. Then he'll be able to discuss any ideas with you on the PR.

I'll make another post here in a bit and I guess try communicating with me either here or in the Issues section of DALLE-pytorch. I'm starting to loathe Github's implementation of "Discussions".

Error detected in SoftmaxBackward

Really cool project that I tried to recreate. Unfortunately, after testing, the discrete_vae.py gives the following error because of on inplace operation I cannot find:

Model is now CUDA!
[TRAIN - 0] GS: 1, Loss: 1.19789:   0%|โ–Š                                                                                                                                                        | 1/202 [00:06<21:26,  6.40s/it]:: Entering Testing Mode
[TEST - 0]:   0%|                                                                                                                                                                                         | 0/1 [00:02<?, ?it/s]
:::: Loss: 1.2889413833618164                                                                                                                                                                             | 0/1 [00:01<?, ?it/s]
[TRAIN - 0] GS: 2, Loss: 1.2198:   1%|โ–ˆโ–Œ                                                                                                                                                        | 2/202 [00:10<16:54,  5.07s/it][W ..\torch\csrc\autograd\python_anomaly_mode.cpp:104] Warning: Error detected in SoftmaxBackward. Traceback of forward call that caused the error:
    trainer.train(
    _, loss, _=model(d)
    result = self.forward(*input, **kwargs)
    return self.module(*inputs[0], **kwargs[0])
    result = self.forward(*input, **kwargs)
    softmax = F.softmax(encoding, dim = 1)
    ret = input.softmax(dim)
 (function _print_stack)
[TRAIN - 0] GS: 2, Loss: 1.2198:   1%|โ–ˆโ–Œ                                                                                                                                                        | 2/202 [00:11<18:45,  5.63s/it] 
Traceback (most recent call last):
    trainer.train(
    loss.backward()
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 3000, 16, 16]], which is output 0 of SoftmaxBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Could you give me a hint, where the inplace operation might be? Running this code on windows on python 3.8.7 64-bit if it helps!

Generate and use CLIP generate images

Hello again! I trained my model now and i'm wondering if I have overseen something:

Does your code provide the possibility to generate Images (e.g. 512 like in the original DALL-E paper) and pick the 12 best generated images with the CLIP method? If yes, in which file is it hidden?

Thanks a lot again for you help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.