Code Monkey home page Code Monkey logo

nuwa's Introduction

This is the official repo for the following papers:

Update 2022/7/13: NUWA-Infinity

[Project page] [Paper]

NUWA-Infinity is a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos.

Update 2021/11/26: NÜWA

NÜWA is a unified multimodal pre-trained model that can generate new or manipulate existing visual data (i.e., images and videos) for 8 visual synthesis tasks (as shown above).

nuwa's People

Contributors

chenfei-wu avatar microsoft-github-policy-service[bot] avatar nanduan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nuwa's Issues

PeppaPig Dataset

Is there any plan to release the PeppaPig dataset? Hope to reproduce some works on this dataset.

Pretrained models

Thanks for this project, it's simply amazing.
Any plans to share the pre-trained model(s)? That would be super helpful to compare it against CLIP, DALL-E, VQGAN, and these models ensemble combinations.

Thanks a lot

You can find the codes here

As mentioned in #2, #4, #5, #6, #7, #9, #14, #15, #16, #19, they aren't going to give you a fuck, as if they care. Especially they are in China, a place disconnected from the world.

You can find the following codes to achieve the corresponding tasks (many of them also achieves superior quality).
And remember to cite them (instead of NUWA of course) to support of their efforts in making their codes publicly available.

code

When will the code of [NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation] be released?

Missing performance numbers in the paper

First off congratulation to this amazing work. I think you managed to find the closing gap to make generative Deep learning relevant for real-world application, besides being just a nice toy as previous work in this area.

However to truly judge the performance of your approach I have to say I was a bit disappointed after reading your paper there was not a single note on execution time for either training or more crucial actually sampling of a single final image.

Would you be able to provide some numbers on how long a sample generation takes for a 4kx1k images with 256^2 patch size and on which setup?

Also if possible could you also shed some light on training times and which setup was used.

Thank you!

Code

The code of this paper will be extremely helpful in evaluating it against other techniques and fine tuning as well.

This work is simply too great!

A very polite question, When will the code and model be released?
I would appreciate it if I could expose the code and the model.

Code

The source code can be public? I want to reproduce your work because the task sound interesting :)

Question about paper

As the paper says in appendix:
"For example, for long videos or high-resolution frames with large h, w, s, usually (e^h)(e^w)(e^s)< (h + w + s)"
Is there any situation that (e^h)(e^w)(e*s)< (h + w + s)?

Paper - Possible (minor) error

In this paper, we show that simply using 2D VQ-GAN to encode each frame of a video can also generate temporal consistency videos and at the same time benefit from both image and video data.

In the paper, I believe you mean "temporally consistent" here. Subtle change in wording.

[Documentation] Video Prediction Labeled as a V2V process, despite taking only 1 frame

Judging by the results, the transformer is taking in a single frame, and would be considered an Image to Video process.
Something like video inpainting or camera FOV extrapolation(like in FGVC) would be input video -> output video.
Am I missing something in the documentation that maybe shows it as some sort of sparse video interpolation where it can input more than a (D1, D2, single frame); or was it called V2V in order to match the I2I label on the inpainting/image completion counterparts?

Additionally, there isn't a direct link to the paper, which documents that the V2V model only takes in a single image.
https://arxiv.org/abs/2111.12417

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.