Code Monkey home page Code Monkey logo

genai's Introduction

Description

D-Soft: Research and Implement Image-to-Video application

Table of contents

Approach methods

  1. Variational Autoencoder (VAE): Encode images to a compressed size, then decode back to the original size, while learning the distribution of the data

  2. Generative Adversarial Network (GAN): They have two parts (the Generator and the Discriminator) that help each other get better. The Generator learns to make data that looks real, and the Discriminator learns to tell the difference between real and fake data.

  3. Flow-based Generative Model: Create new data that’s similar to the data they were trained on and then calculate how likely a certain output is

  4. Auto-Regressive Model: Model the conditional probability of each pixel given previous pixels. Then use the probability distribution to generate new data

  5. Diffusion Model: Systematically and slowly destroy struture in data distribution though an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data.

SOTA models

  1. Stable video diffusion (SDM):

  2. Laten flow diffusion model (LDM):

  3. Cascade diffusion model (CDM):

Model Architecture (U-NET)

U-NET

  1. Encoder: Extract features from the input image, reduce the spatial information, and compress the image into a smaller size.

  2. Decoder: Upsample the features to the original size, and restore the spatial information.

  3. Skip Connections: Connect the encoder and decoder layers to preserve the spatial information.

  4. Output Layer: Produce the final segmentation map with the sampe spatial dimension as the input image.

Embedding

The UNet can take more information in the form of embeddings

  • Time embedding: Related to the timestep and noise level

  • Context embedding: Control the content of the generated image

Fine-Tuning Pre-trained Model

  1. Download Pretrained Model:

  2. Load Pretrained Model

  3. Fine Tuning

  4. Sampling Function

  5. Generation

We can follow this tutorial to fine-tune the model: Colab

Applications

OpenAI API

  • Authencation: Controls on access to API endpoint services and resources

Papers

UNET: Convolutional Networks for Biomedical Image Segmentation [Paper]

High Resolution Image Synthesis and Semantic Manipulation with Conditional GANs [Paper]

Stable Video Diffusion [Paper]

Latent Flow Diffusion Models [Paper]

Denoise Diffusion Probabilistic Models [Paper]

genai's People

Contributors

nguyenhao2k avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.