Code Monkey home page Code Monkey logo

flare's Introduction

Flare

Warning

EARLY PROTOTYPE, WORK IN PROGRESS ๐Ÿ˜ด

Concept

The initial idea was to replicate DALL-E 3's chat-like iterative drawing pipeline.
While drawing inspiration from Anything projects such as Inpaint Anything, IEA, Segment Anything for Stable Diffusion WebUI, Grounded-Segment-Anything, Edit Anything and other, the core concept is different: creating an application for pure text-guided sequential image editing similar to DALL-E 3. This involves further enriching it with features it lacked, such as pixel-perfect inpainting, object removal, etc.

Features

โ˜‘๏ธ Text-to-image
โ˜‘๏ธ Text-guided inpainting
โ˜‘๏ธ Text-guided object removal
โŒ› Text-guided image resize
โŒ› Text-guided object injection
โŒ› Text-guided style transfer
โŒ› Text-guided outpainting
โŒ› Text-guided upscaling
โŒ› Image-based inpainting
โŒ› Text-guided image merge
โŒ› Text-guided object editing
โŒ› Text-guided composition control
โŒ› Text-guided object extraction
โŒ› Voice recognition
โŒ› Fine-tuning LLM for enhanced prompt comprehension

Example

img

Install

You need to have Git (2.43), Python (3.10), Poetry (1.7), Node.js (21.6) installed, then:

git clone https://github.com/seruva19/flare
cd flare

Install core:

poetry install
poetry lock

Install plugins:

poetry run get-default-plugins
poetry run merge

poetry install

Install client:

npm install
npm run build

Launch

poetry run flare

And open browser at http://localhost:8000/

FAQ

โ“ What are system requirements?
๐Ÿ‘‰ I'm not sure, it has only been tested on an RTX 3090. Although there is an Offload models after use option in Settings tab, enabling which may help to decrease VRAM consumption.

โ“ Can prompt comprehension be improved?
๐Ÿ‘‰ Currently, Flare utilizes in-context learning for the vanilla Phi-2 model. While this model is quite capable, its capacity for providing concise instruction interpretation is limited. However, I am confident that additional fine-tuning with a custom instruction dataset will allow Flare to achieve a level of comprehension comparable to DALL-E 3. This is already part of my roadmap.

โ“ Why not use 7B/8B models like Mistral/Llama etc.?
๐Ÿ‘‰ I am considering this, but it might increase system requirements even more, especially considering the fact that I am planning to use Stable Diffusion 3 as the primary image generator (upd. 18.06.2024: maybe I will stick to PixArt Sigma instead). And I think small models like Phi and Gemma must not be underestimated.

โ“ Why not use vision models like LLaVA?
๐Ÿ‘‰ While it's entirely feasible, I found it unnecessary for the prototype. I might explore this option later on. Because of Flare's fully modular design, experimenting with different pipelines would be effortless.

โ“ Looks like reinvention of TaskMatrix or InstructPix2Pix?
๐Ÿ‘‰ Probably, but Flare's primary focus is on text-guided drawing with the utilization of open-source language models instead of ChatGPT and without using dedicated instruction-trained image editing model. Additionally, one of long-term objectives is to empower users to expand its functionality through plugins written in natural language.

โ“ Isn't it the same concept as DiffusionGPT?
๐Ÿ‘‰ Likely, but I initiated Flare's development before discovering this project. Honestly, the idea of multistage processing itself is basic, and numerous comparable applications are anticipated to emerge soon, particularly with SD3 release and projects like ELLA and EMILIE gaining traction.

โ“ Now, when Omost exists, does it make sense to continue developing Flare?
๐Ÿ‘‰ Yes and no. Currently Omost too is far from the concept I have in mind when I started Flare, but who knows? Another option worth reviewing is to integrate Omost as backend into Flare. I haven't decided yet.

Credits

๐Ÿ”ฅ Transformers
๐Ÿ”ฅ Guidance
๐Ÿ”ฅ Diffusers
๐Ÿ”ฅ Segment Anything
๐Ÿ”ฅ Grounding DINO
๐Ÿ”ฅ LaMa
โšก PixArt-ฮฃ
โšก Stable Diffusion XL
โšก Stable Diffusion 3 Medium
โšก Phi-2
๐Ÿ’ง FastAPI
๐Ÿ’ง React
๐Ÿ’ง MantineUI

Colab

Open In Colab

As for now, it doesn't work in free colab ๐Ÿ˜“

flare's People

Contributors

seruva19 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.