jbilcke-hf / clapper Goto Github PK

View Code? Open in Web Editor NEW

1.8K 1.8K 165.0 8.15 MB

Clapper.app, a video synthesizer and sequencer designed for the age of AI cinema

Home Page: https://clapper.app

License: GNU General Public License v3.0

Dockerfile 0.13% JavaScript 0.90% TypeScript 98.76% CSS 0.20%

aitube clapper openclap

clapper's People

Contributors

Stargazers

Watchers

Forkers

melmass rubyresearch suryatmodulus kustomzone xc0r acumennations lalalune veryvanya scorpionbytes devniel soofibd maxboren sf9040 pent arunpattnaik stanlee321 beimingmaster cellinlab jonnyquan rsenthilkumar6 martinnormark onlyforbakup huynt654 guoxizheng yiqideren bitcodepot l296499149 schifflee bkcsplayer songfang lyhiving cyxiaofeng c00renut f901107 b08240 vijinv simpleyj glovebx aoocar lh258928950 elricliu daomoo 8-diagrams boogaloo-xt jackeydou lancechung8888 ecafe8 loverasd zzhalan kerwinchina gqiye saigyouyou eyeshot colinkyo y1y2u3u4 colin6618 leavenotrace geekonlinecode mactaoran-cyber shaozhupeng 74587 oiolong supernovaliu anthonyyuan linusssssai zzdx713 misselvexu victorxys yuelight chg0901 hellosa yeachen2021 x-collect cylonspace zyhongkong qiagenjames zhiweicoding mason0510 yinwin sky-null ohmygaugh-crypto zxbin2000 iholo goldenfiredo cuczhangyi freeman983 bravew xmas25 xudong-mao dashenglin2023 aiworkspace zenoncao onetor01 yuan-manx fireae guoyunfei20 noodlesgentleman keyzf helloxc withyun

clapper's Issues

Script importing bug with scene changing

Describe the Issue:
When I try to import a script with 2 locations, it cuts off the script and starts at the first location. I've included a zip with two scripts, working and not working.

This works

CUT TO:

CLIPPY, the Microsoft assistant paperclip, dressed in army fatigues, sits on a throne surrounded by a cult of various outdated AI bots and programs. The scene resembles
Marlon Brando's Colonel Kurtz in "Apocalypse Now."

CLIPPY
(whispers ominously)
The horror... the horror...

This doesn't work:

CUT TO:

EXT. DENSE JUNGLE - DAY <-- this line is where the script will start from  if this is added

CLIPPY, the Microsoft assistant paperclip, dressed in army fatigues, sits on a throne surrounded by a cult of various outdated AI bots and programs. The scene resembles
Marlon Brando's Colonel Kurtz in "Apocalypse Now."

CLIPPY
(whispers ominously)
The horror... the horror...

Import Bug Example.zip

Cannot checkout on windows

Hello,

Currently cloning the repo under windows results in:

Cloning into 'clapper'...
remote: Enumerating objects: 1173, done.
remote: Counting objects: 100% (1173/1173), done.
remote: Compressing objects: 100% (725/725), done.
remote: Total 1173 (delta 539), reused 969 (delta 335), pack-reused 0
Receiving objects: 100% (1173/1173), 3.41 MiB | 1.12 MiB/s, done.
Resolving deltas: 100% (539/539), done.
error: invalid path 'src/app/CL.iconset/Icon?'
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

After doing the same under wsl (ubuntu), I realized it's probably a macOS Icon and not needed in the iconset:

Fix the "*.png is not a valid image file" in GitHub Actions

There is currently a bug with the GitHub Actions, for some reason the next build doesn't work here:

https://github.com/jbilcke-hf/clapper/actions/runs/9990325696/job/27610785218

  ▲ Next.js 14.2.5

   Creating an optimized production build ...
Failed to compile.

./logos:2
Error: Image import "./anthropic.png" is not a valid image file. The image may be corrupted or an unsupported format.

./logos:3
Error: Image import "./cohere.png" is not a valid image file. The image may be corrupted or an unsupported format.

./logos:4
Error: Image import "./elevenlabs.png" is not a valid image file. The image may be corrupted or an unsupported format.

./logos:5
Error: Image import "./everartai.png" is not a valid image file. The image may be corrupted or an unsupported format.

./logos:6
Error: Image import "./falai.png" is not a valid image file. The image may be corrupted or an unsupported format.


> Build failed because of webpack errors

I've tried to put image/png in the next.config.js (in the "images": { "formats": [....] } array, but this is not the right place (it's meant to define the output of the optimized images, so it only supports modern formats like image/avif and image/webp)
I've tried to put `"sharp": "0.32.6" in the dependencies, see vercel/next.js#59346

As a workaround (to try disable the optimization), I've tried to put unoptimized on the <Image /> component of compute provider logos to skip, but this didn't work either.

npm error

Install fails on missing script:

➜ npm run dev
npm error Missing script: "dev"

Create new Tracks and Clips

To make Clapper a proper video creation tool, we need the ability to create content directly in the editor.

The first step in this is being able to create new clips. Here are the requirements:

Create new tracks
Set the track type from a dropdown list
Create clips on tracks, based on the track type
Drag clips on the timeline
Drag clips from one track to another, as long as the type is the same

Content creation within the clips will be covered in a separate PR, as this pertains to the entity editor. Primarily this feature should enable tracks and clips to be dynamically created on the timeline.

Improve the screenplay editor

Context

The current screenplay editor isn't very useful, there are some syncing bugs and editing the screenplay (ie. doing changes to it) has no effect.

Also, only screenplays in the "hollywood-style" format are supported, so any text without the correct indentation will fail.

Goal

The goal of this ticket is to implement various changes (we can also create sub-tickets). See the tasks at the end of the ticket.

Limitations / things to checks

Performance issues

Saving a screenplay can be computationally intensive since it has to be converted back to a segment.

Even if we add or remove only one line, this can have a cascade effect (eg. shift everything left or right).

To optimize this, we should not run the whole screenplay through the editor, but instead only the current line, and do a localized update (similar how today resolving a segment can change its length, but the cascade effect is done in an efficient way).

Which format to use in the editor itself?

I think we can keep the idea of having only one format inside the screenplay editor (and use import/export functions to convert it to other formats). This format can be Fountain.

For people who want to generate a story from something else (eg. just a sentence, or a synopsis), we can add a little UI workflow to do that (eg. a "create new project" menu entry, and a dialog to show a wizard assistant)

How to implement

Get familiar with Fountain: https://fountain.io/developers/
Get familiar with Monaco: https://microsoft.github.io/monaco-editor/
start with small, simple features (look at our existing examples in the code)

For the task of taking a sentence and converting it to changes in the timeline's segments, I think we can use a LLM whenever a scene needs to be updated. It will do a better job than if/else heuristics.

Acceptance criteria

Add support for EchoMimic for lip-syncing with realistic 3D head animations

To be completed..

https://x.com/fffiloni/status/1815424286460354753

Add support for EchoMimic for lip-syncing with realistic 3D head animations

Ticket specification to be completed..

https://x.com/fffiloni/status/1815424286460354753

(note: we already have some code to call Gradio spaces)

EDIT: humm, I have tested it but it seems to be very slow, and only work on square images. And conventional face poses.

So, I don't know!

Finish the Shotcut project file exporter (aka support for .MLT)

Context

Exporting to an existing video editor has many benefits.

Many people want to be able to edit the AI-generated video precisely, and it also allows us to perform complex operations, which could be added to our VLC-based .clap-to-mp4 exporter with extra work, but full-featured video editors usually make it much easier to perform transitions, apply effects, filters..

Goal

I propose to start with Shotcut, a video editor that is open-source which means the file format it uses (.MLT) is non-proprietary so we can read the spec and generate it easily (it is based on XML, see their official docs for more info)

Limitations / things to checks

When we export a .MLT from within Shotcut, we can see various extra stuff in the XML (like metadata about the codec etc, hash signatures..)

However I do not believe those are strictly required, they act more like metadata. My guess is that we should always try to put the bare minimal info whenever possible, and let Shotcut figure out the rest during import (codec, bitrates etc).

How to implement

I've started to work on the implementation here: useIO.generateMLT()

It is working at about 50% today: the timings look wrong, I think that is due to how playlist works (seems like it always chain videos together even if there are gaps in the timeline, which can happen with a real .clap file) and sometimes the export doesn't work

If you wish to improve this code, feel free to:

improve compatibility (make sure timings and durations are right, that various asset type work)
make the code more robust (put the generateMLT function in in a separate file or NML module, add unit tests etc)

Acceptance criteria

We should be able to seamlessly export a project to .MLT with all the audio and video assets (and images layers as well) and successfully load it in Shotcut

Switch the runtime data structure used to store segments in the timeline

Context

Currently segments are stored as Array<TimelineSegment>, with various temporary data structures to handle time splicing (see BufferedSegments or activeSegments in the codebase)

Goal

I would like to simplify the code and improve performance by reducing the number of .find() and .filter() operations needed

Limitations / things to checks

Each data structure has benefits and drawbacks: we probably should run benchmarks to really check if there are performance improvements by switching the data structure (and where the bottlenecks are)
I do not exclude the possibility to use multiple data structures (a bit like an index for a database), but this increases complexity (do we sync on each little change, or do we use cache invalidation strategies etc)
This is a deep, core change: we should probably add unit tests firsts

How to implement

Our problem can be represented by a Segment Tree:

https://en.wikipedia.org/wiki/Segment_tree

although (to quote wikipedia):

The segment tree is less efficient than the interval tree for range queries in one dimension, due to its higher storage requirement: O(n log n) against the O(n) of the interval tree.

I was also thinking of maybe using a sorted B Tree:
https://www.npmjs.com/package/sorted-btree#features

Acceptance criteria

the performance gain should be perceptible
this should also add benchmark tests to compare performance

Existing features of the app should still work as usual (the problem is that right now, we don't have automated tests to verify this)

Create an About tab with info about the project

To be in the clear with everything and everyone involved,

the idea is to list who worked on it, who sponsored (companies) or backed (individuals) the project, which libraries are used

See example of Krita

Improve the entity editor

Context

In Clapper, characters and locations are represented under the same umbrella concept of entity (the TS type is ClapEntity).

This generic concept allows us to manipulate weird things that are not really human actors but also animals, hybrid location/characters eg. vehicles, aliens, robots, talking spaceships, haunted objects, a catbus, a mecha..

each of those entities can be represented by an audio and visual "identity" card,
together with textual descriptors for the age, origin, appearance etc.

eg. "haunted temple runs, 3000 years old, himalayas, snowy, derelict"

Goal

The goal of this ticket is to implement a nice editor to allow:

listing entities of a project
create an entity
edit an entity (change the visual, audio, prompt..)
preview an entity (static image, video or 3D)
delete an entity

Limitations / things to checks

Let's keep it generic for now.

We can still introduce specialized editing tools for human characters (face, body parts) later on.

How to implement

there is already a basic editor, but it's ugly.

Acceptance criteria

we can browse a project's entities
we can remove an entity from the project
we can add a new entity
we can change the entity identifier (the UPPERCASE name found in the screenplay)
we can edit the identity picture (to use with InstantID, PulID etc.. models)
we can edit the identity audio (voice sample for voice duplication when doing TTS)
we can edit the description, appearance..
we can export an individual entity to a .clap file
we can import an individual entity from a .clap file

How to add My own ComputeProvider,

when i use
extend-compute-provider.tx

import { ComputeProvider } from '@aitube/clapper-services'

declare module '@aitube/clapper-services' {
export enum ComputeProvider {
MyComputeProvider = 'xiaosi',
}
}

console.log('打印扩展是否生效', ComputeProvider.MyComputeProvider)

in main.tsx test:
import '@/store/extend-compute-provider'
import('@aitube/clapper-services')
.then((module) => {
console.log(module); // 输出整个模块对象
console.log(module.ComputeProvider); // output ComputeProvider
console.log(module.ComputeProvider?.MyComputeProvider, '模块导入成功'); /
})
.catch((error) => {
console.error('模块导入失败:', error);
});

output is undefined
What should I do

Improve ComfyUI integration

Context

ComfyUI is all the rage right now, and I would like to use it as one of the supported backends for Clapper.

Currently, Clapper only has ultra-basic support for ComfyUI cloud providers, such as Comfy.icu (see this screenshot, the ugly text field at the very bottom):

Goal

At minimum, it should be able to use an existing integration running locally or in the cloud

But what I have in mind is more something like this:

a list of available workflows straight from from the global ComfyUI community, such as OpenArt.ai
a way to easily pick them, eg. click "apply", "use", "install", or some as simple as a drag & drop
a display of the workflow within the UI (eg. preview of the comfy graph)

Limitations / things to checks

I'm not a ComfyUI developer (or user), so it's gonna be a fun ticket!

How to implement

For now I've only added some basic code that doesn't really work (search for "comfy" and "ComfyIcu" - it's related to comfy.icu - in the codebase)

Acceptance criteria

Being able to run any ComfyUI workflow, for : image, video, and why no also sound, voice and music

Add support for exporting to Adobe Premiere Pro project files (.PRPROJ)

Context

I've seen some closed-source AI storyboarding tools supporting Adobe Premiere Pro project file format.

Goal

it would be nice if Clapper could support it, too (for export only - it wouldn't make much sense to import them since Clapper isn't a traditional editor).

This could also remove the need for a public/shared rendering server, which is an external dependency and point of failure in Clapper (it is separate, I can't make it scale, and currently isn't really suited to processing a lot of requests, or long video projects)

Limitations / things to checks

I do not know if there are any licensing costs involved when implementing such proprietary file format.

How to implement

create an asynchronous function to do the conversion, something like this:

export async function clapToAdobePremierePro(
  clap: ClapProject,
  onProgress?: (progressPercent: number) => void
): Promise<Blob> {
   ...
   ... do something that is not blocking the UI ... regularly call onProgress ...
  ...
   return blob
}

I think at some point we should/will put those utilities in external modules (@aitube-<THING>-to-clap)

integrate it into useIO
integrate it into the menu

Acceptance criteria

Note: I do not have the budget to buy Premiere at the moment, so I won't be able to test the implementation.

Fix issues with Sharp

Sharp worked and.. doesn't work anymore!

this seems to be a known issue:

https://sharp.pixelplumbing.com/install#npm-v10

⚠️ npm package-lock.json files can cause installation problems due to npm bug #4828

a lot of people are complaining about this:
npm/cli#4828

Add a speech recognition feature to talk to the chatbot

Context

Typing into a chatbot UI is annoying, I want to be able to just talk to it:

"freeze frame, go back, change the car's color to grey, add clouds and make it rainy, play again etc"

Goal

Let's add voice input to the chatbot assistant!

Limitations / things to checks

Today there are already various options to perform speech recognition on the client side (whisper, web speech api..) with decent results. I don't think we need to add options to connect to a remote Whisper API etc.

I prefer to encourage the use of web standards so the Web Speech API, that way we can leverage existing frontend libraries based on it + make it future proof
problem: it doesn't work on all browsers: https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition#browser_compatibility
in my opinion, an elegant solution would be to still use this web standard, but with a polyfill (eg. based on transformers.js with whisper or an equivalent model) for incompatible browsers.

How to implement

Let's go use the web standard <3

We can build our own thing or use a library to do this. For instance react-speech-recognition (but it is 2 years old)
Then we need a protocol to capture the user's order:

Listen continuously and react to "Hey Clapper" etc
Press and hold a special button (like when we send a voice message in WhatsApp)
Press and hold a special keyboard shortcut (for people with a keyboard)

Continuous listening is tempting. I want to sit in my sofa and control Clapper running on the TV in full-screen video mode 🍿 🛋️

I don't mind having multiple modalities for the best experience, to make it easier for users accustomed to a specific interaction mode. Could be customizable in the settings.

The transcript of the speech should then be used as a message to the chat assistant (see useAssistant)
Then the elephant in the room: other browsers! The idea is to implement a polyfill for browsers not supporting the Web Speech API.

Unless someone is already working on a public one (?), this probably requires its own ticket I'm afraid due to the complexity.

Acceptance criteria

being able to speak to the chatbot by pressing a button
being able to speak tot he chatbot by saying a special keyword or phrase ("clapper", "clap", "hey clapper".. etc)
support all browsers through the polyfill

Improve user onboarding and initial setup

Context

Today using Clapper for the first time can be a bit confusing, because we don't know where to setup API keys etc

I haven't worked a lot on this part since key setup is a bit of "do it once, then forget about it" process, that the user doesn't need to do often - but this is still important!

Here is the current flow:

TODO

We should work on an improved onboarding process. One solution could be to display a popup the first time we try to generate, or when there is an error "You need to setup your API key, click here" etc

Write more developer documentation (onboarding video, comments, architecture diagram)

Context

As there are now code bounties, new developers might join the project but be a bit lost due to the lack of documentation.

Goal

Today I'm pretty happy to help and answer questions in the Clapper Discord (https://discord.gg/AEruz9B92B) but it would be better to have documentation to make it scale.

So we need to work on:

onboarding video(s)
code documentation
diagrams

Limitations / things to checks

T.B.D.

How to implement

create a diagram with Excalidraw to show the interactions between the services and NPM modules
Write a series of videos to explain each aspect of the project to developers (how screenplays are converted, data structure, the webgl timeline, rendering, AI assistant.. etc)
Write a text document to explain a bit the architecture of the services
Make sure all functions and components are documented (TypeScript code comments)
Improve the README.md

Acceptance criteria

Documentation is sufficient enough for a newcomer to run the app / do changes

Integrate a MP4 video assembler directly within Clapper

Context

Today there exists a project to convert a .clap file to a MP4 file [code, demo]

It is used by my AI Stories Factory Hugging Face Space

but not by Clapper

Goal

I propose to embed the MP4 exporter directly within Clapper.

Using a WASM port for FFmpeg could be an option I guess (eg. running within web workers, with shared memory access, ability to manipulate large amounts of memory)

If not possible, then on the server-side (eg. generate a .clap in the client-side and send it to the server-side, but this adds some extra delays)

Limitations / things to checks

generating a large project can take a lot of time, so we need a system to keep track of the progression
if we generate on the client side, then we can scale quite easily for peanuts (the same instance of Clapper could server many users), although it may run more slowly (CPU only)
if we generate on the server side, we might need a queue mechanism, and I'm not sure it scales well with large projects/number of users (nobody wants to wait 1h in a queue). On the positive side, we can compile FFmpeg to get GPU acceleration.
FFmpeg is a bit low level, it can do a lot but maybe there are some things that cannot be achieved easily with it (I'd say it's not a deal breaker since there is a separate feature for advanced video editing)

How to implement

To determine, but until then here are some useful links:

FFmpeg WASM: https://github.com/ffmpegwasm

example: https://ffmpegwasm.netlify.app/docs/getting-started/usage#transcode-video-with-progress-experimental

(not sure the WASM version will give us a nice progress update, they warn us about it)

how progressive jobs works in Clapper:
look into useIO() there are a few examples of loading something and calling useTasks

    const task = useTasks.getState().add({
      category: TaskCategory.EXPORT,
      visibility: TaskVisibility.BLOCKER, // <- or duplicate the .clap and turn the task into non-blocker
      initialMessage: `Generating MP4 file..`,
      successMessage: `Successfully generated the MP4 file!`,
      value: 0,
    })

    task.setProgress({
      message: "Preparing export..",
      value: 10
    })

Acceptance criteria

We should be able to render a full project to a MP4, with a progress bar (even if it's just an estimate)

jbilcke-hf / clapper Goto Github PK

clapper's People

Contributors

Stargazers

Watchers

Forkers

clapper's Issues

Context

Goal

Limitations / things to checks

Performance issues

Which format to use in the editor itself?

How to implement

Acceptance criteria

Context

Goal

Limitations / things to checks

How to implement

Acceptance criteria

Context

Goal

Limitations / things to checks

How to implement

Acceptance criteria

Context

Goal

Limitations / things to checks

How to implement

Acceptance criteria

Context

Goal

Limitations / things to checks

How to implement

Acceptance criteria

Context

Goal

Limitations / things to checks

How to implement

Acceptance criteria

Context

Goal

Limitations / things to checks

How to implement

Acceptance criteria

Context

TODO

Context

Goal

Limitations / things to checks

How to implement

Acceptance criteria

Context

Goal

Limitations / things to checks

How to implement

Acceptance criteria

Recommend Projects

Recommend Topics

Recommend Org