jbilcke-hf / clapper Goto Github PK
View Code? Open in Web Editor NEWClapper.app, a video synthesizer and sequencer designed for the age of AI cinema
Home Page: https://clapper.app
License: GNU General Public License v3.0
Clapper.app, a video synthesizer and sequencer designed for the age of AI cinema
Home Page: https://clapper.app
License: GNU General Public License v3.0
Describe the Issue:
When I try to import a script with 2 locations, it cuts off the script and starts at the first location. I've included a zip with two scripts, working and not working.
This works
CUT TO:
CLIPPY, the Microsoft assistant paperclip, dressed in army fatigues, sits on a throne surrounded by a cult of various outdated AI bots and programs. The scene resembles
Marlon Brando's Colonel Kurtz in "Apocalypse Now."
CLIPPY
(whispers ominously)
The horror... the horror...
This doesn't work:
CUT TO:
EXT. DENSE JUNGLE - DAY <-- this line is where the script will start from if this is added
CLIPPY, the Microsoft assistant paperclip, dressed in army fatigues, sits on a throne surrounded by a cult of various outdated AI bots and programs. The scene resembles
Marlon Brando's Colonel Kurtz in "Apocalypse Now."
CLIPPY
(whispers ominously)
The horror... the horror...
Hello,
Currently cloning the repo under windows results in:
Cloning into 'clapper'...
remote: Enumerating objects: 1173, done.
remote: Counting objects: 100% (1173/1173), done.
remote: Compressing objects: 100% (725/725), done.
remote: Total 1173 (delta 539), reused 969 (delta 335), pack-reused 0
Receiving objects: 100% (1173/1173), 3.41 MiB | 1.12 MiB/s, done.
Resolving deltas: 100% (539/539), done.
error: invalid path 'src/app/CL.iconset/Icon?'
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'
After doing the same under wsl (ubuntu), I realized it's probably a macOS Icon and not needed in the iconset:
There is currently a bug with the GitHub Actions, for some reason the next build
doesn't work here:
https://github.com/jbilcke-hf/clapper/actions/runs/9990325696/job/27610785218
▲ Next.js 14.2.5
Creating an optimized production build ...
Failed to compile.
./logos:2
Error: Image import "./anthropic.png" is not a valid image file. The image may be corrupted or an unsupported format.
./logos:3
Error: Image import "./cohere.png" is not a valid image file. The image may be corrupted or an unsupported format.
./logos:4
Error: Image import "./elevenlabs.png" is not a valid image file. The image may be corrupted or an unsupported format.
./logos:5
Error: Image import "./everartai.png" is not a valid image file. The image may be corrupted or an unsupported format.
./logos:6
Error: Image import "./falai.png" is not a valid image file. The image may be corrupted or an unsupported format.
> Build failed because of webpack errors
image/png
in the next.config.js (in the "images": { "formats": [....] }
array, but this is not the right place (it's meant to define the output of the optimized images, so it only supports modern formats like image/avif and image/webp)As a workaround (to try disable the optimization), I've tried to put unoptimized
on the <Image />
component of compute provider logos to skip, but this didn't work either.
Install fails on missing script:
➜ npm run dev
npm error Missing script: "dev"
What is the license for this project?
To make Clapper a proper video creation tool, we need the ability to create content directly in the editor.
The first step in this is being able to create new clips. Here are the requirements:
Content creation within the clips will be covered in a separate PR, as this pertains to the entity editor. Primarily this feature should enable tracks and clips to be dynamically created on the timeline.
The current screenplay editor isn't very useful, there are some syncing bugs and editing the screenplay (ie. doing changes to it) has no effect.
Also, only screenplays in the "hollywood-style" format are supported, so any text without the correct indentation will fail.
The goal of this ticket is to implement various changes (we can also create sub-tickets). See the tasks at the end of the ticket.
Saving a screenplay can be computationally intensive since it has to be converted back to a segment.
Even if we add or remove only one line, this can have a cascade effect (eg. shift everything left or right).
To optimize this, we should not run the whole screenplay through the editor, but instead only the current line, and do a localized update (similar how today resolving a segment can change its length, but the cascade effect is done in an efficient way).
I think we can keep the idea of having only one format inside the screenplay editor (and use import/export functions to convert it to other formats). This format can be Fountain.
For people who want to generate a story from something else (eg. just a sentence, or a synopsis), we can add a little UI workflow to do that (eg. a "create new project" menu entry, and a dialog to show a wizard assistant)
For the task of taking a sentence and converting it to changes in the timeline's segments, I think we can use a LLM whenever a scene needs to be updated. It will do a better job than if/else heuristics.
To be completed..
Ticket specification to be completed..
https://x.com/fffiloni/status/1815424286460354753
(note: we already have some code to call Gradio spaces)
EDIT: humm, I have tested it but it seems to be very slow, and only work on square images. And conventional face poses.
So, I don't know!
Exporting to an existing video editor has many benefits.
Many people want to be able to edit the AI-generated video precisely, and it also allows us to perform complex operations, which could be added to our VLC-based .clap-to-mp4 exporter with extra work, but full-featured video editors usually make it much easier to perform transitions, apply effects, filters..
I propose to start with Shotcut, a video editor that is open-source which means the file format it uses (.MLT) is non-proprietary so we can read the spec and generate it easily (it is based on XML, see their official docs for more info)
When we export a .MLT from within Shotcut, we can see various extra stuff in the XML (like metadata about the codec etc, hash signatures..)
However I do not believe those are strictly required, they act more like metadata. My guess is that we should always try to put the bare minimal info whenever possible, and let Shotcut figure out the rest during import (codec, bitrates etc).
I've started to work on the implementation here: useIO.generateMLT()
It is working at about 50% today: the timings look wrong, I think that is due to how playlist works (seems like it always chain videos together even if there are gaps in the timeline, which can happen with a real .clap file) and sometimes the export doesn't work
If you wish to improve this code, feel free to:
We should be able to seamlessly export a project to .MLT with all the audio and video assets (and images layers as well) and successfully load it in Shotcut
Currently segments are stored as Array<TimelineSegment>
, with various temporary data structures to handle time splicing (see BufferedSegments
or activeSegments
in the codebase)
I would like to simplify the code and improve performance by reducing the number of .find()
and .filter()
operations needed
Each data structure has benefits and drawbacks: we probably should run benchmarks to really check if there are performance improvements by switching the data structure (and where the bottlenecks are)
I do not exclude the possibility to use multiple data structures (a bit like an index for a database), but this increases complexity (do we sync on each little change, or do we use cache invalidation strategies etc)
This is a deep, core change: we should probably add unit tests firsts
Our problem can be represented by a Segment Tree:
https://en.wikipedia.org/wiki/Segment_tree
although (to quote wikipedia):
The segment tree is less efficient than the interval tree for range queries in one dimension, due to its higher storage requirement: O(n log n) against the O(n) of the interval tree.
I was also thinking of maybe using a sorted B Tree:
https://www.npmjs.com/package/sorted-btree#features
Existing features of the app should still work as usual (the problem is that right now, we don't have automated tests to verify this)
In Clapper, characters and locations are represented under the same umbrella concept of entity (the TS type is ClapEntity
).
This generic concept allows us to manipulate weird things that are not really human actors but also animals, hybrid location/characters eg. vehicles, aliens, robots, talking spaceships, haunted objects, a catbus, a mecha..
each of those entities can be represented by an audio and visual "identity" card,
together with textual descriptors for the age, origin, appearance etc.
eg. "haunted temple runs, 3000 years old, himalayas, snowy, derelict"
The goal of this ticket is to implement a nice editor to allow:
Let's keep it generic for now.
We can still introduce specialized editing tools for human characters (face, body parts) later on.
there is already a basic editor, but it's ugly.
when i use
extend-compute-provider.tx
import { ComputeProvider } from '@aitube/clapper-services'
declare module '@aitube/clapper-services' {
export enum ComputeProvider {
MyComputeProvider = 'xiaosi',
}
}
console.log('打印扩展是否生效', ComputeProvider.MyComputeProvider)
in main.tsx test:
import '@/store/extend-compute-provider'
import('@aitube/clapper-services')
.then((module) => {
console.log(module); // 输出整个模块对象
console.log(module.ComputeProvider); // output ComputeProvider
console.log(module.ComputeProvider?.MyComputeProvider, '模块导入成功'); /
})
.catch((error) => {
console.error('模块导入失败:', error);
});
output is undefined
What should I do
ComfyUI is all the rage right now, and I would like to use it as one of the supported backends for Clapper.
Currently, Clapper only has ultra-basic support for ComfyUI cloud providers, such as Comfy.icu (see this screenshot, the ugly text field at the very bottom):
At minimum, it should be able to use an existing integration running locally or in the cloud
But what I have in mind is more something like this:
I'm not a ComfyUI developer (or user), so it's gonna be a fun ticket!
For now I've only added some basic code that doesn't really work (search for "comfy" and "ComfyIcu" - it's related to comfy.icu - in the codebase)
Being able to run any ComfyUI workflow, for : image
, video
, and why no also sound
, voice
and music
I've seen some closed-source AI storyboarding tools supporting Adobe Premiere Pro project file format.
it would be nice if Clapper could support it, too (for export only - it wouldn't make much sense to import them since Clapper isn't a traditional editor).
This could also remove the need for a public/shared rendering server, which is an external dependency and point of failure in Clapper (it is separate, I can't make it scale, and currently isn't really suited to processing a lot of requests, or long video projects)
I do not know if there are any licensing costs involved when implementing such proprietary file format.
export async function clapToAdobePremierePro(
clap: ClapProject,
onProgress?: (progressPercent: number) => void
): Promise<Blob> {
...
... do something that is not blocking the UI ... regularly call onProgress ...
...
return blob
}
I think at some point we should/will put those utilities in external modules (@aitube-<THING>-to-clap
)
integrate it into useIO
integrate it into the menu
Note: I do not have the budget to buy Premiere at the moment, so I won't be able to test the implementation.
Sharp worked and.. doesn't work anymore!
this seems to be a known issue:
https://sharp.pixelplumbing.com/install#npm-v10
⚠️ npm package-lock.json files can cause installation problems due to npm bug #4828
a lot of people are complaining about this:
npm/cli#4828
Typing into a chatbot UI is annoying, I want to be able to just talk to it:
"freeze frame, go back, change the car's color to grey, add clouds and make it rainy, play again etc"
Let's add voice input to the chatbot assistant!
Today there are already various options to perform speech recognition on the client side (whisper, web speech api..) with decent results. I don't think we need to add options to connect to a remote Whisper API etc.
I prefer to encourage the use of web standards so the Web Speech API, that way we can leverage existing frontend libraries based on it + make it future proof
problem: it doesn't work on all browsers: https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition#browser_compatibility
in my opinion, an elegant solution would be to still use this web standard, but with a polyfill (eg. based on transformers.js
with whisper
or an equivalent model) for incompatible browsers.
Let's go use the web standard <3
We can build our own thing or use a library to do this. For instance react-speech-recognition
(but it is 2 years old)
Then we need a protocol to capture the user's order:
Continuous listening is tempting. I want to sit in my sofa and control Clapper running on the TV in full-screen video mode 🍿 🛋️
I don't mind having multiple modalities for the best experience, to make it easier for users accustomed to a specific interaction mode. Could be customizable in the settings.
The transcript of the speech should then be used as a message to the chat assistant (see useAssistant
)
Then the elephant in the room: other browsers! The idea is to implement a polyfill for browsers not supporting the Web Speech API.
Unless someone is already working on a public one (?), this probably requires its own ticket I'm afraid due to the complexity.
Today using Clapper for the first time can be a bit confusing, because we don't know where to setup API keys etc
I haven't worked a lot on this part since key setup is a bit of "do it once, then forget about it" process, that the user doesn't need to do often - but this is still important!
Here is the current flow:
We should work on an improved onboarding process. One solution could be to display a popup the first time we try to generate, or when there is an error "You need to setup your API key, click here" etc
As there are now code bounties, new developers might join the project but be a bit lost due to the lack of documentation.
Today I'm pretty happy to help and answer questions in the Clapper Discord (https://discord.gg/AEruz9B92B) but it would be better to have documentation to make it scale.
So we need to work on:
T.B.D.
create a diagram with Excalidraw to show the interactions between the services and NPM modules
Write a series of videos to explain each aspect of the project to developers (how screenplays are converted, data structure, the webgl timeline, rendering, AI assistant.. etc)
Write a text document to explain a bit the architecture of the services
Make sure all functions and components are documented (TypeScript code comments)
Improve the README.md
Documentation is sufficient enough for a newcomer to run the app / do changes
Today there exists a project to convert a .clap file to a MP4 file [code, demo]
It is used by my AI Stories Factory Hugging Face Space
but not by Clapper
I propose to embed the MP4 exporter directly within Clapper.
Using a WASM port for FFmpeg could be an option I guess (eg. running within web workers, with shared memory access, ability to manipulate large amounts of memory)
If not possible, then on the server-side (eg. generate a .clap in the client-side and send it to the server-side, but this adds some extra delays)
To determine, but until then here are some useful links:
example: https://ffmpegwasm.netlify.app/docs/getting-started/usage#transcode-video-with-progress-experimental
(not sure the WASM version will give us a nice progress update, they warn us about it)
useIO()
there are a few examples of loading something and calling useTasks
const task = useTasks.getState().add({
category: TaskCategory.EXPORT,
visibility: TaskVisibility.BLOCKER, // <- or duplicate the .clap and turn the task into non-blocker
initialMessage: `Generating MP4 file..`,
successMessage: `Successfully generated the MP4 file!`,
value: 0,
})
task.setProgress({
message: "Preparing export..",
value: 10
})
We should be able to render a full project to a MP4, with a progress bar (even if it's just an estimate)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.