I see that model's size is 10Gb. How much vram it needs on inference, 16gb is enough?<

copy diff below to patch <code class="

Recommended VRAM?,about doubiiu/tooncrafter

Comments (30)

iurimatias commented on August 29, 2024 11

It seems to be require 27GB VRAM currently

from tooncrafter.

user-vm commented on August 29, 2024 9

OOM on 4090 Running the default example included in run.sh

Got it to work by using FP16

can you please tell me what that means? how do i do this?

He changed something for it to use half precision instead of full precision (fewer bytes per float). This patch seems to have worked for me on a 24GB 3090 (both the gradio demo and scripts/run.sh).

index 334a984..f44f805 100644
--- a/scripts/evaluation/inference.py
+++ b/scripts/evaluation/inference.py
@@ -329,6 +329,10 @@ def run_inference(args, gpu_num, gpu_no):
                 videos = torch.stack(videos, dim=0).to("cuda")
             else:
                 videos = videos.unsqueeze(0).to("cuda")
+                ^M
+            model = model.half()  # Convert model to half precision^M
+            videos = videos.half()  # Ensure inputs are also in half precision^M
+^M
 
             batch_samples = image_guided_synthesis(model, prompts, videos, noise_shape, args.n_samples, args.ddim_steps, args.ddim_eta, \
                                 args.unconditional_guidance_scale, args.cfg_img, args.frame_stride, args.text_input, args.multiple_cond_cfg, args.loop, args.interp, args.timestep_spacing, args.guidance_rescale)
@@ -382,4 +386,4 @@ if __name__ == '__main__':
     
     seed_everything(args.seed)
     rank, gpu_num = 0, 1
-    run_inference(args, gpu_num, rank)
\ No newline at end of file
+    run_inference(args, gpu_num, rank)^M
diff --git a/scripts/gradio/i2v_test_application.py b/scripts/gradio/i2v_test_application.py
index 68f6480..7fd09f9 100644
--- a/scripts/gradio/i2v_test_application.py
+++ b/scripts/gradio/i2v_test_application.py
@@ -48,7 +48,7 @@ class Image2Video():
         if steps > 60:
             steps = 60 
         model = self.model_list[gpu_id]
-        model = model.cuda()
+        model = model.cuda().half()^M
         batch_size=1
         channels = model.model.diffusion_model.out_channels
         frames = model.temporal_length
@@ -60,7 +60,7 @@ class Image2Video():
             text_emb = model.get_learned_conditioning([prompt])
 
             # img cond
-            img_tensor = torch.from_numpy(image).permute(2, 0, 1).float().to(model.device)
+            img_tensor = torch.from_numpy(image).permute(2, 0, 1).float().to(model.device).half()^M
             img_tensor = (img_tensor / 255. - 0.5) * 2
 
             image_tensor_resized = transform(img_tensor) #3,h,w
@@ -72,7 +72,7 @@ class Image2Video():
             
 
 
-            img_tensor2 = torch.from_numpy(image2).permute(2, 0, 1).float().to(model.device)
+            img_tensor2 = torch.from_numpy(image2).permute(2, 0, 1).float().to(model.device).half()^M
             img_tensor2 = (img_tensor2 / 255. - 0.5) * 2
             image_tensor_resized2 = transform(img_tensor2) #3,h,w
             videos2 = image_tensor_resized2.unsqueeze(0).unsqueeze(2) # bchw

Github isn't letting me attach this as a file. If you're not familiar with patch files, you can either save it as tooncrafter_half.patch and run git apply tooncrafter_half.patch in the Tooncrafter directory, or manually change the lines with a - in front of them to the ones with a + in front of them in the specified files (scripts/evaluation/inference.py and scripts/gradio/i2v_test_application.py).
The VRAM usage seems to still fluctuate wildly, it may be because of torch.cuda.amp.autocast changing some things back to full precision (but removing it throws errors because of half and full precision getting mixed together). So it probably could use some more work.

from tooncrafter.

DorotaLuna commented on August 29, 2024 6

played few hours in 24gb and its ok,
no crash, no outofmemory

from tooncrafter.

adbrasi commented on August 29, 2024 3

bro, just: https://huggingface.co/Kijai/DynamiCrafter_pruned/resolve/main/tooncrafter_512_interp-fp16.safetensors

fp16, 5gb only :) kijai legend

from tooncrafter.

flrngel commented on August 29, 2024 3

copy diff below to patch
git apply patch

diff --git a/gradio_app.py b/gradio_app.py
index 5b8c17c..a6ffab2 100644
--- a/gradio_app.py
+++ b/gradio_app.py
@@ -78,5 +78,5 @@ if __name__ == "__main__":
     result_dir = os.path.join('./', 'results')
     dynamicrafter_iface = dynamicrafter_demo(result_dir)
     dynamicrafter_iface.queue(max_size=12)
-    dynamicrafter_iface.launch(max_threads=1)
-    # dynamicrafter_iface.launch(server_name='0.0.0.0', server_port=80, max_threads=1)
\ No newline at end of file
+    #dynamicrafter_iface.launch(max_threads=1)
+    dynamicrafter_iface.launch(server_name='0.0.0.0', server_port=8080, max_threads=1)
diff --git a/scripts/evaluation/inference.py b/scripts/evaluation/inference.py
index 334a984..41f1900 100644
--- a/scripts/evaluation/inference.py
+++ b/scripts/evaluation/inference.py
@@ -285,7 +285,7 @@ def run_inference(args, gpu_num, gpu_no):
     ## set use_checkpoint as False as when using deepspeed, it encounters an error "deepspeed backend not set"
     model_config['params']['unet_config']['params']['use_checkpoint'] = False
     model = instantiate_from_config(model_config)
-    model = model.cuda(gpu_no)
+    model = model.half().cuda(gpu_no)
     model.perframe_ae = args.perframe_ae
     assert os.path.exists(args.ckpt_path), "Error: checkpoint Not Found!"
     model = load_model_checkpoint(model, args.ckpt_path)
@@ -382,4 +382,4 @@ if __name__ == '__main__':
     
     seed_everything(args.seed)
     rank, gpu_num = 0, 1
-    run_inference(args, gpu_num, rank)
\ No newline at end of file
+    run_inference(args, gpu_num, rank)
diff --git a/scripts/gradio/i2v_test_application.py b/scripts/gradio/i2v_test_application.py
index 68f6480..d16cf7f 100644
--- a/scripts/gradio/i2v_test_application.py
+++ b/scripts/gradio/i2v_test_application.py
@@ -48,7 +48,7 @@ class Image2Video():
         if steps > 60:
             steps = 60 
         model = self.model_list[gpu_id]
-        model = model.cuda()
+        model = model.half().cuda()
         batch_size=1
         channels = model.model.diffusion_model.out_channels
         frames = model.temporal_length
@@ -60,7 +60,7 @@ class Image2Video():
             text_emb = model.get_learned_conditioning([prompt])
 
             # img cond
-            img_tensor = torch.from_numpy(image).permute(2, 0, 1).float().to(model.device)
+            img_tensor = torch.from_numpy(image).permute(2, 0, 1).float().half().to(model.device)
             img_tensor = (img_tensor / 255. - 0.5) * 2
 
             image_tensor_resized = transform(img_tensor) #3,h,w
@@ -72,7 +72,7 @@ class Image2Video():
             
 
 
-            img_tensor2 = torch.from_numpy(image2).permute(2, 0, 1).float().to(model.device)
+            img_tensor2 = torch.from_numpy(image2).permute(2, 0, 1).float().half().to(model.device)
             img_tensor2 = (img_tensor2 / 255. - 0.5) * 2
             image_tensor_resized2 = transform(img_tensor2) #3,h,w
             videos2 = image_tensor_resized2.unsqueeze(0).unsqueeze(2) # bchw
@@ -142,4 +142,4 @@ class Image2Video():
 if __name__ == '__main__':
     i2v = Image2Video()
     video_path = i2v.get_image('prompts/art.png','man fishing in a boat at sunset')
-    print('done', video_path)
\ No newline at end of file
+    print('done', video_path)

from tooncrafter.

mycodeiscat commented on August 29, 2024 1

OOM on 4090
Running the default example included in run.sh

from tooncrafter.

mycodeiscat commented on August 29, 2024 1

OOM on 4090 Running the default example included in run.sh

Got it to work by using FP16

from tooncrafter.

supraxylon commented on August 29, 2024 1

These changes can be found in my fork of this repository at supraxylon/ToonCrafter-Low-VRAM, I will be adding more changes as time progresses. I was able to get this to run (slowly) on 6GB of VRAM.

from tooncrafter.

FemBoxbrawl commented on August 29, 2024

It seems to be require 27GB VRAM currently

i wish it would work for 6 barely :(

from tooncrafter.

byakuyamira commented on August 29, 2024

i just tried it with 24GB VRAM and it worked, but it crashed for a few runs too
╮(￣▽￣"")╭

from tooncrafter.

protector131090 commented on August 29, 2024

4090 OOM.

from tooncrafter.

felixgao commented on August 29, 2024

4090 got

sh scripts/run.sh 
@DynamiCrafter cond-Inference: 2024-05-31-19-06-37
Global seed set to 123
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
>>> model checkpoint loaded.
Inference with 16 frames
Prompts testing [rank:0] 3/3 samples loaded.
Sample Batch: 1it [00:39, 39.44s/it]
Traceback (most recent call last):
  File "/usr/media/shared/ToonCrafter/scripts/evaluation/inference.py", line 385, in <module>
    run_inference(args, gpu_num, rank)
  File "/usr/media/shared/ToonCrafter/scripts/evaluation/inference.py", line 333, in run_inference
    batch_samples = image_guided_synthesis(model, prompts, videos, noise_shape, args.n_samples, args.ddim_steps, args.ddim_eta, \
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/media/shared/ToonCrafter/scripts/evaluation/inference.py", line 262, in image_guided_synthesis
    batch_images = model.decode_first_stage(samples, **additional_decode_kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/media/shared/ToonCrafter/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/media/shared/ToonCrafter/scripts/evaluation/../../lvdm/models/ddpm3d.py", line 683, in decode_first_stage
    return self.decode_core(z, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/media/shared/ToonCrafter/scripts/evaluation/../../lvdm/models/ddpm3d.py", line 671, in decode_core
    out = self.first_stage_model.decode(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/media/shared/ToonCrafter/scripts/evaluation/../../lvdm/models/autoencoder.py", line 115, in decode
    dec = self.decoder(z, **kwargs)  ##change for SVD decoder by adding **kwargs
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/media/shared/ToonCrafter/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/media/shared/ToonCrafter/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/media/shared/ToonCrafter/scripts/evaluation/../../lvdm/models/autoencoder_dualref.py", line 507, in forward
    h = self.up[i_level].block[i_block](h, temb, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/media/shared/ToonCrafter/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/media/shared/ToonCrafter/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/media/shared/ToonCrafter/scripts/evaluation/../../lvdm/models/autoencoder_dualref.py", line 898, in forward
    x = super().forward(x, temb)
        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/media/shared/ToonCrafter/scripts/evaluation/../../lvdm/models/autoencoder_dualref.py", line 75, in forward
    h = nonlinearity(h)
        ^^^^^^^^^^^^^^^
  File "/usr/media/shared/ToonCrafter/scripts/evaluation/../../lvdm/models/autoencoder_dualref.py", line 26, in nonlinearity
    return x * torch.sigmoid(x)
           ~~^~~~~~~~~~~~~~~~~~

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.50 GiB. GPU 0 has a total capacty of 23.55 GiB of which 2.04 GiB is free. Including non-PyTorch memory, this process has 21.49 GiB memory in use. Of the allocated memory 19.20 GiB is allocated by PyTorch, and 1.83 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

would be awesome to use it on it.

from tooncrafter.

protector131090 commented on August 29, 2024

OOM on 4090 Running the default example included in run.sh

Got it to work by using FP16

can you please tell me what that means? how do i do this?

from tooncrafter.

dzupin commented on August 29, 2024

Running on MS Windows. Can confirm that 4090 with 24GB is NOT enough VRAM memory. I am able to run without OOM crash but only because missing VRAM is substituted by much slower system RAM by my NVIDIA driver v.555.85 (some NVIDIA drivers will do substitution others will just crash with OOM error)

I am also very interested to reduce VRAM memory requirement in running it using FP16 model as mentioned by mycodeiscat.
I assume there is FP16 checkpoint of the model somewhere. Would be great if somebody shared the link to tooncrafter FP16 version.

UPDATE:
Thanks to @user-vm for info regarding script update in order to use half precision. His solution of script update (message bellow) works great. On my 4090 I can now fit everything into my VRAM. Inference time on my machine dropped from 1 minute (when system RAM was used) to half minute (everything runs in VRAM).

from tooncrafter.

Light-7777 commented on August 29, 2024

Can I run it with multiple GPU to meet vram requirement? I have 2x3060

from tooncrafter.

UberStorm commented on August 29, 2024

bro, just: https://huggingface.co/Kijai/DynamiCrafter_pruned/resolve/main/tooncrafter_512_interp-fp16.safetensors

fp16, 5gb only :) kijai legend

Should I just put it instead of model.ckpt?

from tooncrafter.

C0nsumption commented on August 29, 2024

OOM on 4090 Running the default example included in run.sh

Got it to work by using FP16

can you please tell me what that means? how do i do this?

He changed something for it to use half precision instead of full precision (fewer bytes per float). This patch seems to have worked for me on a 24GB 3090 (both the gradio demo and scripts/run.sh).

index 334a984..f44f805 100644
--- a/scripts/evaluation/inference.py
+++ b/scripts/evaluation/inference.py
@@ -329,6 +329,10 @@ def run_inference(args, gpu_num, gpu_no):
                 videos = torch.stack(videos, dim=0).to("cuda")
             else:
                 videos = videos.unsqueeze(0).to("cuda")
+                ^M
+            model = model.half()  # Convert model to half precision^M
+            videos = videos.half()  # Ensure inputs are also in half precision^M
+^M
 
             batch_samples = image_guided_synthesis(model, prompts, videos, noise_shape, args.n_samples, args.ddim_steps, args.ddim_eta, \
                                 args.unconditional_guidance_scale, args.cfg_img, args.frame_stride, args.text_input, args.multiple_cond_cfg, args.loop, args.interp, args.timestep_spacing, args.guidance_rescale)
@@ -382,4 +386,4 @@ if __name__ == '__main__':
     
     seed_everything(args.seed)
     rank, gpu_num = 0, 1
-    run_inference(args, gpu_num, rank)
\ No newline at end of file
+    run_inference(args, gpu_num, rank)^M
diff --git a/scripts/gradio/i2v_test_application.py b/scripts/gradio/i2v_test_application.py
index 68f6480..7fd09f9 100644
--- a/scripts/gradio/i2v_test_application.py
+++ b/scripts/gradio/i2v_test_application.py
@@ -48,7 +48,7 @@ class Image2Video():
         if steps > 60:
             steps = 60 
         model = self.model_list[gpu_id]
-        model = model.cuda()
+        model = model.cuda().half()^M
         batch_size=1
         channels = model.model.diffusion_model.out_channels
         frames = model.temporal_length
@@ -60,7 +60,7 @@ class Image2Video():
             text_emb = model.get_learned_conditioning([prompt])
 
             # img cond
-            img_tensor = torch.from_numpy(image).permute(2, 0, 1).float().to(model.device)
+            img_tensor = torch.from_numpy(image).permute(2, 0, 1).float().to(model.device).half()^M
             img_tensor = (img_tensor / 255. - 0.5) * 2
 
             image_tensor_resized = transform(img_tensor) #3,h,w
@@ -72,7 +72,7 @@ class Image2Video():
             
 
 
-            img_tensor2 = torch.from_numpy(image2).permute(2, 0, 1).float().to(model.device)
+            img_tensor2 = torch.from_numpy(image2).permute(2, 0, 1).float().to(model.device).half()^M
             img_tensor2 = (img_tensor2 / 255. - 0.5) * 2
             image_tensor_resized2 = transform(img_tensor2) #3,h,w
             videos2 = image_tensor_resized2.unsqueeze(0).unsqueeze(2) # bchw

Github isn't letting me attach this as a file. If you're not familiar with patch files, you can either save it as tooncrafter_half.patch and run git apply tooncrafter_half.patch in the Tooncrafter directory, or manually change the lines with a - in front of them to the ones with a + in front of them in the specified files (scripts/evaluation/inference.py and scripts/gradio/i2v_test_application.py). The VRAM usage seems to still fluctuate wildly, it may be because of torch.cuda.amp.autocast changing some things back to full precision (but removing it throws errors because of half and full precision getting mixed together). So it probably could use some more work.

This worked smoothly, thank you. If you don't mind me asking:
is there a trade off to using this versus KJs fp16 variant of the model? Just in the sense that I know reducing to half loses some quality but isn't that what his model is doing to a certain extent?

Just trying to understand the inner workings is all. Notice we either reduce to half or use a fp16 variant for like every ML repo.

from tooncrafter.

TinyForge commented on August 29, 2024

Can I run it with multiple GPU to meet vram requirement? I have 2x3060

I've been trying to get this working unsuccessfully so far. If I make any progress, I'll make a post and let you know.

from tooncrafter.

UberStorm commented on August 29, 2024

img_tensor2 = torch.from_numpy(image2).permute(2, 0, 1).float().to(model.device).half()

These changes made it work on my 4070Ti inside the tooncrafter-for-windows fork.
takes 3 minutes to infer.

from tooncrafter.

fusillustrator commented on August 29, 2024

OOM on 4090 Running the default example included in run.sh

Got it to work by using FP16

can you please tell me what that means? how do i do this?

He changed something for it to use half precision instead of full precision (fewer bytes per float). This patch seems to have worked for me on a 24GB 3090 (both the gradio demo and scripts/run.sh).

index 334a984..f44f805 100644
--- a/scripts/evaluation/inference.py
+++ b/scripts/evaluation/inference.py
@@ -329,6 +329,10 @@ def run_inference(args, gpu_num, gpu_no):
                 videos = torch.stack(videos, dim=0).to("cuda")
             else:
                 videos = videos.unsqueeze(0).to("cuda")
+                ^M
+            model = model.half()  # Convert model to half precision^M
+            videos = videos.half()  # Ensure inputs are also in half precision^M
+^M
 
             batch_samples = image_guided_synthesis(model, prompts, videos, noise_shape, args.n_samples, args.ddim_steps, args.ddim_eta, \
                                 args.unconditional_guidance_scale, args.cfg_img, args.frame_stride, args.text_input, args.multiple_cond_cfg, args.loop, args.interp, args.timestep_spacing, args.guidance_rescale)
@@ -382,4 +386,4 @@ if __name__ == '__main__':
     
     seed_everything(args.seed)
     rank, gpu_num = 0, 1
-    run_inference(args, gpu_num, rank)
\ No newline at end of file
+    run_inference(args, gpu_num, rank)^M
diff --git a/scripts/gradio/i2v_test_application.py b/scripts/gradio/i2v_test_application.py
index 68f6480..7fd09f9 100644
--- a/scripts/gradio/i2v_test_application.py
+++ b/scripts/gradio/i2v_test_application.py
@@ -48,7 +48,7 @@ class Image2Video():
         if steps > 60:
             steps = 60 
         model = self.model_list[gpu_id]
-        model = model.cuda()
+        model = model.cuda().half()^M
         batch_size=1
         channels = model.model.diffusion_model.out_channels
         frames = model.temporal_length
@@ -60,7 +60,7 @@ class Image2Video():
             text_emb = model.get_learned_conditioning([prompt])
 
             # img cond
-            img_tensor = torch.from_numpy(image).permute(2, 0, 1).float().to(model.device)
+            img_tensor = torch.from_numpy(image).permute(2, 0, 1).float().to(model.device).half()^M
             img_tensor = (img_tensor / 255. - 0.5) * 2
 
             image_tensor_resized = transform(img_tensor) #3,h,w
@@ -72,7 +72,7 @@ class Image2Video():
             
 
 
-            img_tensor2 = torch.from_numpy(image2).permute(2, 0, 1).float().to(model.device)
+            img_tensor2 = torch.from_numpy(image2).permute(2, 0, 1).float().to(model.device).half()^M
             img_tensor2 = (img_tensor2 / 255. - 0.5) * 2
             image_tensor_resized2 = transform(img_tensor2) #3,h,w
             videos2 = image_tensor_resized2.unsqueeze(0).unsqueeze(2) # bchw

Github isn't letting me attach this as a file. If you're not familiar with patch files, you can either save it as tooncrafter_half.patch and run git apply tooncrafter_half.patch in the Tooncrafter directory, or manually change the lines with a - in front of them to the ones with a + in front of them in the specified files (scripts/evaluation/inference.py and scripts/gradio/i2v_test_application.py). The VRAM usage seems to still fluctuate wildly, it may be because of torch.cuda.amp.autocast changing some things back to full precision (but removing it throws errors because of half and full precision getting mixed together). So it probably could use some more work.

How to fix?

from tooncrafter.

TinyForge commented on August 29, 2024

@Light-7777 I couldn't get multi-GPU support working, but I found a ComfyUI node that can run it on one of my 3060's
https://github.com/kijai/ComfyUI-DynamiCrafterWrapper
With using the fp16 model found here: tooncrafter_512_interp-fp16.safetensors
https://huggingface.co/Kijai/DynamiCrafter_pruned/tree/main

from tooncrafter.

fusillustrator commented on August 29, 2024

@Light-7777 I couldn't get multi-GPU support working, but I found a ComfyUI node that can run it on one of my 3060's https://github.com/kijai/ComfyUI-DynamiCrafterWrapper With using the fp16 model found here: tooncrafter_512_interp-fp16.safetensors https://huggingface.co/Kijai/DynamiCrafter_pruned/tree/main

bro can you tell me how to run it? I dont get the process -_-

from tooncrafter.

shiyeshu commented on August 29, 2024

These changes can be found in my fork of this repository at supraxylon/ToonCrafter-Low-VRAM, I will be adding more changes as time progresses. I was able to get this to run (slowly) on 6GB of VRAM.

Simply altering the code without changing model.ckpt to tooncrafter_512_interp-fp16.safetensors will be sufficient, correct?

from tooncrafter.

UsernamesAreSilly4 commented on August 29, 2024

I ain't no expert, but would installing xformers reduce VRAM usage?

from tooncrafter.

UsernamesAreSilly4 commented on August 29, 2024

These changes can be found in my fork of this repository at supraxylon/ToonCrafter-Low-VRAM, I will be adding more changes as time progresses. I was able to get this to run (slowly) on 6GB of VRAM.

I don't see anything in that repo that has an effect on VRAM.

from tooncrafter.

usamimeri commented on August 29, 2024

I test on A100 and it take average 30s to generate.

from tooncrafter.

TrombeCryB commented on August 29, 2024

copy diff below to patch将下面的差异复制到 patch
git apply patch

diff --git a/gradio_app.py b/gradio_app.py
index 5b8c17c..a6ffab2 100644
--- a/gradio_app.py
+++ b/gradio_app.py
@@ -78,5 +78,5 @@ if __name__ == "__main__":
     result_dir = os.path.join('./', 'results')
     dynamicrafter_iface = dynamicrafter_demo(result_dir)
     dynamicrafter_iface.queue(max_size=12)
-    dynamicrafter_iface.launch(max_threads=1)
-    # dynamicrafter_iface.launch(server_name='0.0.0.0', server_port=80, max_threads=1)
\ No newline at end of file
+    #dynamicrafter_iface.launch(max_threads=1)
+    dynamicrafter_iface.launch(server_name='0.0.0.0', server_port=8080, max_threads=1)
diff --git a/scripts/evaluation/inference.py b/scripts/evaluation/inference.py
index 334a984..41f1900 100644
--- a/scripts/evaluation/inference.py
+++ b/scripts/evaluation/inference.py
@@ -285,7 +285,7 @@ def run_inference(args, gpu_num, gpu_no):
     ## set use_checkpoint as False as when using deepspeed, it encounters an error "deepspeed backend not set"
     model_config['params']['unet_config']['params']['use_checkpoint'] = False
     model = instantiate_from_config(model_config)
-    model = model.cuda(gpu_no)
+    model = model.half().cuda(gpu_no)
     model.perframe_ae = args.perframe_ae
     assert os.path.exists(args.ckpt_path), "Error: checkpoint Not Found!"
     model = load_model_checkpoint(model, args.ckpt_path)
@@ -382,4 +382,4 @@ if __name__ == '__main__':
     
     seed_everything(args.seed)
     rank, gpu_num = 0, 1
-    run_inference(args, gpu_num, rank)
\ No newline at end of file
+    run_inference(args, gpu_num, rank)
diff --git a/scripts/gradio/i2v_test_application.py b/scripts/gradio/i2v_test_application.py
index 68f6480..d16cf7f 100644
--- a/scripts/gradio/i2v_test_application.py
+++ b/scripts/gradio/i2v_test_application.py
@@ -48,7 +48,7 @@ class Image2Video():
         if steps > 60:
             steps = 60 
         model = self.model_list[gpu_id]
-        model = model.cuda()
+        model = model.half().cuda()
         batch_size=1
         channels = model.model.diffusion_model.out_channels
         frames = model.temporal_length
@@ -60,7 +60,7 @@ class Image2Video():
             text_emb = model.get_learned_conditioning([prompt])
 
             # img cond
-            img_tensor = torch.from_numpy(image).permute(2, 0, 1).float().to(model.device)
+            img_tensor = torch.from_numpy(image).permute(2, 0, 1).float().half().to(model.device)
             img_tensor = (img_tensor / 255. - 0.5) * 2
 
             image_tensor_resized = transform(img_tensor) #3,h,w
@@ -72,7 +72,7 @@ class Image2Video():
             
 
 
-            img_tensor2 = torch.from_numpy(image2).permute(2, 0, 1).float().to(model.device)
+            img_tensor2 = torch.from_numpy(image2).permute(2, 0, 1).float().half().to(model.device)
             img_tensor2 = (img_tensor2 / 255. - 0.5) * 2
             image_tensor_resized2 = transform(img_tensor2) #3,h,w
             videos2 = image_tensor_resized2.unsqueeze(0).unsqueeze(2) # bchw
@@ -142,4 +142,4 @@ class Image2Video():
 if __name__ == '__main__':
     i2v = Image2Video()
     video_path = i2v.get_image('prompts/art.png','man fishing in a boat at sunset')
-    print('done', video_path)
\ No newline at end of file
+    print('done', video_path)

you are so cool! very very thank you.

from tooncrafter.

rotarzen commented on August 29, 2024

In case this helps: I have created a simplified/optimized implementation of ToonCrafter for inference here. It should avoid OOM issues on 24GB GPUs, while being substantially faster.

from tooncrafter.

Volcanicus commented on August 29, 2024

img_tensor2 = torch.from_numpy(image2).permute(2, 0, 1).float().half().to(model.device)

Basically, if you follow the code above, the red is deletion and green are additions.
In the red highlights, it will refer to a file name/path. Go there. Then search for the lines in red and replace with lines in green.
E.g.:
--- a/scripts/gradio/i2v_test_application.py
+++ b/scripts/gradio/i2v_test_application.py
This means go to the scripts folder, then gradio and open the i2v_test_application.py file.
In it, replace the lines as shown. Typically a cuda reference to a half().cuda or a float().half.

from tooncrafter.

Volcanicus commented on August 29, 2024

As an aside, the creators should add different options for precision. 27 gb VRAM is a joke... there are no standard GPUs other than the 32 gb ones and frankly, with precision reduction you can definitely run this at 16 gb or lower. And I mean 27 gb of ram for lower than 512x512 is... laughable in terms of precision.

from tooncrafter.

Recommended VRAM? about tooncrafter HOT 30 OPEN

Comments (30)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent