It seems that the Llava 34b doesn't work with the current prompt forms. I'm not certai

<|im_start|>system Answer the questions.<|im_end|&

I'm still getting funny output. Thanks for your help. <a target="_b

Llava 34b issues about comfyui_vlm_nodes HOT 3 CLOSED

gokayfem commented on May 29, 2024

Llava 34b issues

from comfyui_vlm_nodes.

Comments (3)

gokayfem commented on May 29, 2024 1

i think this model instruction tuned somewhat different than other models. unfortunately i cant try it, my vram is not enough to iterate on this issue.

from comfyui_vlm_nodes.

gokayfem commented on May 29, 2024

can you try this in the prompt

from comfyui_vlm_nodes.

dicksensei69 commented on May 29, 2024

I'm still getting funny output. Thanks for your help.

Here is a link to the model that I've been using. It could be nonfunctional? It looks like it is about 10 days older than one ones posted by cjpais on huggingface.
https://huggingface.co/cmp-nct/llava-1.6-gguf
https://huggingface.co/cjpais/llava-v1.6-34B-gguf/

Finally here is the terminal output. I'm running this on linux mint if it matters and as you can see from the output i have 2x3090. I don't think that mess anything up.

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   yes
ggml_init_cublas: CUDA_USE_TENSOR_CORES: no
ggml_init_cublas: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
llama_model_loader: loaded meta data with 23 key-value pairs and 543 tensors from /home/dick/proj/ComfyUI/models/LLavacheckpoints/ggml-yi-34b-f16-q_5_k.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 7168
llama_model_loader: - kv   4:                          llama.block_count u32              = 60
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 20480
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 56
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 5000000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 17
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,64000]   = ["<unk>", "<|startoftext|>", "<|endof...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,64000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,64000]   = [2, 3, 3, 3, 3, 3, 1, 1, 1, 3, 3, 3, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 7
llama_model_loader: - kv  18:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  19:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  20:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {% for message in messages %}{{'<|im_...
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type q5_K:  361 tensors
llama_model_loader: - type q6_K:   61 tensors
llm_load_vocab: mismatch in special tokens definition ( 498/64000 vs 267/64000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 64000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 7168
llm_load_print_meta: n_head           = 56
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 60
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 7
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 20480
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 5000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 30B
llm_load_print_meta: model ftype      = Q5_K - Medium
llm_load_print_meta: model params     = 34.39 B
llm_load_print_meta: model size       = 22.65 GiB (5.66 BPW) 
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<|startoftext|>'
llm_load_print_meta: EOS token        = 7 '<|im_end|>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 0 '<unk>'
llm_load_print_meta: LF token         = 315 '<0x0A>'
llm_load_tensors: ggml ctx size       =    0.21 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: system memory used  = 13024.31 MiB
llm_load_tensors: VRAM used           = 10169.58 MiB
llm_load_tensors: offloading 27 repeating layers to GPU
llm_load_tensors: offloaded 27/61 layers to GPU
...................................................................................................
llama_new_context_with_model: n_ctx      = 320
llama_new_context_with_model: freq_base  = 5000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: VRAM kv self = 33.75 MB
llama_new_context_with_model: KV self size  =   75.00 MiB, K (f16):   37.50 MiB, V (f16):   37.50 MiB
llama_build_graph: non-view tensors processed: 1264/1264
llama_new_context_with_model: compute buffer total size = 90.06 MiB
llama_new_context_with_model: VRAM scratch buffer: 86.88 MiB
llama_new_context_with_model: total VRAM used: 10290.20 MiB (model: 10169.58 MiB, context: 120.62 MiB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
Llama.generate: prefix-match hit

llama_print_timings:        load time =    9620.25 ms
llama_print_timings:      sample time =      22.11 ms /    46 runs   (    0.48 ms per token,  2080.98 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =   47613.29 ms /    46 runs   ( 1035.07 ms per token,     0.97 tokens per second)
llama_print_timings:       total time =   47778.26 ms
Prompt executed in 64.51 seconds

from comfyui_vlm_nodes.

Llava 34b issues about comfyui_vlm_nodes HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent