Code Monkey home page Code Monkey logo

Comments (3)

gokayfem avatar gokayfem commented on May 29, 2024 1

i think this model instruction tuned somewhat different than other models. unfortunately i cant try it, my vram is not enough to iterate on this issue.

from comfyui_vlm_nodes.

gokayfem avatar gokayfem commented on May 29, 2024

<|im_start|>system\nAnswer the questions.<|im_end|><|im_start|>user\n[img-10]\nDescribe the image<|im_end|><|im_start|>assistant\n

can you try this in the prompt

from comfyui_vlm_nodes.

dicksensei69 avatar dicksensei69 commented on May 29, 2024

I'm still getting funny output. Thanks for your help.

Screenshot_2024-03-07_18-57-00

Here is a link to the model that I've been using. It could be nonfunctional? It looks like it is about 10 days older than one ones posted by cjpais on huggingface.
https://huggingface.co/cmp-nct/llava-1.6-gguf
https://huggingface.co/cjpais/llava-v1.6-34B-gguf/

Finally here is the terminal output. I'm running this on linux mint if it matters and as you can see from the output i have 2x3090. I don't think that mess anything up.

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   yes
ggml_init_cublas: CUDA_USE_TENSOR_CORES: no
ggml_init_cublas: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
llama_model_loader: loaded meta data with 23 key-value pairs and 543 tensors from /home/dick/proj/ComfyUI/models/LLavacheckpoints/ggml-yi-34b-f16-q_5_k.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 7168
llama_model_loader: - kv   4:                          llama.block_count u32              = 60
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 20480
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 56
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 5000000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 17
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,64000]   = ["<unk>", "<|startoftext|>", "<|endof...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,64000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,64000]   = [2, 3, 3, 3, 3, 3, 1, 1, 1, 3, 3, 3, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 7
llama_model_loader: - kv  18:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  19:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  20:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {% for message in messages %}{{'<|im_...
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type q5_K:  361 tensors
llama_model_loader: - type q6_K:   61 tensors
llm_load_vocab: mismatch in special tokens definition ( 498/64000 vs 267/64000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 64000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 7168
llm_load_print_meta: n_head           = 56
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 60
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 7
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 20480
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 5000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 30B
llm_load_print_meta: model ftype      = Q5_K - Medium
llm_load_print_meta: model params     = 34.39 B
llm_load_print_meta: model size       = 22.65 GiB (5.66 BPW) 
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<|startoftext|>'
llm_load_print_meta: EOS token        = 7 '<|im_end|>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 0 '<unk>'
llm_load_print_meta: LF token         = 315 '<0x0A>'
llm_load_tensors: ggml ctx size       =    0.21 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: system memory used  = 13024.31 MiB
llm_load_tensors: VRAM used           = 10169.58 MiB
llm_load_tensors: offloading 27 repeating layers to GPU
llm_load_tensors: offloaded 27/61 layers to GPU
...................................................................................................
llama_new_context_with_model: n_ctx      = 320
llama_new_context_with_model: freq_base  = 5000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: VRAM kv self = 33.75 MB
llama_new_context_with_model: KV self size  =   75.00 MiB, K (f16):   37.50 MiB, V (f16):   37.50 MiB
llama_build_graph: non-view tensors processed: 1264/1264
llama_new_context_with_model: compute buffer total size = 90.06 MiB
llama_new_context_with_model: VRAM scratch buffer: 86.88 MiB
llama_new_context_with_model: total VRAM used: 10290.20 MiB (model: 10169.58 MiB, context: 120.62 MiB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
Llama.generate: prefix-match hit

llama_print_timings:        load time =    9620.25 ms
llama_print_timings:      sample time =      22.11 ms /    46 runs   (    0.48 ms per token,  2080.98 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =   47613.29 ms /    46 runs   ( 1035.07 ms per token,     0.97 tokens per second)
llama_print_timings:       total time =   47778.26 ms
Prompt executed in 64.51 seconds

from comfyui_vlm_nodes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.