Code Monkey home page Code Monkey logo

llamachat's Introduction

hey, I'm alex ๐Ÿ‘‹

I'm a full-stack engineer and creative ๐Ÿ”ฎ

I've been building for the Apple ecosystem for a long time, in Objective-C/C++ and Swift, and for AppKit, UIKit and SwiftUI. I've always been driven to ship polished projects with clean APIs, and back in the day I built PXSourceList and PXListView which shipped in some pretty big apps, including early versions of Sketch.

These days I'm excited about Swift, SwiftUI and TypeScript, and am placing a focus on building out fun, interesting projects that make the most of all the incredible tools and technologies that are out there.

I've worked for various companies both big1 2 and small3 4, and I'm currently available for freelance projects ๐Ÿ™Œ

If you've ever used any of my open-sourced projects, tips are always appreciated ๐Ÿ™๐Ÿผ and you can also sponsor my work on GitHub

Buy Me A Coffee

Find me around the web:

llamachat's People

Contributors

alexrozanski avatar eltociear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llamachat's Issues

Control answer length?

Hi thanks for this, I am using with GPT4ALL. How do I control how long the answers are? currently they are quite short...

Feature Request: Add API endpoint

I love this! Nice work!

This allows me to quickly test my local models, and see how i want to configure them. Once i've confirmed (through the GUI) that everything is how i want it, it would be great if we can expose this as an API. So it's not only a CLIENT, but now can operate as a LlamaServer.

Is this something that you'd be willing to consider?

Syntax Error in ChatModel.swift

Hey @alexrozanski, at line 73 of ChatModel.swift, you have this:

guard let lastChatContext else {
        canClearContext = false
        return
      }

It seems like a condition is missing:

      guard let _ = lastChatContext else {
        canClearContext = false
        return
      }

Shouldn't it look something like the above?

Llamachat is spouting gibberish

System: Macbook Pro 2019
Installation method: Homebrew
Version: ==> llamachat: 1.2.0 (auto_updates)

I downloaded the 7b via bittorrent and imported it into llamachat. I tried saying hello to my new friend, but all I get in return is gibbrerish. How do I debug this issue?

Screenshot 2023-11-16 at 12 30 49โ€ฏPM

It appears to convert the model without any complaints:

python3 -u /var/folders/tj/hdvn6t_x1lb_27qt5h9xx5300000gn/T/6A1F66D4-970D-4452-9D5A-F8D5231D098F/convert-pth-to-ggml.py /Users/jblack/Library/Application Support/com.alexrozanski.LlamaChat/models/47EECB0E-23AA-4001-A0F3-9548E6C73A71/7B 1
Loading model file /Users/jblack/Library/Application Support/com.alexrozanski.LlamaChat/models/47EECB0E-23AA-4001-A0F3-9548E6C73A71/7B/consolidated.00.pth
Loading vocab file /Users/jblack/Library/Application Support/com.alexrozanski.LlamaChat/models/47EECB0E-23AA-4001-A0F3-9548E6C73A71/tokenizer.model
Writing vocab...
[ 1/291] Writing tensor tok_embeddings.weight | size 32000 x 4096 | type UnquantizedDataType(name='F16')
[ 2/291] Writing tensor norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 3/291] Writing tensor output.weight | size 32000 x 4096 | type UnquantizedDataType(name='F16')
[ 4/291] Writing tensor layers.0.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 5/291] Writing tensor layers.0.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 6/291] Writing tensor layers.0.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 7/291] Writing tensor layers.0.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 8/291] Writing tensor layers.0.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 9/291] Writing tensor layers.0.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 10/291] Writing tensor layers.0.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[ 11/291] Writing tensor layers.0.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 12/291] Writing tensor layers.0.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 13/291] Writing tensor layers.1.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 14/291] Writing tensor layers.1.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 15/291] Writing tensor layers.1.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 16/291] Writing tensor layers.1.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 17/291] Writing tensor layers.1.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 18/291] Writing tensor layers.1.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 19/291] Writing tensor layers.1.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[ 20/291] Writing tensor layers.1.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 21/291] Writing tensor layers.1.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 22/291] Writing tensor layers.2.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 23/291] Writing tensor layers.2.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 24/291] Writing tensor layers.2.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 25/291] Writing tensor layers.2.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 26/291] Writing tensor layers.2.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 27/291] Writing tensor layers.2.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 28/291] Writing tensor layers.2.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[ 29/291] Writing tensor layers.2.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 30/291] Writing tensor layers.2.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 31/291] Writing tensor layers.3.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 32/291] Writing tensor layers.3.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 33/291] Writing tensor layers.3.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 34/291] Writing tensor layers.3.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 35/291] Writing tensor layers.3.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 36/291] Writing tensor layers.3.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 37/291] Writing tensor layers.3.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[ 38/291] Writing tensor layers.3.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 39/291] Writing tensor layers.3.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 40/291] Writing tensor layers.4.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 41/291] Writing tensor layers.4.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 42/291] Writing tensor layers.4.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 43/291] Writing tensor layers.4.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 44/291] Writing tensor layers.4.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 45/291] Writing tensor layers.4.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 46/291] Writing tensor layers.4.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[ 47/291] Writing tensor layers.4.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 48/291] Writing tensor layers.4.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 49/291] Writing tensor layers.5.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 50/291] Writing tensor layers.5.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 51/291] Writing tensor layers.5.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 52/291] Writing tensor layers.5.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 53/291] Writing tensor layers.5.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 54/291] Writing tensor layers.5.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 55/291] Writing tensor layers.5.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[ 56/291] Writing tensor layers.5.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 57/291] Writing tensor layers.5.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 58/291] Writing tensor layers.6.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 59/291] Writing tensor layers.6.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 60/291] Writing tensor layers.6.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 61/291] Writing tensor layers.6.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 62/291] Writing tensor layers.6.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 63/291] Writing tensor layers.6.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 64/291] Writing tensor layers.6.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[ 65/291] Writing tensor layers.6.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 66/291] Writing tensor layers.6.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 67/291] Writing tensor layers.7.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 68/291] Writing tensor layers.7.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 69/291] Writing tensor layers.7.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 70/291] Writing tensor layers.7.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 71/291] Writing tensor layers.7.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 72/291] Writing tensor layers.7.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 73/291] Writing tensor layers.7.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[ 74/291] Writing tensor layers.7.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 75/291] Writing tensor layers.7.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 76/291] Writing tensor layers.8.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 77/291] Writing tensor layers.8.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 78/291] Writing tensor layers.8.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 79/291] Writing tensor layers.8.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 80/291] Writing tensor layers.8.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 81/291] Writing tensor layers.8.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 82/291] Writing tensor layers.8.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[ 83/291] Writing tensor layers.8.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 84/291] Writing tensor layers.8.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 85/291] Writing tensor layers.9.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 86/291] Writing tensor layers.9.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 87/291] Writing tensor layers.9.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 88/291] Writing tensor layers.9.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 89/291] Writing tensor layers.9.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 90/291] Writing tensor layers.9.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 91/291] Writing tensor layers.9.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[ 92/291] Writing tensor layers.9.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[ 93/291] Writing tensor layers.9.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 94/291] Writing tensor layers.10.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 95/291] Writing tensor layers.10.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 96/291] Writing tensor layers.10.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 97/291] Writing tensor layers.10.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[ 98/291] Writing tensor layers.10.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 99/291] Writing tensor layers.10.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[100/291] Writing tensor layers.10.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[101/291] Writing tensor layers.10.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[102/291] Writing tensor layers.10.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[103/291] Writing tensor layers.11.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[104/291] Writing tensor layers.11.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[105/291] Writing tensor layers.11.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[106/291] Writing tensor layers.11.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[107/291] Writing tensor layers.11.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[108/291] Writing tensor layers.11.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[109/291] Writing tensor layers.11.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[110/291] Writing tensor layers.11.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[111/291] Writing tensor layers.11.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[112/291] Writing tensor layers.12.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[113/291] Writing tensor layers.12.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[114/291] Writing tensor layers.12.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[115/291] Writing tensor layers.12.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[116/291] Writing tensor layers.12.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[117/291] Writing tensor layers.12.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[118/291] Writing tensor layers.12.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[119/291] Writing tensor layers.12.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[120/291] Writing tensor layers.12.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[121/291] Writing tensor layers.13.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[122/291] Writing tensor layers.13.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[123/291] Writing tensor layers.13.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[124/291] Writing tensor layers.13.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[125/291] Writing tensor layers.13.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[126/291] Writing tensor layers.13.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[127/291] Writing tensor layers.13.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[128/291] Writing tensor layers.13.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[129/291] Writing tensor layers.13.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[130/291] Writing tensor layers.14.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[131/291] Writing tensor layers.14.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[132/291] Writing tensor layers.14.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[133/291] Writing tensor layers.14.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[134/291] Writing tensor layers.14.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[135/291] Writing tensor layers.14.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[136/291] Writing tensor layers.14.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[137/291] Writing tensor layers.14.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[138/291] Writing tensor layers.14.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[139/291] Writing tensor layers.15.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[140/291] Writing tensor layers.15.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[141/291] Writing tensor layers.15.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[142/291] Writing tensor layers.15.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[143/291] Writing tensor layers.15.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[144/291] Writing tensor layers.15.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[145/291] Writing tensor layers.15.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[146/291] Writing tensor layers.15.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[147/291] Writing tensor layers.15.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[148/291] Writing tensor layers.16.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[149/291] Writing tensor layers.16.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[150/291] Writing tensor layers.16.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[151/291] Writing tensor layers.16.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[152/291] Writing tensor layers.16.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[153/291] Writing tensor layers.16.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[154/291] Writing tensor layers.16.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[155/291] Writing tensor layers.16.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[156/291] Writing tensor layers.16.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[157/291] Writing tensor layers.17.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[158/291] Writing tensor layers.17.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[159/291] Writing tensor layers.17.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[160/291] Writing tensor layers.17.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[161/291] Writing tensor layers.17.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[162/291] Writing tensor layers.17.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[163/291] Writing tensor layers.17.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[164/291] Writing tensor layers.17.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[165/291] Writing tensor layers.17.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[166/291] Writing tensor layers.18.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[167/291] Writing tensor layers.18.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[168/291] Writing tensor layers.18.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[169/291] Writing tensor layers.18.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[170/291] Writing tensor layers.18.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[171/291] Writing tensor layers.18.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[172/291] Writing tensor layers.18.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[173/291] Writing tensor layers.18.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[174/291] Writing tensor layers.18.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[175/291] Writing tensor layers.19.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[176/291] Writing tensor layers.19.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[177/291] Writing tensor layers.19.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[178/291] Writing tensor layers.19.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[179/291] Writing tensor layers.19.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[180/291] Writing tensor layers.19.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[181/291] Writing tensor layers.19.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[182/291] Writing tensor layers.19.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[183/291] Writing tensor layers.19.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[184/291] Writing tensor layers.20.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[185/291] Writing tensor layers.20.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[186/291] Writing tensor layers.20.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[187/291] Writing tensor layers.20.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[188/291] Writing tensor layers.20.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[189/291] Writing tensor layers.20.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[190/291] Writing tensor layers.20.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[191/291] Writing tensor layers.20.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[192/291] Writing tensor layers.20.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[193/291] Writing tensor layers.21.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[194/291] Writing tensor layers.21.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[195/291] Writing tensor layers.21.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[196/291] Writing tensor layers.21.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[197/291] Writing tensor layers.21.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[198/291] Writing tensor layers.21.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[199/291] Writing tensor layers.21.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[200/291] Writing tensor layers.21.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[201/291] Writing tensor layers.21.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[202/291] Writing tensor layers.22.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[203/291] Writing tensor layers.22.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[204/291] Writing tensor layers.22.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[205/291] Writing tensor layers.22.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[206/291] Writing tensor layers.22.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[207/291] Writing tensor layers.22.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[208/291] Writing tensor layers.22.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[209/291] Writing tensor layers.22.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[210/291] Writing tensor layers.22.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[211/291] Writing tensor layers.23.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[212/291] Writing tensor layers.23.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[213/291] Writing tensor layers.23.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[214/291] Writing tensor layers.23.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[215/291] Writing tensor layers.23.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[216/291] Writing tensor layers.23.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[217/291] Writing tensor layers.23.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[218/291] Writing tensor layers.23.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[219/291] Writing tensor layers.23.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[220/291] Writing tensor layers.24.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[221/291] Writing tensor layers.24.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[222/291] Writing tensor layers.24.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[223/291] Writing tensor layers.24.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[224/291] Writing tensor layers.24.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[225/291] Writing tensor layers.24.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[226/291] Writing tensor layers.24.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[227/291] Writing tensor layers.24.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[228/291] Writing tensor layers.24.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[229/291] Writing tensor layers.25.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[230/291] Writing tensor layers.25.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[231/291] Writing tensor layers.25.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[232/291] Writing tensor layers.25.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[233/291] Writing tensor layers.25.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[234/291] Writing tensor layers.25.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[235/291] Writing tensor layers.25.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[236/291] Writing tensor layers.25.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[237/291] Writing tensor layers.25.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[238/291] Writing tensor layers.26.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[239/291] Writing tensor layers.26.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[240/291] Writing tensor layers.26.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[241/291] Writing tensor layers.26.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[242/291] Writing tensor layers.26.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[243/291] Writing tensor layers.26.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[244/291] Writing tensor layers.26.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[245/291] Writing tensor layers.26.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[246/291] Writing tensor layers.26.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[247/291] Writing tensor layers.27.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[248/291] Writing tensor layers.27.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[249/291] Writing tensor layers.27.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[250/291] Writing tensor layers.27.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[251/291] Writing tensor layers.27.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[252/291] Writing tensor layers.27.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[253/291] Writing tensor layers.27.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[254/291] Writing tensor layers.27.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[255/291] Writing tensor layers.27.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[256/291] Writing tensor layers.28.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[257/291] Writing tensor layers.28.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[258/291] Writing tensor layers.28.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[259/291] Writing tensor layers.28.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[260/291] Writing tensor layers.28.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[261/291] Writing tensor layers.28.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[262/291] Writing tensor layers.28.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[263/291] Writing tensor layers.28.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[264/291] Writing tensor layers.28.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[265/291] Writing tensor layers.29.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[266/291] Writing tensor layers.29.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[267/291] Writing tensor layers.29.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[268/291] Writing tensor layers.29.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[269/291] Writing tensor layers.29.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[270/291] Writing tensor layers.29.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[271/291] Writing tensor layers.29.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[272/291] Writing tensor layers.29.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[273/291] Writing tensor layers.29.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[274/291] Writing tensor layers.30.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[275/291] Writing tensor layers.30.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[276/291] Writing tensor layers.30.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[277/291] Writing tensor layers.30.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[278/291] Writing tensor layers.30.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[279/291] Writing tensor layers.30.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[280/291] Writing tensor layers.30.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[281/291] Writing tensor layers.30.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[282/291] Writing tensor layers.30.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[283/291] Writing tensor layers.31.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[284/291] Writing tensor layers.31.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[285/291] Writing tensor layers.31.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[286/291] Writing tensor layers.31.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')
[287/291] Writing tensor layers.31.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[288/291] Writing tensor layers.31.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[289/291] Writing tensor layers.31.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')
[290/291] Writing tensor layers.31.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')
[291/291] Writing tensor layers.31.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
Wrote /Users/jblack/Library/Application Support/com.alexrozanski.LlamaChat/models/47EECB0E-23AA-4001-A0F3-9548E6C73A71/7B/ggml-model-f16.bin

test -f /Users/jblack/Library/Application Support/com.alexrozanski.LlamaChat/models/47EECB0E-23AA-4001-A0F3-9548E6C73A71/7B/ggml-model-f16.bin

Can the questions not affect each other?

I have observed that the previous questions affect the results of the later questions, and each question is not independent of the other. Is there a way to make them independent of each other?
Thank you.

Suggestion add feature to LlamaChat - upload any type of document

Is it possible to add a feature to LlamaChat that allows users to upload any type of document and multi document (PDF, long PDF, docs, excel, etc.) and then chat with running local llm? This feature would enhance privacy, as compared to using online services like chat with OpenAI API, which may not be safe for confidential company or project-related documents such as PDFs or budget reports in Excel.

Error when converting model

I have downloaded the Llama-2 (13B) models and I have them in .pth format.

Environment check passes, so does setting up & dependency check. When it reaches the conversion step, I see this:

Writing vocab...

Traceback (most recent call last):
File "/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert-pth-to-ggml.py", line 11, in
convert.main(['--outtype', 'f16' if args.ftype == 1 else 'f32', '--', args.dir_model])
File "/private/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert.py", line 1144, in main
OutputFile.write_all(outfile, params, model, vocab)
File "/private/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert.py", line 953, in write_all
for i, ((name, lazy_tensor), ndarray) in enumerate(zip(model.items(), ndarrays)):
File "/private/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert.py", line 875, in bounded_parallel_map
result = futures.pop(0).result()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 438, in result
return self.__get_result()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
raise self._exception
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/private/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert.py", line 950, in do_item
return lazy_tensor.load().to_ggml().ndarray
File "/private/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert.py", line 489, in load
ret = self._load()
File "/private/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert.py", line 497, in load
return self.load().astype(data_type)
File "/private/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert.py", line 489, in load
ret = self._load()
File "/private/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert.py", line 549, in load
ndarrays = [load_unquantized(tensor) for tensor in lazy_tensors]
File "/private/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert.py", line 549, in
ndarrays = [load_unquantized(tensor) for tensor in lazy_tensors]
File "/private/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert.py", line 297, in load_unquantized
tensor = lazy_tensor.load()
File "/private/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert.py", line 489, in load
ret = self._load()
File "/private/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert.py", line 695, in load
return UnquantizedTensor(storage.load(storage_offset, elm_count).reshape(size))
File "/private/var/folders/m0/872knsrj62s8zllz2f18nhpr0000gn/T/E7FD53F2-B53D-4439-9AE7-898FE0607B18/convert.py", line 679, in load
raise Exception("tensor stored in unsupported format")
Exception: tensor stored in unsupported format

Am I doing something wrong?

Add ability to configure runtime and model hyperparameters

Some essential ones that come to mind:

  • -t {n} to specify number of threads. From what I can tell, LlamaChat uses 3 threads by default, but on my machine 8 threads gets the best performance.
  • --temp and the rest of the useful params

Support downloadable models

Any chance we could just select the model, and then LlamaChat downloads the necessary files automatically?

Support configuring whether to load the entire model into memory or use mmap

Greetings,
Love the application and UX!

I noticed Llama cpp running on my M1 was flushing the memory during and after each generation causing slower-than-expected outputs.
This can be fixed by passing "-mlock" argument, which massively boosts Mac M1 performance by locking the model into the memory.

However, currently, LlamaChat has a similar issue, and I believe it can be fixed by passing a simple '-mlock' argument. In fact, I suggest leaving it ON by default for a seamless beginner's experience for M1s.

Moreover, please also consider an advanced feature to allow users to change the parameters.

Failed to load model for eachadea/ggml-vicuna-7b-1.1

After I downloaded eachadea/ggml-vicuna-7b-1.1's ggml-vicuna-7b-1.1-q4_0.bin
model from https://huggingface.co/eachadea/ggml-vicuna-7b-1.1/tree/main, I was able to add Chat Source successfully.
However, during the conversation, an error "Failed to load model" occurred.
I also tried llama.cpp, and I could load the model only after updating to the latest llama.cpp. The llama.cpp from 5 days ago would also fail to load the model. I'm not sure if the ggml model in llama.cpp has been modified in any way.

[Feature Request] Support InternLM

Dear LlamaChat developer,

Greetings! I am vansinhu, a community developer and volunteer at InternLM. Your work has been immensely beneficial to me, and I believe it can be effectively utilized in InternLM as well. Welcome to add Discord https://discord.gg/gF9ezcmtM3 . I hope to get in touch with you.

Best regards,
vansinhu

Error using pth format model

Please forgive my ignorance...
Problem:

Exception: tensor stored in unsupported format

Full Error that popped up:

Traceback (most recent call last):
File "/var/folders/dl/8yzjfxfn26sf347jggtzb30c0000gn/T/0D6EEEEF-7FD3-417C-AFC4-71D9A9E81116/convert-pth-to-ggml.py", line 11, in
convert.main(['--outtype', 'f16' if args.ftype == 1 else 'f32', '--', args.dir_model])
File "/private/var/folders/dl/8yzjfxfn26sf347jggtzb30c0000gn/T/0D6EEEEF-7FD3-417C-AFC4-71D9A9E81116/convert.py", line 1144, in main
OutputFile.write_all(outfile, params, model, vocab)
File "/private/var/folders/dl/8yzjfxfn26sf347jggtzb30c0000gn/T/0D6EEEEF-7FD3-417C-AFC4-71D9A9E81116/convert.py", line 953, in write_all
for i, ((name, lazy_tensor), ndarray) in enumerate(zip(model.items(), ndarrays)):
File "/private/var/folders/dl/8yzjfxfn26sf347jggtzb30c0000gn/T/0D6EEEEF-7FD3-417C-AFC4-71D9A9E81116/convert.py", line 875, in bounded_parallel_map
result = futures.pop(0).result()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 438, in result
return self.__get_result()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
raise self._exception
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/private/var/folders/dl/8yzjfxfn26sf347jggtzb30c0000gn/T/0D6EEEEF-7FD3-417C-AFC4-71D9A9E81116/convert.py", line 950, in do_item
return lazy_tensor.load().to_ggml().ndarray
File "/private/var/folders/dl/8yzjfxfn26sf347jggtzb30c0000gn/T/0D6EEEEF-7FD3-417C-AFC4-71D9A9E81116/convert.py", line 489, in load
ret = self._load()
File "/private/var/folders/dl/8yzjfxfn26sf347jggtzb30c0000gn/T/0D6EEEEF-7FD3-417C-AFC4-71D9A9E81116/convert.py", line 497, in load
return self.load().astype(data_type)
File "/private/var/folders/dl/8yzjfxfn26sf347jggtzb30c0000gn/T/0D6EEEEF-7FD3-417C-AFC4-71D9A9E81116/convert.py", line 489, in load
ret = self._load()
File "/private/var/folders/dl/8yzjfxfn26sf347jggtzb30c0000gn/T/0D6EEEEF-7FD3-417C-AFC4-71D9A9E81116/convert.py", line 695, in load
return UnquantizedTensor(storage.load(storage_offset, elm_count).reshape(size))
File "/private/var/folders/dl/8yzjfxfn26sf347jggtzb30c0000gn/T/0D6EEEEF-7FD3-417C-AFC4-71D9A9E81116/convert.py", line 679, in load
raise Exception("tensor stored in unsupported format")
Exception: tensor stored in unsupported format

Request for support of official OpenAI API for ChatGPT/GPT4

I am reaching out to request support for the official OpenAI API for ChatGPT/GPT4 in LlamaChat. I understand that the current Llama models are designed to run offline, but I believe that adding support for faster APIs would make the software more native to Mac and ultimately, more useful.

I kindly ask you to consider adding the official OpenAI API for ChatGPT/GPT4 in LlamaChat. Thank you for your time and consideration.

Build error

/System/Library/Frameworks/Sparkle.framework/Versions/B/Sparkle' (no such file, not in dyld cache)
'/Library/Frameworks/Sparkle.framework/Versions/B/Sparkle' (no such file),

support for amd gpu (macos)

maybe there is support for amd gpu for macos, since it can run mps. or maybe add fallback mode like pytorch, but not full on cpu only.

Sources don't persist

After closing the app and reopening it, the chat sources are cleared out and need to be re-added.

My steps:

  1. Opened app, and added Alpaca in GGML format.
  2. Used the app for a bit.
  3. Closed the app.
  4. Opened app, and got prompted to add a source again. Previous addition disappeared.

Chinese LLaMA/Alpaca support

Dear LlamaChat Maintainer,

Greetings from Yiming Cui, the Chinese-LLaMA/Alpaca maintainer. I would like to thank you for your efforts in making LLaMA-like models more accessible to the community. I just conducted a quick test, loading our Chinese-Alpaca-7B/13B model (ggml format), and it functioned without any errors. The system outputs closely resemble those generated by llama.cpp, and I believe there are no major issues concerning the support of Chinese models. I plan to include a description of LlamaChat on our project page and I am looking forward to future updates of LlamaChat.
Thank you.

Best regards,
Yiming

P.S. As a quick suggestion, some users might be interested in using advanced hyper-parameter settings (such as temperature, top-k, top-p, threads, etc.) to generate more diverse outputs. (forgive me if this feature is already implemented in LlamaChat)

Screenshot:
WX20230417-083210@2x

Ensure messages view is scrolled to bottom when new messages are added

When you're at the bottom of the chat, it should focus on showing the generated text, as it's being generated. Which means, it should automatically scroll to a new line as it's being generated.

Now the newly generated messages go off-screen at the bottom and you have to manually scroll continuously to read each line as it's being generated.

Same goes for showing your own message at the bottom when pressing enter.

Feature request: custom initial prompt

Really cool project, thank you so much !

Could the chat context be customized by the userย ?
Each conversation could have its own context configuration and be tuned from the settings

Selected Model is Unsupported

Using Alpaca and gpt4all

Have the following

  • ggml-alpaca-7b-q4.bin
  • gpt4all-lora-quantized.bin

After selecting the path it shows Selected model is of an unsupported version

image

installation environment problem

First of all, thank you so much for sharing. The interface and installation method of this project is impressive in its simplicity and power. I want to use it so much!

During my installation, I encountered the following problems:

When performing the second step "setting up environment", the following error occurred:

Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/Library/Python/3.7'
Consider using the `--user` option or check the permissions.

You are using pip version 19.0.3, however version 23.2.1 is available.
You should consider upgrading via the โ€˜pip install --upgrade pipโ€™ command.
image

After some searching, I tried the following:

  1. I ran the python -u -m pip install numpy sentencepiece torch --user command, and it showed a successful installation (except for a WARNING: Error parsing requirements for yapf: [Errno 2] No such file or directory : '/Users/user/anaconda3/lib/python3.7/site-packages/yapf-0.29.0.dist-info/METADATA'").
  2. I checked that my pip version is "pip 23.2.1" through pip --version, but the pip version displayed in the above error message is "19.0.3", so I suspect it is the environment of LlamaChat Not as I expected.

What are the reasons and solutions for this situation? Thank you so much!

Feature request: Automatically stop generation

It would be nice to have an option to automatically stop generation after a random number of 1-n sentences, delimited by punctuation followed by whitespace. Additionally, it may be helpful to have a "Stop generation at end of sentence" button as an alternative to the current control in the chat window.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.