Comments (11)
Can you use wireshark on loopback device to watch communication and can show what happenen on communication beetween client and gpt4all?
May this code examples help you:
node js
const OpenAI = require ('openai');
const openai = new OpenAI({
baseURL:'http://127.0.0.1:4891/v1',
apiKey: "not needed for a local LLM",
});
async function main() {
const text = await openai.chat.completions.create(
{
messages: [{ role: 'user', content: '}],
model: 'Nous Hermes 2 Mistral DPO',
max_tokens: 1024 ,
n: 1,
stop: null,
temperature: 0.35,
top_p:0.75,
stream:false,
},
{ maxRetries: 5,} )
console.log( text );
console.log( text.choices);
}//main
main();
in Browser as fetch
const json_completion = JSON.stringify(
{stream:false,
temperatur:0.6,
max_tokens:100,
messages:[{role: "user", content:"Hello"},],
model: 'Nous Hermes 2 Mistral DPO'
}
);
const completions = await fetch("http://127.0.0.1:4891/v1/chat/completions",{
keepalive: true,
method: "POST",
mode: "no-cors",
// with this mode it will get a response ,
// but for security reason js in browser can not access the result of "await completions.json()"
headers: {
Accept: 'application/json',
'Content-Type': 'application/json',
'Access-Control-Allow-Origin': "*",
'Access-Control-Allow-Headers': "*"
},
body:json_completion
});
const completionjson = await completions.json();
/* here this is a problem,
* cause the browser can not do mode:"no-corse" and
* after finish the request do "completions.json()"
* this results in an error like
* Uncaught (in promise) SyntaxError: JSON.parse: unexpected end of data at line 1 column 1 of the JSON data
* see https://stackoverflow.com/questions/54896998/how-to-process-fetch-response-from-an-opaque-type
* without "mode:"no-corse" you will get an error like
* XHROPTIONS http://127.0.0.1:4891/v1/chat/completions CORS Preflight Did Not Succeed
* /
console.log(completionjson);
from gpt4all.
... not a templating/parameters issue, as the model works very fine in GPT4All, and furthermore the issue inside Continue's chat is that it does not output anything, whatever the prompt.
Alright then, but are you sure? I'm not all that familiar with the GUI's API server, but I've spent a bit of time with that recently. It's certainly possible that it's not entirely compatible and something that's expected by the continue plugin is not actually returned by the server.
That is, it definitely doesn't mimic the OpenAI API in full.
However, looking at the output of your previous comment again:
GPT4All response excerpt:
... "usage":{"completion_tokens":16,"prompt_tokens":20,"total_tokens":36}
ollama response excerpt:
... "usage":{"prompt_tokens":76,"completion_tokens":91,"total_tokens":167}
Note how many more prompt_tokens
it says it has used for the ollama prompt, although your own input is the same in both cases. My hunch here is that ollama adds templates, whereas in GPT4All you'd have to do that manually.
It's entirely possible that this isn't the only issue, though. To get everything to work, I mean. You might also want to run curl -v
once in case there's a problem with the HTTP headers (or use a web API tool which shows more details).
I'll probably have a look at the continue plugin when I have some time.
from gpt4all.
I tested a few different backends and I think that the issue is that server doesn't support streaming responses and continuedev extension require those.
Every backend that worked returned streaming response.
There is also a parameter stream: true
in incoming data:
{"messages":[{"role":"user","content":"hello"}],"model":"Llama 3 Instruct","max_tokens":1024,"stream":true}
from gpt4all.
The GPT4ALL v3.0.0 client has a "Server Chat" section which correctly shows the response to queries received from VSCode/Continue as they arrive, but I can confirm that when configured as the OP suggests at least, these responses don't make it back into Continue.
from gpt4all.
Sorry, last time I tried to really look into it I got held up, so I shelved it for a while.
I tested a few different backends and I think that the issue is that server doesn't support streaming responses and continuedev extension require those.
Every backend that worked returned streaming response.
True, the server mode currently doesn't implement streaming responses. If that's a hard requirement, then I guess this is the problem here.
will there be any fix for this?
I can't really say what the plans are right now, sorry. Improvements to the server mode are mentioned on the roadmap, however.
from gpt4all.
Thank you @zwilch . I am quite rusty with wireshark, so I'm going to need some time to debug it adequately this way.
Nevertheless, I tried to use curl
, an alternative to your two other suggested solutions. And I think this already sheds some light on the issue.
Here is what GPT4All spits out:
$ curl http://localhost:5001/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-coder-6.7b-instruct.Q8_0.gguf",
"messages": [{"role": "user", "content": "Hello! What is your name?"}]
}'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 489 100 354 100 135 66 25 0:00:05 0:00:05 --:--:-- 85{"choices":[{"finish_reason":"length","index":0,"message":{"content":"It seems like you forgot to say anything, could you please tell me again how","role":"assistant"}}],"created":1712489905,"id":"foobarbaz","model":"deepseek-coder-6.7b-instruct.Q8_0.gguf","object":"text_completion","usage":{"completion_tokens":16,"prompt_tokens":20,"total_tokens":36}}
I wrote before that it worked with curl. It indeed appears to do so, but it's only an appearance: when looking at the exact output, it is very much subpar in the quality we could expect, often outputting gibberish sentences and ending mid-sentence.
For comparison, here is what GPT4All outputs when the same model is queried from the GUI:
As an artificial intelligence, I don't have personal experiences or emotions like human beings do. Therefore, I am not named after individuals but rather by the programmers who designed me. My purpose is to assist users in providing information and answering questions based on my programming knowledge base. How can I help you today?
And here is what ollama outputs with the same model and prompt:
$ curl http://localhost:11434/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-coder:6.7b-instruct-Q8_0",
"messages": [{"role": "user", "content": "Hello! What is your name?"}]
}'
{"id":"chatcmpl-721","object":"chat.completion","created":1712479256,"model":"deepseek-coder:latest","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"As an AI Programming Assistant based on DeepSeek's model \"Deepseek Coder\", I don’t have a personal identity so it can be any person who has access to my features or services, such as the ability to respond in many languages. My design is focused around providing help and information related to computer science topics within this context of AI programming assistant service. How may I assist you with your coding needs today?\n"},"finish_reason":"stop"}],"usage":{"prompt_tokens":76,"completion_tokens":91,"total_tokens":167}}
So it seems that it's not just a formatting issue, but the GPT4All OpenAI-like API server does not work respond to queries the same way. It seems that it forgets the default parameters maybe? Because it outputs total gibberish, often stopping mid-sentence.
So this issue is not only related to continuedev it seems, it's the whole OpenAI-like API server function that seems to be affected.
I am trying to test my hypothesis above that it's because of missing parameters, but for the moment when I try to input the parameters it takes an infinite time to generate.
from gpt4all.
Sorry, I haven't read through everything here, but it might be a templates/parameters issue, so:
Note that many models don't work all that well if you don't provide them with the expected templates. I don't think these are added automatically to any of the web API endpoints. Also, the parameters can have a big influence, too.
What you should try:
- First of all, use the chat GUI itself with simple example prompt; for the provided models, the chat application automatically downloads and uses appropriate templates.
- Set temperature to zero during your tests, so that an example conversation is reproducible.
- Check what templates are in use in the settings, adapt them to an API call.
- While testing, make sure to set the same options when making calls through the web API, especially temperature zero.
- Test with the web API until you get it right and it produces the same output as in the chat GUI (this can be done with curl, I think).
from gpt4all.
@cosmic-snow Thank you for your suggestions, and although I will implement them in future tests to improve replicability, this is not a templating/parameters issue, as the model works very fine in GPT4All, and furthermore the issue inside Continue's chat is that it does not output anything, whatever the prompt.
(PS: I know how to edit continue config file, I made it work with several models in koboldcpp including the same model I am trying to use in gpt4all -- koboldcpp is also not supported by default in continue and must be manually configured as an OpenAI-like API server)
from gpt4all.
I see, I missed this detail. I'll try to debug this further, but this is getting a bit out of my current abilities, I need to train but I'm not sure when I'll have time to do that... But at least your indications are pointing me to the right direction, I'll post further comments if I find how to do that.
(NB: I wanted to use HTTP Toolkit but it didn't work, then I tried Wireshark but for some reason I cannot see the exchange, I must be mismanipulating, so what remains is Frida.re -- I think it would be more effective if I could catch and manipulate all the exchanges)
from gpt4all.
it should do streaming:
https://docs.gpt4all.io/gpt4all_python.html#chatting-with-gpt4all
from gpt4all.
will there be any fix for this?
from gpt4all.
Related Issues (20)
- Python API: how to get results from prompts you want? HOT 3
- хуита которая не заслуживает внимания HOT 1
- Attempting to download any model returns "Error" HOT 6
- [Feature] Feature request title... HOT 1
- There are two wiki and they might have complementary info yet they are not linked together HOT 2
- Auto Updater + Per Platform Installer RFC HOT 4
- GPT4All on the ARM64 architecture for Windows 11 ARM HOT 1
- [Feature] a button to stop all processes temporarily HOT 2
- [Request] Add LongWriter model(s) HOT 6
- [Feature] Clone chat to convert to different Ai model HOT 2
- [Feature] Confirmation dialog when attempting to Remove a model (delete its file/s)
- Slowdown between GPT4All-Chat 3.0 and GPT4All-Chat 3.1 HOT 3
- openai-compatible model: Allow system prompt
- "New Chat" does not change after switching locale HOT 2
- [Feature] Add Ukrainian localization of GPT4All 🇺🇦
- GPT4All is not straightforward to use offline - blocked by GFW HOT 17
- Downloading language models stops HOT 5
- Introduce Configurable Initial Instruction for Customizing Chat Behavior HOT 1
- [Feature] Enable Input Buffering During Model Initialization to Improve User Efficiency HOT 1
- [Feature] ] Add Confirmation Step Before Model Deletion HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpt4all.