Bug Report I tried to use GPT4All as a local LLM server with an Op

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

it should do streaming: <a href="https://docs.gpt4all.io/gpt4all_python.html#chatt

Incompatible with continuedev chat and code completion about gpt4all HOT 11 OPEN

lrq3000 commented on September 26, 2024 4

Incompatible with continuedev chat and code completion

from gpt4all.

Comments (11)

zwilch commented on September 26, 2024 1

Can you use wireshark on loopback device to watch communication and can show what happenen on communication beetween client and gpt4all?

May this code examples help you:

node js

const   OpenAI  = require ('openai');

const openai = new OpenAI({
  baseURL:'http://127.0.0.1:4891/v1',
  apiKey: "not needed for a local LLM",

});

async function main() {
 const text = await openai.chat.completions.create(
{ 
  messages: [{ role: 'user', content: '}], 
  model: 'Nous Hermes 2 Mistral DPO',
		max_tokens: 1024 ,
		n: 1,
		stop: null,
		temperature: 0.35,
		top_p:0.75,
		stream:false,
		},
 {  maxRetries: 5,}  )
 
 console.log(  text );
 console.log( text.choices);
}//main

main();

in Browser as fetch

const json_completion = JSON.stringify(
{stream:false,
 temperatur:0.6,
 max_tokens:100,
 messages:[{role: "user", content:"Hello"},],
 model: 'Nous Hermes 2 Mistral DPO'
}
);
const completions = await fetch("http://127.0.0.1:4891/v1/chat/completions",{
	keepalive: true,
	method: "POST",
	mode: "no-cors",  
// with this mode it will get a response , 
// but for security reason js in browser can not access the result of "await completions.json()"
 headers: {
    Accept: 'application/json',
    'Content-Type': 'application/json',
    'Access-Control-Allow-Origin': "*",
    'Access-Control-Allow-Headers': "*"
	},
	body:json_completion
	});
	
const completionjson = await completions.json();
/* here this is a problem, 
 *  cause the browser can not do mode:"no-corse" and
 *  after finish the request do "completions.json()"
 * this results in an error like
 * Uncaught (in promise) SyntaxError: JSON.parse: unexpected end of data at line 1 column 1 of the JSON data 
 * see https://stackoverflow.com/questions/54896998/how-to-process-fetch-response-from-an-opaque-type
 * without "mode:"no-corse" you will get an error like
 * XHROPTIONS http://127.0.0.1:4891/v1/chat/completions CORS Preflight Did Not Succeed
 * /
console.log(completionjson);

from gpt4all.

cosmic-snow commented on September 26, 2024 1

... not a templating/parameters issue, as the model works very fine in GPT4All, and furthermore the issue inside Continue's chat is that it does not output anything, whatever the prompt.

Alright then, but are you sure? I'm not all that familiar with the GUI's API server, but I've spent a bit of time with that recently. It's certainly possible that it's not entirely compatible and something that's expected by the continue plugin is not actually returned by the server.

That is, it definitely doesn't mimic the OpenAI API in full.

However, looking at the output of your previous comment again:
GPT4All response excerpt:

... "usage":{"completion_tokens":16,"prompt_tokens":20,"total_tokens":36}

ollama response excerpt:

... "usage":{"prompt_tokens":76,"completion_tokens":91,"total_tokens":167}

Note how many more prompt_tokens it says it has used for the ollama prompt, although your own input is the same in both cases. My hunch here is that ollama adds templates, whereas in GPT4All you'd have to do that manually.

It's entirely possible that this isn't the only issue, though. To get everything to work, I mean. You might also want to run curl -v once in case there's a problem with the HTTP headers (or use a web API tool which shows more details).

I'll probably have a look at the continue plugin when I have some time.

from gpt4all.

hyperstown commented on September 26, 2024 1

I tested a few different backends and I think that the issue is that server doesn't support streaming responses and continuedev extension require those.

Every backend that worked returned streaming response.

There is also a parameter stream: true in incoming data:

{"messages":[{"role":"user","content":"hello"}],"model":"Llama 3 Instruct","max_tokens":1024,"stream":true}

from gpt4all.

nebulous commented on September 26, 2024 1

The GPT4ALL v3.0.0 client has a "Server Chat" section which correctly shows the response to queries received from VSCode/Continue as they arrive, but I can confirm that when configured as the OP suggests at least, these responses don't make it back into Continue.

from gpt4all.

cosmic-snow commented on September 26, 2024 1

Sorry, last time I tried to really look into it I got held up, so I shelved it for a while.

I tested a few different backends and I think that the issue is that server doesn't support streaming responses and continuedev extension require those.

Every backend that worked returned streaming response.

True, the server mode currently doesn't implement streaming responses. If that's a hard requirement, then I guess this is the problem here.

will there be any fix for this?

I can't really say what the plans are right now, sorry. Improvements to the server mode are mentioned on the roadmap, however.

from gpt4all.

lrq3000 commented on September 26, 2024

Thank you @zwilch . I am quite rusty with wireshark, so I'm going to need some time to debug it adequately this way.

Nevertheless, I tried to use curl, an alternative to your two other suggested solutions. And I think this already sheds some light on the issue.

Here is what GPT4All spits out:


$ curl http://localhost:5001/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "deepseek-coder-6.7b-instruct.Q8_0.gguf",
    "messages": [{"role": "user", "content": "Hello! What is your name?"}]
  }'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   489  100   354  100   135     66     25  0:00:05  0:00:05 --:--:--    85{"choices":[{"finish_reason":"length","index":0,"message":{"content":"It seems like you forgot to say anything, could you please tell me again how","role":"assistant"}}],"created":1712489905,"id":"foobarbaz","model":"deepseek-coder-6.7b-instruct.Q8_0.gguf","object":"text_completion","usage":{"completion_tokens":16,"prompt_tokens":20,"total_tokens":36}}

I wrote before that it worked with curl. It indeed appears to do so, but it's only an appearance: when looking at the exact output, it is very much subpar in the quality we could expect, often outputting gibberish sentences and ending mid-sentence.

For comparison, here is what GPT4All outputs when the same model is queried from the GUI:

As an artificial intelligence, I don't have personal experiences or emotions like human beings do. Therefore, I am not named after individuals but rather by the programmers who designed me. My purpose is to assist users in providing information and answering questions based on my programming knowledge base. How can I help you today?

And here is what ollama outputs with the same model and prompt:

$ curl http://localhost:11434/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "deepseek-coder:6.7b-instruct-Q8_0",
    "messages": [{"role": "user", "content": "Hello! What is your name?"}]
  }'
{"id":"chatcmpl-721","object":"chat.completion","created":1712479256,"model":"deepseek-coder:latest","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"As an AI Programming Assistant based on DeepSeek's model \"Deepseek Coder\", I don’t have a personal identity so it can be any person who has access to my features or services, such as the ability to respond in many languages.  My design is focused around providing help and information related to computer science topics within this context of AI programming assistant service. How may I assist you with your coding needs today?\n"},"finish_reason":"stop"}],"usage":{"prompt_tokens":76,"completion_tokens":91,"total_tokens":167}}

So it seems that it's not just a formatting issue, but the GPT4All OpenAI-like API server does not work respond to queries the same way. It seems that it forgets the default parameters maybe? Because it outputs total gibberish, often stopping mid-sentence.

So this issue is not only related to continuedev it seems, it's the whole OpenAI-like API server function that seems to be affected.

I am trying to test my hypothesis above that it's because of missing parameters, but for the moment when I try to input the parameters it takes an infinite time to generate.

from gpt4all.

cosmic-snow commented on September 26, 2024

Sorry, I haven't read through everything here, but it might be a templates/parameters issue, so:

Note that many models don't work all that well if you don't provide them with the expected templates. I don't think these are added automatically to any of the web API endpoints. Also, the parameters can have a big influence, too.

What you should try:

First of all, use the chat GUI itself with simple example prompt; for the provided models, the chat application automatically downloads and uses appropriate templates.
Set temperature to zero during your tests, so that an example conversation is reproducible.
Check what templates are in use in the settings, adapt them to an API call.
While testing, make sure to set the same options when making calls through the web API, especially temperature zero.
Test with the web API until you get it right and it produces the same output as in the chat GUI (this can be done with curl, I think).

from gpt4all.

lrq3000 commented on September 26, 2024

@cosmic-snow Thank you for your suggestions, and although I will implement them in future tests to improve replicability, this is not a templating/parameters issue, as the model works very fine in GPT4All, and furthermore the issue inside Continue's chat is that it does not output anything, whatever the prompt.

(PS: I know how to edit continue config file, I made it work with several models in koboldcpp including the same model I am trying to use in gpt4all -- koboldcpp is also not supported by default in continue and must be manually configured as an OpenAI-like API server)

from gpt4all.

lrq3000 commented on September 26, 2024

I see, I missed this detail. I'll try to debug this further, but this is getting a bit out of my current abilities, I need to train but I'm not sure when I'll have time to do that... But at least your indications are pointing me to the right direction, I'll post further comments if I find how to do that.

(NB: I wanted to use HTTP Toolkit but it didn't work, then I tried Wireshark but for some reason I cannot see the exchange, I must be mismanipulating, so what remains is Frida.re -- I think it would be more effective if I could catch and manipulate all the exchanges)

from gpt4all.

zwilch commented on September 26, 2024

it should do streaming:
https://docs.gpt4all.io/gpt4all_python.html#chatting-with-gpt4all

from gpt4all.

xieu90 commented on September 26, 2024

will there be any fix for this?

from gpt4all.

Incompatible with continuedev chat and code completion about gpt4all HOT 11 OPEN

Comments (11)

node js

in Browser as fetch

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent