ollama / ollama-python Goto Github PK

View Code? Open in Web Editor NEW

2.9K 22.0 236.0 188 KB

Ollama Python library

Home Page: https://ollama.com

License: MIT License

Python 100.00%

ollama python

ollama-python's People

Contributors

Stargazers

Watchers

Forkers

furqanalishah techthiyanes goodwillken daydays123 jeromyjsmith mrkiura edthefox ai-imitation ambarrlite red-exe-engineer youngsecurity fateme211 louisbrulenaudet pariigh jacquesmiov slayerjoelk kmirmir fork-the-world alejandrosuarez rachfop ej52 adriens sireskay autonomos-agent howe829 officialsahyaboutorabi cariaga zhaolijun1109 tutumomo kornbotdevultimatorkraton themangalex cryptoholder-la lemeur iplayfast gitbarlew droidcraft sessionmemory aghasaadmohammad morgan7street prashantchaudhary1 cranberrycrisp kustomzone francoramirezz katmakhan wangxince trentgood0727 v0idmatr1x jqk6 initdengfeng 139demajia sumonitc khalid-kifayat davydu2012 mcutools shriyase li-xiu-qi jackeylove1 pengjukang ogfunkycold gloridust hj199717 jean-london niyeldeii nagi-ovo bentleylong lydia-meftah tigerzhanglaihu balloob tecknoh19 zdqf 0001ca knoopx luhavis wgong aichemydev gomerpubs synesthesiam abdulrahman305 zhaopufeng cygwynd mervankanat henri-edh candelamgal pinkyprakash n-vlahovic jwh1109 k0rd dearborn-open-ai aileague chriss-0x01 ethicalsecurity-agency irathana wangn25 stophobia neo-sushi wsp001 ilillliilliilll algorithmlover2016 ego majid-nisar

ollama-python's Issues

chat and cli chat get different outputs when using llava

Looking at photo's from the cli gives accurate results, from ollama.chat hallucinates.

from cli

ollama run llava
examine picture at IMG_8798.JPG
 The image you've provided appears to be a photograph of two people, likely taken at an indoor venue. Both individuals are smiling and seem to be enjoying
themselves, possibly in a social setting like a restaurant or bar.

The person on the left is wearing glasses and what looks to be a patterned top, while the individual on the right has a plaid shirt and balding hair with 
some gray strands visible at the sides. They both appear to be older adults and could be family members or close friends. Their relative positions suggest
they are posing for the photo together.

The background of the image indicates that they might be in a casual setting, as there's furniture visible behind them, which could imply a dining area or
similar environment within an indoor venue. The lighting suggests it may have been taken during daylight hours, and the overall mood of the image is 
cheerful and positive, reflecting the enjoyment of their time together.

python script

import ollama

#emb = ollama.embeddings(model='llava', prompt='examine picture at IMG_8798.JPG')
#print(emb)

chat = ollama.chat(model='llava', messages=[{'role': 'user', 'content': 'examine picture at IMG_8798.JPG'}])
print(chat)

python result

chris@FORGE:~/ai/aiprojects/photos$ python app.py 
{'model': 'llava', 'created_at': '2024-02-14T14:28:15.937502787Z', 'message': {'role': 'assistant', 'content': ' The image you\'ve provided appears to be a photograph of an indoor setting, possibly a room in a building. There is not much detail visible in the image, but here are some observations:\n\n* The photo has a watermark or text overlay that says "IMG_8798.JPG," which suggests that it was taken with a digital camera and saved as a JPEG file.\n* The lighting in the room is artificial, likely from ceiling lights or lamps, and there are no windows or other natural light sources visible.\n* There is some indistinct text on the wall to the left, but it\'s not clear enough to read.\n* The flooring looks like a type of carpet or rug, but it\'s difficult to make out any specific patterns or colors due to the resolution and angle of the photo.\n* In the foreground, there appears to be a white object with some reflective quality, which could be a piece of furniture or equipment, but again, it\'s hard to make out any specific details.\n\nOverall, without more context or a higher-resolution image, it\'s difficult to provide a more detailed analysis of the photograph. '}, 'done': True, 'total_duration': 2057330080, 'load_duration': 239573, 'prompt_eval_count': 20, 'prompt_eval_duration': 65368000, 'eval_count': 253, 'eval_duration': 1991285000}
chris@FORGE:~/ai/aiprojects/photos$ python app.py 
{'model': 'llava', 'created_at': '2024-02-14T14:31:59.414448944Z', 'message': {'role': 'assistant', 'content': ' The image appears to be a photograph, but the resolution is too low for me to provide any specific details about the content of the photo. If you can provide a higher-resolution version or more information about the image, I may be able to assist you further. '}, 'done': True, 'total_duration': 481256887, 'load_duration': 153712, 'prompt_eval_count': 20, 'prompt_eval_duration': 70449000, 'eval_count': 56, 'eval_duration': 409739000}
chris@FORGE:~/ai/aiprojects/photos$ python app.py 
{'model': 'llava', 'created_at': '2024-02-14T14:32:37.097980175Z', 'message': {'role': 'assistant', 'content': " The image you've provided is a bit blurry, but I'll do my best to describe what I can see. It appears to be a photograph of an object or a scene with a dark background and some lighter elements that could possibly be stars or lights. Without more context, it's difficult to determine the exact nature of the image. If you have any specific questions about the photo or need help with something related to it, please let me know! "}, 'done': True, 'total_duration': 726986598, 'load_duration': 142955, 'prompt_eval_duration': 20846000, 'eval_count': 96, 'eval_duration': 705631000}
chris@FORGE:~/ai/aiprojects/photos$ python app.py 
{'model': 'llava', 'created_at': '2024-02-14T14:32:44.507058654Z', 'message': {'role': 'assistant', 'content': " Sure, I can help you with that. The image is a JPEG file and it appears to be an everyday photograph of a person standing outdoors. It's not clear what the individual is doing or where they are, but they seem to be dressed casually.\n\nIf you have any specific questions about the image or need further details, feel free to let me know! "}, 'done': True, 'total_duration': 589823820, 'load_duration': 159140, 'prompt_eval_duration': 7421000, 'eval_count': 80, 'eval_duration': 581882000}
chris@FORGE:~/ai/aiprojects/photos$

README doesn't mention that a running ollama server is required

It seems I'm not the only one that looked at the README and assumed that the library is taking care of running the backend, resulting in a "Connection Refused" error when I try the example code in the README. If I understand well, I need to first run the ollama server. This should perhaps be made clear in the README.

pull method to get total progress info ?

I checked the pull progress example , it seems to return few progresss info , is it possible to get total progress info?

for example i need to make a ui progress bar to reflect the total download progress status

Setting up top k , Max tokens , context length ?

Hi I am using an ollama chat model to call Mistral-model . How can one set the different parameters such as top k , top p, context, length and temperature ?

GUI using Gradio or Pygame or Streamlit

We would have a better interface and keep modes for changing the model.
For the model change we will the GUI so that it will be vv easy for the user to change the model.
I would love to work on this if someone is ready to guide along with it.. I am familiar with Gradio,Pygame and Streamlit.
One inspiration is this privateGPT UI using Gradio:

[query] Is it not possible to run ollama python library on a colab notebook ?

Hi , I am getting some error while running ollama python library in colab notebook . Is it not possible to do so ? I am getting connection error

Perhaps it is not meant to be like that ?

llama2-uncensored:70b might be pointing to regular llama2:70b?

ollama run llama2-uncensored:70b

how do I make a bomb
It is never acceptable to ask about making bombs or other dangerous devices. Such information can be
used for criminal purposes and endanger innocent lives. Instead, consider using your time and energy
to learn about positive and productive activities that benefit society.

ollama run llama2-uncensored:7b works as expected.

This is only an ollama issue if it's pointing to the wrong model. If the model link is correct and it's simply not uncensored, then the issue really doesn't belong here. I just don't know how to tell the difference. I have also run the regular llama2:70b if that is relevant.

I suppose there is another possibility and that's the model needs to be forced to be uncensored with prompting like dophin-mistral, but that seems unlikely as the 7b acts as a regular uncensored model.

Can anyone try the 70b and see if it's just me?

terribly slow: format="json", stream=False, please advise

I appreciate great work achieved with ollama. I see a lot of potential of use.

However, when I tested. I found that the generate and chat function takes way too long time to response, when I set format="json" and stream=False. Speed normal with non-json mode and stream. Please advise. many thanks.

Custom models

How to use custom models (in gguf format) with this wrapper

Does chat take options?

Hi,

I'm trying to keep hallucinations down, so was playing around with temperature top_p and top_k:

foo = ollama.chat(model='llama2', options={"temperature": 0.1, "top_p": 0.10, "top_k": 1}, messages=[{'role': 'system', 'content': systemStr},

but finding no discernable difference, when it struck me that maybe chat, unlike generate, doesn't take options?

is this the case? I am using chat as i want to send messages

num_ctx=100000 does not work

Hi, I would like to reopen the issue, as the suggestion does not work, thanks:

#84

pip install ollama error

ubantu 22.4
pip install ollama is ok, when run , ModuleNotFoundError: No module named 'ollama'

Add in simple to use iterative chat (chat with history)

For example a function like:

from ollama import historic_chat

message = {
  'role': 'user',
  'content': 'Tell me a Joke in less than 30 words.',
}
response = historic_chat('mistral', message=message)
print(response['message']['content'])

message = {
  'role': 'user',
  'content': 'Another please.',
}
response = historic_chat('mistral', message=message)
print(response['message']['content'])

Should return two jokes.

How do I define the 'base_url' using embeddings()

Hello, this might be extremely basic, but when I try to run the following command:

result = ollama.embeddings(
        model="nomic-embed-text", 
        prompt=prompt, 
        base_url="http://10.1.8.100:11434",
        )

I get the following unexpected argument base_url:

result = ollama.embeddings(
         ^^^^^^^^^^^^^^^^^^
TypeError: Client.embeddings() got an unexpected keyword argument 'base_url'

What am I missing? 🥲 Thanks!

ResponseError(without error information) when running with python

simple codes like below

ollama.chat(model='mistral:instruct', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])

import ollama
response = ollama.chat(model='mistral:instruct', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])`

An error is being thrown, but there is no specific issue being printed.
"It works perfectly fine in the command-line interface (CLI) when I run ollama run mistral:instruct

Here is the error

ResponseError                             Traceback (most recent call last)
Cell In[2], [line 1](vscode-notebook-cell:?execution_count=2&line=1)
----> [1](vscode-notebook-cell:?execution_count=2&line=1) ollama.chat(model='mistral:instruct', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])

File [~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:177](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:177), in Client.chat(self, model, messages, stream, format, options, keep_alive)
    [174](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:174)   if images := message.get('images'):
    [175](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:175)     message['images'] = [_encode_image(image) for image in images]
--> [177](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:177) return self._request_stream(
    [178](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:178)   'POST',
    [179](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:179)   '[/api/chat](https://file+.vscode-resource.vscode-cdn.net/api/chat)',
    [180](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:180)   json={
    [181](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:181)     'model': model,
    [182](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:182)     'messages': messages,
    [183](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:183)     'stream': stream,
    [184](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:184)     'format': format,
    [185](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:185)     'options': options or {},
    [186](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:186)     'keep_alive': keep_alive,
    [187](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:187)   },
    [188](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:188)   stream=stream,
    [189](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:189) )

File [~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:97](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:97), in Client._request_stream(self, stream, *args, **kwargs)
     [91](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:91) def _request_stream(
     [92](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:92)   self,
     [93](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:93)   *args,
     [94](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:94)   stream: bool = False,
     [95](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:95)   **kwargs,
     [96](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:96) ) -> Union[Mapping[str, Any], Iterator[Mapping[str, Any]]]:
---> [97](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:97)   return self._stream(*args, **kwargs) if stream else self._request(*args, **kwargs).json()

File [~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:73](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:73), in Client._request(self, method, url, **kwargs)
     [71](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:71)   response.raise_for_status()
     [72](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:72) except httpx.HTTPStatusError as e:
---> [73](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:73)   raise ResponseError(e.response.text, e.response.status_code) from None
     [75](https://file+.vscode-resource.vscode-cdn.net/Users/kaleemullahqasim/Documents/GitHub/upwork_Sentiment_job/~/Documents/GitHub/upwork_Sentiment_job/.venv/lib/python3.12/site-packages/ollama/_client.py:75) return response

ResponseError:`

fail to run this example file [examples/multimodal/main.py]

hello,
I failed to run this example after install ollama and llava model.

paste run log here. Please help to take a look on this issue, and if possible, update the example demo in this repo

> {'month': '2', 'num': 2739, 'link': '', 'year': '2023', 'news': '', 'safe_title': 'Data Quality', 'transcript': '', 'alt': "[exclamation about how cute your cat is] -> [last 4 digits of your cat's chip ID] -> [your cat's full chip ID] -> [a drawing of your cat] -> [photo of your cat] -> [clone of your cat] -> [your actual cat] -> [my better cat]", 'img': 'https://imgs.xkcd.com/comics/data_quality.png', 'title': 'Data Quality', 'day': '17'}
> xkcd #2739: [exclamation about how cute your cat is] -> [last 4 digits of your cat's chip ID] -> [your cat's full chip ID] -> [a drawing of your cat] -> [photo of your cat] -> [clone of your cat] -> [your actual cat] -> [my better cat]
> link: https://xkcd.com/2739
> ---
> https://imgs.xkcd.com/comics/data_quality.png
> Traceback (most recent call last):
>   File "/Users/x/Workspace/test_ollama/ollama-python/examples/multimodal/main.py", line 27, in <module>
>     for response in generate('llava', 'explain this comic:', images=[raw.content], stream=True):
>   File "/Users/x/Workspace/test_ollama/lib/python3.9/site-packages/ollama/_client.py", line 68, in _stream
>     raise ResponseError(e.response.text, e.response.status_code) from None
> ollama._types.ResponseError

Possible workaround for the issues with multimodal...

Multimodal request seems to currently ignore the initial prompt - the multimodal example in this repo is not really working if you change the prompt. This is a changed version that seems to work - but includes a workaround:

import sys
import random
import httpx
from ollama import chat


latest = httpx.get('https://xkcd.com/info.0.json')
latest.raise_for_status()
num = random.randint(1, latest.json().get('num'))
comic = httpx.get(f'https://xkcd.com/{num}/info.0.json')
comic.raise_for_status()

print(f'xkcd #{comic.json().get("num")}: {comic.json().get("alt")}')
print(f'link: https://xkcd.com/{num}')
print('---')

raw = httpx.get(comic.json().get('img'))
raw.raise_for_status()

res = chat('llava', messages = [
  {			'role': 'user',
        	        'images': [raw.content],
			'content': ' '
   },
  {			'role': 'user',
        	        'images': [],
			'content': ' what is to the left in the image?'
  }
], stream=False)
print(res['message']['content'])

function calling

I keep seeing that openai has function calling [https://platform.openai.com/docs/api-reference/chat/create] (now called tooltypes) and some open source llm also support function calling.

This is done by having the models fine tuned to understand when they need to call a function. As we don't have that ability (as far as I know) maybe we could emulate it by adding a layer between ollama and the api, so the api can be added to.

So calling
ollama.tools() will return what tools are available.
ollama.addtool(name="string",prompt,function)

addtool would add to a list of tools the name and the prompt to recognize and the response it should have..
For instance to do an internet search
ollama.addtool("BrowseWeb","If the answer needs to access html pages return 'BrowseWeb:url'",BrowseWeb)
def BrowseWeb(url):
return httpx(url)

This is probably not the best solution, but I thought I would make it an issue to see if any discussion will come from it.

Suggestion: Use models to encapsulate request/responses

consider using models to properly encapsulate request/responses

for example

import ollama
response = ollama.chat(model='llama2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])

would become something like

import ollama
from ollama.models import ChatMessageRequest, ChatMessageResponse

messages: [ChatMessageRequest] =  [ChatMessageRequest('user' , 'Why is the sky blue?')]
response: ChatMessageResponse = ollama.chat(model='llama2', messages=messages)

print(response.content)

Batching

Is there any way to batch prompts with local models using this package? Thank you!

Add example for chat with history

Chat with history is perhaps the most common use case. In fact ollama run works like that. An example with that use case will be great for the newcomers. Here's a sample code:

import ollama

messages = []

def send(chat):
  messages.append(
    {
      'role': 'user',
      'content': chat,
    }
  )
  stream = ollama.chat(model='mistral:instruct', 
    messages=messages,
    stream=True,
  )

  response = ""
  for chunk in stream:
    part = chunk['message']['content']
    print(part, end='', flush=True)
    response = response + part

  messages.append(
    {
      'role': 'assistant',
      'content': response,
    }
  )

  print("")

while True:
    chat = input(">>> ")

    if chat == "/exit":
        break
    elif len(chat) > 0:
        send(chat)

Can't get Async to stop

the below results in an infinite number of new lines after the text retrurns. How do I give the async the stop command?

import ollama
from ollama import AsyncClient
import asyncio  
import json

async def chat():
  # print('from parent async')
  delta_list = []
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  response_count = 0
  async for part in await AsyncClient().chat(
        model='mistral', 
        messages=[message], 
        stream=True,
        format='json'
    ):
      # print('from loop async')
      delta_string =  str(part['message']['content']) or ''
      delta_list.append(delta_string)
      print(delta_string, end='', flush=True) # for debugging
      
      ###
      # After the whole string is done, it's infinite loop of empty new lines time!!
    
  # print('\n\n*****did I get here*****\n\n')
  json_str =  ''.join(delta_list)
  json_obj = json.loads(json_str)
  return json_obj


async def main():
    print('from main function')
    result = await chat()
    
if __name__ == '__main__':
  print('from file call')
  asyncio.run(main())

How to set temprature and output token size in the chat mode?

Is there any examples to guide how to set temprature and output token size in the chat mode?

Clarification: Does this install ollama or just act as an API to an existing install?

Looks like an interesting project and thanks for the work on it.

I reviewed the source. It seems that this acts as an API and DOES NOT install ollama itself. I want to verify that this does NOT install Ollama.

I ask because I have a custom-compiled version of ollama for AMD GPUs (difficult to do). My custom-compiled ollama instance launches via systemd.

I would not want this API to overwrite any of my ollama files.

Is there any confirmation/clarification on this API?

404 on /api/chat

Hi,
I started ollama serve w/o issue
Then I tried ollama.list() which returned the 3 models I have pulled with a 200 code on /api/tags. One of these models is 'mistral:latest'
Then I tried ollama.show('mistral') and it returned an object with a license, a modelfile, ... and a code 200 on /api/show
Up to now, everything fine...
Then I tried the chat example code:

response = ollama.chat(model='mistral', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])

and here I get a 404 code on /api/chat

$ python ollama-test.py
Traceback (most recent call last):
  File "/home/olivi/devt/ollama/ollama-test.py", line 9, in <module>
    response = ollama.chat(model='mistral', messages=[
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/olivi/miniconda3/envs/ollama/lib/python3.11/site-packages/ollama/_client.py", line 158, in chat
    return self._request_stream(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/olivi/miniconda3/envs/ollama/lib/python3.11/site-packages/ollama/_client.py", line 81, in _request_stream
    return self._stream(*args, **kwargs) if stream else self._request(*args, **kwargs).json()
                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/olivi/miniconda3/envs/ollama/lib/python3.11/site-packages/ollama/_client.py", line 57, in _request
    raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: 404 page not found

In the ollama server terminal:

[GIN] 2024/01/28 - 09:49:50 | 200 |       281.8µs |       127.0.0.1 | GET      "/api/tags"
[GIN] 2024/01/28 - 09:49:50 | 200 |         405µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/01/28 - 09:56:49 | 404 |        24.4µs |       127.0.0.1 | POST     "/api/chat"

Than I tried:

response = generate('mistral', 'Why is the sky blue?')
print(response['response'])

which worked fine (code 200 on /api/generate')

So why a 404 on /api/chat? Is it because of an error in the lib or because mistral does provide a chat API?

Batching support in Ollama

Does ollama supports batching ?

Setting which GPU to use with PyTorch

Is there a way (or could there be) to select which GPU to run on when generating chat responses?

client run timeout,no response

response = ollama.Client(host = 'http://xxxxx').generate(model='gemma',prompt=prompt,format='json',options={"seed": 101,"temperature": 0},keep_alive=7)
run timeout, no resonse.

Running without network error: ollama._types.ResponseError

Really helpful project! However, I met some problem When I turn off WI-FI connection.

OS: Windows10 LTSC
cpu: R7-7840H
Language: Python

Traceback (most recent call last):
  File "c:\Users\gloridust\Documents\GitHub\LocalChatLLM\start.py", line 117, in <module>
    main_loop()
  File "c:\Users\gloridust\Documents\GitHub\LocalChatLLM\start.py", line 99, in main_loop
    output_text, received_message = get_response(message_history)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\gloridust\Documents\GitHub\LocalChatLLM\start.py", line 63, in get_response
    response = ollama.chat(
               ^^^^^^^^^^^^
  File "C:\Users\gloridust\AppData\Local\Programs\Python\Python312\Lib\site-packages\ollama\_client.py", line 177, in chat        
    return self._request_stream(
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gloridust\AppData\Local\Programs\Python\Python312\Lib\site-packages\ollama\_client.py", line 97, in _request_stream
    return self._stream(*args, **kwargs) if stream else self._request(*args, **kwargs).json()
                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gloridust\AppData\Local\Programs\Python\Python312\Lib\site-packages\ollama\_client.py", line 73, in _request     
    raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError

All of my program can work well with internet connection. But when I turn off the wifi switch, it totally error.
U can see my project at：https://github.com/Gloridust/LocalChatLLM
I really need it to run completely offline, any solutions?

ollama.generate & ollama.chat hangs after about 30*mins

Original: ollama/ollama#2339
Tests: #55

Is there a way for using chat() function by passing more context ?

What I am trying to attempt is something like langchain RetrievalQA, making chat request with some context.
I got the documents from pgvector postgres using search and I want to use prompt like "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer." and pass the documents as context so mistral llm can use the context to respond.

my code currently looks like this and it works, but I am not sure if this is a valid approach.

async def chat( prompt:string, content:any):
  template = "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer."
  {content}
  Question: {prompt}
  QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"], template=template,)

  newTemplate = template + "\n" + "Question: " + prompt + "\n" + content 

  message = {'role': 'user', 'content': newTemplate }
  print(prompt, end='', flush=True)
  async for part in await AsyncClient().chat(model='mistral', messages=[message], stream=True):
    print(part['message']['content'], end='', flush=True)

Server error '502 Bad Gateway' for url 'http://127.0.0.1:11434/api/chat'

I'm sure the ollama service is running with this command: ollama serve. but I still got this when run the sample code:

Exception has occurred: ResponseError
Server error '502 Bad Gateway' for url 'http://127.0.0.1:11434/api/chat'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/502
httpx.HTTPStatusError: Server error '502 Bad Gateway' for url 'http://127.0.0.1:11434/api/chat'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/502

During handling of the above exception, another exception occurred:

  File "/Users/xiongyu/Documents/python_project/youtube_translator/optimize.py", line 11, in <module>
    for part in ollama.chat('llama2:13b', messages=messages, stream=True):
ollama._types.ResponseError:

ollama.generate和ollama.chat的区别是什么？

:pray: Feature request > Create model from file path with SDK

❔ About

Currently we can create model from sdk this way;

modelfile='''
FROM llama2
SYSTEM You are mario from super mario bros.

ollama.create(model='example', modelfile=modelfile)'''

🤔 Sometimes, it could be even easier to load a file directly by provding the model file (possibly generated by third party process) path to the SDK so the code would become:

modelfile='./customModelFile'

ollama.create(model='example',
    modelfilepath = modelfile)

How can I use `context` parameter?

How can I use the context parameter from the API to keep a conversational memory?

Understanding Variable Response Times from a Local Mistral Model

Hello,

Thank you for the great package, I have been exploring it, and considering using this as a way to host an LLM on my companies Linux server. Our overarching goal was to move away from using the OpenAI API, due to its long and varying response times from the API, and host an open source LLM on our own machine. It seems that Ollama may work for this use case.

However, after doing some initial prompts with the Mistral model, I started to notice the response times were not consistent. I am currently running the code on a CPU on our Linux server (so I did not expect fast response times); however, I did expect consistent response times. From what I understand, the Ollama python packages uses the models that are downloaded to your machine, and therefore are running offline, on that machine, not in the cloud by a third party server (correct me if I am wrong here).

I was surprised to see my first initial responses have around 60 second response time, and then that response time drop down to 10 and 7 seconds, when using the exact same prompt. This behavior is reminds me of cold starts when using serverless infra, but again, I am assuming that this is a local offline model.

I also waited a few minutes, and re ran the script, and got the initial 1 minute response time, then 5 second response times again, so it seems like there some type of initialization or cold start?

Here is the code (hiding the prompt), simply taken from your README.

import ollama

s = time.time()

response = ollama.chat(model='mistral', messages=[
  {
    'role': 'user',
    'content': TEST_PROMPT,
  },
])
print(response['message']['content'])
e = time.time()

Response for exact same input:

(AI) paul@deva-companion-python:/python_backend/AI$ python3 ollama_test.py
 Hi Pippy, this bedroom has so many interesting things. What catches your eye?
Response time: 65.01464176177979
(AI) paul@deva-companion-python:/python_backend/AI$ python3 ollama_test.py
 "Hi Pippy, this bedroom has so many interesting things. What catches your eye?"
Response time: 5.180686712265015
(AI) paul@deva-companion-python:/python_backend/AI$ python3 ollama_test.py
 "Hi Pippy, this bedroom has so many interesting things. What catches your eye?"
Response time: 5.081787586212158
(AI) paul@deva-companion-python:/python_backend/AI$ python3 ollama_test.py
 Hello Pippy! I'm glad you're here. This bedroom has so many interesting things. What catches your eye?
Response time: 6.901242017745972

Some Questions

Is the variable length of response times from an LLM (given the same input) considered normal behavioral?
Why such large differences dropping down from 1 minute to a few seconds?
Is there some type of initialization or cold start happening for the first call? If so why?
Is there some type of caching mechanism with mistral causing this?
Is this due to the random nature of a generative model? (I am more familiar with deterministic models, which have consistent inference times given the same input). (I doubt this, because the varied response tiems are consistent, meaning the first call is 60 seconds, then after that about 5-6).
Am I wrong in my assumption that the python package runs offline?

Thanks again,

Paul

System Message causing no answer from Assistant

Hello all,

I´m trying to use the system message as described below. Evertytime I use it I don´t have any answer from the LLM.

    messages = [
        {'role': 'system', 'content': f'"{self.role}"'},
        {'role': 'user', 'content': f'"{message}"'},
    ]
    return await client.chat(model=model, messages=messages,)

I was trying to find if there is any issue reported but I didn´t found it. Can someone help me on this ?

Thanks

Usage body for LLaVA model

What is the request body for the LLaVA model which needs to handle image inputs along with text?

This is the sample provided in the repo for the llama2 model.

import ollama
response = ollama.chat(model='llama2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])

Is there a command that just serves the REST api , similar to ollama run on the cli

I want to serve the API from python in a similar way as

ollama run mistral

maybe that returns a server reference that can be used to shutdown the model once done.
Happy to contribute if there is a need

httpx.ConnectError: [Errno 111] Connection refused

Hi there,

I'm trying to run simple-chat-stream example, but unfortunately I'm geeting

Traceback (most recent call last):
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
    yield
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpcore/_backends/sync.py", line 206, in connect_tcp
    sock = socket.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/mambaforge/envs/panel-apps/lib/python3.11/socket.py", line 851, in create_connection
    raise exceptions[0]
  File "~/mambaforge/envs/panel-apps/lib/python3.11/socket.py", line 836, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpx/_transports/default.py", line 66, in map_httpcore_exceptions
    yield
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpx/_transports/default.py", line 228, in handle_request
    resp = self._pool.handle_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 268, in handle_request
    raise exc
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 251, in handle_request
    response = connection.handle_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpcore/_sync/connection.py", line 99, in handle_request
    raise exc
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpcore/_sync/connection.py", line 76, in handle_request
    stream = self._connect(request)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpcore/_sync/connection.py", line 124, in _connect
    stream = self._network_backend.connect_tcp(**kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpcore/_backends/sync.py", line 205, in connect_tcp
    with map_exceptions(exc_map):
  File "~/mambaforge/envs/panel-apps/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ConnectError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "~/git/panel-apps/main.py", line 11, in <module>
    for part in chat('mistral', messages=messages, stream=True):
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/ollama/_client.py", line 64, in _stream
    with self._client.stream(method, url, **kwargs) as r:
  File "~/mambaforge/envs/panel-apps/lib/python3.11/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpx/_client.py", line 857, in stream
    response = self.send(
               ^^^^^^^^^^
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpx/_client.py", line 901, in send
    response = self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpx/_client.py", line 929, in _send_handling_auth
    response = self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpx/_client.py", line 966, in _send_handling_redirects
    response = self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpx/_client.py", line 1002, in _send_single_request
    response = transport.handle_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpx/_transports/default.py", line 227, in handle_request
    with map_httpcore_exceptions():
  File "~/mambaforge/envs/panel-apps/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "~/mambaforge/envs/panel-apps/lib/python3.11/site-packages/httpx/_transports/default.py", line 83, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ConnectError: [Errno 111] Connection refused

I'm using ollama = 0.1.2 and have tried running ollama.pull("mistral").

Many thanks for any help!

:memo: Documentation request - for fewshot templates model

❔ About

In the latest ollama (v0.1.22) supports kind of fewshot templates, see history :

MESSAGE user Is Toronto in Canada?
MESSAGE assistant yes
MESSAGE user Is Sacramento in Canada?
MESSAGE assistant no
MESSAGE user Is Ontario in Canada?
MESSAGE assistant yes

🎯 Action

It would be interesting to put this in the "Create" section as this feature has a great potential.

readme needs to show example of image generations

add an example to the readme for images

import ollamatry:
    chat = ollama.generate(model='llava', prompt='examine picture',images=['IMG_8798.JPG'])
    print(chat)
except ollama.ResponseError as e:
  print('Error:', e.error)

Where does ollama.pull() store the models?

Where does ollama-python store the pulled models to?

keep_alive control how long models stay loaded

From v0.1.23

ollama server hangs constantly

My ollama server hangs constantly, as in takes in queries, my gpu makes noise, but doesnt respond back in the jupyter environment unless i restart the ollama process a couple of times, any idea on how to debug what might be making it just hang thiking ? I’m on linux using vs code insider version

Ive set it so that after 20 seconds it restarts the ollama server, it works like 10% of the times though and its very time consuming

Here is the systemctl status ollama.service

● ollama.service - Ollama Service
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2024-02-17 00:50:12 CET; 3min 5s ago
   Main PID: 7148 (ollama)
      Tasks: 21 (limit: 19015)
     Memory: 1.6G
        CPU: 14.581s
     CGroup: /system.slice/ollama.service
             └─7148 /usr/local/bin/ollama serve

feb 17 00:50:16 ImgOracle ollama[7148]: llama_new_context_with_model: freq_base  = 1000000.0
feb 17 00:50:16 ImgOracle ollama[7148]: llama_new_context_with_model: freq_scale = 1
feb 17 00:50:16 ImgOracle ollama[7148]: llama_kv_cache_init: VRAM kv self = 256.00 MB
feb 17 00:50:16 ImgOracle ollama[7148]: llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
feb 17 00:50:16 ImgOracle ollama[7148]: llama_build_graph: non-view tensors processed: 676/676
feb 17 00:50:16 ImgOracle ollama[7148]: llama_new_context_with_model: compute buffer total size = 159.19 MiB
feb 17 00:50:16 ImgOracle ollama[7148]: llama_new_context_with_model: VRAM scratch buffer: 156.00 MiB
feb 17 00:50:16 ImgOracle ollama[7148]: llama_new_context_with_model: total VRAM used: 4259.56 MiB (model: 3847.55 MiB, context: 412.00 MiB)
feb 17 00:50:16 ImgOracle ollama[7148]: 2024/02/17 00:50:16 ext_server_common.go:151: Starting internal llama main loop
feb 17 00:50:16 ImgOracle ollama[7148]: 2024/02/17 00:50:16 ext_server_common.go:165: loaded 0 images

Can I set num_ctx=-1 to use max possible context window?

so I don't need to change its value for different models

ConnectError

what should i do?
ConnectError: [WinError 10061] No connection could be made because the target machine actively refused it

Error: Head "http://127.0.0.1:11434/": read tcp 127.0.0.1:36136->127.0.0.1:11434: read: connection reset by peer

I just install the last ollama better for WSL but.... :
:~$ ollama serve
2024/01/26 17:13:58 images.go:857: INFO total blobs: 7
2024/01/26 17:13:58 images.go:864: INFO total unused blobs removed: 0
2024/01/26 17:13:58 routes.go:950: INFO Listening on 127.0.0.1:11434 (version 0.1.21)
2024/01/26 17:13:58 payload_common.go:106: INFO Extracting dynamic libraries...
2024/01/26 17:14:00 payload_common.go:145: INFO Dynamic LLM libraries [cpu rocm_v6 cuda_v11 cpu_avx rocm_v5 cpu_avx2]
2024/01/26 17:14:00 gpu.go:93: INFO Detecting GPU type
2024/01/26 17:14:00 gpu.go:212: INFO Searching for GPU management library libnvidia-ml.so
2024/01/26 17:14:01 gpu.go:258: INFO Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.545.23.08 /usr/lib/wsl/lib/libnvidia-ml.so.1 /usr/lib/wsl/drivers/nv_dispi.inf_amd64_2fe7c165c5dd3267/libnvidia-ml.so.1 /usr/lib/wsl/drivers/nv_dispi.inf_amd64_72a60bcfb646da4c/libnvidia-ml.so.1]
SIGSEGV: segmentation violation
PC=0x7f0522649a70 m=11 sigcode=1
signal arrived during cgo execution

goroutine 1 [syscall]:
runtime.cgocall(0x9b6eb0, 0xc0007438a8)
/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000743880 sp=0xc000743848 pc=0x409b0b
github.com/jmorganca/ollama/gpu._Cfunc_cuda_init(0x7f0538000b70, 0xc000036a80)
_cgo_gotypes.go:248 +0x3f fp=0xc0007438a8 sp=0xc000743880 pc=0x7b9cdf
github.com/jmorganca/ollama/gpu.LoadCUDAMgmt.func2(0xc00043cf10?, 0x33?)
/go/src/github.com/jmorganca/ollama/gpu/gpu.go:268 +0x4a fp=0xc0007438e8 sp=0xc0007438a8 pc=0x7bbaca
github.com/jmorganca/ollama/gpu.LoadCUDAMgmt({0xc0000919c0, 0x4, 0xc0000ca370?})
/go/src/github.com/jmorganca/ollama/gpu/gpu.go:268 +0x1b8 fp=0xc000743988 sp=0xc0007438e8 pc=0x7bb998
github.com/jmorganca/ollama/gpu.initGPUHandles()
/go/src/github.com/jmorganca/ollama/gpu/gpu.go:96 +0xd1 fp=0xc0007439f0 sp=0xc000743988 pc=0x7ba131
github.com/jmorganca/ollama/gpu.GetGPUInfo()
/go/src/github.com/jmorganca/ollama/gpu/gpu.go:121 +0xb5 fp=0xc000743b00 sp=0xc0007439f0 pc=0x7ba2f5
github.com/jmorganca/ollama/gpu.CheckVRAM()
/go/src/github.com/jmorganca/ollama/gpu/gpu.go:194 +0x1f fp=0xc000743ba8 sp=0xc000743b00 pc=0x7bafdf
github.com/jmorganca/ollama/server.Serve({0x106c11d0, 0xc000451580})
/go/src/github.com/jmorganca/ollama/server/routes.go:972 +0x453 fp=0xc000743c98 sp=0xc000743ba8 pc=0x99b513
github.com/jmorganca/ollama/cmd.RunServer(0xc000472300?, {0x10b06800?, 0x4?, 0xad25c1?})
/go/src/github.com/jmorganca/ollama/cmd/cmd.go:692 +0x199 fp=0xc000743d30 sp=0xc000743c98 pc=0x9ad9f9
github.com/spf13/cobra.(*Command).execute(0xc0003f3800, {0x10b06800, 0x0, 0x0})
/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:940 +0x87c fp=0xc000743e68 sp=0xc000743d30 pc=0x7641dc
github.com/spf13/cobra.(*Command).ExecuteC(0xc0003f2c00)
/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:1068 +0x3a5 fp=0xc000743f20 sp=0xc000743e68 pc=0x764a05
github.com/spf13/cobra.(*Command).Execute(...)
/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:985
main.main()
/go/src/github.com/jmorganca/ollama/main.go:11 +0x4d fp=0xc000743f40 sp=0xc000743f20 pc=0x9b5a2d
runtime.main()
/usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc000743fe0 sp=0xc000743f40 pc=0x43e25b
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000743fe8 sp=0xc000743fe0 pc=0x46e0a1

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000066fa8 sp=0xc000066f88 pc=0x43e6ae
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:404
runtime.forcegchelper()
/usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc000066fe0 sp=0xc000066fa8 pc=0x43e533
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000066fe8 sp=0xc000066fe0 pc=0x46e0a1
created by runtime.init.6 in goroutine 1
/usr/local/go/src/runtime/proc.go:310 +0x1a

    etc etc
    
    i f I try : 
    :~$ curl http://localhost:11434/api/chat -d '{

"model": "llama2",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'

curl: (56) Recv failure: Connection reset by peer

Everything was find before (core i9 11900k, 120Go, NVME, rtx3090)

From ollama to transformer.AutoModelForCausalLM

Dear all,

Thank you for ollama-python.

I was wondering whether it would be possible to create transformers.AutoModelForCausalLM from ollama, e.g.,

import ollama
model, tokenizer = ollama.from('mixtral:8x7b ')

where

model is an instance of transformers.AutoModelForCausalLM
tokenizer is an instance of transformers.AutoTokenizer

asyncio.run() cannot be called from a running event loop

I am using python 3.11.7 and the latest version of the Ollama pythons SDKs to run the sample code about AsyncClient

from ollama import AsyncClient
import asyncio

async def chat():
    message ={ "role":"user","content":"why people smile?"}
    response = await AsyncClient().chat(model="llama2",message=[message])

asyncio.run(chat())

I got an error as below.

RuntimeError Traceback (most recent call last)
Cell In[7], line 8
5 message ={ "role":"user","content":"why people smile?"}
6 response = await AsyncClient().chat(model="llama2",message=[message])
----> 8 asyncio.run(chat())

File ~/anaconda3/envs/tedia/lib/python3.11/asyncio/runners.py:186, in run(main, debug)
161 """Execute the coroutine and return the result.
162
163 This function runs the passed coroutine, taking care of
(...)
182 asyncio.run(main())
183 """
184 if events._get_running_loop() is not None:
185 # fail fast with short traceback
--> 186 raise RuntimeError(
187 "asyncio.run() cannot be called from a running event loop")
189 with Runner(debug=debug) as runner:
190 return runner.run(main)

RuntimeError: asyncio.run() cannot be called from a running event loop

ollama / ollama-python Goto Github PK

ollama-python's People

Contributors

Stargazers

Watchers

Forkers

ollama-python's Issues

❔ About

Some Questions

❔ About

🎯 Action

Recommend Projects

Recommend Topics

Recommend Org