Code Monkey home page Code Monkey logo

ai00_server's Introduction

💯AI00 RWKV Server

license Rust Version PRs welcome

All Contributors

English | 中文


AI00 RWKV Server is an inference API server for the RWKV language model based upon the web-rwkv inference engine.

It supports VULKAN parallel and concurrent batched inference and can run on all GPUs that support VULKAN. No need for Nvidia cards!!! AMD cards and even integrated graphics can be accelerated!!!

No need for bulky pytorch, CUDA and other runtime environments, it's compact and ready to use out of the box!

Compatible with OpenAI's ChatGPT API interface.

100% open source and commercially usable, under the MIT license.

If you are looking for a fast, efficient, and easy-to-use LLM API server, then AI00 RWKV Server is your best choice. It can be used for various tasks, including chatbots, text generation, translation, and Q&A.

Join the AI00 RWKV Server community now and experience the charm of AI!

QQ Group for communication: 30920262

💥Features

  • Based on the RWKV model, it has high performance and accuracy
  • Supports VULKAN inference acceleration, you can enjoy GPU acceleration without the need for CUDA! Supports AMD cards, integrated graphics, and all GPUs that support VULKAN
  • No need for bulky pytorch, CUDA and other runtime environments, it's compact and ready to use out of the box!
  • Compatible with OpenAI's ChatGPT API interface

⭕Usages

  • Chatbots
  • Text generation
  • Translation
  • Q&A
  • Any other tasks that LLM can do

👻Other

Installation, Compilation, and Usage

📦Download Pre-built Executables

  1. Directly download the latest version from Release

  2. After downloading the model, place the model in the assets/models/ path, for example, assets/models/RWKV-x060-World-3B-v2-20240228-ctx4096.st

  3. Optionally modify assets/Config.toml for model configurations like model path, quantization layers, etc.

  4. Run in the command line

    $ ./ai00_rwkv_server
  5. Open the browser and visit the WebUI https://localhost:65530

📜(Optional) Build from Source

  1. Install Rust

  2. Clone this repository

    $ git clone https://github.com/cgisky1980/ai00_rwkv_server.git
    $ cd ai00_rwkv_server
  3. After downloading the model, place the model in the assets/models/ path, for example, assets/models/RWKV-x060-World-3B-v2-20240228-ctx4096.st

  4. Compile

    $ cargo build --release
  5. After compilation, run

    $ cargo run --release
  6. Open the browser and visit the WebUI https://localhost:65530

📒Convert the Model

It only supports Safetensors models with the .st extension now. Models saved with the .pth extension using torch need to be converted before use.

  1. Download the .pth model

  2. In the Release you could find an executable called converter. Run

$ ./converter --input /path/to/model.pth
  1. If you are building from source, run
$ cargo run --release --bin converter -- --input /path/to/model.pth
  1. Just like the steps mentioned above, place the model in the .st model in the assets/models/ path and modify the model path in assets/Config.toml

📝Supported Arguments

  • --config: Configure file path (default: assets/Config.toml)
  • --ip: The IP address the server is bound to
  • --port: Running port

📙Currently Available APIs

The API service starts at port 65530, and the data input and output format follow the Openai API specification.

  • /api/oai/v1/models
  • /api/oai/models
  • /api/oai/v1/chat/completions
  • /api/oai/chat/completions
  • /api/oai/v1/completions
  • /api/oai/completions
  • /api/oai/v1/embeddings
  • /api/oai/embeddings

The following is an example of ai00 invocation based on Python and an out of the box tool class implementation

import openai

class Ai00:
    def __init__(self,model="model",port=65530,api_key="JUSTSECRET_KEY") :
        openai.api_base = f"http://127.0.0.1:{port}/api/oai"
        openai.api_key = api_key
        self.ctx = []
        self.params = {
            "system_name": "System",
            "user_name": "User", 
            "assistant_name": "Assistant",
            "model": model,
            "max_tokens": 4096,
            "top_p": 0.6,
            "temperature": 1,
            "presence_penalty": 0.3,
            "frequency_penalty": 0.3,
            "half_life": 400,
            "stop": ['\x00','\n\n']
        }
        
    def set_params(self,**kwargs):
        self.params.update(kwargs)
        
    def clear_ctx(self):
        self.ctx = []
        
    def get_ctx(self):
        return self.ctx
    
    def continuation(self, message):
        response = openai.Completion.create(
            model=self.params['model'],
            prompt=message,
            max_tokens=self.params['max_tokens'],
            half_life=self.params['half_life'],
            top_p=self.params['top_p'],
            temperature=self.params['temperature'],
            presence_penalty=self.params['presence_penalty'],
            frequency_penalty=self.params['frequency_penalty'],
            stop=self.params['stop']
        )
        result = response.choices[0].text
        return result
    
    def append_ctx(self,role,content):
        self.ctx.append({
            "role": role,
            "content": content
        })
        
    def send_message(self, message,role="user"):
        self.ctx.append({
            "role": role,
            "content": message
        })
        result = openai.ChatCompletion.create(
            model=self.params['model'],
            messages=self.ctx,
            names={
                "system": self.params['system_name'],
                "user": self.params['user_name'],
                "assistant": self.params['assistant_name']
            },
            max_tokens=self.params['max_tokens'],
            half_life=self.params['half_life'],
            top_p=self.params['top_p'],
            temperature=self.params['temperature'],
            presence_penalty=self.params['presence_penalty'],
            frequency_penalty=self.params['frequency_penalty'],
            stop=self.params['stop']
        )
        result = result.choices[0].message['content']
        self.ctx.append({
            "role": "assistant",
            "content": result
        })
        return result
    
ai00 = Ai00()
ai00.set_params(
    max_tokens = 4096,
    top_p = 0.55,
    temperature = 2,
    presence_penalty = 0.3,
    frequency_penalty = 0.8,
    half_life = 400,
    stop = ['\x00','\n\n']
)
print(ai00.send_message("how are you?"))
print(ai00.send_message("me too!"))
print(ai00.get_ctx())
ai00.clear_ctx()
print(ai00.continuation("i like"))

📙WebUI Screenshots

Chat Feature

Continuation Feature

Paper Writing Feature

📝TODO List

  • Support for text_completions and chat_completions
  • Support for sse push
  • Add embeddings
  • Integrate basic front-end
  • Parallel inference via batch serve
  • Support for int8 quantization
  • Support for NF4 quantization
  • Support for LoRA model
  • Hot loading and switching of LoRA model

👥Join Us

We are always looking for people interested in helping us improve the project. If you are interested in any of the following, please join us!

  • 💀Writing code
  • 💬Providing feedback
  • 🔆Proposing ideas or needs
  • 🔍Testing new features
  • ✏Translating documentation
  • 📣Promoting the project
  • 🏅Anything else that would be helpful to us

No matter your skill level, we welcome you to join us. You can join us in the following ways:

  • Join our Discord channel
  • Join our QQ group
  • Submit issues or pull requests on GitHub
  • Leave feedback on our website

We can't wait to work with you to make this project better! We hope the project is helpful to you!

Thank you to these awesome individuals who are insightful and outstanding for their support and selfless dedication to the project

顾真牛
顾真牛

📖 💻 🖋 🎨 🧑‍🏫
研究社交
研究社交

💻 💡 🤔 🚧 👀 📦
josc146
josc146

🐛 💻 🤔 🔧
l15y
l15y

🔧 🔌 💻
Cahya Wirawan
Cahya Wirawan

🐛
yuunnn_w
yuunnn_w

📖 ⚠️
longzou
longzou

💻 🛡️

Stargazers over time

Stargazers over time

ai00_server's People

Contributors

allcontributors[bot] avatar cahya-wirawan avatar cgisky1980 avatar cryscan avatar cuijinsen avatar dependabot[bot] avatar jekaxv avatar josstorer avatar l15y avatar nranphy avatar seikaijyu avatar yuunnn-w avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ai00_server's Issues

如何添加本地知识库?

具体需求:主要用于查询文档和书籍名称。ai需要通过一段文字查找出本地知识库中内容归属(比方说A、B、C、D大类等),如果遇到特定分类(每个大类都有一个特定分类,这个特定分类总共包含a、b、c三个选项)就再选出特定分类,最后输出。即属于特定分类的只输出特定分类名称( a | b |c ),不属特定分类的输出大类名称。

现在想利用ai00达到目的,但不知道如何添加本地知识库。

小小的要求,能有主题更换功能吗?

我中文不好,尽量说明白,见谅。
只是简单的黑,白色主题,太单调,比如我最爱的solarized light,而且续写框体没有主题联动,默认白色。现在可以有相关选项吗?

OpenAI api seems to not be working

I'm trying to run this script, but I keep getting nonsense output from the openai api

import requests
import json
from typing import Optional

# Function to call the OpenAI API with optional logit_bias
def test_logit_bias(api_key: str, prompt: str, max_tokens: int, logit_bias: Optional[dict] = None):
    # Endpoint URL for OpenAI API
    openai_api_base = "http://0.0.0.0:65530/api/oai"
    api_url = f"{openai_api_base}/v1/completions"
    
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {api_key}'
    }
    
    # Payload with and without logit_bias
    payload = {
        "prompt": prompt,
        "max_tokens": 3, 
        "model": "rwkv",
        "temperature": 0
    }
    
    if logit_bias:
        payload["logit_bias"] = logit_bias
    
    response = requests.post(api_url, headers=headers, json=payload)
    
    if response.status_code == 200:
        print(json.dumps(response.json(), indent=2))
    else:
        print(f"Error {response.status_code}: {response.text}")

# Usage
API_KEY = "your-api-key"  # Replace with your actual OpenAI API key

# Test with logit_bias
test_logit_bias(
    api_key=API_KEY,
    prompt="A profression that involves going to space is an ",
    max_tokens=100,
    logit_bias={15311: -100}  # Replace with actual token values
)

# Test without logit_bias
print("Without")
test_logit_bias(
    api_key=API_KEY,
    prompt="A profression that involves going to space is an ",
    max_tokens=100
)

This is the response I get.

{
  "object": "text_completion",
  "model": "assets/models/rwkv-4.safetensors",
  "choices": [
    {
      "text": "\ud83d\ude80",
      "index": 0,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 11,
    "completion_tokens": 3,
    "total_tokens": 14
  }
}
Without
{
  "object": "text_completion",
  "model": "assets/models/rwkv-4.safetensors",
  "choices": [
    {
      "text": "\ud83d\ude80",
      "index": 0,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 11,
    "completion_tokens": 3,
    "total_tokens": 14
  }
}

The weird thing is, i'm able to use the chat page in the UI just fine.

api使用问题

ip使用127.0.0.1时,可在本机浏览器打开,也可以chat。chatgptnextweb 也可以成功调用api进行chat

ip设置为本机真实ip后,本机和其他机器在浏览器上都可以使用,然而本机和其他机器chatgptnextweb均无法正确访问。

使用的是最新版本,windows平台。
已用F12检查过网络,使用http://ip:65530/api/oai作为api。

麻烦作者排查解答一下是什么问题

使用量化好的模型

缓存量化结果,使用量化好的模型而不是每次加载都花时间量化,感觉会更好。

是否存在基于CUDA的RWKV服务端?

由于我使用一些专业卡(A100,V100,P100,T4),并不支持vulkan。难以使用本项目,是否有支持CUDA的RWKV服务器实现,或者本项目有计划支持?

The server crashed

Hi,
Since the precompiled version doesn't work as mentioned in #15, I compiled the source code and run it, but it crashed after I submit any text in web user interface. Here is the messages in the console:

$ cargo run --release -- --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st 
    Finished release [optimized] target(s) in 1.53s
     Running `target/release/ai00_server --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st`
2023-08-01T01:24:06.717Z WARN  [wgpu_core::instance] Missing downlevel flags: DownlevelFlags(SURFACE_VIEW_FORMATS)
The underlying API or device in use does not support enough features to be a fully compliant implementation of WebGPU. A subset of the features can still be used. If you are running this program on native and not in a browser and wish to limit the features you use to the supported subset, call Adapter::downlevel_properties or Device::downlevel_properties to get a listing of the features the current platform supports.
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
2023-08-01T01:24:06.768Z INFO  [ai00_server] AdapterInfo {
    name: "llvmpipe (LLVM 12.0.0, 256 bits)",
    vendor: 65541,
    device: 0,
    device_type: Cpu,
    driver: "llvmpipe",
    driver_info: "Mesa 21.2.6 (LLVM 12.0.0)",
    backend: Vulkan,
}
2023-08-01T01:24:08.293Z INFO  [ai00_server] ModelInfo {
    num_layers: 24,
    num_emb: 1024,
    num_vocab: 65536,
}
2023-08-01T01:24:08.554Z INFO  [ai00_server] server started at http://0.0.0.0:65530
2023-08-01T01:24:21.065Z TRACE [ai00_server] Sampler {
    top_p: 0.5,
    temperature: 1.0,
    presence_penalty: 0.3,
    frequency_penalty: 0.3,
}
2023-08-01T01:24:21.065Z TRACE [ai00_server] state cache miss
2023-08-01T01:24:21.065Z TRACE [ai00_server] User: 现在的时间是2023 8月 1日 星期二 早上

Assistant: 好的我知道了!

User: 你是谁?

Assistant: Hello, I am your AI assistant. If you have any questions or instructions, please let me know!

User: test

Assistant:
Segmentation fault

[bug] Penalty decay argument seems to be compulsory?

Tested with

curl 'http://127.0.0.1:65530/api/oai/completions' \
  -H 'Content-Type: application/json' \
  --data-raw '{"prompt":["# Hello Editor"],"max_tokens":1000,"temperature":1,"top_p":0.5,"presence_penalty":0.3,"frequency_penalty":0.3,"penalty_decay":0.9982686325973925,"stop":["\n\n","\nQ:","\nUser:","\nQuestion:","\n\nQ:","\n\nUser:","\n\nQuestion:","Q:","User:","Question:"],"stream":true}' \
  --insecure

VS

curl 'http://127.0.0.1:65530/api/oai/completions' \
  -H 'Content-Type: application/json' \
  --data-raw '{"prompt":["# Hello Editor"],"max_tokens":1000,"temperature":1,"top_p":0.5,"presence_penalty":0.3,"frequency_penalty":0.3,"stop":["\n\n","\nQ:","\nUser:","\nQuestion:","\n\nQ:","\n\nUser:","\n\nQuestion:","Q:","User:","Question:"],"stream":true}' \
  --insecure

Which will give the following error
Failed to deserialize the JSON body into the target type: data did not match any variant of untagged enum SamplerParams at line 1 column 245

Mobile platform support?

This project seems to use Vulkan to provide hardware acceleration, which means it can be easily ported to mobile platforms. Do you currently support mobile platforms, or do you have plans to support them in the future?

模型的回答是空输出,偶尔能输出一次正常的;后面我将语言切换到中文后,最初对话顺序错乱后面就都正常;

1.启动命令和操作系统信息:
./ai00_server --model ~/GPT/RWKV-model/RWKV-4-World-CHNtuned-7B-v1-20230709-ctx4096.st --quant 32
系统:Ubuntu22
显卡:AMD RX6700XT 12GB

2.以下是终端记录:
`terryjay@TJUbuntu:/GPT/ai00$ ./ai00_server --model ./RWKV-4-World-CHNtuned-7B-v1-20230709-ctx4096.st --quant 32
✔ Please select an adapter · AMD Radeon RX 6700 XT (RADV NAVI22) (Vulkan)
2023-09-16T03:12:15.772Z INFO [ai00_server] server started at http://0.0.0.0:65530
2023-09-16T03:14:59.251Z INFO [ai00_server] AdapterInfo {
name: "AMD Radeon RX 6700 XT (RADV NAVI22)",
vendor: 4098,
device: 29663,
device_type: DiscreteGpu,
driver: "radv",
driver_info: "Mesa 22.2.5-0ubuntu0.1
22.04.3",
backend: Vulkan,
}
2023-09-16T03:14:59.251Z INFO [ai00_server] ModelInfo {
num_layers: 32,
num_emb: 4096,
num_vocab: 65536,
}
2023-09-16T03:14:59.251Z INFO [ai00_server] chosen head chunk size: large (8192)
2023-09-16T03:17:08.866Z TRACE [ai00_server] Sampler {
top_p: 0.5,
temperature: 1.0,
presence_penalty: 0.3,
frequency_penalty: 0.3,
}
2023-09-16T03:17:08.866Z TRACE [ai00_server] state cache miss
2023-09-16T03:17:08.866Z TRACE [ai00_server] User: 现在的时间是2023 9月 16日 星期六 上午

Assistant: 好的我知道了!

User: 你是谁?

Assistant: Hello, I am your AI assistant. If you have any questions or instructions, please let me know!

User: 你好

Assistant:

User: hello

Assistant:

User: 你好

Assistant:

[DONE]
2023-09-16T03:17:09.823Z TRACE [ai00_server] state cache evicted: 0
2023-09-16T03:17:29.249Z TRACE [ai00_server] Sampler {
top_p: 0.5,
temperature: 1.0,
presence_penalty: 0.3,
frequency_penalty: 0.3,
}
2023-09-16T03:17:29.249Z TRACE [ai00_server] state cache hit: 1
2023-09-16T03:17:29.249Z TRACE [ai00_server] User: hello!

Assistant:
User:[DONE]
2023-09-16T03:17:29.405Z TRACE [ai00_server] state cache evicted: 0
2023-09-16T03:17:33.687Z TRACE [ai00_server] Sampler {
top_p: 0.5,
temperature: 1.0,
presence_penalty: 0.3,
frequency_penalty: 0.3,
}
2023-09-16T03:17:33.688Z TRACE [ai00_server] state cache hit: 1
2023-09-16T03:17:33.688Z TRACE [ai00_server]

User: 111

Assistant:
User

[DONE]
2023-09-16T03:17:33.853Z TRACE [ai00_server] state cache evicted: 0
2023-09-16T03:17:37.943Z TRACE [ai00_server] Sampler {
top_p: 0.5,
temperature: 1.0,
presence_penalty: 0.3,
frequency_penalty: 0.3,
}
2023-09-16T03:17:37.943Z TRACE [ai00_server] state cache hit: 1
2023-09-16T03:17:37.943Z TRACE [ai00_server] User: ???

Assistant:
I'm sorry, but I'm not sure what you mean by "111". Can you please provide more context or information?

[DONE]
2023-09-16T03:17:39.044Z TRACE [ai00_server] state cache evicted: 0
2023-09-16T03:17:52.410Z TRACE [ai00_server] Sampler {
top_p: 0.5,
temperature: 1.0,
presence_penalty: 0.3,
frequency_penalty: 0.3,
}
2023-09-16T03:17:52.410Z TRACE [ai00_server] state cache hit: 1
2023-09-16T03:17:52.410Z TRACE [ai00_server] User: 今天是什么日子

Assistant:
User:[DONE]
2023-09-16T03:17:52.618Z TRACE [ai00_server] state cache evicted: 0
2023-09-16T03:27:47.835Z TRACE [ai00_server] Sampler {
top_p: 0.5,
temperature: 1.0,
presence_penalty: 0.3,
frequency_penalty: 0.3,
}
2023-09-16T03:27:47.835Z TRACE [ai00_server] state cache hit: 1
2023-09-16T03:27:47.836Z TRACE [ai00_server]

User: 你叫什么

Assistant:
User

[DONE]
2023-09-16T03:27:48.017Z TRACE [ai00_server] state cache evicted: 0
2023-09-16T03:27:54.082Z TRACE [ai00_server] Sampler {
top_p: 0.5,
temperature: 1.0,
presence_penalty: 0.3,
frequency_penalty: 0.3,
}
2023-09-16T03:27:54.082Z TRACE [ai00_server] state cache hit: 1
2023-09-16T03:27:54.082Z TRACE [ai00_server] User: ?

Assistant:
我是一个AI助手,你可以称呼我为“你”或者“小助手”。

[DONE]
2023-09-16T03:27:55.263Z TRACE [ai00_server] state cache evicted: 0
2023-09-16T03:27:59.982Z TRACE [ai00_server] Sampler {
top_p: 0.5,
temperature: 1.0,
presence_penalty: 0.3,
frequency_penalty: 0.3,
}
2023-09-16T03:27:59.982Z TRACE [ai00_server] state cache hit: 1
2023-09-16T03:27:59.982Z TRACE [ai00_server] User: 111

Assistant:
好的,有什么问题或任务需要我帮忙吗?

[DONE]
2023-09-16T03:28:00.806Z TRACE [ai00_server] state cache evicted: 0
2023-09-16T03:28:06.045Z TRACE [ai00_server] Sampler {
top_p: 0.5,
temperature: 1.0,
presence_penalty: 0.3,
frequency_penalty: 0.3,
}
2023-09-16T03:28:06.045Z TRACE [ai00_server] state cache hit: 1
2023-09-16T03:28:06.045Z TRACE [ai00_server] User: OK

Assistant:
如果您有任何问题或需要帮助,请随时告诉我。

[DONE]
2023-09-16T03:28:07.027Z TRACE [ai00_server] state cache evicted: 0
2023-09-16T03:28:13.035Z TRACE [ai00_server] Sampler {
top_p: 0.5,
temperature: 1.0,
presence_penalty: 0.3,
frequency_penalty: 0.3,
}
2023-09-16T03:28:13.036Z TRACE [ai00_server] state cache hit: 1
2023-09-16T03:28:13.036Z TRACE [ai00_server] User: 小助手

Assistant:
是的,有什么我可以帮您的吗?User:[DONE]
2023-09-16T03:28:13.867Z TRACE [ai00_server] state cache evicted: 0
`

3.以下是网页chat截图:
图片

图片

ambiguous finish reason

d8c084f58c59998849f222a012b1f638
The finish reason returned a string "null", which should be a regular null, because many frontends use if (finish_reason) to determine the end, and in this case "null" will be considered as ended. OpenAI's API also uses regular null.

error: slice 59522..59523 out of range for dimension size 50277 for Raven 7B

I'm getting this error on the RWK4-Raven-7B-v12 models.
I dont see these errors on the world-4-ARAfintuned 7B

info
binary: 2.0.0
vulkan driver
AMD RX580

 ./ai00_server --port 8081  --quant 32  --model assets/models/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.st --tokenizer assets/rwkv_vocab_v20230424.json 
MESA-INTEL: warning: Haswell Vulkan support is incomplete
✔ Please select an adapter · Radeon RX 580 Series (Vulkan)
2023-09-13T15:31:43.806Z INFO  [ai00_server] server started at http://0.0.0.0:8081
2023-09-13T15:33:14.036Z INFO  [ai00_server] AdapterInfo {
    name: "Radeon RX 580 Series",
    vendor: 4098,
    device: 26591,
    device_type: DiscreteGpu,
    driver: "AMD open-source driver",
    driver_info: "2023.Q1.2 (LLPC)",
    backend: Vulkan,
}
2023-09-13T15:33:14.036Z INFO  [ai00_server] ModelInfo {
    num_layers: 32,
    num_emb: 4096,
    num_vocab: 50277,
}
2023-09-13T15:33:14.036Z INFO  [ai00_server] chosen head chunk size: large (8192)
2023-09-13T15:34:28.531Z TRACE [ai00_server] Sampler {
    top_p: 0.5,
    temperature: 1.0,
    presence_penalty: 0.3,
    frequency_penalty: 0.3,
}
2023-09-13T15:34:28.532Z TRACE [ai00_server] state cache miss
2023-09-13T15:34:28.532Z TRACE [ai00_server] User: 现在的时间是2023 9月 13日 星期三 晚上
Assistant: 好的我知道了!
User: 你是谁?

User: hello

Assistant:
2023-09-13T15:34:29.924Z ERROR [ai00_server] slice 59522..59523 out of range for dimension size 50277

如何调用api

能否详细在说明文档中简单列举一下输入输出格式或者头部信息。因为和openaiapi相比,感觉差异还是比较大的。谢谢

The generated text starts to be gibberish after around 1000 tokens

Hi,

I fine-tuned 1.5B and 7B world model with indonesian dataset. Then I used it with this repo. It generated text is very good, but after around 1000 tokens it starts to concat the words without the spaces and then generate gibberish text.

I don't have this problem if I use the cryscan repo directly (using the chat.rs and gen.rs script). Both scripts generate nice texts even after 10K tokens. I tested my model also with Blinkd's script https://github.com/BlinkDL/ChatRWKV/blob/main/API_DEMO_WORLD.py and it generates also very good text even after 10K tokens. So, I am wondering what could be the difference between this repo and cryscan repo since this repos based on cryscan repo.

Could it be an overflow issue after 1000 tokens?

Thanks.

加快超长文本的state生成?

在部分微调中,有效上下文长度已经超过65k甚至到128k。
对于超长上下文的state生成,花费了很久的时间。
而真正生成输出所耗时间极短。
能否加快超长文本的state生成?

`GLIBC_2.33' not found

Hi,
I tried the compiled version and run ./ai00_server --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st
but I get following error message:

$ ./ai00_server --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st
./ai00_server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by ./ai00_server)
./ai00_server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./ai00_server)
./ai00_server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./ai00_server)

invalid Zip archive: Could not find central directory end',

I tried to run the server on my Mac Mini M2, but it failed with the error message:

cahya@Cahyas-Mac-mini ai00_rwkv_server-main % cargo run --release -- --model ./wkv-1b5.st 
    Finished release [optimized] target(s) in 0.11s
     Running `target/release/ai00_server --model ./rwkv-1b5.st`
✔ Please select an adapter · Apple M2 Pro (Metal)
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: invalid Zip archive: Could not find central directory end', src/main.rs:483:44
stack backtrace:
   0:        0x10448e4b4 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h523fafbfdb8f0857
   1:        0x1043ab0a0 - core::fmt::write::hf94a55b5a3abd106
   2:        0x10446def8 - std::io::Write::write_fmt::hc7c6bf1da111b052
   3:        0x104491e58 - std::sys_common::backtrace::print::h68ede8fb1e716cba
   4:        0x104491a90 - std::panicking::default_hook::{{closure}}::hba2205c2705c60bb
   5:        0x10449297c - std::panicking::rust_panic_with_hook::h8654c51ef9980a29
   6:        0x1044924f4 - std::panicking::begin_panic_handler::{{closure}}::hd188a636b3b90298
   7:        0x104492468 - std::sys_common::backtrace::__rust_end_short_backtrace::hc331d455ac21f427
   8:        0x10449245c - _rust_begin_unwind
   9:        0x10467d12c - core::panicking::panic_fmt::h4f2054f72ff905b1
  10:        0x10467d448 - core::result::unwrap_failed::ha6ab1074250e7550
  11:        0x104320940 - ai00_server::main::{{closure}}::h7ed615bedac00e08.1388
  12:        0x1042fcb54 - ai00_server::main::h35029858114b2f14
  13:        0x104326b20 - std::sys_common::backtrace::__rust_begin_short_backtrace::h09a5428b4246a891
  14:        0x1042fb214 - _main

The model ./wkv-1b5.st is fine, because I tested it with web-rwkv on this Mac mini and it runs properly.
Thanks.

server start problems

I try to run the server on an debian 12 laptop with vulkan and a integrated GPU.

using the latest release, 0.3.11, I get the following error:

2024-01-06T19:11:43.640Z ERROR [ai00_server] receiving on a closed channel
2024-01-06T19:11:43.640Z ERROR [ai00_server] receiving on a closed channel
2024-01-06T19:11:43.640Z ERROR [ai00_server] receiving on a closed channel
2024-01-06T19:11:43.640Z ERROR [ai00_server] receiving on a closed channel
2024-01-06T19:11:43.640Z ERROR [ai00_server] receiving on a closed channel
2024-01-06T19:11:43.640Z ERROR [ai00_server] receiving on a closed channel
2024-01-06T19:11:43.640Z ERROR [ai00_server] receiving on a closed channel
2024-01-06T19:11:43.640Z ERROR [ai00_server] receiving on a closed channel
2024-01-06T19:11:43.640Z ERROR [ai00_server] receiving on a closed channel
2024-01-06T19:11:43.640Z ^C

this error was introduced with 0.3.5.

going back to release 0.3.4, I get a diff. error:
even so, Configs.toml is there.

RUST_BACKTRACE=full ./ai00_server
2024-01-06T19:11:46.871Z INFO  [ai00_server] reading config assets/configs/Config.toml...
thread 'main' panicked at src/main.rs:585:49:
load frontend failed: No such file or directory (os error 2)
stack backtrace:
   0:     0x560fcffd84ff - std::backtrace_rs::backtrace::libunwind::trace::ha69d38c49f1bf263
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x560fcffd84ff - std::backtrace_rs::backtrace::trace_unsynchronized::h93125d0b85fd543c
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x560fcffd84ff - std::sys_common::backtrace::_print_fmt::h8d65f438e8343444
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x560fcffd84ff - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h41751d2af6c8033a
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x560fcfdded6c - core::fmt::rt::Argument::fmt::h5db2f552d8a28f63
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/core/src/fmt/rt.rs:138:9
   5:     0x560fcfdded6c - core::fmt::write::h99465148a27e4883
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/core/src/fmt/mod.rs:1114:21
   6:     0x560fcffa87ad - std::io::Write::write_fmt::hee8dfd57bd179ab2
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/io/mod.rs:1763:15
   7:     0x560fcffd9a7e - std::sys_common::backtrace::_print::h019a3cee3e814da4
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x560fcffd9a7e - std::sys_common::backtrace::print::h55694121c2ddf918
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x560fcffd9663 - std::panicking::default_hook::{{closure}}::h29cbe3da3891b0b0
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:272:22
  10:     0x560fcffda629 - std::panicking::default_hook::h881e76b2b8c74280
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:292:9
  11:     0x560fcffda629 - std::panicking::rust_panic_with_hook::hcc36e25b6e33969c
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:731:13
  12:     0x560fcffda15c - std::panicking::begin_panic_handler::{{closure}}::ha415efb0f69f41f9
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:609:13
  13:     0x560fcffda0b6 - std::sys_common::backtrace::__rust_end_short_backtrace::h395fe90f99451e4e
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:170:18
  14:     0x560fcffda0a1 - rust_begin_unwind
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:597:5
  15:     0x560fcfb86ed4 - core::panicking::panic_fmt::h452a83e54ecd764e
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/core/src/panicking.rs:72:14
  16:     0x560fcfb87442 - core::result::unwrap_failed::hed0fccbe07e724fc
                               at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/core/src/result.rs:1652:5
  17:     0x560fcfc375e5 - tokio::runtime::context::blocking::BlockingRegionGuard::block_on::hd0f8bbbac1569650
  18:     0x560fcfcac48f - ai00_server::main::had5bd28e0605f84b
  19:     0x560fcfd729f3 - std::sys_common::backtrace::__rust_begin_short_backtrace::h0e7061c307683ba8
  20:     0x560fcfcabf55 - main
  21:     0x7f772f96d1ca - <unknown>
  22:     0x7f772f96d285 - __libc_start_main
  23:     0x560fcfbbb7b5 - _start
  24:                0x0 - <unknown>

wgpu库进行计算着色器(Compute Shader)编译时出现了错误

2023-08-18T04:24:01.957Z WARN [wgpu::backend::direct] Shader translation error for stage ShaderStages(COMPUTE): HLSL: Unimplemented("write_expr_math Unpack4x8unorm")
2023-08-18T04:24:01.958Z WARN [wgpu::backend::direct] Please report it to https://github.com/gfx-rs/naga
2023-08-18T04:24:01.958Z ERROR [wgpu::backend::direct] Handling wgpu errors as fatal by default
thread 'main' panicked at 'wgpu error: Validation Error

Caused by:
In Device::create_compute_pipeline
note: label = matmul
Internal error: HLSL: Unimplemented("write_expr_math Unpack4x8unorm")

', C:\Users\runneradmin.cargo\registry\src\index.crates.io-6f17d22bba15001f\wgpu-0.16.3\src\backend\direct.rs:3019:5
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

以下为openai给出的解释:
根据您提供的错误消息,看起来是在使用wgpu库进行计算着色器(Compute Shader)编译时出现了错误。

错误消息中指出了一个HLSL(High-Level Shading Language)编译错误,具体是关于未实现的功能:"Unimplemented("write_expr_math Unpack4x8unorm")"。这可能表示您的计算着色器中使用了HLSL中尚未实现的操作或函数。

解决此问题的步骤如下:

  1. 报告问题:根据错误消息中的提示,您可以将此问题报告给wgpu库的开发者,以便他们了解到该功能尚未实现,并可能提供修复或解决方案。您可以访问https://github.com/gfx-rs/naga并提交一个新的issue。

  2. 检查计算着色器代码:检查您的计算着色器代码,特别关注使用了"Unpack4x8unorm"操作或函数的地方。如果可能,尝试使用其他可用的操作或函数替代。

  3. 版本更新:确保您正在使用wgpu库的最新版本。可能已经有人报告了这个问题,并且在更新的版本中可能已经得到修复。

  4. 回溯信息:根据错误消息中的提示,您可以设置环境变量RUST_BACKTRACE为1,以显示完整的回溯信息。这可能会提供更多关于错误发生位置的信息,帮助您进行故障排除。

我已经是最新驱动了,请问该怎么解决😔

Nooby question

where can I download the RWKV models with the .st extension? On HF there are only .pth files, or can your server read those files too? THX in advance

Need help to run server in Kaggle

我Colab上运行成功,尝试迁移到Kaggle失败。错误为:

2023-12-24T14:33:30.654Z INFO  [ai00_server] reading config assets/configs/Config.toml...
2023-12-24T14:33:30.657Z ERROR [ai00_server] reload model failed: No such device (os error 19)
2023-12-24T14:33:30.756Z INFO  [ai00_server] server started at 0.0.0.0:65530

大致流程如下:

%cd /kaggle
!git clone https://github.com/cgisky1980/ai00_rwkv_server.git
%cd ai00_rwkv_server

# 需要自行编译,kaggle使用ubuntu 20.04,无法直接binary
!curl https://sh.rustup.rs -sSf | sh -s -- -y
!$HOME/.cargo/bin/cargo build --release
!cp /kaggle/ai00_rwkv_server/ai00_rwkv_server/ai00_rwkv_server/target/release/ai00_server /kaggle/ai00_rwkv_server/ai00_server

# 准备依赖
!apt -y install libnvidia-gl-535
!apt -y install -qq aria2

# 下载模型
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/tastypear/RWKV-v5-12B-one-state-chat-16k-safetensors/resolve/main/RWKV-5-12B-one-state-chat-16k.st -d /kaggle/ai00_rwkv_server/assets/models -o RWKV-5-12B-one-state-chat-16k.st
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/tastypear/RWKV-v5-12B-one-state-chat-16k-safetensors/resolve/main/rwkv_vocab_v20230424.txt -d /kaggle/ai00_rwkv_server/assets/tokenizer -o rwkv_vocab_v20230424.txt

# 修改配置
%cd /kaggle/ai00_rwkv_server
!sed -i 's/RWKV-4-World-0.4B-v1-20230529-ctx4096.st/RWKV-5-12B-one-state-chat-16k.st/g' assets/configs/Config.toml
!sed -i 's/rwkv_vocab_v20230424.json/rwkv_vocab_v20230424.txt/g' assets/configs/Config.toml

# 启动服务
!$HOME/.cargo/bin/cargo run --release

还有一个问题是,能否像llama.cpp那样平均分配到多块GPU上?

0.3.24 outputs gibberish

0.3.23 was fine
For example for the prompt "A girl was" the output now is:

asked to ask me, a \item[ Norris, is still quite a bit more than a “ Armeni on the half-Lewy were not going to be like, and said it all the days in the world-Europae4Ugqpg0oIuPzDK, to say nothing like the coming on my fellow- Elias Desc! I'll still can’t have some said, then a 700, and my first serious recon "../Ini ADD.
Akt

Server settings:
tls=false
model RWKV-4-World-0.4B-v1-20230529-ctx4096.st
tokenizer rwkv_vocab_v20230424.json

Request settings:

payload={
    "prompt": 'A girl was',
    "max_tokens": 100,
    "model": "rwkv",
    "temperature": 1,
    "top_p": 0.5,
    "presence_penalty": 0.3,
    "frequency_penalty": 0.3,
    "sampler": "Nucleus",
    "stream": False
}

v0.4 前后端需求汇总

后端需求

  • BNF实现
  • QuIP# 量化实现
  • LoRAMoE 实现
  • 静态量化文件

前端需求

  • 添加 stop 功能
  • 添加settings存储、读取
  • 添加量化数据库和向量化的DEMO

1.76环境ubuntu编译报错

error[E0382]: use of moved value: app--> src/salvo_main.rs:226:56 | 138 | let app = app | --- move occurs becauseapphas typeRouter, which does not implement the Copy` trait
...
145 | let service = Service::new(app)
| --- value moved here
...
226 | salvo::server::Server::new(acceptor).serve(app).await;
| ^^^ value used here after move

For more information about this error, try rustc --explain E0382.
error: could not compile ai00_server (bin "ai00_server") due to 1 previous error`

Error: invalid Zip archive: Could not find central directory end

Hi there,
Thanks for this rust port, pretty neat.
I'm bumping into this error "Error: invalid Zip archive: Could not find central directory end", tried downloading the zip file again, but nothing.
Any idea where I can start looking for a workaround?
Thx!

前端界面文字标准化的一些意见和建议

H311ORMG1LS1Z(TMGK3ZDNQ

  1. 建议使用标准现代汉语“愉快地聊天吧!”
  2. 此处可使用/models接口获取真实模型名称
  3. 此处也可标准化成"Max Tokens"、"Top P"、“Presence Penalty”以及"Frequency Penalty",体现专业性

Issue on ARM macos

Currently doing a rather clean build from a new macos machine, and i suspect there might be missing steps

Below is the logs

(base) picocreator@MacBook-Pro ai00_rwkv_server % cargo run --release
    Finished release [optimized] target(s) in 0.13s
     Running `target/release/ai00_server`
2023-11-19T01:51:46.831Z INFO  [ai00_server] reading config assets/configs/Config.toml...
2023-11-19T01:51:46.834Z INFO  [ai00_server] server started at http://0.0.0.0:65530
2023-11-19T01:51:46.835Z INFO  [ai00_server] ReloadRequest {
    model_path: "assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st",
    lora: [],
    quant: 0,
    turbo: true,
    token_chunk_size: 32,
    head_chunk_size: 8192,
    max_runtime_batch: 8,
    max_batch: 16,
    embed_layer: 2,
    tokenizer_path: "assets/tokenizer/rwkv_vocab_v20230424.json",
    adapter: Auto,
}
2023-11-19T01:51:47.680Z WARN  [wgpu_hal::metal::device] Naga generated shader:
// language: metal2.4
#include <metal_stdlib>
#include <simd/simd.h>

using metal::uint;
struct DefaultConstructible {
    template<typename T>
    operator T() && {
        return T {};
    }
};

struct _mslBufferSizes {
    uint size2;
};

typedef metal::float4 type_2[1];
constant uint BLOCK_SIZE = 128u;

struct halfInput {
};
kernel void half(
  metal::uint3 invocation_id_1 [[thread_position_in_grid]]
, constant metal::uint4& shape [[buffer(0)]]
, device type_2& output [[buffer(1)]]
, constant _mslBufferSizes& _buffer_sizes [[buffer(2)]]
) {
    uint _e4 = shape.x;
    uint stride = _e4 / 4u;
    uint index = invocation_id_1.x;
    uint token = invocation_id_1.y;
    uint batch = invocation_id_1.z;
    if (index < stride) {
        uint _e14 = shape.y;
        uint bti = (((batch * _e14) + token) * stride) + index;
        metal::float4 _e24 = uint(bti) < 1 + (_buffer_sizes.size2 - 0 - 16) / 16 ? output[bti] : DefaultConstructible();
        if (uint(bti) < 1 + (_buffer_sizes.size2 - 0 - 16) / 16) {
            output[bti] = 0.5 * _e24;
        }
        return;
    } else {
        return;
    }
}

2023-11-19T01:51:47.680Z WARN  [wgpu::backend::direct] Shader translation error for stage ShaderStages(COMPUTE): Metal: program_source:22:13: error: cannot combine with previous 'void' declaration specifier
kernel void half(
            ^
program_source:23:16: error: expected ')'
  metal::uint3 invocation_id_1 [[thread_position_in_grid]]
               ^
program_source:22:17: note: to match this '('
kernel void half(
                ^
program_source:22:1: error: 'kernel' attribute only applies to functions
kernel void half(
^
program_source:23:10: error: program scope variable must reside in constant address space
  metal::uint3 invocation_id_1 [[thread_position_in_grid]]
         ^
program_source:28:5: error: unexpected type name 'uint': expected expression
    uint _e4 = shape.x;
    ^
program_source:28:10: error: expected '}'
    uint _e4 = shape.x;
         ^
program_source:27:3: note: to match this '{'
) {
  ^
program_source:29:10: error: program scope variable must reside in constant address space
    uint stride = _e4 / 4u;
         ^
program_source:29:19: error: use of undeclared identifier '_e4'
    uint stride = _e4 / 4u;
                  ^
program_source:30:10: error: program scope variable must reside in constant address space
    uint index = invocation_id_1.x;
         ^
program_source:30:18: error: use of undeclared identifier 'invocation_id_1'
    uint index = invocation_id_1.x;
                 ^
program_source:31:10: error: program scope variable must reside in constant address space
    uint token = invocation_id_1.y;
         ^
program_source:31:18: error: use of undeclared identifier 'invocation_id_1'
    uint token = invocation_id_1.y;
                 ^
program_source:32:10: error: program scope variable must reside in constant address space
    uint batch = invocation_id_1.z;
         ^
program_source:32:18: error: use of undeclared identifier 'invocation_id_1'
    uint batch = invocation_id_1.z;
                 ^
program_source:33:5: error: expected unqualified-id
    if (index < stride) {
    ^
program_source:41:7: error: expected unqualified-id
    } else {
      ^
program_source:44:1: error: extraneous closing brace ('}')
}
^

2023-11-19T01:51:47.680Z WARN  [wgpu::backend::direct] Please report it to https://github.com/gfx-rs/naga
2023-11-19T01:51:47.680Z ERROR [wgpu::backend::direct] Handling wgpu errors as fatal by default
thread '<unnamed>' panicked at /Users/picocreator/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.18.0/src/backend/direct.rs:3111:5:
wgpu error: Validation Error

Caused by:
    In Device::create_compute_pipeline
      note: label = `half`
    Internal error: Metal: program_source:22:13: error: cannot combine with previous 'void' declaration specifier
kernel void half(
            ^
program_source:23:16: error: expected ')'
  metal::uint3 invocation_id_1 [[thread_position_in_grid]]
               ^
program_source:22:17: note: to match this '('
kernel void half(
                ^
program_source:22:1: error: 'kernel' attribute only applies to functions
kernel void half(
^
program_source:23:10: error: program scope variable must reside in constant address space
  metal::uint3 invocation_id_1 [[thread_position_in_grid]]
         ^
program_source:28:5: error: unexpected type name 'uint': expected expression
    uint _e4 = shape.x;
    ^
program_source:28:10: error: expected '}'
    uint _e4 = shape.x;
         ^
program_source:27:3: note: to match this '{'
) {
  ^
program_source:29:10: error: program scope variable must reside in constant address space
    uint stride = _e4 / 4u;
         ^
program_source:29:19: error: use of undeclared identifier '_e4'
    uint stride = _e4 / 4u;
                  ^
program_source:30:10: error: program scope variable must reside in constant address space
    uint index = invocation_id_1.x;
         ^
program_source:30:18: error: use of undeclared identifier 'invocation_id_1'
    uint index = invocation_id_1.x;
                 ^
program_source:31:10: error: program scope variable must reside in constant address space
    uint token = invocation_id_1.y;
         ^
program_source:31:18: error: use of undeclared identifier 'invocation_id_1'
    uint token = invocation_id_1.y;
                 ^
program_source:32:10: error: program scope variable must reside in constant address space
    uint batch = invocation_id_1.z;
         ^
program_source:32:18: error: use of undeclared identifier 'invocation_id_1'
    uint batch = invocation_id_1.z;
                 ^
program_source:33:5: error: expected unqualified-id
    if (index < stride) {
    ^
program_source:41:7: error: expected unqualified-id
    } else {
      ^
program_source:44:1: error: extraneous closing brace ('}')
}
^



note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Rayon: detected unexpected panic; aborting
zsh: abort      cargo run --release

如何中断对话或续写?

如题,模型需要完整输出完毕后才结束计算,对于新增对话或续写 或者说刷新后的界面而言,后台还在跑无效运算

v0.3 前后端需求汇总

目前 Batch Inference 的后台已经基本可用,现在将新版本前端的需求汇总如下。
注:因为前端模型载入未完成,暂时请使用

$ cargo r -r -- --model Config.toml

启动服务器,并修改Config.toml来修改载入配置。

添加载入模型功能

/load 发送 POST,内容如下:

#[derive(Debug, Deserialize)]
pub struct ReloadRequest {
    /// Path to the model.
    pub path: PathBuf,
    /// Specify layers that needs to be quantized.
    pub quant: Vec<usize>,
    /// Maximum tokens to be processed in parallel at once.
    pub token_chunk_size: usize,
    /// The chunk size for each split of the head matrix.
    pub head_chunk_size: usize,
    /// Maximum number of batches that are active at once.
    pub max_runtime_batch: usize,
    /// Number of states that are cached on GPU.
    pub max_batch: usize,
    /// the (reversed) number of layer at which the output is as embedding.
    pub embed_layer: usize,
}

向 Sampler 添加 Penalty Decay

新的 Sampler 结构如下:

#[derive(Debug, Clone)]
pub struct Sampler {
    pub top_p: f32,
    pub temperature: f32,
    pub presence_penalty: f32,
    pub frequency_penalty: f32,
    pub penalty_decay: f32,
}

penalty_decay 的默认值可设 0.95。

解决在推理时无法切换 Chat 的问题

目前在推理未完成时无法切换聊天窗口。目前只能开多个标签页并行测试。

Issue starting server on Manjaro

2024-03-27T00:11:01.765Z INFO [ai00_server::salvo_main] reading config assets/configs/Config.toml...
thread 'main' panicked at src/salvo_main.rs:49:54:
called Result::unwrap() on an Err value: No such file or directory (os error 2)
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at src/middleware.rs:344:47:
called Result::unwrap() on an Err value: Disconnected

运行./ai00_server提示Error: failed to request adaptor

运行./ai00_server提示Error: failed to request adaptor
二进制版本:
ai00_server-v0.1.2-x86_64-unknown-linux-gnu.zip
系统版本
Ubuntu 22.04.2 LTS
内核版本:
Linux server 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

discord channel?

to not spam the bug-tracker with usage or non-bug-questions, pls. put the discord-invitation on the README.

API需求 state 、tps

  1. state API 返回 内存占用、CPU、GPU状态等

/v1/chat/completions
/chat/completions
/v1/completions
/completions
等 API 加入 tps的统计 (可以加个参数开关)

load a model to a explizit GPU

starting the server by hand and choose one of the vulkan GPUs works well – actually very well! kudos 4 u!

in auto mode, it tries to load the model to the first reported (integrated) GPU (here: haswell/intel celeron) .
as it looks like, from watching radeontop, it then starts to offload to GPU, which breaks the system. see kernel log below.

further down the road RWKV-Runner acts the same. it's a bit more friendly in stopping to load the model earlier an stays responsive, but also cannot load a 7B model to a GPU.

suggestion:
implement a solution so that one can load the model to a discrete GPU i.e.
-a <driver>:<busNo>
-a vulkan:01

or skip to load to GPU0, if it is integrated and if there are discrete GPUs available.

Loadorder:

 ./ai00_server --port 8082 --quant 32  --model assets/models/RWKV-4-World-ARAtuned-7B-v1-20230803-ctx4096.stMESA-INTEL: warning: Haswell Vulkan support is incomplete
? Please select an adapter ›
❯ Intel(R) HD Graphics (HSW GT1) (Vulkan)
  Radeon RX 580 Series (Vulkan)
  Radeon RX 580 Series (Vulkan)
  Radeon RX 580 Series (Vulkan)
  Radeon RX 580 Series (Vulkan)

vulkaninfo:

Devices:
========
GPU0:
        apiVersion         = 1.2.230
        driverVersion      = 22.3.6
        vendorID           = 0x8086
        deviceID           = 0x0402
        deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
        deviceName         = Intel(R) HD Graphics (HSW GT1)
        driverID           = DRIVER_ID_INTEL_OPEN_SOURCE_MESA
        driverName         = Intel open-source Mesa driver
        driverInfo         = Mesa 22.3.6
        conformanceVersion = 0.0.0.0
        deviceUUID         = 0f2f1a2f-fc30-f647-758e-bed37906cc4d
        driverUUID         = da807cc5-e5c9-2add-5541-8357feabd0cc
GPU1:
        apiVersion         = 1.3.240
        driverVersion      = 2.0.255
        vendorID           = 0x1002
        deviceID           = 0x67df
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = Radeon RX 580 Series
        driverID           = DRIVER_ID_AMD_OPEN_SOURCE
        driverName         = AMD open-source driver
        driverInfo         = 2023.Q1.2 (LLPC)
        conformanceVersion = 1.3.0.0
        deviceUUID         = 00000000-0100-0000-0000-000000000000
        driverUUID         = 414d442d-4c49-4e55-582d-445256000000
GPU2 - GPU4
…

system: is an old crypto RIG with celeron, 8GB & 4 x RX580 (8GB)
OS: debian 12
kernel: Linux jeeves 6.1.0-11-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4 (2023-08-08) x86_64 GNU/Linux
vulkan driver: amdgpu (opensource)

versions:

libegl-mesa0:amd64               22.3.6-1+deb12u1
libgl1-mesa-dri:amd64            22.3.6-1+deb12u1
libglapi-mesa:amd64              22.3.6-1+deb12u1
libglu1-mesa:amd64               9.0.2-1.1
libglx-mesa0:amd64               22.3.6-1+deb12u1
libvulkan1:amd64                 1.3.239.0-1
mesa-common-dev:amd64            22.3.6-1+deb12u1
mesa-opencl-icd:amd64            22.3.6-1+deb12u1
mesa-va-drivers:amd64            22.3.6-1+deb12u1
mesa-vdpau-drivers:amd64         22.3.6-1+deb12u1
mesa-vulkan-drivers:amd64        22.3.6-1+deb12u1
vulkan-amdgpu:amd64              23.10-1620044.22.04
vulkan-tools                     1.3.239.0+dfsg1-1
vulkan-validationlayers:amd64    1.3.239.0-

kernellog:

Sep 07 18:18:31 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 65535
Sep 07 18:18:31 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 65535
Sep 07 18:18:31 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 65535
Sep 07 18:18:31 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 65535
Sep 07 18:18:43 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=11015, emitted seq=11017
Sep 07 18:18:43 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Sep 07 18:18:43 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: GPU reset begin!
Sep 07 18:18:43 jeeves kernel: amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Sep 07 18:18:43 jeeves kernel: [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Sep 07 18:18:44 jeeves kernel: amdgpu: cp is busy, skip halt cp
Sep 07 18:18:44 jeeves kernel: amdgpu: rlc is busy, skip halt rlc
Sep 07 18:18:44 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: BACO reset
Sep 07 18:18:44 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: GPU reset succeeded, trying to resume
Sep 07 18:18:44 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:18:44 jeeves kernel: [drm] VRAM is lost due to GPU reset!
Sep 07 18:18:51 jeeves kernel: perf: interrupt took too long (3146 > 3142), lowering kernel.perf_event_max_sample_rate to 63500
Sep 07 18:18:54 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:00 jeeves kernel: amdgpu: SMU Firmware start failed!
Sep 07 18:19:00 jeeves kernel: amdgpu: Failed to load SMU ucode.
Sep 07 18:19:00 jeeves kernel: amdgpu: fw load failed
Sep 07 18:19:00 jeeves kernel: amdgpu: smu firmware loading failed
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: GPU reset(1) failed
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: GPU reset end with ret = -22
Sep 07 18:19:00 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -22
Sep 07 18:19:03 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:09 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:16 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:22 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:28 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:29 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=11017, emitted seq=11018
Sep 07 18:19:29 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Sep 07 18:19:29 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: GPU reset begin!
Sep 07 18:19:37 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:47 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:53 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:01 jeeves CRON[136318]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 07 18:20:01 jeeves CRON[136319]: (root) CMD (/bin/ping -qi 10 -c 3 -I zt4mrrjgxa 10.11.1.1 >/dev/null || /usr/sbin/service zerotier-one restart)
Sep 07 18:20:02 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:06 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=871, emitted seq=871
Sep 07 18:20:06 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Sep 07 18:20:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
Sep 07 18:20:12 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:18 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:26 jeeves CRON[136318]: pam_unix(cron:session): session closed for user root
Sep 07 18:20:47 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:47 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:47 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:49 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:59 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:05 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:14 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:24 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:27 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: Guilty job already signaled, skipping HW reset
Sep 07 18:21:27 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset(1) succeeded!
Sep 07 18:21:27 jeeves kernel: kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0
Sep 07 18:21:30 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:39 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:42 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Sep 07 18:21:48 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:52 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=36, emitted seq=39
Sep 07 18:21:52 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process ai00_server pid 134867 thread ai00_server pid 134867
Sep 07 18:21:52 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
Sep 07 18:21:53 jeeves kernel: amdgpu: cp is busy, skip halt cp
Sep 07 18:21:53 jeeves kernel: amdgpu: rlc is busy, skip halt rlc
Sep 07 18:21:53 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: BACO reset
Sep 07 18:22:00 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:00 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset succeeded, trying to resume
Sep 07 18:22:00 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:22:00 jeeves kernel: [drm] VRAM is lost due to GPU reset!
Sep 07 18:22:22 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:22 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:22 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:22 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:22 jeeves kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Sep 07 18:22:22 jeeves kernel:         (detected by 1, t=5252 jiffies, g=1043809, q=784 ncpus=2)
Sep 07 18:22:22 jeeves kernel: rcu: All QSes seen, last rcu_preempt kthread activity 5240 (4296071681-4296066441), jiffies_till_next_fqs=1, root ->qsmask 0x0
Sep 07 18:22:22 jeeves kernel: rcu: rcu_preempt kthread starved for 5240 jiffies! g1043809 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
Sep 07 18:22:22 jeeves kernel: rcu:         Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
Sep 07 18:22:22 jeeves kernel: rcu: RCU grace-period kthread stack dump:
Sep 07 18:22:22 jeeves kernel: task:rcu_preempt     state:R  running task     stack:0     pid:15    ppid:2      flags:0x00004000
Sep 07 18:22:22 jeeves kernel: Call Trace:
Sep 07 18:22:22 jeeves kernel:  <TASK>
Sep 07 18:22:22 jeeves kernel:  __schedule+0x351/0xa20
Sep 07 18:22:22 jeeves kernel:  ? rcu_gp_cleanup+0x480/0x480
Sep 07 18:22:22 jeeves kernel:  schedule+0x5d/0xe0
Sep 07 18:22:22 jeeves kernel:  schedule_timeout+0x94/0x150
Sep 07 18:22:22 jeeves kernel:  ? __bpf_trace_tick_stop+0x10/0x10
Sep 07 18:22:22 jeeves kernel:  rcu_gp_fqs_loop+0x141/0x4c0
Sep 07 18:22:22 jeeves kernel:  rcu_gp_kthread+0xd0/0x190
Sep 07 18:22:22 jeeves kernel:  kthread+0xe9/0x110
Sep 07 18:22:22 jeeves kernel:  ? kthread_complete_and_exit+0x20/0x20
Sep 07 18:22:22 jeeves kernel:  ret_from_fork+0x22/0x30
Sep 07 18:22:22 jeeves kernel:  </TASK>
Sep 07 18:22:22 jeeves kernel: rcu: Stack dump where RCU GP kthread last ran:
Sep 07 18:22:22 jeeves kernel: CPU: 1 PID: 133662 Comm: kworker/u4:7 Tainted: G          I        6.1.0-11-amd64 #1  Debian 6.1.38-4
Sep 07 18:22:22 jeeves kernel: Hardware name: BIOSTAR Group TB85/TB85, BIOS 4.6.5 08/22/2017
Sep 07 18:22:22 jeeves kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Sep 07 18:22:22 jeeves kernel: RIP: 0010:amdgpu_device_rreg.part.0+0x2f/0xe0 [amdgpu]
Sep 07 18:22:22 jeeves kernel: Code: 41 54 44 8d 24 b5 00 00 00 00 55 89 f5 53 48 89 fb 4c 3b a7 b8 08 00 00 73 62 83 e2 02 74 21 4c 03 a3 c0 08 00 00 45 8b 24 24 <48> 8b 43 08 0f b7 70 3e 66 90 44 89 e0 5b 5d 41 5c c3 cc cc cc cc
Sep 07 18:22:22 jeeves kernel: RSP: 0018:ffffbe9b430e7b68 EFLAGS: 00000282
Sep 07 18:22:22 jeeves kernel: RAX: ffffffffc0f56c80 RBX: ffff9be629d40000 RCX: 0000000000000000
Sep 07 18:22:22 jeeves kernel: RDX: 0000000000000000 RSI: 0000000000000095 RDI: ffff9be629d40000
Sep 07 18:22:22 jeeves kernel: RBP: 0000000000000095 R08: 0000000000000000 R09: ffffbe9b430e7948
Sep 07 18:22:22 jeeves kernel: R10: 0000000000000003 R11: ffffffffbbcd43a8 R12: 0000000000000000
Sep 07 18:22:22 jeeves kernel: R13: 0000000000000000 R14: 000000000000ffff R15: 0000000000000000
Sep 07 18:22:22 jeeves kernel: FS:  0000000000000000(0000) GS:ffff9be786b00000(0000) knlGS:0000000000000000
Sep 07 18:22:22 jeeves kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 07 18:22:22 jeeves kernel: CR2: 00007faa31959000 CR3: 0000000106522002 CR4: 00000000000706e0
Sep 07 18:22:22 jeeves kernel: Call Trace:
Sep 07 18:22:22 jeeves kernel:  <IRQ>
Sep 07 18:22:22 jeeves kernel:  ? rcu_check_gp_kthread_starvation.cold+0x16c/0x171
Sep 07 18:22:22 jeeves kernel:  ? rcu_sched_clock_irq+0xc9c/0xcd0
Sep 07 18:22:22 jeeves kernel:  ? raw_notifier_call_chain+0x44/0x60
Sep 07 18:22:22 jeeves kernel:  ? update_process_times+0x77/0xb0
Sep 07 18:22:22 jeeves kernel:  ? tick_sched_handle+0x22/0x60
Sep 07 18:22:22 jeeves kernel:  ? tick_sched_timer+0x6f/0x80
Sep 07 18:22:22 jeeves kernel:  ? tick_sched_do_timer+0xa0/0xa0
Sep 07 18:22:22 jeeves kernel:  ? __hrtimer_run_queues+0x112/0x2b0
Sep 07 18:22:22 jeeves kernel:  ? hrtimer_interrupt+0xfe/0x220
Sep 07 18:22:22 jeeves kernel:  ? __sysvec_apic_timer_interrupt+0x7f/0x170
Sep 07 18:22:22 jeeves kernel:  ? sysvec_apic_timer_interrupt+0x99/0xc0
Sep 07 18:22:22 jeeves kernel:  </IRQ>
Sep 07 18:22:22 jeeves kernel:  <TASK>
Sep 07 18:22:22 jeeves kernel:  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
Sep 07 18:22:22 jeeves kernel:  ? amdgpu_cgs_write_register+0x10/0x10 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  ? amdgpu_device_rreg.part.0+0x2f/0xe0 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  phm_wait_for_register_unequal+0x5e/0xa0 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  smu7_send_msg_to_smc+0x91/0x140 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  smum_send_msg_to_smc_with_parameter+0xc7/0x100 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  smu7_update_clock_gatings+0x2c4/0x3f0 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  pp_set_clockgating_by_smu+0x35/0x70 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  amdgpu_dpm_set_clockgating_by_smu+0x4d/0x70 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  vi_common_set_clockgating_state+0x19d/0x310 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  amdgpu_device_set_cg_state+0x92/0xf0 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  ? __irq_put_desc_unlock+0x18/0x40
Sep 07 18:22:22 jeeves kernel:  amdgpu_device_ip_suspend_phase1+0x27/0xe0 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  amdgpu_device_ip_suspend+0x1b/0x70 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  amdgpu_device_pre_asic_reset+0xcf/0x290 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  amdgpu_device_gpu_recover.cold+0x607/0xad4 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  amdgpu_job_timedout+0x1d8/0x220 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  ? psi_group_change+0x145/0x360
Sep 07 18:22:22 jeeves kernel:  ? __switch_to+0x228/0x410
Sep 07 18:22:22 jeeves kernel:  drm_sched_job_timedout+0x76/0x110 [gpu_sched]
Sep 07 18:22:22 jeeves kernel:  process_one_work+0x1c7/0x380
Sep 07 18:22:22 jeeves kernel:  worker_thread+0x4d/0x380
Sep 07 18:22:22 jeeves kernel:  ? _raw_spin_lock_irqsave+0x23/0x50
Sep 07 18:22:22 jeeves kernel:  ? rescuer_thread+0x3a0/0x3a0
Sep 07 18:22:22 jeeves kernel:  kthread+0xe9/0x110
Sep 07 18:22:22 jeeves kernel:  ? kthread_complete_and_exit+0x20/0x20
Sep 07 18:22:22 jeeves kernel:  ret_from_fork+0x22/0x30
Sep 07 18:22:22 jeeves kernel:  </TASK>
Sep 07 18:22:22 jeeves kernel: amdgpu: SMU Firmware start failed!
Sep 07 18:22:22 jeeves kernel: amdgpu: Failed to load SMU ucode.
Sep 07 18:22:22 jeeves kernel: amdgpu: fw load failed
Sep 07 18:22:22 jeeves kernel: amdgpu: smu firmware loading failed
Sep 07 18:22:22 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:22 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:22 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset(4) failed
Sep 07 18:22:22 jeeves kernel: kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0
Sep 07 18:22:22 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:22 jeeves kernel: [drm] Skip scheduling IBs!
SSep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
…
around 800 x the same logentry
…
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=241, emitted seq=241
Sep 07 18:22:31 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Sep 07 18:22:31 jeeves kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:31 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:31 jeeves kernel: amdgpu: Move buffer fallback to memcpy unavailable
Sep 07 18:22:31 jeeves kernel: [drm] evicting device resources failed
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: The canary thread is apparently starving. Taking action.
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: Demoting known real-time threads.
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: Successfully demoted thread 77389 of process 77039.
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: Successfully demoted thread 75491 of process 75461.
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: Successfully demoted thread 6109 of process 5989.
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: Successfully demoted thread 5989 of process 5989.
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: Demoted 4 threads.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: The canary thread is apparently starving. Taking action.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: Demoting known real-time threads.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: Successfully demoted thread 77389 of process 77039.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: Successfully demoted thread 75491 of process 75461.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: Successfully demoted thread 6109 of process 5989.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: Successfully demoted thread 5989 of process 5989.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: Demoted 4 threads.
Sep 07 18:22:31 jeeves kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Sep 07 18:22:43 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:43 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:43 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset end with ret = -22
Sep 07 18:22:43 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -22
Sep 07 18:22:43 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=1620, emitted seq=1622
Sep 07 18:22:43 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Sep 07 18:22:43 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
Sep 07 18:22:43 jeeves kernel: amdgpu 0000:04:00.0: amdgpu: Guilty job already signaled, skipping HW reset
Sep 07 18:22:43 jeeves kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset(1) succeeded!
Sep 07 18:22:43 jeeves kernel: kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: The canary thread is apparently starving. Taking action.
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: Demoting known real-time threads.
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 77389 of process 77039.
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 75491 of process 75461.
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 6109 of process 5989.
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 5989 of process 5989.
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: Demoted 4 threads.
Sep 07 18:22:56 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:56 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:56 jeeves kernel: amdgpu: Move buffer fallback to memcpy unavailable
Sep 07 18:22:56 jeeves kernel: [drm] evicting device resources failed
Sep 07 18:22:56 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:56 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:08 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:08 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:21 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:21 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:21 jeeves kernel: ------------[ cut here ]------------
Sep 07 18:23:21 jeeves kernel: WARNING: CPU: 1 PID: 139863 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2521 dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:21 jeeves kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc tun overlay rfkill qrtr binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl_msr ext4 intel_rapl_common crc16 mbcache jbd2 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ghash_clmulni_intel cryptd sha512_ssse3 mei_hdcp mei_wdt sha512_generic snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio rapl snd_hda_codec_hdmi snd_hda_intel intel_cstate at24 intel_uncore iTCO_wdt snd_intel_dspcfg intel_pmc_bxt snd_intel_sdw_acpi iTCO_vendor_support pcspkr watchdog snd_hda_codec snd_hda_core snd_hwdep snd_pcm mei_me snd_timer snd mei soundcore evdev sg msr parport_pc ppdev efi_pstore lp fuse parport loop dm_mod configfs efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas
Sep 07 18:23:23 jeeves kernel:  usb_storage hid_generic sd_mod t10_pi usbhid hid crc64_rocksoft crc64 crc_t10dif crct10dif_generic amdgpu i915 ahci libahci gpu_sched drm_buddy i2c_algo_bit drm_display_helper libata cec xhci_pci rc_core drm_ttm_helper ttm drm_kms_helper xhci_hcd crct10dif_pclmul crct10dif_common scsi_mod ehci_pci r8169 crc32_pclmul crc32c_intel ehci_hcd scsi_common i2c_i801 i2c_smbus realtek mdio_devres lpc_ich libphy drm usbcore usb_common fan video wmi button
Sep 07 18:23:23 jeeves kernel: CPU: 1 PID: 139863 Comm: kworker/1:0 Tainted: G          I        6.1.0-11-amd64 #1  Debian 6.1.38-4
Sep 07 18:23:23 jeeves kernel: Hardware name: BIOSTAR Group TB85/TB85, BIOS 4.6.5 08/22/2017
Sep 07 18:23:23 jeeves kernel: Workqueue: pm pm_runtime_work
Sep 07 18:23:23 jeeves kernel: RIP: 0010:dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel: Code: 4c 89 e7 e8 a0 be 1e 00 48 89 ef e8 d8 b4 00 00 4c 89 f7 e8 70 c0 ff ff e9 f0 fe ff ff 4c 89 e6 4c 89 ef e8 20 e3 1e 00 eb d6 <0f> 0b e9 9b fe ff ff e8 32 1b c3 f9 66 90 41 57 49 89 fa 49 89 cf
Sep 07 18:23:23 jeeves kernel: RSP: 0018:ffffbe9b42397c98 EFLAGS: 00010282
Sep 07 18:23:23 jeeves kernel: RAX: 0000000000000000 RBX: ffff9be613177450 RCX: 0000000000000000
Sep 07 18:23:23 jeeves kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9be613160000
Sep 07 18:23:23 jeeves kernel: RBP: ffff9be613160000 R08: 0000000000000001 R09: ffff9be6222bec74
Sep 07 18:23:23 jeeves kernel: R10: 0000000000000003 R11: 0000000000000005 R12: ffff9be613160000
Sep 07 18:23:23 jeeves kernel: R13: 0000000000000000 R14: ffff9be6131754f8 R15: ffff9be600a3f248
Sep 07 18:23:23 jeeves kernel: FS:  0000000000000000(0000) GS:ffff9be786b00000(0000) knlGS:0000000000000000
Sep 07 18:23:23 jeeves kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 07 18:23:23 jeeves kernel: CR2: 0000558ccf368010 CR3: 0000000103144003 CR4: 00000000000706e0
Sep 07 18:23:23 jeeves kernel: Call Trace:
Sep 07 18:23:23 jeeves kernel:  <TASK>
Sep 07 18:23:23 jeeves kernel:  ? __warn+0x7d/0xc0
Sep 07 18:23:23 jeeves kernel:  ? dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  ? report_bug+0xe6/0x170
Sep 07 18:23:23 jeeves kernel:  ? handle_bug+0x41/0x70
Sep 07 18:23:23 jeeves kernel:  ? exc_invalid_op+0x13/0x60
Sep 07 18:23:23 jeeves kernel:  ? asm_exc_invalid_op+0x16/0x20
Sep 07 18:23:23 jeeves kernel:  ? dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  ? dm_suspend+0x32/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  ? vi_common_set_clockgating_state+0x237/0x310 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  amdgpu_device_ip_suspend_phase1+0x75/0xe0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  amdgpu_device_suspend+0x78/0x150 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  amdgpu_pmops_runtime_suspend+0xba/0x190 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  pci_pm_runtime_suspend+0x66/0x1b0
Sep 07 18:23:23 jeeves kernel:  ? pci_dev_put+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  __rpm_callback+0x44/0x170
Sep 07 18:23:23 jeeves kernel:  ? pci_dev_put+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  rpm_callback+0x5d/0x70
Sep 07 18:23:23 jeeves kernel:  ? pci_dev_put+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  rpm_suspend+0x11a/0x720
Sep 07 18:23:23 jeeves kernel:  pm_runtime_work+0x94/0xa0
Sep 07 18:23:23 jeeves kernel:  process_one_work+0x1c7/0x380
Sep 07 18:23:23 jeeves kernel:  worker_thread+0x4d/0x380
Sep 07 18:23:23 jeeves kernel:  ? _raw_spin_lock_irqsave+0x23/0x50
Sep 07 18:23:23 jeeves kernel:  ? rescuer_thread+0x3a0/0x3a0
Sep 07 18:23:23 jeeves kernel:  kthread+0xe9/0x110
Sep 07 18:23:23 jeeves kernel:  ? kthread_complete_and_exit+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  ret_from_fork+0x22/0x30
Sep 07 18:23:23 jeeves kernel:  </TASK>
Sep 07 18:23:23 jeeves kernel: ---[ end trace 0000000000000000 ]---
Sep 07 18:23:23 jeeves kernel: ------------[ cut here ]------------
Sep 07 18:23:23 jeeves kernel: WARNING: CPU: 1 PID: 140 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2521 dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc tun overlay rfkill qrtr binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl_msr ext4 intel_rapl_common crc16 mbcache jbd2 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ghash_clmulni_intel cryptd sha512_ssse3 mei_hdcp mei_wdt sha512_generic snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio rapl snd_hda_codec_hdmi snd_hda_intel intel_cstate at24 intel_uncore iTCO_wdt snd_intel_dspcfg intel_pmc_bxt snd_intel_sdw_acpi iTCO_vendor_support pcspkr watchdog snd_hda_codec snd_hda_core snd_hwdep snd_pcm mei_me snd_timer snd mei soundcore evdev sg msr parport_pc ppdev efi_pstore lp fuse parport loop dm_mod configfs efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas
Sep 07 18:23:23 jeeves kernel:  usb_storage hid_generic sd_mod t10_pi usbhid hid crc64_rocksoft crc64 crc_t10dif crct10dif_generic amdgpu i915 ahci libahci gpu_sched drm_buddy i2c_algo_bit drm_display_helper libata cec xhci_pci rc_core drm_ttm_helper ttm drm_kms_helper xhci_hcd crct10dif_pclmul crct10dif_common scsi_mod ehci_pci r8169 crc32_pclmul crc32c_intel ehci_hcd scsi_common i2c_i801 i2c_smbus realtek mdio_devres lpc_ich libphy drm usbcore usb_common fan video wmi button
Sep 07 18:23:23 jeeves kernel: CPU: 1 PID: 140 Comm: kworker/1:3 Tainted: G        W I        6.1.0-11-amd64 #1  Debian 6.1.38-4
Sep 07 18:23:23 jeeves kernel: Hardware name: BIOSTAR Group TB85/TB85, BIOS 4.6.5 08/22/2017
Sep 07 18:23:23 jeeves kernel: Workqueue: pm pm_runtime_work
Sep 07 18:23:23 jeeves kernel: RIP: 0010:dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel: Code: 4c 89 e7 e8 a0 be 1e 00 48 89 ef e8 d8 b4 00 00 4c 89 f7 e8 70 c0 ff ff e9 f0 fe ff ff 4c 89 e6 4c 89 ef e8 20 e3 1e 00 eb d6 <0f> 0b e9 9b fe ff ff e8 32 1b c3 f9 66 90 41 57 49 89 fa 49 89 cf
Sep 07 18:23:23 jeeves kernel: RSP: 0018:ffffbe9b40437c98 EFLAGS: 00010282
Sep 07 18:23:23 jeeves kernel: RAX: 0000000000000000 RBX: ffff9be612d57450 RCX: 0000000000000000
Sep 07 18:23:23 jeeves kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9be612d40000
Sep 07 18:23:23 jeeves kernel: RBP: ffff9be612d40000 R08: 0000000000000001 R09: ffff9be60ea8baf4
Sep 07 18:23:23 jeeves kernel: R10: 0000000000000003 R11: 000000000000000f R12: ffff9be612d40000
Sep 07 18:23:23 jeeves kernel: R13: 0000000000000000 R14: ffff9be612d554f8 R15: ffff9be600a39248
Sep 07 18:23:23 jeeves kernel: FS:  0000000000000000(0000) GS:ffff9be786b00000(0000) knlGS:0000000000000000
Sep 07 18:23:23 jeeves kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 07 18:23:23 jeeves kernel: CR2: 00007ffde7b07078 CR3: 0000000102b52002 CR4: 00000000000706e0
Sep 07 18:23:23 jeeves kernel: Call Trace:
Sep 07 18:23:23 jeeves kernel:  <TASK>
Sep 07 18:23:23 jeeves kernel:  ? __warn+0x7d/0xc0
Sep 07 18:23:23 jeeves kernel:  ? dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  ? report_bug+0xe6/0x170
Sep 07 18:23:23 jeeves kernel:  ? handle_bug+0x41/0x70
Sep 07 18:23:23 jeeves kernel:  ? exc_invalid_op+0x13/0x60
Sep 07 18:23:23 jeeves kernel:  ? asm_exc_invalid_op+0x16/0x20
Sep 07 18:23:23 jeeves kernel:  ? dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  ? dm_suspend+0x32/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  ? vi_common_set_clockgating_state+0x237/0x310 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  amdgpu_device_ip_suspend_phase1+0x75/0xe0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  amdgpu_device_suspend+0x78/0x150 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  amdgpu_pmops_runtime_suspend+0xba/0x190 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  pci_pm_runtime_suspend+0x66/0x1b0
Sep 07 18:23:23 jeeves kernel:  ? update_load_avg+0x7e/0x780
Sep 07 18:23:23 jeeves kernel:  ? pci_dev_put+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  __rpm_callback+0x44/0x170
Sep 07 18:23:23 jeeves kernel:  ? pci_dev_put+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  rpm_callback+0x5d/0x70
Sep 07 18:23:23 jeeves kernel:  ? pci_dev_put+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  rpm_suspend+0x11a/0x720
Sep 07 18:23:23 jeeves kernel:  ? _raw_spin_unlock+0x15/0x30
Sep 07 18:23:23 jeeves kernel:  ? finish_task_switch.isra.0+0x9b/0x300
Sep 07 18:23:23 jeeves kernel:  ? __switch_to+0x106/0x410
Sep 07 18:23:23 jeeves kernel:  pm_runtime_work+0x94/0xa0
Sep 07 18:23:23 jeeves kernel:  process_one_work+0x1c7/0x380
Sep 07 18:23:23 jeeves kernel:  worker_thread+0x4d/0x380
Sep 07 18:23:23 jeeves kernel:  ? rescuer_thread+0x3a0/0x3a0
Sep 07 18:23:23 jeeves kernel:  kthread+0xe9/0x110
Sep 07 18:23:23 jeeves kernel:  ? kthread_complete_and_exit+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  ret_from_fork+0x22/0x30
Sep 07 18:23:23 jeeves kernel:  </TASK>
Sep 07 18:23:23 jeeves kernel: ---[ end trace 0000000000000000 ]---
Sep 07 18:23:51 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:51 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:51 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:51 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:51 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:51 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: The canary thread is apparently starving. Taking action.
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: Demoting known real-time threads.
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 77389 of process 77039.
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 75491 of process 75461.
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 6109 of process 5989.
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 5989 of process 5989.
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: Demoted 4 threads.
Sep 07 18:24:08 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:08 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:08 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: The canary thread is apparently starving. Taking action.
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: Demoting known real-time threads.
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: Successfully demoted thread 77389 of process 77039.
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: Successfully demoted thread 75491 of process 75461.
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: Successfully demoted thread 6109 of process 5989.
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: Successfully demoted thread 5989 of process 5989.
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: Demoted 4 threads.
Sep 07 18:24:09 jeeves dbus-daemon[3029]: [session uid=1001 pid=3029] Activating service name='org.xfce.Xfconf' requested by ':1.16' (uid=1001 pid=5900 comm="xfsettingsd")
Sep 07 18:24:09 jeeves dbus-daemon[3029]: [session uid=1001 pid=3029] Successfully activated service 'org.xfce.Xfconf'
Sep 07 18:24:12 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:23 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:23 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: The canary thread is apparently starving. Taking action.
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: Demoting known real-time threads.
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: Successfully demoted thread 77389 of process 77039.
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: Successfully demoted thread 75491 of process 75461.
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: Successfully demoted thread 6109 of process 5989.
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: Successfully demoted thread 5989 of process 5989.
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: Demoted 4 threads.
Sep 07 18:24:31 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:31 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:32 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:24:32 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:24:32 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:24:32 jeeves kernel: amdgpu 0000:04:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:24:32 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:24:33 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:24:33 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:24:35 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:24:41 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:58 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:25:24 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:25:24 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:25:24 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:25:24 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:25:24 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.0.0 (-110).
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on uvd (-110).
Sep 07 18:25:24 jeeves CRON[140499]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 07 18:25:24 jeeves CRON[140604]: (root) CMD (/bin/ping -qi 10 -c 3 -I zt4mrrjgxa 10.11.1.1 >/dev/null || /usr/sbin/service zerotier-one restart)
Sep 07 18:25:24 jeeves kernel: [drm:process_one_work] *ERROR* ib ring test failed (-110).
Sep 07 18:25:24 jeeves kernel: [drm:process_one_work] *ERROR* ib ring test failed (-110).
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:04:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:25:33 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:33 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:33 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:25:42 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:42 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:42 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:25:42 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:25:42 jeeves kernel: amdgpu 0000:04:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:25:42 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:25:43 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:25:43 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:25:43 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:25:48 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:26:00 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:26:00 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:26:08 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:26:08 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:26:08 jeeves CRON[140499]: pam_unix(cron:session): session closed for user root
Sep 07 18:26:09 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:26:09 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:26:09 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:26:17 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:26:17 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:26:18 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:26:50 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:27:06 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:04:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:27:06 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:27:06 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:27:06 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:27:06 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:27:06 jeeves kernel: INFO: task kworker/u4:7:133662 blocked for more than 126 seconds.
Sep 07 18:27:06 jeeves kernel:       Tainted: G        W I        6.1.0-11-amd64 #1 Debian 6.1.38-4
Sep 07 18:27:06 jeeves kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 07 18:27:06 jeeves kernel: task:kworker/u4:7    state:D stack:0     pid:133662 ppid:2      flags:0x00004000
Sep 07 18:27:06 jeeves kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Sep 07 18:27:06 jeeves kernel: Call Trace:
Sep 07 18:27:06 jeeves kernel:  <TASK>
Sep 07 18:27:06 jeeves kernel:  __schedule+0x351/0xa20
Sep 07 18:27:06 jeeves kernel:  schedule+0x5d/0xe0
Sep 07 18:27:06 jeeves kernel:  schedule_preempt_disabled+0x14/0x30
Sep 07 18:27:06 jeeves kernel:  __mutex_lock.constprop.0+0x3b4/0x700
Sep 07 18:27:06 jeeves kernel:  ? __schedule+0x359/0xa20
Sep 07 18:27:06 jeeves kernel:  dm_suspend+0xba/0x1b0 [amdgpu]
Sep 07 18:27:06 jeeves kernel:  ? __cond_resched+0x1c/0x30
Sep 07 18:27:06 jeeves kernel:  ? preempt_schedule_common+0x2d/0x70
Sep 07 18:27:06 jeeves kernel:  ? __cond_resched+0x1c/0x30
Sep 07 18:27:06 jeeves kernel:  amdgpu_device_ip_suspend_phase1+0x75/0xe0 [amdgpu]
Sep 07 18:27:06 jeeves kernel:  amdgpu_device_ip_suspend+0x1b/0x70 [amdgpu]
Sep 07 18:27:06 jeeves kernel:  amdgpu_device_pre_asic_reset+0xcf/0x290 [amdgpu]
Sep 07 18:27:06 jeeves kernel:  amdgpu_device_gpu_recover.cold+0x607/0xad4 [amdgpu]
Sep 07 18:27:06 jeeves kernel:  amdgpu_job_timedout+0x1d8/0x220 [amdgpu]
Sep 07 18:27:06 jeeves kernel:  ? psi_group_change+0x145/0x360
Sep 07 18:27:06 jeeves kernel:  ? __switch_to+0x228/0x410
Sep 07 18:27:06 jeeves kernel:  drm_sched_job_timedout+0x76/0x110 [gpu_sched]
Sep 07 18:27:06 jeeves kernel:  process_one_work+0x1c7/0x380
Sep 07 18:27:06 jeeves kernel:  worker_thread+0x4d/0x380
Sep 07 18:27:06 jeeves kernel:  ? _raw_spin_lock_irqsave+0x23/0x50
Sep 07 18:27:06 jeeves kernel:  ? rescuer_thread+0x3a0/0x3a0
Sep 07 18:27:06 jeeves kernel:  kthread+0xe9/0x110
Sep 07 18:27:06 jeeves kernel:  ? kthread_complete_and_exit+0x20/0x20
Sep 07 18:27:06 jeeves kernel:  ret_from_fork+0x22/0x30
Sep 07 18:27:06 jeeves kernel:  </TASK>
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
Sep 07 18:27:06 jeeves kernel: [drm:process_one_work] *ERROR* ib ring test failed (-110).
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:27:06 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:04:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:27:09 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:27:09 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:27:09 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:27:09 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:27:12 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:28:44 jeeves kernel: INFO: task radeontop:46792 blocked for more than 121 seconds.
Sep 07 18:28:44 jeeves kernel:       Tainted: G        W I        6.1.0-11-amd64 #1 Debian 6.1.38-4
Sep 07 18:28:44 jeeves kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 07 18:28:44 jeeves kernel: task:radeontop       state:D stack:0     pid:46792 ppid:46288  flags:0x00004002
Sep 07 18:28:44 jeeves kernel: Call Trace:
Sep 07 18:28:44 jeeves kernel:  <TASK>
Sep 07 18:28:44 jeeves kernel:  __schedule+0x351/0xa20
Sep 07 18:28:44 jeeves kernel:  schedule+0x5d/0xe0
Sep 07 18:28:44 jeeves kernel:  schedule_preempt_disabled+0x14/0x30
Sep 07 18:28:44 jeeves kernel:  __mutex_lock.constprop.0+0x3b4/0x700
Sep 07 18:28:44 jeeves kernel:  drm_release+0x42/0xd0 [drm]
Sep 07 18:28:44 jeeves kernel:  __fput+0x91/0x250
Sep 07 18:28:44 jeeves kernel:  task_work_run+0x59/0x90
Sep 07 18:28:44 jeeves kernel:  do_exit+0x357/0xb10
Sep 07 18:28:44 jeeves kernel:  ? finish_task_switch.isra.0+0x25e/0x300
Sep 07 18:28:44 jeeves kernel:  ? __switch_to+0x106/0x410
Sep 07 18:28:44 jeeves kernel:  do_group_exit+0x2d/0x80
Sep 07 18:28:44 jeeves kernel:  get_signal+0x96a/0x970
Sep 07 18:28:44 jeeves kernel:  ? _raw_spin_unlock_irqrestore+0x23/0x40
Sep 07 18:28:44 jeeves kernel:  ? hrtimer_try_to_cancel+0x78/0x110
Sep 07 18:28:44 jeeves kernel:  arch_do_signal_or_restart+0x3e/0x840
Sep 07 18:28:44 jeeves kernel:  ? hrtimer_nanosleep+0xc7/0x1b0
Sep 07 18:28:44 jeeves kernel:  exit_to_user_mode_prepare+0x18c/0x1d0
Sep 07 18:28:44 jeeves kernel:  syscall_exit_to_user_mode+0x17/0x40
Sep 07 18:28:44 jeeves kernel:  do_syscall_64+0x67/0xc0
Sep 07 18:28:44 jeeves kernel:  ? do_syscall_64+0x67/0xc0
Sep 07 18:28:44 jeeves kernel:  ? do_syscall_64+0x67/0xc0
Sep 07 18:28:44 jeeves kernel:  entry_SYSCALL_64_after_hwframe+0x69/0xd3
Sep 07 18:28:44 jeeves kernel: RIP: 0033:0x7f351cab9385
Sep 07 18:28:44 jeeves kernel: RSP: 002b:00007f351c5fed50 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6
Sep 07 18:28:44 jeeves kernel: RAX: fffffffffffffdfc RBX: 0000000000000061 RCX: 00007f351cab9385
Sep 07 18:28:44 jeeves kernel: RDX: 00007f351c5fed90 RSI: 0000000000000000 RDI: 0000000000000000
Sep 07 18:28:44 jeeves kernel: RBP: 00007f351c5fedf0 R08: 0000000000000000 R09: 00007f351c5fedec
Sep 07 18:28:44 jeeves kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007f35140030f0
Sep 07 18:28:44 jeeves kernel: R13: 0000000000000078 R14: 00007f35140029c0 R15: 00007f3514000b70
Sep 07 18:28:44 jeeves kernel:  </TASK>
Sep 07 18:28:44 jeeves kernel: INFO: task kworker/u4:7:133662 blocked for more than 248 seconds.
Sep 07 18:28:44 jeeves kernel:       Tainted: G        W I        6.1.0-11-amd64 #1 Debian 6.1.38-4
Sep 07 18:28:44 jeeves kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 07 18:28:44 jeeves kernel: task:kworker/u4:7    state:D stack:0     pid:133662 ppid:2      flags:0x00004000
Sep 07 18:28:44 jeeves kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Sep 07 18:28:44 jeeves kernel: Call Trace:
Sep 07 18:28:44 jeeves kernel:  <TASK>
Sep 07 18:28:44 jeeves kernel:  __schedule+0x351/0xa20
Sep 07 18:28:44 jeeves kernel:  schedule+0x5d/0xe0
Sep 07 18:28:44 jeeves kernel:  schedule_preempt_disabled+0x14/0x30
Sep 07 18:28:44 jeeves kernel:  __mutex_lock.constprop.0+0x3b4/0x700
Sep 07 18:28:44 jeeves kernel:  ? __schedule+0x359/0xa20
Sep 07 18:28:44 jeeves kernel:  dm_suspend+0xba/0x1b0 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  ? __cond_resched+0x1c/0x30
Sep 07 18:28:44 jeeves kernel:  ? preempt_schedule_common+0x2d/0x70
Sep 07 18:28:44 jeeves kernel:  ? __cond_resched+0x1c/0x30
Sep 07 18:28:44 jeeves kernel:  amdgpu_device_ip_suspend_phase1+0x75/0xe0 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  amdgpu_device_ip_suspend+0x1b/0x70 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  amdgpu_device_pre_asic_reset+0xcf/0x290 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  amdgpu_device_gpu_recover.cold+0x607/0xad4 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  amdgpu_job_timedout+0x1d8/0x220 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  ? psi_group_change+0x145/0x360
Sep 07 18:28:44 jeeves kernel:  ? __switch_to+0x228/0x410
Sep 07 18:28:44 jeeves kernel:  drm_sched_job_timedout+0x76/0x110 [gpu_sched]
Sep 07 18:28:44 jeeves kernel:  process_one_work+0x1c7/0x380
Sep 07 18:28:44 jeeves kernel:  worker_thread+0x4d/0x380
Sep 07 18:28:44 jeeves kernel:  ? _raw_spin_lock_irqsave+0x23/0x50
Sep 07 18:28:44 jeeves kernel:  ? rescuer_thread+0x3a0/0x3a0
Sep 07 18:28:44 jeeves kernel:  kthread+0xe9/0x110
Sep 07 18:28:44 jeeves kernel:  ? kthread_complete_and_exit+0x20/0x20
Sep 07 18:28:44 jeeves kernel:  ret_from_fork+0x22/0x30
Sep 07 18:28:44 jeeves kernel:  </TASK>
Sep 07 18:28:44 jeeves kernel: INFO: task ai00_server:135401 blocked for more than 121 seconds.
Sep 07 18:28:44 jeeves kernel:       Tainted: G        W I        6.1.0-11-amd64 #1 Debian 6.1.38-4
Sep 07 18:28:44 jeeves kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 07 18:28:44 jeeves kernel: task:ai00_server     state:D stack:0     pid:135401 ppid:18857  flags:0x00004006
Sep 07 18:28:44 jeeves kernel: Call Trace:
Sep 07 18:28:44 jeeves kernel:  <TASK>
Sep 07 18:28:44 jeeves kernel:  __schedule+0x351/0xa20
Sep 07 18:28:44 jeeves kernel:  schedule+0x5d/0xe0
Sep 07 18:28:44 jeeves kernel:  schedule_timeout+0x118/0x150
Sep 07 18:28:44 jeeves kernel:  dma_fence_default_wait+0x1a5/0x260
Sep 07 18:28:44 jeeves kernel:  ? __bpf_trace_dma_fence+0x10/0x10
Sep 07 18:28:44 jeeves kernel:  dma_fence_wait_timeout+0x108/0x130
Sep 07 18:28:44 jeeves kernel:  amdgpu_vm_fini+0xf7/0x510 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  amdgpu_driver_postclose_kms+0x1e5/0x2d0 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  drm_file_free.part.0+0x207/0x250 [drm]
Sep 07 18:28:44 jeeves kernel:  drm_release+0x64/0xd0 [drm]
Sep 07 18:28:44 jeeves kernel:  __fput+0x91/0x250
Sep 07 18:28:44 jeeves kernel:  task_work_run+0x59/0x90
Sep 07 18:28:44 jeeves kernel:  exit_to_user_mode_prepare+0x1c4/0x1d0
Sep 07 18:28:44 jeeves kernel:  syscall_exit_to_user_mode+0x17/0x40
Sep 07 18:28:44 jeeves kernel:  do_syscall_64+0x67/0xc0
Sep 07 18:28:44 jeeves kernel:  ? exit_to_user_mode_prepare+0x40/0x1d0
Sep 07 18:28:44 jeeves kernel:  ? syscall_exit_to_user_mode+0x17/0x40
Sep 07 18:28:44 jeeves kernel:  ? do_syscall_64+0x67/0xc0
Sep 07 18:28:44 jeeves kernel:  entry_SYSCALL_64_after_hwframe+0x69/0xd3
Sep 07 18:28:44 jeeves kernel: RIP: 0033:0x7f78fb9e27ea
Sep 07 18:28:44 jeeves kernel: RSP: 002b:00007f78b9bb76c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
Sep 07 18:28:44 jeeves kernel: RAX: 0000000000000000 RBX: 000055ab3ee3e150 RCX: 00007f78fb9e27ea
Sep 07 18:28:44 jeeves kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000d
Sep 07 18:28:44 jeeves kernel: RBP: 000055ab3eed9e90 R08: 0000000000000007 R09: 000055ab3ee40580
Sep 07 18:28:44 jeeves kernel: R10: 7d63c7425ede93d5 R11: 0000000000000293 R12: 000055ab3ed59d88
Sep 07 18:28:44 jeeves kernel: R13: 000055ab3ee34448 R14: 000055ab3ee34648 R15: 000055ab3ee34310
Sep 07 18:28:44 jeeves kernel:  </TASK>
Sep 07 18:30:01 jeeves CRON[146976]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 07 18:30:01 jeeves CRON[146977]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 07 18:30:01 jeeves CRON[146978]: (root) CMD (/bin/ping -qi 10 -c 3 -I zt4mrrjgxa 10.11.1.1 >/dev/null || /usr/sbin/service zerotier-one restart)
Sep 07 18:30:01 jeeves CRON[146979]: (root) CMD ([ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi)
Sep 07 18:30:01 jeeves CRON[146976]: pam_unix(cron:session): session closed for user root
Sep 07 18:30:21 jeeves CRON[146977]: pam_unix(cron:session): session closed for user root

切换模型

可以在页面上可以增加一个下拉框切换不同rwkv模型的功能吗

Add a text conversion API 增加文本转换API

增加文本转换API,把常见的docx、pdf等文档转化为txt,
方便后续 对文档对话、个人知识库等功能的开发。

可以说一下 还有那些文档格式的支持, 欢迎Pr

Add a text conversion API to convert docx and PDF documents into txt,

Facilitate the development of functions such as document chat and personal knowledge base in the future.

Needs Pr

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.