thudm / chatglm3 Goto Github PK

View Code? Open in Web Editor NEW

12.8K 12.8K 1.5K 17.79 MB

ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型

License: Apache License 2.0

Python 71.31% Shell 0.09% Jupyter Notebook 28.60%

chatglm3's Issues

function call的回复内容错乱

我自己模拟注册了几个简单的tool，测试结果回复的内容有点错乱，必填参数未输入会自己创建时间

RuntimeError: CUDA error: device-side assert triggered

模型部署后调用了几百次没问题但再调用就报了这个错误
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
return await self.app(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/applications.py", line 276, in call
await super().call(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
raise e
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/routing.py", line 237, in app
raw_response = await run_endpoint_function(
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
return await dependant.call(**values)
File "get_api_cuda1.py", line 66, in create_item
response, history = model.chat(tokenizer,
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 1032, in chat
inputs = inputs.to(self.device)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 758, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 758, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

如何设置个性化的systemprompt，让模型默认扮演某一角色呢？

如何设置自己个性化的system角色，让模型默认扮演着这个角色进行后续的对话，用户不需要在对话时要求模型扮演，类似chatgpt的customize，在哪里修改代码呢？

微调代码什么时候能够发布

关心这个问题，谢谢

需要样例代码，让我们更好的使用，调用ChatGLM3。

需要样例代码，让我们更好的使用，调用ChatGLM3：

为什么我运行的composite_demo里面tool不会调用例子里面的方法？

尝试查询一下厦门地天气，结果给了个代码。

输出图像的Prompt格式

我有观察到Prompt文档显示如何生成图像，如下所示。请问【image】是随意写个占位符就行了是吗？

...
plt.axis('equal')
plt.axis('off')
plt.show()

<|observation|>
```result
【image】

<|assistant|>
这是一个爱心形状。我使用了参数方程来描述这个形状，并使用matplotlib进行了绘制。如果您有任何其他需求或问题，请随时告诉我。
<|user|> # End

model paths in web demos are pointing to local path

In both web_demo.py and web_demo2.py the paths to ChatGLM3 model are pointing to a local path "/mnt/vepfs/workspace/zxdu/chatglm3-6b"

eg.

tokenizer = AutoTokenizer.from_pretrained("/mnt/vepfs/workspace/zxdu/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("/mnt/vepfs/workspace/zxdu/chatglm3-6b", trust_remote_code=True).cuda()

They probably should be corrected to hugging face hub format, like "THUDM/chatglm3-6b"

function call 必填参数不输入,不会提示,有时还会自己输出无关内容

使用的是tool_using中的cli_demo_tool.py,必填参数不输入,不会提示,有时还会自己编内容
代码：

效果：发消息的内容未输入，模型自行输出了”你好，李四“

运行webdemo2的时候出现ModuleNotFoundError: No module named 'transformers_modules.'

File "G:\ai\ChatGLM\ChatGLM3\web_demo2.py", line 23, in
tokenizer, model = get_model()
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 212, in wrapper
return cached_func(*args, **kwargs)
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 241, in call
return self._get_or_create_cached_value(args, kwargs)
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 267, in _get_or_create_cached_value
return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 321, in _handle_cache_miss
computed_value = self.info.func(*func_args, **func_kwargs)
File "G:\ai\ChatGLM\ChatGLM3\web_demo2.py", line 14, in get_model
tokenizer = AutoTokenizer.from_pretrained("./models/chatglm3-6b", trust_remote_code=True)
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 676, in from_pretrained
tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\transformers\dynamic_module_utils.py", line 443, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\transformers\dynamic_module_utils.py", line 164, in get_class_in_module
module = importlib.import_module(module_path)
File "D:\Program Files\python\lib\importlib_init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.'
但实际transformers已经安装也是4.30.2

请问1.5B 和 3B 的参数开放吗？

是否和我想象的工具调用一样？

首先，3比2在cpu运行的体验上提升了一大截，明显感觉快了很多，比1和2的体验明显，超赞。

工具调用是向AI 提供一套 api调用接口或应用程序的启动参数说明，然后让AI根据我的需要去调用api和程序。
是这样的吗？

但我不会......，然后我问了chatglm3-6b，它好像也不知道，还是我提问的没有问道关键词？

tokenizer 相关问题

非常感谢开源，很棒的工作
ChatGLM3 的 tokenizer 对特殊字符（如<|user|>）不允许注入，微调时应如何构造对齐模版的数据呢？具体而言，encode 时无法将 <|user|> 等 special tokens 编码到对应 id，而只是当成普通文本处理。这种情况在垂类微调时，数据应该怎么构造、处理，才能保证模板一致？
FYI：QWen 在发布初期也进行了防注入，后续社区反馈影响很大，做出了一定的调整 QWen的处理方式

同时，tokenizer 的类命名（ChatGLMTokenizer）与 ChatGLM2 的 tokenizer 类命名一致，但在细节上完全不同，这可能会使得一些下游仓库在适配时遇到问题，请问是否考虑给 ChatGLM3 的 tokenizer 起一个新的类名字？

发么用代码执行（Code Interpreter）

如题

https://www.chatglm.cn/main/detail

请问下https://www.chatglm.cn/main/detail，官方提供的demo是130b的模型吗

如何支持在线和离线量化？？？

推理速度是不是变慢了呢

更新ChatGLM3之后，感觉推理速度变慢了

可以在摩达上传模型吗，huggingface把咱们国内都墙了，很难下载

在说完“你好啊”之后报错

有计划出10多b的模型吗

多模态

请问visualGLM3会考虑吗

PyTorch版本对推理性能的影响具体是在哪些方面？

萌新~请教各位大佬，我看到文档中说需要torch2.0以上达到最佳推理性能，请问是体现在速度方面吗？会不会影响模型的推理效果呢？
谢谢大佬们！

会直接提供int4量化文件吗

ChatGLM3-6b会提供可直接下载的int4量化权重吗?

缺少 jupyter_client 模块

安装依赖 pip install -r requirements.txt

然后 streamlit run main.py

然后报错下面信息：

2023-10-29 00:31:32.581 Uncaught app exception
Traceback (most recent call last):
  File "/home/xxx/.local/share/virtualenvs/ChatGLM3-uTamXjui/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
    exec(code, module.__dict__)
  File "/home/xxx/code/github/ChatGLM3/composite_demo/main.py", line 11, in <module>
    import demo_chat, demo_ci, demo_tool
  File "/home/xxx/code/github/ChatGLM3/composite_demo/demo_ci.py", line 9, in <module>
    import jupyter_client
ModuleNotFoundError: No module named 'jupyter_client'

安装 pip install jupyter_client
之后正常

ChatGPT-Next-Web请官方支持，感激不尽！！！

端侧

如何部署到手机端呢？

提示语里面有一句话，有一年被评为A，glm3输出这句原话变成了有一量被评为A

Benchmark result reproducibility 关于 Benchmark 结果的复现的方法

感谢 release 了一个非常强大的模型，想问一下 readme 中的 benchmark result 如何进行复现呢？我尝试使用了和 huggingface leaderboard 或提供在原本论文/repo中类似的测试方加 greedy decoding 的方法进行测试。经过一些调试，可以复现 Llama2 文章中提到的大部分分数。但是在测试 ChatGLM3 时发现，除了 AGI_Eval 的分数较为接近外，大部分分数都有 10-20 分以上的差距，尤其是 GSM8K 的 exact match 只有 47.8 分，即使是检测 contains，也只有 51 分。是否可以分享下复现 benchmark 结果的方法？谢谢

请问如何输入调用插件的提示词

我想部署在如vllm的平台上，请问如何通过对话模型输入提示词的方式，将工具的设定输入

想问下可以实现图生文吗？（类似于识图）

在demo里面没有看到

chat-glm3 6B 和 chat-glm3 6B Base 什么区别？我看是分开放模型的

如题

长文本应用场景下prompt

您好，注意到此版模型prompt格式调整挺大。请问长本文下（如文档问答）prompt构建与普通问答一样吗，有没什么样例呢

使用composite_demo报错，请帮忙看下原因

💬 Chat

🛠️ Tool

🧑‍💻 Code Interpreter

Tools

查询巴黎天气

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
Traceback:
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
exec(code, module.dict)
File "/root/ChatGLM3-main/composite_demo/main.py", line 52, in
demo_tool.main(top_p, temperature, prompt_text)
File "/root/ChatGLM3-main/composite_demo/demo_tool.py", line 111, in main
for response in client.generate_stream(
File "/root/ChatGLM3-main/composite_demo/client.py", line 119, in generate_stream
for new_text, _ in stream_chat(self.model,
File "/root/ChatGLM3-main/composite_demo/client.py", line 69, in stream_chat
for outputs in self.stream_generate(**inputs, past_key_values=past_key_values,
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 1156, in stream_generate
outputs = self(
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 937, in forward
transformer_outputs = self.transformer(
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 830, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 640, in forward
layer_ret = layer(
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 544, in forward
attention_output, kv_cache = self.self_attention(
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 376, in forward
mixed_x_layer = self.query_key_value(hidden_states)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)

请问history中的metadata有什么作用？

{'role': 'assistant', 'metadata': '', 'content': '你好世界'}

工具调用具体执行API/函数的代码在哪里？

对python版本有要求吗？

python38能跑吗？

长文本模型是否存在性能损失？

ChatGLM3-6B-32K相对于ChatGLM3-6B在非长文本评价指标上是否存在性能损失？能否透露相关性能损失的具体大小？

Function calling

I see that function calling is the format used by the Openai API, which may not be very user-friendly for some POST requests. Is there any solution to this?

Something went wrong Expecting value: line 1 column 1 (char 0)

输出什么内容，都是出错

控制台中没有错误信息显示，谢谢

长文本文档问答

请问长文档问答的prompt构造如何合适

请问chatglm得generate方法是否支持embedding输入？

我没看到具体generate方法代码，就先用prepare_inputs_for_generation分析。
如上图，llama的prepare_inputs_for_generation可以支持embedding输入，但是chatglm没有。
请问chatglm的generate方法是否不支持embedding输入？
如果理解错误，还望见谅。
@xunkai55 @davidlvxin @duzx16

openai api

之前的openai api脚本应该不能用了吧，有没有大佬写一个新的

运行web_demo.py的时候正常运行但是浏览器无法访问

端口是打开了的也给了白名单但就是浏览器无法访问

Function calling 返回tool时解析错误

当识别到某个tool时，直接输出了tool名字，没有\n，导致split出错！

chatglm3-6b-32k不能实现对tool支持

发现chatglm3-6b可以跑出json，chatglm3-6b-32k跑不出json

兼容openAI的代码什么时候开发？

请问 RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'怎么解决

=== History:
[Conversation(role=<Role.USER: 2>, content='1', tool=None, image=None)]
2023-10-28 23:14:33.424 Uncaught app exception
Traceback (most recent call last):
File "E:\ChatGLM3\venv\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 534, in _run_script
exec(code, module.dict)
File "E:\ChatGLM3\composite_demo\main.py", line 50, in
demo_chat.main(top_p, temperature, system_prompt, prompt_text)
File "E:\ChatGLM3\composite_demo\demo_chat.py", line 50, in main
for response in client.generate_stream(
File "E:\ChatGLM3\composite_demo\client.py", line 119, in generate_stream
for new_text, _ in stream_chat(self.model,
File "E:\ChatGLM3\composite_demo\client.py", line 69, in stream_chat
for outputs in self.stream_generate(**inputs, past_key_values=past_key_values,
File "E:\ChatGLM3\venv\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "C:\Users\zhuya/.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\fc3235f807ef5527af598c05f04f2ffd17f48bab\modeling_chatglm.py", line 1156, in stream_generate
outputs = self(
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\zhuya/.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\fc3235f807ef5527af598c05f04f2ffd17f48bab\modeling_chatglm.py", line 937, in forward
transformer_outputs = self.transformer(
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\zhuya/.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\fc3235f807ef5527af598c05f04f2ffd17f48bab\modeling_chatglm.py", line 830, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\zhuya/.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\fc3235f807ef5527af598c05f04f2ffd17f48bab\modeling_chatglm.py", line 640, in forward
layer_ret = layer(
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\zhuya/.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\fc3235f807ef5527af598c05f04f2ffd17f48bab\modeling_chatglm.py", line 544, in forward
attention_output, kv_cache = self.self_attention(
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\zhuya/.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\fc3235f807ef5527af598c05f04f2ffd17f48bab\modeling_chatglm.py", line 376, in forward
mixed_x_layer = self.query_key_value(hidden_states)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in call_impl
return forward_call(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu" not implemented for 'Half'

速度真快，点赞！！！

点赞！
AWESOME!

thudm / chatglm3 Goto Github PK

chatglm3's Issues

Recommend Projects

Recommend Topics

Recommend Org