Code Monkey home page Code Monkey logo

chatglm3's Issues

RuntimeError: CUDA error: device-side assert triggered

模型部署后调用了几百次没问题 但再调用就报了这个错误
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
return await self.app(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/applications.py", line 276, in call
await super().call(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
raise e
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/routing.py", line 237, in app
raw_response = await run_endpoint_function(
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
return await dependant.call(**values)
File "get_api_cuda1.py", line 66, in create_item
response, history = model.chat(tokenizer,
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 1032, in chat
inputs = inputs.to(self.device)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 758, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 758, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

输出图像的Prompt格式

我有观察到Prompt文档显示如何生成图像,如下所示。请问【image】是随意写个占位符就行了是吗?

...
plt.axis('equal')
plt.axis('off')
plt.show()

<|observation|>
```result
【image】

<|assistant|>
这是一个爱心形状。我使用了参数方程来描述这个形状,并使用matplotlib进行了绘制。如果您有任何其他需求或问题,请随时告诉我。
<|user|> # End

model paths in web demos are pointing to local path

In both web_demo.py and web_demo2.py the paths to ChatGLM3 model are pointing to a local path "/mnt/vepfs/workspace/zxdu/chatglm3-6b"

eg.

tokenizer = AutoTokenizer.from_pretrained("/mnt/vepfs/workspace/zxdu/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("/mnt/vepfs/workspace/zxdu/chatglm3-6b", trust_remote_code=True).cuda()

They probably should be corrected to hugging face hub format, like "THUDM/chatglm3-6b"

运行webdemo2的时候出现ModuleNotFoundError: No module named 'transformers_modules.'

File "G:\ai\ChatGLM\ChatGLM3\web_demo2.py", line 23, in
tokenizer, model = get_model()
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 212, in wrapper
return cached_func(*args, **kwargs)
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 241, in call
return self._get_or_create_cached_value(args, kwargs)
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 267, in _get_or_create_cached_value
return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 321, in _handle_cache_miss
computed_value = self.info.func(*func_args, **func_kwargs)
File "G:\ai\ChatGLM\ChatGLM3\web_demo2.py", line 14, in get_model
tokenizer = AutoTokenizer.from_pretrained("./models/chatglm3-6b", trust_remote_code=True)
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 676, in from_pretrained
tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\transformers\dynamic_module_utils.py", line 443, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "G:\ai\ChatGLM\ChatGLM3\venv\lib\site-packages\transformers\dynamic_module_utils.py", line 164, in get_class_in_module
module = importlib.import_module(module_path)
File "D:\Program Files\python\lib\importlib_init
.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.'
但实际transformers已经安装也是4.30.2

是否和我想象的工具调用一样?

首先,3比2在cpu运行的体验上提升了一大截,明显感觉快了很多,比1和2的体验明显,超赞。

工具调用是 向AI 提供一套 api调用接口 或 应用程序的启动参数说明,然后让AI根据我的需要去调用api和程序。
是这样的吗?

但我不会......,然后我问了chatglm3-6b,它好像也不知道,还是我提问的没有问道关键词?

微信图片_20231027171422

tokenizer 相关问题

非常感谢开源,很棒的工作
ChatGLM3 的 tokenizer 对特殊字符(如<|user|>)不允许注入,微调时应如何构造对齐模版的数据呢?具体而言,encode 时无法将 <|user|> 等 special tokens 编码到对应 id,而只是当成普通文本处理。这种情况在垂类微调时,数据应该怎么构造、处理,才能保证模板一致?
FYI:QWen 在发布初期也进行了防注入,后续社区反馈影响很大,做出了一定的调整 QWen的处理方式

同时,tokenizer 的类命名(ChatGLMTokenizer)与 ChatGLM2 的 tokenizer 类命名一致,但在细节上完全不同,这可能会使得一些下游仓库在适配时遇到问题,请问是否考虑给 ChatGLM3 的 tokenizer 起一个新的类名字?

缺少 jupyter_client 模块

安装依赖 pip install -r requirements.txt

然后 streamlit run main.py

然后报错下面信息:

2023-10-29 00:31:32.581 Uncaught app exception
Traceback (most recent call last):
  File "/home/xxx/.local/share/virtualenvs/ChatGLM3-uTamXjui/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
    exec(code, module.__dict__)
  File "/home/xxx/code/github/ChatGLM3/composite_demo/main.py", line 11, in <module>
    import demo_chat, demo_ci, demo_tool
  File "/home/xxx/code/github/ChatGLM3/composite_demo/demo_ci.py", line 9, in <module>
    import jupyter_client
ModuleNotFoundError: No module named 'jupyter_client'

安装 pip install jupyter_client
之后正常

端侧

如何部署到手机端呢?

Benchmark result reproducibility 关于 Benchmark 结果的复现的方法

感谢 release 了一个非常强大的模型,想问一下 readme 中的 benchmark result 如何进行复现呢?我尝试使用了和 huggingface leaderboard 或提供在原本论文/repo中类似的测试方加 greedy decoding 的方法进行测试。经过一些调试,可以复现 Llama2 文章中提到的大部分分数。但是在测试 ChatGLM3 时发现,除了 AGI_Eval 的分数较为接近外,大部分分数都有 10-20 分以上的差距,尤其是 GSM8K 的 exact match 只有 47.8 分,即使是检测 contains,也只有 51 分。是否可以分享下复现 benchmark 结果的方法?谢谢

长文本应用场景下prompt

您好,注意到此版模型prompt格式调整挺大。请问长本文下(如文档问答)prompt构建与普通问答一样吗,有没什么样例呢

使用composite_demo报错,请帮忙看下原因

💬 Chat

🛠️ Tool

🧑‍💻 Code Interpreter

Tools

查询巴黎天气

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
Traceback:
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
exec(code, module.dict)
File "/root/ChatGLM3-main/composite_demo/main.py", line 52, in
demo_tool.main(top_p, temperature, prompt_text)
File "/root/ChatGLM3-main/composite_demo/demo_tool.py", line 111, in main
for response in client.generate_stream(
File "/root/ChatGLM3-main/composite_demo/client.py", line 119, in generate_stream
for new_text, _ in stream_chat(self.model,
File "/root/ChatGLM3-main/composite_demo/client.py", line 69, in stream_chat
for outputs in self.stream_generate(**inputs, past_key_values=past_key_values,
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 1156, in stream_generate
outputs = self(
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 937, in forward
transformer_outputs = self.transformer(
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 830, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 640, in forward
layer_ret = layer(
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 544, in forward
attention_output, kv_cache = self.self_attention(
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 376, in forward
mixed_x_layer = self.query_key_value(hidden_states)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/chatglm3-demo/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)

Function calling

I see that function calling is the format used by the Openai API, which may not be very user-friendly for some POST requests. Is there any solution to this?

openai api

之前的openai api脚本应该不能用了吧,有没有大佬写一个新的

请问 RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'怎么解决

=== History:
[Conversation(role=<Role.USER: 2>, content='1', tool=None, image=None)]
2023-10-28 23:14:33.424 Uncaught app exception
Traceback (most recent call last):
File "E:\ChatGLM3\venv\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 534, in _run_script
exec(code, module.dict)
File "E:\ChatGLM3\composite_demo\main.py", line 50, in
demo_chat.main(top_p, temperature, system_prompt, prompt_text)
File "E:\ChatGLM3\composite_demo\demo_chat.py", line 50, in main
for response in client.generate_stream(
File "E:\ChatGLM3\composite_demo\client.py", line 119, in generate_stream
for new_text, _ in stream_chat(self.model,
File "E:\ChatGLM3\composite_demo\client.py", line 69, in stream_chat
for outputs in self.stream_generate(**inputs, past_key_values=past_key_values,
File "E:\ChatGLM3\venv\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "C:\Users\zhuya/.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\fc3235f807ef5527af598c05f04f2ffd17f48bab\modeling_chatglm.py", line 1156, in stream_generate
outputs = self(
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\zhuya/.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\fc3235f807ef5527af598c05f04f2ffd17f48bab\modeling_chatglm.py", line 937, in forward
transformer_outputs = self.transformer(
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\zhuya/.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\fc3235f807ef5527af598c05f04f2ffd17f48bab\modeling_chatglm.py", line 830, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\zhuya/.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\fc3235f807ef5527af598c05f04f2ffd17f48bab\modeling_chatglm.py", line 640, in forward
layer_ret = layer(
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\zhuya/.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\fc3235f807ef5527af598c05f04f2ffd17f48bab\modeling_chatglm.py", line 544, in forward
attention_output, kv_cache = self.self_attention(
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\zhuya/.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\fc3235f807ef5527af598c05f04f2ffd17f48bab\modeling_chatglm.py", line 376, in forward
mixed_x_layer = self.query_key_value(hidden_states)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in call_impl
return forward_call(*args, **kwargs)
File "E:\ChatGLM3\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu
" not implemented for 'Half'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.