Hi,
I just download the colab you provide at this link: https://github.com/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_chat_module_getting_started.ipynb.
I works properly if I use the 7b model, however if I change the settings in order to use the 70b model, I receive the following error:
InternalError Traceback (most recent call last)
in <cell line: 4>()
2 from mlc_chat.callback import StreamToStdout
3
----> 4 cm = ChatModule(
5 model="dist/Llama-2-70b-chat-hf-q4f16_1-MLC",
6 model_lib_path="dist/prebuilt_libs/Llama-2-70b-chat-hf/Llama-2-70b-chat-hf-q4f16_1-cuda.so"
5 frames
tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.PackedFuncBase.call()
tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.FuncCall()
tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.FuncCall3()
tvm/_ffi/_cython/./base.pxi in tvm._ffi._cy3.core.CHECK_CALL()
/workspace/mlc-llm/cpp/llm_chat.cc in LoadParams()
InternalError: Traceback (most recent call last):
7: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtrtvm::runtime::Object const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
at /workspace/mlc-llm/cpp/llm_chat.cc:1633
6: mlc::llm::LLMChat::Reload(tvm::runtime::TVMArgValue, tvm::runtime::String, tvm::runtime::String)
at /workspace/mlc-llm/cpp/llm_chat.cc:631
5: LoadParams
at /workspace/mlc-llm/cpp/llm_chat.cc:219
4: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)>::AssignTypedLambda<void ()(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)>(void ()(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int), std::__cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
3: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)
2: tvm::runtime::relax_vm::NDArrayCacheMetadata::Load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
1: tvm::runtime::LoadBinaryFromFile(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >*)
0: _ZN3tvm7runtime6deta
File "/workspace/tvm/src/runtime/file_utils.cc", line 121
InternalError: Check failed: (!fs.fail()) is false: Cannot open dist/Llama-2-70b-chat-hf-q4f16_1-MLC/ndarray-cache.json