Comments (7)
Hi, @santurini
ModuleNotFoundError: No module named 'neural_speed.mistral_cpp'
This means installation is not successful.
Please reinstall ITREX & NeuralSpeed from the source code. It works I have checked it.
git clone https://github.com/intel/intel-extension-for-transformers.git
pip install -r requirmenets.txt
python setup.py install
git clone https://github.com/intel/neural-speed.git
pip install -r requirmenets.txt
python setup.py install
Then, use this script
from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
#model_name = "Intel/neural-chat-7b-v3-1" # Hugging Face model_id or local model
# git lfs install & git clone https://huggingface.co/Intel/neural-chat-7b-v3-1
model_name = "/home/zhenzhong/model/neural-chat-7b-v3-1"
prompt = "Once upon a time, there existed a little girl,"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
As shown, model_name has been changed to the local. Everything will be ok.
BTW, Online loading is a known issue for this model. I will fix this tomorrow and let you know.
Thanks.
from neural-speed.
Hi, online loading for Intel/neural-chat-7b-v3-1 should be ok now if you cherry-pick this #132
from neural-speed.
Can I ask on which OS are you working? I tried to reproduce your steps using Ubuntu-20.04 on WSL-2 but I have errors during the installation (python setup.py install) of ITREX
from neural-speed.
@santurini Hi
I am running ITREX & NeuralSpeed on Linux. Ubuntu-20.04 on WSL-2 should be OK I think.
Please check gxx. I am using gcc version 13.2.0 (conda-forge gcc 13.2.0-5)
conda install conda-forge::gxx
Make sure you have uninstalled all ITREX & NeuralSpeed by using pip uninstall neural-speed
& pip uninstasll
.
If you update the gxx, you also need delete build & neural_speed.egg-info directories before reinstallation
from neural-speed.
In addition, if you are building from source and install the package (non-editable-installation), please check if #88 (comment) helps. If that doesn't work, showing the output of the following commands may help.
export ns_dir=$(python -c "import neural_speed; print(neural_speed.__path__[0])")
echo $ns_dir
ls $ns_dir
from neural-speed.
Sorry for the late response, after playing a bit with the environment I was able to compile everything correctly.
Should I open a new issue to ask about supported quantization methods?
When trying to load mistral-7b-instruct-v0.1.Q4_K_M.gguf, I get the following error:
error loading model: unrecognized tensor type 12
model_init_from_file: failed to load model
Segmentation fault
from neural-speed.
@santurini Hi, we don't support Qx_K_M.gguf / Qx_K_S.gguf currently. Please try q4_0.gguf and check this https://github.com/intel/neural-speed/blob/main/docs/supported_models.md
from neural-speed.
Related Issues (20)
- Qwen2 GPTQ break in cpp_model.Model.np_bestla_qpack HOT 1
- heap-buffer-overflow while packing weight HOT 1
- Performance Gap between Neural Speed Matmul Operator and Llama.cpp Operator HOT 13
- Neural Speed compilation failing in ORT HOT 3
- Sycl support ? HOT 1
- baseline example not working HOT 1
- AssertionError: Fail to convert pytorch model HOT 3
- Issue in whisper inference from pre-converted gguf HOT 2
- Feature request: JSON mode output HOT 1
- Huge performance difference in "Transformer-like" usage and "llama.cpp-like" usage HOT 2
- Running Q4_K_M gguf models: unrecognized tensor type 12 HOT 1
- Distributing tensors across NUMA nodes HOT 3
- Garbled characters with beam search HOT 16
- Is tensor parallelism supported by neural speed? HOT 2
- Question about Thread pool and GEMV HOT 4
- i wish for simpler way to run the model HOT 4
- i saw how beautiful this repo is, in terms of parallelism / numa stuff etc. HOT 1
- Linking back to Neural Chat / intel-extension-for-transformers HOT 2
- Add support for phi-3-mini-128k model HOT 4
- Loading checkpoint shards takes too long HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neural-speed.