Trying simple example on m1 mac: <div class="snippet-clipboard-content notranslate

Thanks a lot <a class="user-mention notranslate" data-hovercard-type="user" data-hover

Maybe this - <a href="https://stackoverflow.com

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Latest quantized models are available in this repo: <a href="https://hugg

Segmentation fault on m1 mac,about marella/ctransformers

Comments (65)

marella commented on June 26, 2024 2

Thanks a lot @s-kostyaev for helping in debugging the issue.

from ctransformers.

s-kostyaev commented on June 26, 2024 1

16 minutes starchat-alpha-q4_0 100% cpu no output with max_new_tokens=1 file test.py:

from ctransformers import AutoModelForCausalLM
from ctransformers import AutoConfig

config = AutoConfig.from_pretrained(
    "/Users/sergeykostyaev/nn/text-generation-webui/models/starchat-alpha-ggml-q4_0.bin",
    threads=8,
)

llm = AutoModelForCausalLM.from_pretrained(
    "/Users/sergeykostyaev/nn/text-generation-webui/models/starchat-alpha-ggml-q4_0.bin",
    model_type="starcoder",
    lib="/Users/sergeykostyaev/nn/ctransformers/build/lib/libctransformers.dylib",
    config=config,
)
print("loaded")
print(llm("Hi", max_new_tokens=1, threads=8))

Printed only "loaded"

45 minutes - nothing changes

from ctransformers.

marella commented on June 26, 2024 1

Thanks. Tomorrow I will add a main.cc file to repo which can be run directly without Python. It should make it easy to debug the issue.

from ctransformers.

marella commented on June 26, 2024 1

Maybe this - https://stackoverflow.com/questions/54587052/cmake-on-mac-could-not-find-threads-missing-threads-found

I also saw this but cmake should fail with an error but it is successfully building. May be it found threads but simply not printing it. When you build ggml repo, are you seeing a line which says Found Threads: TRUE?

No.

Thanks for checking. I think cmake is just not printing that it found threads library, otherwise it wouldn't work all.

from ctransformers.

marella commented on June 26, 2024 1

Thanks. I think I found the issue. I will make a new release and will let you know in sometime.

from ctransformers.

marella commented on June 26, 2024 1

@marella sorry I've been working like crazy, I see @s-kostyaev executed the necessary commands, if you need anything else from my hardware just let me know, glad you guys found it.

No worries @bgonzalezfractal

@s-kostyaev I released a fix in the latest version 0.2.1 Please update:

pip install --upgrade ctransformers

and let me know if it works. Please don't set lib=... option.

Also please try running with different threads (1, 4, 8) and let me know if you see any change in performance.

from ctransformers.

s-kostyaev commented on June 26, 2024 1

Finally it works. Threads parameter works. It even works with conda now. Thank you!

from ctransformers.

marella commented on June 26, 2024

Hi, ggml recently introduced a breaking change so existing models have to be re-quantized. This error happens when you are using a old model with the new ggml library. If you pull the latest changes from ggml repo or do a fresh clone, you should get the same error with example code as well.

Latest quantized models are available in this repo: https://huggingface.co/NeoDim/starcoderbase-GGML/tree/main If you have already downloaded from this repo, please check if they are the latest as they got updated just 1 day ago.

Please ensure you are using the latest version of this library:

pip install --upgrade ctransformers

and then run:

llm = AutoModelForCausalLM.from_pretrained(
    'NeoDim/starcoderbase-GGML',
    model_file='starcoderbase-ggml-q4_0.bin',
    model_type='starcoder',
)

print(llm('Hi', max_new_tokens=1))

Above example downloads the latest model file from hugging face repo directly. Please let me know if this works. The reason I used max_new_tokens=1 is because currently it is slow on Mac M1 (#5 (comment)). If this basic example is working, we can see how to improve the performance.

from ctransformers.

s-kostyaev commented on June 26, 2024

Latest quantized models are available in this repo: https://huggingface.co/NeoDim/starcoderbase-GGML/tree/main If you have already downloaded from this repo, please check if they are the latest as they got updated just 1 day ago.

I know, this is mine repo :)

It still crashes with segmentation fault

from ctransformers.

marella commented on June 26, 2024

I know, this is mine repo :)

Oh, nice! :)

Can you please try building from source and let me know if it works:

git clone --recurse-submodules https://github.com/marella/ctransformers
cd ctransformers
./scripts/build.sh

The compiled library will be located at build/lib/libctransformers.dylib which can be used as:

llm = AutoModelForCausalLM.from_pretrained(..., lib='/path/to/ctransformers/build/lib/libctransformers.dylib')

from ctransformers.

s-kostyaev commented on June 26, 2024

Compiled from source also crashes with segmentation fault.

from ctransformers.

marella commented on June 26, 2024

Thanks for checking. Can you please check with a simpler model to verify if it is starcoder specific issue or library issue:

llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')

Also were you getting the error while loading the model using from_pretrained() or while generating text using llm()?

Also can you please share your macOS and Python versions. Since I don't have a mac, it may take a while to debug this.

from ctransformers.

s-kostyaev commented on June 26, 2024

Unfortunately it also segfaults

from ctransformers.

marella commented on June 26, 2024

Were you getting the error while loading the model using from_pretrained() or while generating text using llm()?

from ctransformers.

s-kostyaev commented on June 26, 2024

Also were you getting the error while loading the model using from_pretrained() or while generating text using llm()?

While generating. Load is fine. Tokenizer also works

from ctransformers.

marella commented on June 26, 2024

Thanks. Can you try running the following and let me know where it is throwing the error:

print('eval', llm.eval([123]))

print('sample', llm.sample())

from ctransformers.

s-kostyaev commented on June 26, 2024

Sample works fine. Eval leads to segmentation fault

from ctransformers.

s-kostyaev commented on June 26, 2024

% export PYTHONFAULTHANDLER=1
% python modules/test.py     
Fetching 1 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 19239.93it/s]
Fetching 1 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2978.91it/s]
loaded
Fatal Python error: Segmentation fault

Thread 0x00000001711bf000 (most recent call first):
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/threading.py", line 324 in wait
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/threading.py", line 607 in wait
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00000001e4c2db40 (most recent call first):
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/site-packages/ctransformers/llm.py", line 241 in eval
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/site-packages/ctransformers/llm.py", line 320 in generate
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/site-packages/ctransformers/llm.py", line 362 in _stream
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/site-packages/ctransformers/llm.py", line 453 in __call__
  File "/Users/username/nn/text-generation-webui/modules/test.py", line 11 in <module>

Extension modules: charset_normalizer.md, yaml._yaml (total: 2)
zsh: segmentation fault  python modules/test.py
/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown                     
  warnings.warn('resource_tracker: There appear to be %d '

from ctransformers.

s-kostyaev commented on June 26, 2024

I'm pretty new in python world. I hope it can help to debug this issue.

from ctransformers.

s-kostyaev commented on June 26, 2024

With pdb:

(Pdb) step
> /opt/homebrew/anaconda3/envs/textgen/lib/python3.10/site-packages/ctransformers/llm.py(242)eval()
-> batch_size, threads)
(Pdb) step
> /opt/homebrew/anaconda3/envs/textgen/lib/python3.10/site-packages/ctransformers/llm.py(241)eval()
-> status = self.ctransformers_llm_batch_eval(tokens, n_tokens,
(Pdb) step
Fatal Python error: Segmentation fault

Thread 0x0000000171ac7000 (most recent call first):
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/threading.py", line 324 in wait
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/threading.py", line 607 in wait
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00000001e4c2db40 (most recent call first):
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/site-packages/ctransformers/llm.py", line 241 in eval
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/site-packages/ctransformers/llm.py", line 320 in generate
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/site-packages/ctransformers/llm.py", line 362 in _stream
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/site-packages/ctransformers/llm.py", line 453 in __call__
  File "/Users/sergeykostyaev/nn/text-generation-webui/modules/test.py", line 11 in <module>
  File "<string>", line 1 in <module>
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/bdb.py", line 597 in run
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/pdb.py", line 1592 in _runscript
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/pdb.py", line 1732 in main
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/pdb.py", line 1759 in <module>
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/runpy.py", line 86 in _run_code
  File "/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/runpy.py", line 196 in _run_module_as_main

Extension modules: charset_normalizer.md, yaml._yaml (total: 2)
zsh: segmentation fault  python -m pdb modules/test.py
/opt/homebrew/anaconda3/envs/textgen/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown                     
  warnings.warn('resource_tracker: There appear to be %d '

from ctransformers.

s-kostyaev commented on June 26, 2024

With lldb I can see:

%  lldb `which python3.10`                  
error: module importing failed: Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'cpython_lldb'
(lldb) target create "/opt/homebrew/anaconda3/envs/textgen/bin/python3.10"
Current executable set to '/opt/homebrew/anaconda3/envs/textgen/bin/python3.10' (arm64).
(lldb) run modules/test.py
Process 60321 launched: '/opt/homebrew/anaconda3/envs/textgen/bin/python3.10' (arm64)
Fetching 1 files: 100% 1/1 [00:00<00:00, 19599.55it/s]
Fetching 1 files: 100% 1/1 [00:00<00:00, 3890.82it/s]
loaded
Process 60321 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16edffff8)
    frame #0: 0x00000001897bade0 libsystem_pthread.dylib`___chkstk_darwin + 60
libsystem_pthread.dylib`:
->  0x1897bade0 <+60>: ldur   x11, [x11, #-0x8]
    0x1897bade4 <+64>: mov    x10, sp
    0x1897bade8 <+68>: cmp    x9, #0x1, lsl #12         ; =0x1000 
    0x1897badec <+72>: b.lo   0x1897bae04               ; <+96>
(lldb) up
frame #1: 0x0000000105386810 libctransformers.dylib`ggml_graph_compute + 128
libctransformers.dylib`ggml_graph_compute:
->  0x105386810 <+128>: mov    x9, sp
    0x105386814 <+132>: sub    x23, x9, x8
    0x105386818 <+136>: mov    sp, x23
    0x10538681c <+140>: mov    x25, #0x0
(lldb) up
frame #2: 0x0000000105352960 libctransformers.dylib`gpt2_eval(gpt2_model const&, int, int, std::__1::vector<int, std::__1::allocator<int>> const&, std::__1::vector<float, std::__1::allocator<float>>&, unsigned long&) + 2252
libctransformers.dylib`gpt2_eval:
->  0x105352960 <+2252>: ldp    x24, x22, [sp, #0x20]
    0x105352964 <+2256>: ldp    x20, x8, [x24]
    0x105352968 <+2260>: sub    x8, x8, x20
    0x10535296c <+2264>: asr    x8, x8, #2
(lldb)

from ctransformers.

marella commented on June 26, 2024

Thanks for the detailed info. It looks like you are using anaconda and in a different issue (tee-ar-ex/trx-python#23 (comment) not related to this library) someone pointed out that anaconda could be the cause.
So can you please try installing Python from https://www.python.org/downloads/ and see if it works. Once Python is installed, pip can be installed using python3 -m ensurepip --upgrade

from ctransformers.

s-kostyaev commented on June 26, 2024

I will try, thanks

from ctransformers.

s-kostyaev commented on June 26, 2024

Without anaconda it doesn't segfault. But it's super slow. And parameter threads does nothing. Always 100% cpu. Eleven minutes is not enough to generate single token on m1 max with model "marella/gpt-2-ggml"

from ctransformers.

marella commented on June 26, 2024

Can you try building from source and see if it improves.

from ctransformers.

s-kostyaev commented on June 26, 2024

Sure. Now try using built from source

from ctransformers.

s-kostyaev commented on June 26, 2024

Why only single thread running? It also doesn't segfault on manually built library, but also super slow. I'm not sure how long it will take to generate single token.

from ctransformers.

s-kostyaev commented on June 26, 2024

16 minutes starchat-alpha-q4_0 100% cpu no output with max_new_tokens=1 file test.py:

from ctransformers import AutoModelForCausalLM
from ctransformers import AutoConfig

config = AutoConfig.from_pretrained(
    "/Users/sergeykostyaev/nn/text-generation-webui/models/starchat-alpha-ggml-q4_0.bin",
    threads=8,
)

llm = AutoModelForCausalLM.from_pretrained(
    "/Users/sergeykostyaev/nn/text-generation-webui/models/starchat-alpha-ggml-q4_0.bin",
    model_type="starcoder",
    lib="/Users/sergeykostyaev/nn/ctransformers/build/lib/libctransformers.dylib",
    config=config,
)
print("loaded")
print(llm("Hi", max_new_tokens=1, threads=8))

Printed only "loaded"

from ctransformers.

marella commented on June 26, 2024

I think there might be some issue in the library itself. Another user also reported same issue (#1 (comment)) but I thought it was just running slow. Before the breaking changes in GGML, older version of this library was working on M1 mac but just very slow. Now it appears to be not working.

Can you also try running a LLaMA model which basically uses llama.cpp:

llm = AutoModelForCausalLM.from_pretrained(
    'TheBloke/LLaMa-7B-GGML',
    model_file='llama-7b.ggmlv3.q4_0.bin',
    model_type='llama',
)

from ctransformers.

s-kostyaev commented on June 26, 2024

I will try with llama.cpp model

from ctransformers.

s-kostyaev commented on June 26, 2024

Also inference on example code from library - https://github.com/ggerganov/ggml/tree/master/examples/starcoder run just fine.

from ctransformers.

s-kostyaev commented on June 26, 2024

I think there might be some issue in the library itself. Another user also reported same issue (#1 (comment)) but I thought it was just running slow. Before the breaking changes in GGML, older version of this library was working on M1 mac but just very slow. Now it appears to be not working.

Can you also try running a LLaMA model which basically uses llama.cpp:
llm = AutoModelForCausalLM.from_pretrained(
    'TheBloke/LLaMa-7B-GGML',
    model_file='llama-7b.ggmlv3.q4_0.bin',
    model_type='llama',
)

Looks like for now llamacpp models has the same issue on apple silicon

from ctransformers.

marella commented on June 26, 2024

Thanks for checking patiently. I will debug this later.

Can you please try one last thing: try installing an older version of this library and see if it works:

pip install ctransformers==0.1.2

llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')

print(llm('Hi', max_new_tokens=1))

from ctransformers.

s-kostyaev commented on June 26, 2024

Thanks for checking patiently. I will debug this later.

Can you please try one last thing: try installing an older version of this library and see if it works:
pip install ctransformers==0.1.2
llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')

print(llm('Hi', max_new_tokens=1))

Sure.

from ctransformers.

s-kostyaev commented on June 26, 2024

Thanks for checking patiently. I will debug this later.

Can you please try one last thing: try installing an older version of this library and see if it works:
pip install ctransformers==0.1.2
llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')

print(llm('Hi', max_new_tokens=1))

Looks like after downgrade issue still here.

from ctransformers.

marella commented on June 26, 2024

Hi, I added the main.cc file in debug git branch. Please check if it works:

git clone --recurse-submodules https://github.com/marella/ctransformers
cd ctransformers
git checkout debug

./scripts/build.sh
./build/lib/main <model_type> <model_path> # ./build/lib/main gpt2 /path/to/ggml-model.bin

Also please send the output of both ./scripts/build.sh and ./build/lib/main commands.

from ctransformers.

s-kostyaev commented on June 26, 2024

Hi. Sure, will test it now.

from ctransformers.

s-kostyaev commented on June 26, 2024

%  ./build/lib/main starcoder ../text-generation-webui/models/starchat-alpha-ggml-q4_0.bin 

model type : 'starcoder'
model path : '../text-generation-webui/models/starchat-alpha-ggml-q4_0.bin'
prompt     : 'Hi'

load ... ✔
tokenize ... ✔
eval ... ✔
sample ... ✔
detokenize ... ✔
delete ... ✔

response : ''

from ctransformers.

s-kostyaev commented on June 26, 2024

%  ./scripts/build.sh
-- CTRANSFORMERS_INSTRUCTIONS: avx2
-- ARM detected
-- Accelerate framework found
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/sergeykostyaev/nn/ctransformers/build
[ 60%] Built target ctransformers
[ 80%] Building CXX object CMakeFiles/main.dir/main.cc.o
[100%] Linking CXX executable lib/main
ld: warning: directory not found for option '-L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/'
[100%] Built target main

from ctransformers.

s-kostyaev commented on June 26, 2024

%  ./build/lib/main llama ../text-generation-webui/models/WizardLM-7B-uncensored.ggmlv3.q4_0.bin 

model type : 'llama'
model path : '../text-generation-webui/models/WizardLM-7B-uncensored.ggmlv3.q4_0.bin'
prompt     : 'Hi'

load ... ✔
tokenize ... ✔
eval ... ✔
sample ... ✔
detokenize ... ✔
delete ... ✔

response : '!'

from ctransformers.

s-kostyaev commented on June 26, 2024

%  ./build/lib/main starcoder ../text-generation-webui/models/starcoder-ggml-q4_0.bin

model type : 'starcoder'
model path : '../text-generation-webui/models/starcoder-ggml-q4_0.bin'
prompt     : 'Hi'

load ... ✔
tokenize ... ✔
eval ... ✔
sample ... ✔
detokenize ... ✔
delete ... ✔

response : ''

from ctransformers.

marella commented on June 26, 2024

%  ./scripts/build.sh
-- CTRANSFORMERS_INSTRUCTIONS: avx2
-- ARM detected
-- Accelerate framework found
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/sergeykostyaev/nn/ctransformers/build
[ 60%] Built target ctransformers
[ 80%] Building CXX object CMakeFiles/main.dir/main.cc.o
[100%] Linking CXX executable lib/main
ld: warning: directory not found for option '-L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/'
[100%] Built target main

Thanks. Is this entire output of build script? It should print the line "Found Threads: TRUE"
Also not sure why ld: warning: directory not found for option '-L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/' appears on a arm macos.

from ctransformers.

s-kostyaev commented on June 26, 2024

ld: warning: directory not found for option '-L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/'
this is problem in my configuration. Shouldn't be a problem. And yes, this is entire output.

from ctransformers.

marella commented on June 26, 2024

I'm suspecting the issue to be with threads library not being found because the errors you posted previously also show threads in error message.

When you build ggml repo, are you seeing a line which says Found Threads: TRUE?

Also can you please try removing the line set(THREADS_PREFER_PTHREAD_FLAG ON) in CMakeLists.txt and try building again and see if threads is appearing.

from ctransformers.

s-kostyaev commented on June 26, 2024

After removing line set(THREADS_PREFER_PTHREAD_FLAG ON) from file models/CMakeLists.txt:

%  ./scripts/build.sh         
-- CTRANSFORMERS_INSTRUCTIONS: avx2
-- ARM detected
-- Accelerate framework found
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/sergeykostyaev/nn/ctransformers/build
[ 60%] Built target ctransformers
[100%] Built target main

from ctransformers.

marella commented on June 26, 2024

%  ./build/lib/main llama ../text-generation-webui/models/WizardLM-7B-uncensored.ggmlv3.q4_0.bin 

model type : 'llama'
model path : '../text-generation-webui/models/WizardLM-7B-uncensored.ggmlv3.q4_0.bin'
prompt     : 'Hi'

load ... ✔
tokenize ... ✔
eval ... ✔
sample ... ✔
detokenize ... ✔
delete ... ✔

response : '!'

At least the LLaMA model is giving some output, so the C++ code is working. So the issue might be when loading the library into Python. I will search more about this and get back to you if I find a solution. Thanks for helping with the debugging.

from ctransformers.

s-kostyaev commented on June 26, 2024

Maybe this - https://stackoverflow.com/questions/54587052/cmake-on-mac-could-not-find-threads-missing-threads-found

from ctransformers.

marella commented on June 26, 2024

Maybe this - https://stackoverflow.com/questions/54587052/cmake-on-mac-could-not-find-threads-missing-threads-found

I also saw this but cmake should fail with an error but it is successfully building. May be it found threads but simply not printing it. When you build ggml repo, are you seeing a line which says Found Threads: TRUE?

from ctransformers.

s-kostyaev commented on June 26, 2024

Thank you. Will wait if you find a solution.

from ctransformers.

s-kostyaev commented on June 26, 2024

Maybe this - https://stackoverflow.com/questions/54587052/cmake-on-mac-could-not-find-threads-missing-threads-found

I also saw this but cmake should fail with an error but it is successfully building. May be it found threads but simply not printing it. When you build ggml repo, are you seeing a line which says Found Threads: TRUE?

No.

from ctransformers.

$bgonzalezfractal avatar$ bgonzalezfractal commented on June 26, 2024

Hi @marella, I've been mentioned in #1 and #5. I have been able to run quantized models for starcoder, starchat, llama, whisper and mpt so far. Nonetheless, none of them work in ctransformers:

Mac M1 64GB vRAM
Architecture: ARM

I get exactly the same error as @s-kostyaev, meaning the llm object keeps running forever without any change, using the models natively works just fine. We've been trying to use ctransformers and langchain but nothing works, any new information?

I have done everything mentioned in this repo as well, building from source doesn't work.

works just fine with ggml natively at 79.63 ms/token

from ctransformers.

marella commented on June 26, 2024

Hi @bgonzalezfractal, s-kostyaev was helping me debug the issue but I couldn't find the reason/solution to this yet. So far we found that:

Model is loading and tokenize is working but eval method is failing in Python
C++ code works fine natively but it is not working when being called from Python

I will keep looking for a solution and will let you know on this thread if I find a solution or if I need your help in debugging the issue.

Can you also please run the following and share the output:

git clone --recurse-submodules https://github.com/marella/ctransformers
cd ctransformers
git checkout debug

./scripts/build.sh
./build/lib/main <model_type> <model_path> # example: ./build/lib/main gpt2 /path/to/ggml-model.bin

Please share the output of ./scripts/build.sh and ./build/lib/main commands.

from ctransformers.

s-kostyaev commented on June 26, 2024

 %  ./scripts/build.sh
-- CTRANSFORMERS_INSTRUCTIONS: avx2
-- ARM detected
-- Accelerate framework found
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/sergeykostyaev/nn/ctransformers/build
[ 60%] Built target ctransformers
[ 80%] Building CXX object CMakeFiles/main.dir/main.cc.o
[100%] Linking CXX executable lib/main
ld: warning: directory not found for option '-L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/'
[100%] Built target main

from ctransformers.

s-kostyaev commented on June 26, 2024

%  ./build/lib/main llama ../LocalAI/models/WizardLM-7B-uncensored.ggmlv3.q4_0.bin 

model type : 'llama'
model path : '../LocalAI/models/WizardLM-7B-uncensored.ggmlv3.q4_0.bin'
prompt     : 'Hi'

load ... ✔
tokenize ... ✔
> [ 1 18567 ]
eval ... ✔
sample ... ✔
> 29892
detokenize ... ✔
> ','
delete ... ✔

from ctransformers.

s-kostyaev commented on June 26, 2024

I see code is updated, so this is output of commands.

from ctransformers.

marella commented on June 26, 2024

Thanks @s-kostyaev, I was actually asking bgonzalezfractal to run it so that I can check and comapre the output on their system as well :)

Since you already built it, can you also run ./build/lib/main on a starcoder model, because yesterday it was giving an empty response.

from ctransformers.

s-kostyaev commented on June 26, 2024

Sure.

%  ./build/lib/main starcoder ../LocalAI/models/starchat-alpha-ggml-q4_0.bin

model type : 'starcoder'
model path : '../LocalAI/models/starchat-alpha-ggml-q4_0.bin'
prompt     : 'Hi'

load ... ✔
tokenize ... ✔
> [ 12575 ]
eval ... ✔
sample ... ✔
> 399
detokenize ... ✔
> ' A'
delete ... ✔

from ctransformers.

marella commented on June 26, 2024

Thanks. So the C++ code works fine natively and doesn't have any issue. I will have to debug why it is failing from Python.

from ctransformers.

marella commented on June 26, 2024

@s-kostyaev I found another issue LibRaw/LibRaw#437 (comment) which looks similar to the error you posted previously #8 (comment)
They mention it to be a stack size limit issue which gets worse with multiple threads.
So can you please try using threads=1 after building from source (I added some print statements):

git clone --recurse-submodules https://github.com/marella/ctransformers
cd ctransformers
git checkout debug
./scripts/build.sh

llm = AutoModelForCausalLM.from_pretrained(..., lib='/path/to/ctransformers/build/lib/libctransformers.dylib')

print(llm('Hi', max_new_tokens=1, threads=1))

Also please run with threads=4 and share both the outputs.

In above thread, they also suggested increasing stack size limit but I'm not sure what an ideal limit would be.

from ctransformers.

s-kostyaev commented on June 26, 2024

Sure. Will test it.

from ctransformers.

s-kostyaev commented on June 26, 2024

With single thread:

 %  python3 test.py 

ggml_graph_compute: n_threads = 0
ggml_graph_compute: create thread pool
ggml_graph_compute: initialize tasks + work buffer
ggml_graph_compute: allocating work buffer for graph (26048 bytes)
ggml_graph_compute: compute nodes

And it stucked.

from ctransformers.

marella commented on June 26, 2024

Are you using threads=1? because it is printing n_threads = 0!
Can you also please check with threads=4.

from ctransformers.

s-kostyaev commented on June 26, 2024

Sure.

%  python3 test.py 

ggml_graph_compute: n_threads = 0
ggml_graph_compute: create thread pool
ggml_graph_compute: initialize tasks + work buffer
ggml_graph_compute: allocating work buffer for graph (26048 bytes)
ggml_graph_compute: compute nodes

from ctransformers.

s-kostyaev commented on June 26, 2024

This is with 4 threads set. And even set in 2 places - config and llm eval call.

from ctransformers.

$bgonzalezfractal avatar$ bgonzalezfractal commented on June 26, 2024

@marella sorry I've been working like crazy, I see @s-kostyaev executed the necessary commands, if you need anything else from my hardware just let me know, glad you guys found it.

from ctransformers.

Segmentation fault on m1 mac about ctransformers HOT 65 CLOSED

Comments (65)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent