Code Monkey home page Code Monkey logo

Comments (4)

qwopqwop200 avatar qwopqwop200 commented on May 20, 2024

If this is true, it's very strange. I've coded the result so that it doesn't change.

from autogptq.

PanQiWei avatar PanQiWei commented on May 20, 2024

@GenTxt can you share your quantization code and model to us so that we can try to reproduce and figure out what went wrong.

Also you may try on the up-to-date commit in main branch, may be it can solve your problem.

from autogptq.

GenTxt avatar GenTxt commented on May 20, 2024

https://huggingface.co/kz919/gpt-neox-20b-8k-longtuning/tree/main

Converted above to safetensors with text generation webui script.

CUDA_VISIBLE_DEVICES="0" python quant_with_alpaca.py --pretrained_model_dir models/neox20b_8192_safe --quantized_model_dir 4bit_converted --bits 4 --group_size 128 --fast_tokenizer --save_and_reload

Old models deleted as current triton kernel can cause errors on refurbished 6000.
triton-lang/triton#1556

For the specific code above, this error:

Occurs on NVIDIA GeForce RTX 2080 Ti (similar to original 6000 - gpu1)
Doesn't occur on NVIDIA GeForce RTX 3090 (works fine on same - gpu0)

Quantized in latest cuda main and not encountering the error. False alarm. Closing here and carefully testing each update.

Thanks

from autogptq.

PeiyuZ-star avatar PeiyuZ-star commented on May 20, 2024

https://huggingface.co/kz919/gpt-neox-20b-8k-longtuning/tree/main

Converted above to safetensors with text generation webui script.

CUDA_VISIBLE_DEVICES="0" python quant_with_alpaca.py --pretrained_model_dir models/neox20b_8192_safe --quantized_model_dir 4bit_converted --bits 4 --group_size 128 --fast_tokenizer --save_and_reload

Old models deleted as current triton kernel can cause errors on refurbished 6000. openai/triton#1556

For the specific code above, this error:

Occurs on NVIDIA GeForce RTX 2080 Ti (similar to original 6000 - gpu1)
Doesn't occur on NVIDIA GeForce RTX 3090 (works fine on same - gpu0)

Quantized in latest cuda main and not encountering the error. False alarm. Closing here and carefully testing each update.

Thanks

Hi, I've also tyied neox20b quantization, the inference speed I got is 16tokens/s, which isn't fast enough, may I have your results?

from autogptq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.