rahulschand / gpu_poor Goto Github PK
View Code? Open in Web Editor NEWCalculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
Home Page: https://rahulschand.github.io/gpu_poor/
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
Home Page: https://rahulschand.github.io/gpu_poor/
Hi, thanks for your great work to calculate Tokens/s. I read your code of App.js and found some magic numbers. Can you please add comments for them? Just list out some numbers as below. It is awesome you would add comments for all magic numbers.
let finalPromptTime =
theoryTimePrompt_in_ms * getFloatRatio_F16(quantType) * 1.8 + // What's the meaning of ”1.8“?
convertByteToMB(2 * memoryTransfer) * (0.008 / 100); // What's the meaning of "0.008 / 100" ?
What's the meaning of extraFactor 2.0,1.5,1.0 ...?
I like this, great work.
I saw on your page that you mention the code is open source, but I could not find a license (such as MIT or BSD3, etc.), would it be ok if you add a license file so the terms are clear?
Sorry I am not quite familiar with inference: in fine-tune/training, I simply use the concept of max_seq_length. Are [Prompt len] and [Tokens to Generate] the same as max_seq_length? How could they be different?
Hi, great work! I would like to use it in a terminal environment so I am wondering if you can release the API or add a terminal interaction function. Thanks!
If I'm just using it for inference, do I not need to save the intermediate activation value, for example in vllm
How do I understand this activation value?
Different batch size doesn't seem to affect GPU memory usage when set in INFERENCE MODE? This doesn't seem to make sense. Is that normal?
Hi! I want to add some GPU specs to gpu_configs.json. What is the meaning of compute in that file? Is it the TFLOPS under certain precision?
Thk for your contribution
When I use this project for fine tuning:https://github.com/hiyouga/LLaMA-Factory
I used the Baichuan-13B model for sft,max_token=800,the actual memory size I use is: 28G(A40)
BUT Use your project test as:44241MB(43G)
What is the problem that causes such a difference?
Looking forward to your reply
MY TRAIN SCRIPT:
CUDA_VISIBLE_DEVICES=0 python ../src/train_bash.py
--stage sft
--template baichuan
--model_name_or_path /container/LLM/Baichuan-13B-Chat
--do_train
--dataset test_data
--dataset_dir ../data
--val_size 0.1
--finetuning_type lora
--lora_target W_pack
--output_dir output_Bc
--overwrite_cache
--preprocessing_num_workers 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--cutoff_len 800
--max_new_tokens 1400
--logging_steps 10
--save_steps 6
--eval_steps 6
--max_grad_norm 0.5
--learning_rate 5e-5
--num_train_epochs 3.0
--evaluation_strategy steps
--load_best_model_at_end
--plot_loss
--fp16
--overwrite_output_dir
--seed 3407
Hey @RahulSChand, Awesome work on creating this calculator. But there are some problems I am facing and getting unreliable results. Here are some of the issues I am facing:
The configurations I will be using are as follows:
Model: CodeLlama
Param size: 7B
batch size: 1
context length: 2048
In LoRA
it is showing: 177 GB
and for QLoRA
it is showing: 180 GB
and full
fine-tuning it is showing: 216 GB
When I upload the config.json
file vs. just the parameter number, it shows inconsistent results.
The memory requirement number should not be this much. For example, I am using just 1 as batch size and 2048 context length size it is showing triple digits for LoRA and QLoRA, and now consider this graph. Reference
According to this graph, the memory requirement for LoRA is 16GB but in the calculation, it is showing 177 GB.
So, can you please address this doubts and if there is any way to fix this, it would be awesome.
As far as i know, we can set lora rank and target module to change the number of trainable parameter, which, I think, can cause different memory usage. But i didn't fine any relevant setting in your project. How do you estimate memory usage without those information?
if i'm using deepspeed+huggingface (including ZeRO-1, 2, 3). Is there any difference on memory usage compare with just using 🤗? If there is a difference, is it gonna be supported?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.