Comments (2)
Try dataset streaming: streaming: true
from llama-factory.
@hiyouga I am still facing the issue with streaming:true and max_steps:10000. I am finetuning LLava on 93000 images and tokenizer just report No Space left on device error after tokenizing around 52000 images. I can see that my sagemaker cache is 75GB after this making the space memory full. how to counter this issue?
Full Command:
llamafactory-cli train \
--stage sft \
--do_train True \
--model_name_or_path llava-hf/llava-1.5-7b-hf \
--preprocessing_num_workers 16 \
--finetuning_type lora \
--template vicuna \
--flash_attn fa2 \
--visual_inputs True \
--dataset_dir data \
--dataset icentia11k \
--cutoff_len 1024 \
--learning_rate 5e-05 \
--num_train_epochs 10.0 \
--max_steps 10000 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--max_grad_norm 1.0 \
--logging_steps 5 \
--save_steps 100 \
--warmup_steps 0 \
--optim adamw_torch \
--packing False \
--report_to none \
--output_dir saves/LLaVA1.5-7B-Chat/lora/train_2024-06-26-11-09-00 \
--fp16 True \
--plot_loss True \
--ddp_timeout 180000000 \
--include_num_input_tokens_seen True \
--lora_rank 8 \
--lora_alpha 32 \
--lora_dropout 0 \
--use_dora True \
--lora_target all \
--streaming True
from llama-factory.
Related Issues (20)
- training 'Num examples' is not equal to the size of the dataset HOT 1
- 全参量微调GLM4-9B时报错
- Is there a way to do QLORA 8bit for Llama3 70B using 2*A6000? HOT 1
- AttributeError: 'str' object has no attribute 'template'. Did you mean: 'replace'? HOT 4
- LORA微调chatglm3-6b报错:ValueError: Target modules {'v_proj', 'gate_proj', 'k_proj', 'o_proj', 'down_proj', 'q_proj', 'up_proj'} not found in the base model. Please check the target modules and try again. HOT 1
- 我尝试对qwen2_0.5B进行Pre-Training,在读取数据集时报错 HOT 3
- 有一个自有模型,有没有指引可以接入llama-factory获取强大的微调等能力?
- is there a plan to support InternVL2?
- About Contamination-Free Packaging During Pretraining HOT 1
- web页面打开后显示错误
- Qwen2VL微调数据集的图片数量有无限制 HOT 7
- Qwen2VL终端推理多模态数据格式问题 HOT 1
- STF MiniCPM3-4B报错 HOT 1
- 执行验证的时候报错的问题
- 【bug】internlm2_5 models is not supported, it seems a mismatch of model name prefix and default_template
- 4卡4090 训练 qwen1.5 14b时候 4个显卡都在加载模型 导致显存溢出。 HOT 2
- npu上qwen2-vl-7b推理问题 HOT 5
- ValueError: Some keys are not used by the HfArgumentParser: ['ddp_timeout', 'do_eval', 'output_dir', 'overwrite_output_dir', 'per_device_eval_batch_size', 'predict_with_generate'] HOT 3
- reward model 导出报没有config 错误
- 最新版LLaMA Factory,使用vllm推理报错 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama-factory.