Comments (1)
部署量化后模型服务端 也报错
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir 'output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged'
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir 'output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged'
run sh: python /root/swift/swift/cli/deploy.py --ckpt_dir output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged
2024-05-15 18:36:47,690 - modelscope - INFO - PyTorch version 2.1.2 Found.
2024-05-15 18:36:47,691 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-05-15 18:36:47,715 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 20ca4569cd83063597978af789773db5 and a total number of 976 components indexed
[INFO:swift] Successfully registered /root/swift/swift/llm/data/dataset_info.json
[INFO:swift] Start time of running main: 2024-05-15 18:36:48.482545
[INFO:swift] ckpt_dir: /root/swift/examples/pytorch/llm/output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged
[INFO:swift] Setting args.model_type: baichuan2-7b
[INFO:swift] Setting model_info['revision']: master
[INFO:swift] Setting self.eval_human: True
[INFO:swift] Setting overwrite_generation_config: True
[INFO:swift] args: DeployArguments(model_type='baichuan2-7b', model_id_or_path='baichuan-inc/Baichuan2-7B-Base', model_revision='master', sft_type='full', template_type='default-generation', infer_backend='pt', ckpt_dir='/root/swift/examples/pytorch/llm/output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged', load_args_from_ckpt_dir=True, load_dataset_config=False, eval_human=True, device_map_config_path=None, seed=42, dtype='bf16', dataset=[], dataset_seed=42, dataset_test_ratio=1, show_dataset_sample=10, save_result=True, system=None, max_length=None, truncation_strategy='delete', check_dataset_strategy='none', model_name=[None, None], model_author=[None, None], quantization_bit=4, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, max_new_tokens=2048, do_sample=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.0, num_beams=1, stop_words=None, use_flash_attn=None, ignore_args_error=False, stream=True, merge_lora=False, merge_device_map='cpu', save_safetensors=True, overwrite_generation_config=True, verbose=None, custom_register_path=None, custom_dataset_info=None, gpu_memory_utilization=0.9, tensor_parallel_size=1, max_model_len=None, vllm_enable_lora=False, vllm_max_lora_rank=16, lora_modules=[], self_cognition_sample=0, train_dataset_sample=-1, val_dataset_sample=None, safe_serialization=None, model_cache_dir=None, merge_lora_and_save=None, custom_train_dataset_path=[], custom_val_dataset_path=[], vllm_lora_modules=None, host='127.0.0.1', port=8000, ssl_keyfile=None, ssl_certfile=None)
[INFO:swift] Global seed set to 42
INFO: 2024-05-15 18:36:48,526 infer.py:131] device_count: 1
INFO: 2024-05-15 18:36:48,526 infer.py:148] quantization_config: {'quant_method': <QuantizationMethod.BITS_AND_BYTES: 'bitsandbytes'>, '_load_in_8bit': False, '_load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': True, 'bnb_4bit_compute_dtype': torch.bfloat16, 'bnb_4bit_quant_storage': torch.uint8}
INFO: 2024-05-15 18:36:48,526 model.py:3941] Loading the model using model_dir: /root/swift/examples/pytorch/llm/output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged
Traceback (most recent call last):
File "/root/swift/swift/cli/deploy.py", line 5, in
deploy_main()
File "/root/swift/swift/utils/run_utils.py", line 27, in x_main
result = llm_x(args, **kwargs)
File "/root/swift/swift/llm/deploy.py", line 442, in llm_deploy
model, template = prepare_model_template(args)
File "/root/swift/swift/llm/infer.py", line 161, in prepare_model_template
model, tokenizer = get_model_tokenizer(
File "/root/swift/swift/llm/utils/model.py", line 4004, in get_model_tokenizer
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, load_model, **kwargs)
File "/root/swift/swift/llm/utils/model.py", line 1106, in get_model_tokenizer_baichuan2
model, tokenizer = get_model_tokenizer_from_repo(
File "/root/swift/swift/llm/utils/model.py", line 815, in get_model_tokenizer_from_repo
model = automodel_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/modelscope/utils/hf_util.py", line 113, in from_pretrained
module_obj = module_class.from_pretrained(model_dir, *model_args,
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/root/.cache/huggingface/modules/transformers_modules/checkpoint-1200-merged/modeling_baichuan.py", line 609, in from_pretrained
state_dict = torch.load(model_file, map_location="cpu")
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 986, in load
with _open_file_like(f, 'rb') as opened_file:
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 435, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 416, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/root/swift/examples/pytorch/llm/output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged/pytorch_model.bin' 怎么解决呢 参数是 safetensors格式
from swift.
Related Issues (20)
- batch_size默认使用1 数据集为1800多条 epoch为3 为什么step是351呢? HOT 2
- cogvlm2 使用多张a100 sft微调 报错 “all tensors to be on the same device” HOT 2
- Support GLM-4 HOT 1
- mini-internvl-4b-v1_5支持部署么 HOT 2
- 使用SimPO算法加载数据集很慢
- 关于多图微调和推理问题
- 是否支持自定义lr_scheduler
- internvl-chat-v1_5部署成功,但是接口调用失败 HOT 1
- swift训练minicpm-v如何设置grounding格式
- load qwen110B model using get_vllm_engine throws error HOT 1
- 微调qwen1.5 110B chat后会重复说 HOT 2
- 请教双4090 lora glm3-6b-32k 显存溢出问题 HOT 3
- InternVL-v1.5微调时报错以及爆显存
- CUDA_VISIBLE_DEVICES=0 swift infer --model_type glm4v-9b-chat 可以正常推理,但是运行单样本推理的python脚本会直接卡住,请问这是什么原因呢? HOT 1
- MiniCPM-V2.5 推理显存暴涨10多G
- 低成本更新swfit版本
- glm4v在Merge lora后推理报错
- 双卡80GiB A100对Qwen2-72B-Instruct进行自我认知微调的最佳实践
- 请问有多模态模型可以实现few-shot(in-context)数据的微调吗?
- SimPO不支持zero3_offload分布式训练
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from swift.