Code Monkey home page Code Monkey logo

rwkv5-infctxlm's Introduction

RWKV5-infctxLM

python train.py --my_testing r3r4 --load_model /home/asd/model/RWKV-5-World-1B5-v2-20231025-ctx4096.pth --proj_dir /home/asd/model --data_file ttt_text_document --data_type binidx --vocab_size 65536 --epoch_steps 1 --epoch_count 100 --epoch_begin 0 --epoch_save 5 --micro_bsz 1 --n_layer 24 --n_embd 2048 --pre_ffn 0 --head_qk 0 --lr_init 1e-5 --lr_final 1e-5 --warmup_steps 0 --beta1 0.9 --beta2 0.99 --adam_eps 1e-8 --accelerator gpu --devices 1 --precision bf16 --strategy deepspeed_stage_2 --grad_cp 1 --real_len 100 --ctx_len 200

ctx_len 为你想要的训练长度 4096 real_len 受显存限制为实际训练长度 1024 ttt 测试文件

项目说明/Project Description Translation:

The project has successfully achieved infinite-length training for rwkv5. However, due to a lack of understanding of some design aspects in bo v5backward, there are issues with gradient fallback during breakpoint backpropagation, resulting in gradient drops. Improvement is needed in the backward process.
本项目实现了rwkv5的无限长度训练,但由于没有理解bo v5backward的一些设计所以目前断点回传梯度时fallback会出现掉点,需改进backward
wanicca版本的https://github.com/xiaol/Train-infctx-RWKV.git 和Blealtan的https://github.com/RWKV/RWKV-infctx-trainer.git
我只是修改了rwkv5算子部分,以及移植代码,非常感谢Blealtan老师的耐心指导和交流。

rwkv5-infctxlm's People

Contributors

jl-er avatar

Stargazers

Amirreza salimi avatar  avatar  avatar  avatar  avatar  avatar 黑喵小姐 avatar hazukiaoi avatar  avatar

Watchers

 avatar  avatar

Forkers

xiaol

rwkv5-infctxlm's Issues

how can i fix it

Training: | | 0/? [00:00<?, ?it/s]Traceback (most recent call last):
File "/code/train.py", line 447, in
trainer.fit(model, data_loader)
File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 102, in launch
return function(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
results = self._run_stage()
File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1035, in _run_stage
self.fit_loop.run()
File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 201, in run
self.on_advance_start()
File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 341, in on_advance_start
call._call_callback_hooks(trainer, "on_train_epoch_start")
File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 208, in _call_callback_hooks
fn(trainer, trainer.lightning_module, *args, **kwargs)
File "/code/src/trainer.py", line 144, in on_train_epoch_start
dataset = trainer.train_dataloader.dataset.datasets
AttributeError: 'MyDataset' object has no attribute 'datasets'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.