Code Monkey home page Code Monkey logo

Comments (12)

bizhen46766 avatar bizhen46766 commented on August 20, 2024

Hi, can you list the full bug information (more detailedly)?

from relphormer.

Mewral avatar Mewral commented on August 20, 2024

Thanks for your reply, Here is the full traceback.

Traceback (most recent call last):
File "main.py", line 139, in
main()
File "main.py", line 126, in main
trainer.fit(lit_model, datamodule=data)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
self._run(model)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
self.dispatch()
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
self.accelerator.start_training(self)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 122, in start_training
mp.spawn(self.new_process, **self.mp_spawn_kwargs)
File "/home/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 179, in start_processes
process.start()
File "/home/miniconda3/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/home/miniconda3/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/home/miniconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/home/miniconda3/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/home/miniconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/home/miniconda3/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_linear_schedule_with_warmup..lr_lambda'

from relphormer.

bizhen46766 avatar bizhen46766 commented on August 20, 2024

It might be caused by the multiprocessing in pytorch_lightning package. You can run the python file by a single gpu and check the version of the pytorch_lightning.

from relphormer.

Mewral avatar Mewral commented on August 20, 2024

你好,之前的问题通过设置set_start_method解决了,但是在训练时会提示"list object has no attribute to",debug发现是data转换成features后features.pos是一个python数组而不是一个tensor所以在to(device)时报错了,环境是一致的,数据使用的是fb15k237,想请问下这个问题是什么原因呢。

from relphormer.

Mewral avatar Mewral commented on August 20, 2024

下面是完整的traceback
'list' object has no attribute 'to'
File "/home/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 738, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 738, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/miniconda3/lib/python3.8/site-packages/transformers/file_utils.py", line 1639, in wrapper
return func(*args, **kwargs)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 158, in batch_to
return data.to(device, **kwargs)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 84, in apply_to_collection
return function(data, *args, **kwargs)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 161, in move_data_to_device
return apply_to_collection(batch, dtype=dtype, function=batch_to)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/core/hooks.py", line 704, in transfer_batch_to_device
return move_data_to_device(batch, device)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 216, in _apply_batch_transfer_handler
batch = self.transfer_batch_to_device(batch, device)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 177, in batch_to_device
return model._apply_batch_transfer_handler(batch, device)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 394, in to_device
return self.batch_to_device(batch, self.root_device)
File "/home/mazewei/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu.py", line 69, in to_device
batch = super().to_device(batch)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 221, in validation_step
batch = self.to_device(args[0])
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 174, in evaluation_step
output = self.trainer.accelerator.validation_step(args)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 962, in run_evaluation
output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1107, in run_sanity_check
self.run_evaluation()
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 842, in run_train
self.run_sanity_check(self.lightning_module)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
return self.run_train()
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
self._results = trainer.run_stage()
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
self.accelerator.start_training(self)
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
self.dispatch()
File "/home/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
self._run(model)
File "/home/github_projects/Relphormer/main.py", line 128, in main
trainer.fit(lit_model, datamodule=data)
File "/home/github_projects/Relphormer/main.py", line 142, in
main()
File "/home/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
AttributeError: 'list' object has no attribute 'to'

from relphormer.

bizhen46766 avatar bizhen46766 commented on August 20, 2024

您好,有可能是pytorch_lightning导致的。我们正在复现这个问题,请问你有使用多卡训练吗?

from relphormer.

Mewral avatar Mewral commented on August 20, 2024

没有使用多卡,我的参数是:
["--gpus", "2,", "--max_epochs", "16", "--num_workers", "32", "--model_name_or_path", "bert-base-uncased",
"--accumulate_grad_batches", "1", "--model_class", "BertKGC", "--batch_size", "128", "--checkpoint",
"/home/mazewei/github_projects/Relphormer/pretrain/output/FB15k-237/epoch=15-step=19299-Eval/hits10=0.96.ckpt",
"--pretrain", "0", "--bce", "0", "--check_val_every_n_epoch", "1", "--overwrite_cache", "--data_dir", "dataset/FB15k-237",
"--eval_batch_size", "256", "--max_seq_length", "128", "--lr", "3e-5", "--max_triplet", "64", "--add_attn_bias", "True", "--use_global_node", "True"]

from relphormer.

bizhen46766 avatar bizhen46766 commented on August 20, 2024

好的 请问您目前的transformers和pytorch_lightning的版本是多少

from relphormer.

Mewral avatar Mewral commented on August 20, 2024

transformers==4.7.0 pytorch_lightning==1.3.1

from relphormer.

Mewral avatar Mewral commented on August 20, 2024

@bizhen46766 你好,想请问下这个问题有什么进展吗

from relphormer.

bizhen46766 avatar bizhen46766 commented on August 20, 2024

你好!我们正在解决,复现修正后的代码这两天会更新到项目库上

from relphormer.

zxlzr avatar zxlzr commented on August 20, 2024

您好该问题已解决,您可以重新pull代码。

from relphormer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.