Comments (6)
world_size should be the number of gpus you are using
from bertsum.
training it on google colab:
directory structure as :
-->BertSum
├── bert_data
├── (other BertSum repo files)
added system path as
import sys
sys.path.append('/content/BertSum/src')
On running this on colab:
!python train.py -mode train -encoder transformer -dropout 0.1 -bert_data_path ../bert_data/cnndm -model_path ../models/bert_transformer -lr 2e-3 -visible_gpus 0 -gpu_ranks 0 -world_size 3 -report_every 50 -save_checkpoint_steps 1000 -batch_size 3000 -decay_method noam -train_steps 50000 -accum_count 2 -log_file ../logs/bert_transformer -use_interval true -warmup_steps 10000 -ff_size 2048 -inter_layers 2 -heads 8
(i keyboard interrupted this)
[2019-07-17 12:03:24,228 INFO] Starting process pid: 759
[2019-07-17 12:03:24,233 INFO] Starting process pid: 760
[2019-07-17 12:03:24,241 INFO] Starting process pid: 761
[0]
Process SpawnProcess-2:
Traceback (most recent call last):
File "/content/BertSum/src/train.py", line 68, in run
gpu_rank = distributed.multi_init(device_id, args.world_size, args.gpu_ranks)
File "/content/BertSum/src/distributed.py", line 27, in multi_init
world_size=dist_world_size, rank=gpu_ranks[device_id])
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/content/BertSum/src/train.py", line 80, in run
error_queue.put((args.gpu_ranks[device_id], traceback.format_exc()))
IndexError: list index out of range
[0]
[0]
Process SpawnProcess-3:
Traceback (most recent call last):
File "/content/BertSum/src/train.py", line 68, in run
gpu_rank = distributed.multi_init(device_id, args.world_size, args.gpu_ranks)
File "/content/BertSum/src/distributed.py", line 27, in multi_init
world_size=dist_world_size, rank=gpu_ranks[device_id])
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/content/BertSum/src/train.py", line 80, in run
error_queue.put((args.gpu_ranks[device_id], traceback.format_exc()))
IndexError: list index out of range
Traceback (most recent call last):
File "train.py", line 338, in <module>
multi_main(args)
File "train.py", line 58, in multi_main
On changing visible_gpus 0
to visible_gpus 0,1
the error remains the same.
from bertsum.
try in the state of test?
from bertsum.
i still encounter this problem
from bertsum.
try in the state of test?
try in the state of test?
yep,just happens in the test state
Is data problem?
from bertsum.
This should be solved now.
If not, please paste your commands
from bertsum.
Related Issues (20)
- How to implement Bert + CNN
- How to enter a sentence for prediction
- Question for models/trainer.py#L325 ? HOT 3
- How to train TransformerExt baseline?
- Package Requirements - versions? HOT 1
- Hi i wonder if i want to do a multi-classification task, what should i change?
- How to continue training from previous checkpoints? HOT 4
- I got xent of 3-4 after 30000 training steps, is this normal? HOT 1
- Can someone please explain the validation metrics for me?
- Format of the .story files
- Help needed when processing my own dataset for testing.
- Questions about imbalance sentence distribution of label 1 and label 0 when training
- [CLS] similar context vector on Evaluation
- FileNotFoundError HOT 1
- Extractive Setting? HOT 2
- I also got slightly Lower Rouge score for the same code
- Order inconsistency of output candidate file with original test.json when testing bertSum Extractive HOT 1
- expected mask dtype to be Bool but got Long
- default batch_size is 3000, I don't quite understand, why so huge? HOT 1
- Training in Colab (CNN/DM)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bertsum.