Code Monkey home page Code Monkey logo

Comments (12)

lhoestq avatar lhoestq commented on August 20, 2024 1

Hi ! Thanks for reporting this issue with wikicorpus, we implemented a fix in huggingface/datasets#2844

from lightseq.

Taka152 avatar Taka152 commented on August 20, 2024

Maybe you can try to change this line?

--dataset_name conll2003 \

from lightseq.

xingfeng01 avatar xingfeng01 commented on August 20, 2024

I already downloaded the dataset to disk using https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/LanguageModeling/BERT/data/create_datasets_from_start.sh. I am wondering how to use those data ? Thanks !

from lightseq.

xingfeng01 avatar xingfeng01 commented on August 20, 2024

I got following error with dataset "wikicorpus"

CMD: python3.7 -m torch.distributed.launch
--nproc_per_node=1
$THIS_DIR/run_ner.py
--model_name_or_path bert-large-uncased
--dataset_name wikicorpus
--dataset_config_name raw_en
--output_dir ./test-ner-no-wikicorpus
--cache_dir ./cache-wikicorpus
--do_train
--do_eval
--num_train_epochs 1

Error message:
Downloading and preparing dataset wikicorpus/raw_en (download: 1.25 GiB, generated: 3.16 GiB, post-processed: Unknown size, total: 4.41 GiB) to ./cache-wikicorpus/wikicorpus/raw_en/0.0.0/8665d716c08f102e87fdbb711326cbdf12c7ce810962819f1c71ca294d722774...
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.35G/1.35G [15:12<00:00, 1.48MB/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/datasets/builder.py", line 1103, in _prepare_split
writer.write(example, key)
File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_writer.py", line 342, in write
self.check_duplicate_keys()
File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_writer.py", line 353, in check_duplicate_keys
raise DuplicatedKeysError(key)
datasets.keyhash.DuplicatedKeysError: FAILURE TO GENERATE DATASET !
Found duplicate Key: 519
Keys should be unique and deterministic in nature

During handling of the above exception, another exception occurred:

from lightseq.

Taka152 avatar Taka152 commented on August 20, 2024

It seems a dataset error, not lightseq's error. Anyway, you can try to delete the cache.

from lightseq.

xingfeng01 avatar xingfeng01 commented on August 20, 2024

I tried several times, but the error is same, could you have a look ?

from lightseq.

Taka152 avatar Taka152 commented on August 20, 2024

I tried several times, but the error is same, could you have a look ?

You can check huggingface/datasets#2552

from lightseq.

zkh2016 avatar zkh2016 commented on August 20, 2024

Hi! I run the example:

 sh examples/training/huggingface/run_ner.sh

but get the error:

RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 15.78 GiB total capacity; 7.21 GiB already allocated; 45.75 MiB free; 7.53 GiB reserved in total by PyTorch)
  0%|

How can I solve it?

from lightseq.

zkh2016 avatar zkh2016 commented on August 20, 2024

when I set per_device_train_batch_size=1, get the error:

 File "/usr/local/python3.7.0/lib/python3.7/site-packages/lightseq/training/ops/pytorch/transformer_encoder_layer.py", line 288, in forward
    assert bs == encoder_padding_mask.size(0) and sl == encoder_padding_mask.size(1)
AssertionError

from lightseq.

Taka152 avatar Taka152 commented on August 20, 2024

This error is caused by the wrong padding mask shape, anyway, batch_size=1 is not usual.
If you encounter GPU out of memory, besides decreasing your batch_size, remember to set smaller max_batch_tokens in lightseq layer config, it will also influence GPU memory usage

from lightseq.

zkh2016 avatar zkh2016 commented on August 20, 2024

When I set max_batch_tokens=1024 in ls_hf_transformer_encoder_layer.py, I still get the following error:

   f"Batch token numbers {bs * sl} exceeds the limit {self.config.max_batch_tokens}."
ValueError: Batch token numbers 1344 exceeds the limit 1024.

May I change this place?

from lightseq.

Taka152 avatar Taka152 commented on August 20, 2024

from lightseq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.