Code Monkey home page Code Monkey logo

lm-evaluation-harness's People

Contributors

aflah02 avatar andyzwei avatar anishthite avatar anjor avatar baberabb avatar chrisociepa avatar cjlovering avatar farzanehnakhaee70 avatar fattorib avatar gakada avatar h-albert-lee avatar haileyschoelkopf avatar jeffhsu3 avatar jon-tow avatar juletx avatar kasnerz avatar khalidalt avatar leogao2 avatar lintangsutawika avatar muennighoff avatar nopperl avatar pic-o avatar picocreator avatar researcher2 avatar sdtblck avatar stellaathena avatar thefazzer avatar thomasw21 avatar tttyuntian avatar zphang avatar

Stargazers

 avatar

lm-evaluation-harness's Issues

Potential error with accelerate + lm-eval + rwkv sharded models

The following error occurs when running lm-eval with accelerate, with a sharded RWKV model

# ------------------------------
# Running Task : anli
# ------------------------------
The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `7`
		More than one GPU was found, enabling multi-GPU training.
		If this was unintended please pass in `--num_processes=1`.
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Using RTX 4000 series which doesn't support faster communication speedups. Ensuring P2P and IB communications are disabled.
2024-02-25:19:16:47,809 INFO     [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,809 INFO     [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:47,887 INFO     [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,887 INFO     [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:47,897 INFO     [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,897 INFO     [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:47,904 INFO     [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,905 INFO     [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:47,905 INFO     [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,905 INFO     [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:47,906 INFO     [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,906 INFO     [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:47,937 INFO     [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,937 INFO     [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:48,003 INFO     [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:48,088 INFO     [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:48,097 INFO     [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:48,098 INFO     [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:48,101 INFO     [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:48,102 INFO     [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:48,110 INFO     [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:49,485 INFO     [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,485 INFO     [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:49,558 INFO     [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,559 INFO     [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:49,568 INFO     [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,568 INFO     [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:49,589 INFO     [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,589 INFO     [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:49,599 INFO     [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,599 INFO     [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:49,607 INFO     [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,607 INFO     [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:49,681 INFO     [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,[681](https://github.com/RWKV/lm-evaluation-harness/actions/runs/8040203638/job/21958032153#step:3:688) INFO     [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:54,373 INFO     [__main__.py:238] Selected Tasks: ['anli']
2024-02-25:19:16:54,373 INFO     [__main__.py:239] Loading selected tasks...
2024-02-25:19:16:54,478 INFO     [__main__.py:238] Selected Tasks: ['anli']
2024-02-25:19:16:54,478 INFO     [__main__.py:239] Loading selected tasks...
2024-02-25:19:16:54,487 INFO     [__main__.py:238] Selected Tasks: ['anli']
2024-02-25:19:16:54,487 INFO     [__main__.py:239] Loading selected tasks...
2024-02-25:19:16:54,490 INFO     [__main__.py:238] Selected Tasks: ['anli']
2024-02-25:19:16:54,490 INFO     [__main__.py:239] Loading selected tasks...
2024-02-25:19:16:54,501 INFO     [__main__.py:238] Selected Tasks: ['anli']
2024-02-25:19:16:54,501 INFO     [__main__.py:239] Loading selected tasks...
2024-02-25:19:16:54,507 INFO     [__main__.py:238] Selected Tasks: ['anli']
2024-02-25:19:16:54,507 INFO     [__main__.py:239] Loading selected tasks...
  exitcode  : 1 (pid: 45813)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2024-02-25_19:16:59
  host      : 0c7aa106d18a
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 45814)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2024-02-25_19:16:59
  host      : 0c7aa106d18a
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 45815)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
  time      : 2024-02-25_19:16:59
  host      : 0c7aa106d18a
  rank      : 4 (local_rank: 4)
  exitcode  : 1 (pid: 45816)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
  time      : 2024-02-25_19:16:59
  host      : 0c7aa106d18a
  rank      : 5 (local_rank: 5)
  exitcode  : 1 (pid: 45817)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
  time      : 2024-02-25_19:16:59
  host      : 0c7aa106d18a
  rank      : 6 (local_rank: 6)
  exitcode  : 1 (pid: 45818)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-02-25_19:16:59
  host      : 0c7aa106d18a
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 45812)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

This error does not seem to occur with smaller unsharded models.
As such as a temporary work around, we would be avoiding sharding the checkpoint for the HF implementation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.