Code Monkey home page Code Monkey logo

Comments (5)

safoinme avatar safoinme commented on August 24, 2024 1

Thank you @nuwanq for reporting this, we have created a fix for this which you can find here #2751

from zenml.

safoinme avatar safoinme commented on August 24, 2024 1

@nuwanq I have just taken another look and tested this with GCP and s3 and actually the implementation was causing an error related to the fact that hugging face can not access the artifact store, The code is updated to fix this issue

from zenml.

nuwanq avatar nuwanq commented on August 24, 2024

@safoinme. Thank you for fixing it.

from zenml.

nuwanq avatar nuwanq commented on August 24, 2024

@safoinme
I think now it's introduced another bug when using S3 as ARTIFACT_STORE


╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /mnt/ssd/projects/events/src/debug_train.py:101 in <module>                                      │
│                                                                                                  │
│    98                                                                                            │
│    99                                                                                            │
│   100 if __name__ == "__main__":                                                                 │
│ ❱ 101 │   training_pipeline()                                                                    │
│   102                                                                                            │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/new/pipelines/pipeline.py:1397 in       │
│ __call__                                                                                         │
│                                                                                                  │
│   1394 │   │   │   return self.entrypoint(*args, **kwargs)                                       │
│   1395 │   │                                                                                     │
│   1396 │   │   self.prepare(*args, **kwargs)                                                     │
│ ❱ 1397 │   │   return self._run(**self._run_args)                                                │
│   1398 │                                                                                         │
│   1399 │   def _call_entrypoint(self, *args: Any, **kwargs: Any) -> None:                        │
│   1400 │   │   """Calls the pipeline entrypoint function with the given arguments.               │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/new/pipelines/pipeline.py:758 in _run   │
│                                                                                                  │
│    755 │   │   │   │   │   │   "`zenml up`."                                                     │
│    756 │   │   │   │   │   )                                                                     │
│    757 │   │   │                                                                                 │
│ ❱  758 │   │   │   deploy_pipeline(                                                              │
│    759 │   │   │   │   deployment=deployment_model, stack=stack, placeholder_run=run             │
│    760 │   │   │   )                                                                             │
│    761 │   │   │   if run:                                                                       │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/new/pipelines/run_utils.py:148 in       │
│ deploy_pipeline                                                                                  │
│                                                                                                  │
│   145 │   │   │   # placeholder run to stay in the database                                      │
│   146 │   │   │   Client().delete_pipeline_run(placeholder_run.id)                               │
│   147 │   │                                                                                      │
│ ❱ 148 │   │   raise e                                                                            │
│   149 │   finally:                                                                               │
│   150 │   │   constants.SHOULD_PREVENT_PIPELINE_EXECUTION = previous_value                       │
│   151                                                                                            │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/new/pipelines/run_utils.py:136 in       │
│ deploy_pipeline                                                                                  │
│                                                                                                  │
│   133 │   previous_value = constants.SHOULD_PREVENT_PIPELINE_EXECUTION                           │
│   134 │   constants.SHOULD_PREVENT_PIPELINE_EXECUTION = True                                     │
│   135 │   try:                                                                                   │
│ ❱ 136 │   │   stack.deploy_pipeline(deployment=deployment)                                       │
│   137 │   except Exception as e:                                                                 │
│   138 │   │   if (                                                                               │
│   139 │   │   │   placeholder_run                                                                │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/stack/stack.py:853 in deploy_pipeline   │
│                                                                                                  │
│    850 │   │   Returns:                                                                          │
│    851 │   │   │   The return value of the call to `orchestrator.run_pipeline(...)`.             │
│    852 │   │   """                                                                               │
│ ❱  853 │   │   return self.orchestrator.run(deployment=deployment, stack=self)                   │
│    854 │                                                                                         │
│    855 │   def _get_active_components_for_step(                                                  │
│    856 │   │   self, step_config: "StepConfiguration"                                            │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/base_orchestrator.py:175  │
│ in run                                                                                           │
│                                                                                                  │
│   172 │   │   environment = get_config_environment_vars(deployment=deployment)                   │
│   173 │   │                                                                                      │
│   174 │   │   try:                                                                               │
│ ❱ 175 │   │   │   result = self.prepare_or_run_pipeline(                                         │
│   176 │   │   │   │   deployment=deployment, stack=stack, environment=environment                │
│   177 │   │   │   )                                                                              │
│   178 │   │   finally:                                                                           │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/local/local_orchestrator. │
│ py:78 in prepare_or_run_pipeline                                                                 │
│                                                                                                  │
│    75 │   │   │   │   │   step_name,                                                             │
│    76 │   │   │   │   )                                                                          │
│    77 │   │   │                                                                                  │
│ ❱  78 │   │   │   self.run_step(                                                                 │
│    79 │   │   │   │   step=step,                                                                 │
│    80 │   │   │   )                                                                              │
│    81                                                                                            │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/base_orchestrator.py:195  │
│ in run_step                                                                                      │
│                                                                                                  │
│   192 │   │   │   step=step,                                                                     │
│   193 │   │   │   orchestrator_run_id=self.get_orchestrator_run_id(),                            │
│   194 │   │   )                                                                                  │
│ ❱ 195 │   │   launcher.launch()                                                                  │
│   196 │                                                                                          │
│   197 │   @staticmethod                                                                          │
│   198 │   def requires_resources_in_orchestration_environment(                                   │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/step_launcher.py:250 in   │
│ launch                                                                                           │
│                                                                                                  │
│   247 │   │   │   │   │   while retries < max_retries:                                           │
│   248 │   │   │   │   │   │   last_retry = retries == max_retries - 1                            │
│   249 │   │   │   │   │   │   try:                                                               │
│ ❱ 250 │   │   │   │   │   │   │   self._run_step(                                                │
│   251 │   │   │   │   │   │   │   │   pipeline_run=pipeline_run,                                 │
│   252 │   │   │   │   │   │   │   │   step_run=step_run_response,                                │
│   253 │   │   │   │   │   │   │   │   last_retry=last_retry,                                     │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/step_launcher.py:451 in   │
│ _run_step                                                                                        │
│                                                                                                  │
│   448 │   │   │   │   │   last_retry=last_retry,                                                 │
│   449 │   │   │   │   )                                                                          │
│   450 │   │   │   else:                                                                          │
│ ❱ 451 │   │   │   │   self._run_step_without_step_operator(                                      │
│   452 │   │   │   │   │   pipeline_run=pipeline_run,                                             │
│   453 │   │   │   │   │   step_run=step_run,                                                     │
│   454 │   │   │   │   │   step_run_info=step_run_info,                                           │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/step_launcher.py:535 in   │
│ _run_step_without_step_operator                                                                  │
│                                                                                                  │
│   532 │   │   if last_retry:                                                                     │
│   533 │   │   │   os.environ[ENV_ZENML_IGNORE_FAILURE_HOOK] = "false"                            │
│   534 │   │   runner = StepRunner(step=self._step, stack=self._stack)                            │
│ ❱ 535 │   │   runner.run(                                                                        │
│   536 │   │   │   pipeline_run=pipeline_run,                                                     │
│   537 │   │   │   step_run=step_run,                                                             │
│   538 │   │   │   input_artifacts=input_artifacts,                                               │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/step_runner.py:189 in run │
│                                                                                                  │
│   186 │   │   │   │   self._prepare_model_context_for_step()                                     │
│   187 │   │   │   │                                                                              │
│   188 │   │   │   │   # Parse the inputs for the entrypoint function.                            │
│ ❱ 189 │   │   │   │   function_params = self._parse_inputs(                                      │
│   190 │   │   │   │   │   args=spec.args,                                                        │
│   191 │   │   │   │   │   annotations=spec.annotations,                                          │
│   192 │   │   │   │   │   input_artifacts=input_artifacts,                                       │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/step_runner.py:355 in     │
│ _parse_inputs                                                                                    │
│                                                                                                  │
│   352 │   │   │   │   )                                                                          │
│   353 │   │   │   │   function_params[arg] = get_step_context()                                  │
│   354 │   │   │   elif arg in input_artifacts:                                                   │
│ ❱ 355 │   │   │   │   function_params[arg] = self._load_input_artifact(                          │
│   356 │   │   │   │   │   input_artifacts[arg], arg_type                                         │
│   357 │   │   │   │   )                                                                          │
│   358 │   │   │   elif arg in self.configuration.parameters:                                     │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/step_runner.py:458 in     │
│ _load_input_artifact                                                                             │
│                                                                                                  │
│   455 │   │   )                                                                                  │
│   456 │   │   materializer: BaseMaterializer = materializer_class(artifact.uri)                  │
│   457 │   │   materializer.validate_type_compatibility(data_type)                                │
│ ❱ 458 │   │   return materializer.load(data_type=data_type)                                      │
│   459 │                                                                                          │
│   460 │   def _validate_outputs(                                                                 │
│   461 │   │   self,                                                                              │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/integrations/huggingface/materializers/ │
│ huggingface_tokenizer_materializer.py:58 in load                                                 │
│                                                                                                  │
│   55 │   │                                                                                       │
│   56 │   │   print(os.path.join(self.uri, DEFAULT_TOKENIZER_DIR))                                │
│   57 │   │                                                                                       │
│ ❱ 58 │   │   return AutoTokenizer.from_pretrained(                                               │
│   59 │   │   │   os.path.join(self.uri, DEFAULT_TOKENIZER_DIR),                                  │
│   60 │   │   )                                                                                                          │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py │
│ :652 in from_pretrained                                                                          │
│                                                                                                  │
│   649 │   │   │   return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *input   │
│   650 │   │                                                                                      │
│   651 │   │   # Next, let's try to use the tokenizer_config file to get the tokenizer class.     │
│ ❱ 652 │   │   tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)   │
│   653 │   │   if "_commit_hash" in tokenizer_config:                                             │
│   654 │   │   │   kwargs["_commit_hash"] = tokenizer_config["_commit_hash"]                      │
│   655 │   │   config_tokenizer_class = tokenizer_config.get("tokenizer_class")                   │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py │
│ :496 in get_tokenizer_config                                                                     │
│                                                                                                  │
│   493 │   tokenizer_config = get_tokenizer_config("tokenizer-test")                              │
│   494 │   ```"""                                                                                 │
│   495 │   commit_hash = kwargs.get("_commit_hash", None)                                         │
│ ❱ 496 │   resolved_config_file = cached_file(                                                    │
│   497 │   │   pretrained_model_name_or_path,                                                     │
│   498 │   │   TOKENIZER_CONFIG_FILE,                                                             │
│   499 │   │   cache_dir=cache_dir,                                                               │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/transformers/utils/hub.py:417 in cached_file  │
│                                                                                                  │
│    414 │   user_agent = http_user_agent(user_agent)                                              │
│    415 │   try:                                                                                  │
│    416 │   │   # Load from URL or cache if already cached                                        │
│ ❱  417 │   │   resolved_file = hf_hub_download(                                                  │
│    418 │   │   │   path_or_repo_id,                                                              │
│    419 │   │   │   filename,                                                                     │
│    420 │   │   │   subfolder=None if len(subfolder) == 0 else subfolder,                         │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:106 in   │
│ _inner_fn                                                                                        │
│                                                                                                  │
│   103 │   │   │   kwargs.items(),  # Kwargs values                                               │
│   104 │   │   ):                                                                                 │
│   105 │   │   │   if arg_name in ["repo_id", "from_id", "to_id"]:                                │
│ ❱ 106 │   │   │   │   validate_repo_id(arg_value)                                                │
│   107 │   │   │                                                                                  │
│   108 │   │   │   elif arg_name == "token" and arg_value is not None:                            │
│   109 │   │   │   │   has_token = True                                                           │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:154 in   │
│ validate_repo_id                                                                                 │
│                                                                                                  │
│   151 │   │   raise HFValidationError(f"Repo id must be a string, not {type(repo_id)}: '{repo_   │
│   152 │                                                                                          │
│   153 │   if repo_id.count("/") > 1:                                                             │
│ ❱ 154 │   │   raise HFValidationError(                                                           │
│   155 │   │   │   "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"            │
│   156 │   │   │   f" '{repo_id}'. Use `repo_type` argument if needed."                           │
│   157 │   │   )                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 
's3://xxx-xxx-xxx/tokenizer_loader/tokenizer/xxac9ce6-xxxx-xxxx-xxxx-00xxxxx/xxxxxx/hf_tokenizer'. Use `repo_type` argument if needed.

from zenml.

nuwanq avatar nuwanq commented on August 24, 2024

@safoinme , Thanks for the quick work.

from zenml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.