System Information ZENML_LOCAL_VERSION: 0.57.1 ZENML_SERVER_VE

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar
<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us
[BUG]: mt5 tokenizer spiece.model saving issue about zenml HOT 5 CLOSED

nuwanq commented on August 24, 2024
[BUG]: mt5 tokenizer spiece.model saving issue
from zenml.
Comments (5)

safoinme commented on August 24, 2024 1
Thank you @nuwanq for reporting this, we have created a fix for this which you can find here #2751
from zenml.
safoinme commented on August 24, 2024 1
@nuwanq I have just taken another look and tested this with GCP and s3 and actually the implementation was causing an error related to the fact that hugging face can not access the artifact store, The code is updated to fix this issue
from zenml.
nuwanq commented on August 24, 2024
@safoinme. Thank you for fixing it.
from zenml.
nuwanq commented on August 24, 2024
@safoinme
I think now it's introduced another bug when using S3 as ARTIFACT_STORE

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /mnt/ssd/projects/events/src/debug_train.py:101 in <module>                                      │
│                                                                                                  │
│    98                                                                                            │
│    99                                                                                            │
│   100 if __name__ == "__main__":                                                                 │
│ ❱ 101 │   training_pipeline()                                                                    │
│   102                                                                                            │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/new/pipelines/pipeline.py:1397 in       │
│ __call__                                                                                         │
│                                                                                                  │
│   1394 │   │   │   return self.entrypoint(*args, **kwargs)                                       │
│   1395 │   │                                                                                     │
│   1396 │   │   self.prepare(*args, **kwargs)                                                     │
│ ❱ 1397 │   │   return self._run(**self._run_args)                                                │
│   1398 │                                                                                         │
│   1399 │   def _call_entrypoint(self, *args: Any, **kwargs: Any) -> None:                        │
│   1400 │   │   """Calls the pipeline entrypoint function with the given arguments.               │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/new/pipelines/pipeline.py:758 in _run   │
│                                                                                                  │
│    755 │   │   │   │   │   │   "`zenml up`."                                                     │
│    756 │   │   │   │   │   )                                                                     │
│    757 │   │   │                                                                                 │
│ ❱  758 │   │   │   deploy_pipeline(                                                              │
│    759 │   │   │   │   deployment=deployment_model, stack=stack, placeholder_run=run             │
│    760 │   │   │   )                                                                             │
│    761 │   │   │   if run:                                                                       │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/new/pipelines/run_utils.py:148 in       │
│ deploy_pipeline                                                                                  │
│                                                                                                  │
│   145 │   │   │   # placeholder run to stay in the database                                      │
│   146 │   │   │   Client().delete_pipeline_run(placeholder_run.id)                               │
│   147 │   │                                                                                      │
│ ❱ 148 │   │   raise e                                                                            │
│   149 │   finally:                                                                               │
│   150 │   │   constants.SHOULD_PREVENT_PIPELINE_EXECUTION = previous_value                       │
│   151                                                                                            │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/new/pipelines/run_utils.py:136 in       │
│ deploy_pipeline                                                                                  │
│                                                                                                  │
│   133 │   previous_value = constants.SHOULD_PREVENT_PIPELINE_EXECUTION                           │
│   134 │   constants.SHOULD_PREVENT_PIPELINE_EXECUTION = True                                     │
│   135 │   try:                                                                                   │
│ ❱ 136 │   │   stack.deploy_pipeline(deployment=deployment)                                       │
│   137 │   except Exception as e:                                                                 │
│   138 │   │   if (                                                                               │
│   139 │   │   │   placeholder_run                                                                │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/stack/stack.py:853 in deploy_pipeline   │
│                                                                                                  │
│    850 │   │   Returns:                                                                          │
│    851 │   │   │   The return value of the call to `orchestrator.run_pipeline(...)`.             │
│    852 │   │   """                                                                               │
│ ❱  853 │   │   return self.orchestrator.run(deployment=deployment, stack=self)                   │
│    854 │                                                                                         │
│    855 │   def _get_active_components_for_step(                                                  │
│    856 │   │   self, step_config: "StepConfiguration"                                            │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/base_orchestrator.py:175  │
│ in run                                                                                           │
│                                                                                                  │
│   172 │   │   environment = get_config_environment_vars(deployment=deployment)                   │
│   173 │   │                                                                                      │
│   174 │   │   try:                                                                               │
│ ❱ 175 │   │   │   result = self.prepare_or_run_pipeline(                                         │
│   176 │   │   │   │   deployment=deployment, stack=stack, environment=environment                │
│   177 │   │   │   )                                                                              │
│   178 │   │   finally:                                                                           │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/local/local_orchestrator. │
│ py:78 in prepare_or_run_pipeline                                                                 │
│                                                                                                  │
│    75 │   │   │   │   │   step_name,                                                             │
│    76 │   │   │   │   )                                                                          │
│    77 │   │   │                                                                                  │
│ ❱  78 │   │   │   self.run_step(                                                                 │
│    79 │   │   │   │   step=step,                                                                 │
│    80 │   │   │   )                                                                              │
│    81                                                                                            │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/base_orchestrator.py:195  │
│ in run_step                                                                                      │
│                                                                                                  │
│   192 │   │   │   step=step,                                                                     │
│   193 │   │   │   orchestrator_run_id=self.get_orchestrator_run_id(),                            │
│   194 │   │   )                                                                                  │
│ ❱ 195 │   │   launcher.launch()                                                                  │
│   196 │                                                                                          │
│   197 │   @staticmethod                                                                          │
│   198 │   def requires_resources_in_orchestration_environment(                                   │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/step_launcher.py:250 in   │
│ launch                                                                                           │
│                                                                                                  │
│   247 │   │   │   │   │   while retries < max_retries:                                           │
│   248 │   │   │   │   │   │   last_retry = retries == max_retries - 1                            │
│   249 │   │   │   │   │   │   try:                                                               │
│ ❱ 250 │   │   │   │   │   │   │   self._run_step(                                                │
│   251 │   │   │   │   │   │   │   │   pipeline_run=pipeline_run,                                 │
│   252 │   │   │   │   │   │   │   │   step_run=step_run_response,                                │
│   253 │   │   │   │   │   │   │   │   last_retry=last_retry,                                     │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/step_launcher.py:451 in   │
│ _run_step                                                                                        │
│                                                                                                  │
│   448 │   │   │   │   │   last_retry=last_retry,                                                 │
│   449 │   │   │   │   )                                                                          │
│   450 │   │   │   else:                                                                          │
│ ❱ 451 │   │   │   │   self._run_step_without_step_operator(                                      │
│   452 │   │   │   │   │   pipeline_run=pipeline_run,                                             │
│   453 │   │   │   │   │   step_run=step_run,                                                     │
│   454 │   │   │   │   │   step_run_info=step_run_info,                                           │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/step_launcher.py:535 in   │
│ _run_step_without_step_operator                                                                  │
│                                                                                                  │
│   532 │   │   if last_retry:                                                                     │
│   533 │   │   │   os.environ[ENV_ZENML_IGNORE_FAILURE_HOOK] = "false"                            │
│   534 │   │   runner = StepRunner(step=self._step, stack=self._stack)                            │
│ ❱ 535 │   │   runner.run(                                                                        │
│   536 │   │   │   pipeline_run=pipeline_run,                                                     │
│   537 │   │   │   step_run=step_run,                                                             │
│   538 │   │   │   input_artifacts=input_artifacts,                                               │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/step_runner.py:189 in run │
│                                                                                                  │
│   186 │   │   │   │   self._prepare_model_context_for_step()                                     │
│   187 │   │   │   │                                                                              │
│   188 │   │   │   │   # Parse the inputs for the entrypoint function.                            │
│ ❱ 189 │   │   │   │   function_params = self._parse_inputs(                                      │
│   190 │   │   │   │   │   args=spec.args,                                                        │
│   191 │   │   │   │   │   annotations=spec.annotations,                                          │
│   192 │   │   │   │   │   input_artifacts=input_artifacts,                                       │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/step_runner.py:355 in     │
│ _parse_inputs                                                                                    │
│                                                                                                  │
│   352 │   │   │   │   )                                                                          │
│   353 │   │   │   │   function_params[arg] = get_step_context()                                  │
│   354 │   │   │   elif arg in input_artifacts:                                                   │
│ ❱ 355 │   │   │   │   function_params[arg] = self._load_input_artifact(                          │
│   356 │   │   │   │   │   input_artifacts[arg], arg_type                                         │
│   357 │   │   │   │   )                                                                          │
│   358 │   │   │   elif arg in self.configuration.parameters:                                     │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/orchestrators/step_runner.py:458 in     │
│ _load_input_artifact                                                                             │
│                                                                                                  │
│   455 │   │   )                                                                                  │
│   456 │   │   materializer: BaseMaterializer = materializer_class(artifact.uri)                  │
│   457 │   │   materializer.validate_type_compatibility(data_type)                                │
│ ❱ 458 │   │   return materializer.load(data_type=data_type)                                      │
│   459 │                                                                                          │
│   460 │   def _validate_outputs(                                                                 │
│   461 │   │   self,                                                                              │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/zenml/integrations/huggingface/materializers/ │
│ huggingface_tokenizer_materializer.py:58 in load                                                 │
│                                                                                                  │
│   55 │   │                                                                                       │
│   56 │   │   print(os.path.join(self.uri, DEFAULT_TOKENIZER_DIR))                                │
│   57 │   │                                                                                       │
│ ❱ 58 │   │   return AutoTokenizer.from_pretrained(                                               │
│   59 │   │   │   os.path.join(self.uri, DEFAULT_TOKENIZER_DIR),                                  │
│   60 │   │   )                                                                                                          │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py │
│ :652 in from_pretrained                                                                          │
│                                                                                                  │
│   649 │   │   │   return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *input   │
│   650 │   │                                                                                      │
│   651 │   │   # Next, let's try to use the tokenizer_config file to get the tokenizer class.     │
│ ❱ 652 │   │   tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)   │
│   653 │   │   if "_commit_hash" in tokenizer_config:                                             │
│   654 │   │   │   kwargs["_commit_hash"] = tokenizer_config["_commit_hash"]                      │
│   655 │   │   config_tokenizer_class = tokenizer_config.get("tokenizer_class")                   │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py │
│ :496 in get_tokenizer_config                                                                     │
│                                                                                                  │
│   493 │   tokenizer_config = get_tokenizer_config("tokenizer-test")                              │
│   494 │   ```"""                                                                                 │
│   495 │   commit_hash = kwargs.get("_commit_hash", None)                                         │
│ ❱ 496 │   resolved_config_file = cached_file(                                                    │
│   497 │   │   pretrained_model_name_or_path,                                                     │
│   498 │   │   TOKENIZER_CONFIG_FILE,                                                             │
│   499 │   │   cache_dir=cache_dir,                                                               │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/transformers/utils/hub.py:417 in cached_file  │
│                                                                                                  │
│    414 │   user_agent = http_user_agent(user_agent)                                              │
│    415 │   try:                                                                                  │
│    416 │   │   # Load from URL or cache if already cached                                        │
│ ❱  417 │   │   resolved_file = hf_hub_download(                                                  │
│    418 │   │   │   path_or_repo_id,                                                              │
│    419 │   │   │   filename,                                                                     │
│    420 │   │   │   subfolder=None if len(subfolder) == 0 else subfolder,                         │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:106 in   │
│ _inner_fn                                                                                        │
│                                                                                                  │
│   103 │   │   │   kwargs.items(),  # Kwargs values                                               │
│   104 │   │   ):                                                                                 │
│   105 │   │   │   if arg_name in ["repo_id", "from_id", "to_id"]:                                │
│ ❱ 106 │   │   │   │   validate_repo_id(arg_value)                                                │
│   107 │   │   │                                                                                  │
│   108 │   │   │   elif arg_name == "token" and arg_value is not None:                            │
│   109 │   │   │   │   has_token = True                                                           │
│                                                                                                  │
│ /mnt/ssd/mamba/envs/x/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:154 in   │
│ validate_repo_id                                                                                 │
│                                                                                                  │
│   151 │   │   raise HFValidationError(f"Repo id must be a string, not {type(repo_id)}: '{repo_   │
│   152 │                                                                                          │
│   153 │   if repo_id.count("/") > 1:                                                             │
│ ❱ 154 │   │   raise HFValidationError(                                                           │
│   155 │   │   │   "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"            │
│   156 │   │   │   f" '{repo_id}'. Use `repo_type` argument if needed."                           │
│   157 │   │   )                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 
's3://xxx-xxx-xxx/tokenizer_loader/tokenizer/xxac9ce6-xxxx-xxxx-xxxx-00xxxxx/xxxxxx/hf_tokenizer'. Use `repo_type` argument if needed.
from zenml.
nuwanq commented on August 24, 2024
@safoinme , Thanks for the quick work.
from zenml.
[BUG]: mt5 tokenizer spiece.model saving issue about zenml HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent