foundation-model-stack / fms-hf-tuning Goto Github PK

View Code? Open in Web Editor NEW

12.0 12.0 37.0 9.42 MB

🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.

License: Apache License 2.0

Python 97.96% Shell 0.21% Makefile 0.06% Dockerfile 1.78%

fms-hf-tuning's Issues

bug: `eval` is still not safe even with checks for `` and `"builtins__": None`

fms-hf-tuning/tuning/trainercontroller/callback.py

Lines 198 to 200 in dd29d49

    
           rule_succeeded = eval( 
        
               control_action.rule, {"__builtins__": None}, self.metrics 
        
           )

Some examples of malicious strings that get past current checks are in the unit tests in the corresponding PR that fixes this issue:

https://github.com/foundation-model-stack/fms-hf-tuning/pull/147/files#diff-73791d95640c88e27cc8aef08ee77265bd2939f7f7a0691640aabb3f6394d932R66-R81

Arg Descriptions CLI

Most of the custom arguments that are added to the tuning CLI (e.g., for peft_config, defined here) don't have descriptions in the --help splash screen for the sft_trainer script. Given how many options may be used at tuning time, it would be ideal to add descriptions for these arguments to make the script easier to use

Add unit tests for tuning.aim_loader

Is your feature request related to a problem? Please describe.

Add unit tests for the tuning/aim_loader.py module.

Describe the solution you'd like

If possible, each function should have test cases in the file tests/tuning/test_aim_loader.py
and pass test with the command pytest tests/tuning/test_aim_loader.py

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the feature request here.

Test SFTtrainer image

Is your feature request related to a problem? Please describe.

When Dockerfile is updated no guarantee a tuning job still launch with the pytorchjobs.
For example:
One might assume the packaging is not needed and remove it from Dockerfile. The accelerate_launch.py may fail with:

│ Traceback (most recent call last):                                                                                                                                                                                        │
│   File "/app/accelerate_launch.py", line 25, in <module>                                                                                                                                                                  │
│     from accelerate.commands.launch import launch_command                                                                                                                                                                 │
│   File "/usr/local/lib/python3.11/site-packages/accelerate/__init__.py", line 16, in <module>                                                                                                                             │
│     from .accelerator import Accelerator                                                                                                                                                                                  │
│   File "/usr/local/lib/python3.11/site-packages/accelerate/accelerator.py", line 35, in <module>                                                                                                                          │
│     from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state                                                                                                       │
│   File "/usr/local/lib/python3.11/site-packages/accelerate/checkpointing.py", line 24, in <module>                                                                                                                        │
│     from .utils import (                                                                                                                                                                                                  │
│   File "/usr/local/lib/python3.11/site-packages/accelerate/utils/__init__.py", line 29, in <module>                                                                                                                       │
│     from .dataclasses import (                                                                                                                                                                                            │
│   File "/usr/local/lib/python3.11/site-packages/accelerate/utils/dataclasses.py", line 34, in <module>                                                                                                                    │
│     from .environment import str_to_bool                                                                                                                                                                                  │
│   File "/usr/local/lib/python3.11/site-packages/accelerate/utils/environment.py", line 27, in <module>                                                                                                                    │
│     from packaging.version import parse                                                                                                                                                                                   │
│ ModuleNotFoundError: No module named 'packaging'

Describe the solution you'd like

Create a GH action to test SFT trainer image with every PR. For example: https://gist.github.com/tedhtchang/34335240230d64c6e2ecb5b78d8a8cb0

Describe alternatives you've considered

Additional context

Add any other context about the feature request here.

bug: build output and auto-generated file are not ignored

Describe the bug

The build output is produced in build/lib/ directory.
There is also an auto-generated file tuning/_version.py

Platform

Please provide details about the environment you are using, including the following:

Interpreter version: Python 3.11.5 (main, Aug 24 2023, 15:09:45) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Library version: v0.0.2rc1 main branch commit 8548a6df86e0a3ea00ece11a47c3aac2971a512e

Sample Code

Use main branch commit 8548a6df86e0a3ea00ece11a47c3aac2971a512e
Build the wheel tox -e build

Expected behavior

Build output should not be checked into version control.

Observed behavior

Build output shows up when you do git status. The status is not clean.
The auto-generated file tuning/_version.py

# file generated by setuptools_scm
# don't change, don't track in version control
TYPE_CHECKING = False
if TYPE_CHECKING:
    from typing import Tuple, Union
    VERSION_TUPLE = Tuple[Union[int, str], ...]
else:
    VERSION_TUPLE = object

version: str
__version__: str
__version_tuple__: VERSION_TUPLE
version_tuple: VERSION_TUPLE

__version__ = version = '0.0.2rc2.dev1+gbf0c857'
__version_tuple__ = version_tuple = (0, 0, 2, 'dev1', 'gbf0c857')

Add load and inference functions

Description

Currently the repository has only train(). It would be good to add support for load and inference so tuned models can be validated with same repository

Acceptance criteria

Add ability to load tuned models / generative models
Add ability to run inference on those models
Add example scripts with each

bug: logging_steps greater than one results in TypeError when evaluating trainer controller rule

Describe the bug

logging_steps greater than one results in TypeError when evaluating trainer controller rule. Following trainer controller configuration used when the bug was detected:

controller-metrics:
  - name: loss
    class: Loss
controllers:
  - name: loss-controller
    triggers:
      - on_log
    rule: loss < 1.0
    operations:
      - hfcontrols.should_training_stop

Platform

Interpreter version: We ran the ./tuning/sft_trainer.py script with the below training configuration in CCC cluster with 1 GPU (Tesla V100-SXM2-32GB).
Library version: This environment was not used.

Sample Code

Below configuration for fms-hf-tuning causes the crash:

python ./tuning/sft_trainer.py  \
--model_name_or_path $MODEL_PATH  \
--tokenizer_name_or_path $MODEL_PATH \
--training_data_path $DATA_PATH  \
--output_dir $OUTPUT_PATH  \
--validation_data_path $VALIDATION_PATH \
--num_train_epochs 10  \
--per_device_train_batch_size 4  \
--per_device_eval_batch_size 4  \
--gradient_accumulation_steps 4  \
--evaluation_strategy "epoch"  \
--save_strategy "epoch"  \
--learning_rate 1e-5  \
--weight_decay 0.  \
--warmup_ratio 0.03  \
--lr_scheduler_type "cosine"  \
--logging_steps 5  \
--logging_strategy "steps" \
--include_tokens_per_second  \
--packing False  \
--metric_for_best_model "loss" \
--load_best_model_at_end True \
--use_flash_attn False \
--trainer_controller_config_file "examples/trainercontroller_configs/loss.yaml" \
--response_template "\n### Response:"  \
--dataset_text_field "output"

Expected behavior

The rule should evaluate to False and should not throw any exception.

Observed behavior

  File "/seshapad/fms-hf-tuning/tuning/trainercontroller/callback.py", line 235, in _take_control_actions
    raise TypeError("Rule failed due to incorrect type usage") from et
TypeError: Rule failed due to incorrect type usage

Additional context

None.

Python package dependency management for building/development

There's a problem with the current requirements.txt method as it pulls in too many requirements for CI/CD. We want to enable users to install minimal set of dependencies needed for functioning of the repository and to perform tuning techniques. Other dependencies should be optional.

This issue is to explore what best optional dependency framework to use to ensure install is light-weight.

Tasks include:

Explore how to enable optional dependencies - what are best practices open source repositories use
Add that framework to this repository with associated documentation

feat: Expose the trainer state as a trainer controller metric

Is your feature request related to a problem? Please describe.

This feature is an add-on to the trainer controller framework. We add a new metric to this framework to expose the trainer state of the trainer loop.

Describe the solution you'd like

This metric now enables rules like the below example, wherein we could check the epochs elapsed in order to stop training on a loss threshold.

controller-metrics:
  - name: state
    class: StateOfTrainer
  - name: loss
    class: Loss
controllers:
  - name: loss-controller
    triggers:
      - on_log
    rule: loss < 2.2 and state['epoch'] > 5
    operations:
      - hfcontrols.should_training_stop

Describe alternatives you've considered

We checked if the trainer state is a call-by-reference object which could be obtained once and referred many-times, but this is not how the trainer loop exposes the trainer state. The only way we could find the trainer state to be exposed is in through the arguments of the events of trainer callback.

Additional context

integrate black formatter

Description
To keep the code in good health and retain styling conventions, it is a good practice to integrate tools such as black formatter to the repo https://black.readthedocs.io/en/stable/

Acceptance criteria

integrate black formatter
make sure it runs with every build as a check and build fails if formatter fails
Add it as a pre commit hook, so it checks on every commit https://stackoverflow.com/questions/71180810/python-black-code-formatter-is-there-any-way-to-apply-automatic-black-formatti

feat: Pass trainer controller metrics to operations

Is your feature request related to a problem? Please describe.

Trainer control operations have no access to trainer control metrics.

Describe the solution you'd like

Trainer controller metrics should be passed to the operations to allow for their usage in performing operating tasks such as logging.

Describe alternatives you've considered

Additional context

enable low_cpu_mem_usage when loading model

we should enable low_cpu_mem_usage to avoid loading 8 copies of the model into CPU. This should boost the memory usage as well as loading time.

@Ssukriti

Add unit tests

Description

The repo is in need for unit tests so we can start enforcing tests to be contributed with every code contribution. In this issue we can add a simple unit test framework to the repository and some simple tests to get started . We also need to enable tests to run as part of every PR build.

Acceptance criteria

Add unit tests framework to the repo - preferred pytorch
Tests organization in the repository - new folder called tests where we organize tests going forward
Add some simple unit tests as a good example (example given in comments)
Enable tests to run as part of every build

Update input args for sftTrainer

Is your feature request related to a problem? Please describe.
To be more inline with HF and have clearer input arguments, there are a few flags that can be updated but will cause breaking change for users. These are listed as TODOs in fms-hf-tuning:

model_max_length should be updated to max_seq_length
data_path should be updated to training_data_path since we also have validation_data_path

Fix CLI error for passing in target_modules

Request

Fix bug in CLI parsing for target_modules

Context

Running python tuning/sft_trainer.py --target_modules "c_attn" "c_proj" which accepts a string or a list and is the correct way to pass in a list in Bash/Shell, gets a parsing error when it actually runs successfully. It gets interpreted correctly as I printed out the LoraConfig and you see the correct value but then at the end I get error ERROR: Could not consume arg: c_proj.

On looking this up quickly, it appears to have to do with the argument parsing in fire.Fire(main).

Example in detail:

# command run
$ python tuning/sft_trainer.py --model_name_or_path $MODEL_PATH --data_path $DATA_PATH --output_dir $OUTPUT_PATH --num_train_epochs 80 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 4 --save_strategy "epoch" --learning_rate 1e-4 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --include_tokens_per_second --packing False --response_template " Label:" --dataset_text_field "output" --use_flash_attn True --tokenizer_name_or_path $MODEL_PATH --torch_dtype bfloat16 --peft_method "lora" --logging_strategy "epoch" --r 16 --lora_dropout 0.05 --lora_alpha 32 --target_modules "c_attn" "c_proj" 

# print statement added of LoraConfig from argument parsing
LoraConfig(r=16, lora_alpha=32, target_modules=['c_attn', 'c_proj'], lora_dropout=0.05)

# print statement added of LoraConfig interpreted by SFTTrainer
LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type='CAUSAL_LM', inference_mode=False, r=16, target_modules={'c_proj', 'c_attn'}, lora_alpha=32, lora_dropout=0.05, fan_in_fan_out=False, bias='none', modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={})

# after tuning runs successfully, see error message
{'train_runtime': 228.1836, 'train_samples_per_second': 17.53, 'train_steps_per_second': 1.052, 'train_tokens_per_second': 760.09, 'train_loss': 0.3890522326032321, 'epoch': 73.85}
100%|███████████████████████████████████████████████████████████████████████████| 240/240 [03:47<00:00,  1.06it/s]
ERROR: Could not consume arg: c_proj

Acceptance Criteria

Able to pass in list via CLI and there is no error.

Reported zero loss with LoRA tuning

When running LoRA tuning the reported loss is zero, the variable max_seq_length is set to the default value of 4096.

### Label: no complaint<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|> This instance will be ignored in loss calculation. Note, if this happens often, consider increasing the `max_seq_length`. 
### Label: complaint This instance will be ignored in loss calculation. Note, if this happens often, consider increasing the `max_seq_length`.
  
{'loss': 0.0, 'learning_rate': 6.716876096514944e-06, 'epoch': 2.04}
...
{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 5.0}

Stop cloning repo when building image

pip install in Dockerfile once wheel releases are setup instead of cloning the repo while building the image

model checkpoints are broken when doing full parameter tuning

Currently when doing full parameter tuning (peft_config=None), the model can be fine-tuned and ckpt can be saved, however, the ckpt cannot be re-load back.

Root cause has been discovered with offline discussions, and most specifically:

PeftSavingCallback is not necessary for saving a full-parameter-tuning model. This callback is designed to separately save an adapter-only model, and the rationale behind it was that Trainer by itself will save everything in the root folder but for PEFT we want a clean adapter only folder, thus we duplicately save some of them in a separate place.
there is some corner case bug in the model saving with safe tensor if saving with save_pretrained directly as it will drop some shared tensors (e.g. lm_head) during saving. The better way to do it would be save_pretrained(..., state_dict=state_dict) by explicitly passing the full state dict, same as how Trainer natively save it.

So... due to 2, the ckpt is missing some tensors thus cannot load back, and due to 1, this callback-generated broken ckpt is either being used or overwrites original ckpt.

Solution would be revise the callback to skip doing anything on full parameter tuning.

Upgrade to new transformers raises a flag in flash attention usage

The current code is using an older API for attention type. We need to move this to the newer version:

The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.

Contribute ADR for Acceleration Framework Idea

Is your feature request related to a problem? Please describe.

The issue is to discuss and document Design for a framework to include custom acceleration tools into sft_trainer.py, that improve training metrics such as GPU memory consumption, training speed, etc.

Describe the solution you'd like

Framework for for utilizing libraries of packages that accelerate fine-tuning / training of large models.
Design on how to integrate with framework with fms-tuning
Solution for dependency management
Design to be reviewed and accepted by maintainers

Add unit tests for tuning/sft_trainer.py

Is your feature request related to a problem? Please describe.

Add unit tests for the tuning/sft_trainer.py module.

Describe the solution you'd like

If possible, each function should have test cases in the file tests/test_sft_trainer.py
and pass test with the command pytest tests/test_sft_trainer.py

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the feature request here.

Add support for registering custom AIM metadata

Issue

As a user that is running a large finetuning campaign of runs I would like to annotate each individual run with metadata. This way I'll be able to query my AIM server for my finetuning experiments in the future.

Done when

add new parameter aim_metadata to the train() method for optional metadata that fms-hf-tuning propagates to AIM
commandline argument which can point to a json file with aim metadata

feat: Exposed the evaluation metrics for rules within trainer controller

Is your feature request related to a problem? Please describe.

The evaluation metrics based stopping criteria for training jobs is not available yet.

Describe the solution you'd like

The evaluation metrics could be customized in the hugging face. Using these metrics within the stopping criteria rules of the trainer controller framework will help stop training if the training is not making progress.

Describe alternatives you've considered

Not applicable

Additional context

Not applicable

bug: Using more than 1 GPU causes random stalls and exceptions

Describe the bug

This issue started happening sometime last couple of weeks and the observed behaviour is not deterministic.

When using 2 GPUs, sometimes both GPUs are at 100% utilization, the script runs and terminates successfully in the expected time (e.g. for an artificial dataset within about 4 minutes). Other times, one GPU is constantly at 100% utilization, the other at 0% and the script never terminates or raises an encounters a Segmentation Fault which kills the python interpreter. A third scenario has both GPUs at 100% utilization and the script never completes or encounters a Segmentation Fault.

Platform

Please provide details about the environment you are using, including the following:

Interpreter version: python 3.9
Library version: a93e3bc
- Also tested with commit 304b179

Sample Code

export CUDA_VISIBLE_DEVICES=0,1
model="hf-tiny-model-private/tiny-random-BloomForCausalLM"
python tuning/sft_trainer.py --output_dir tuned --model_name_or_path ${model} --use_flash_attn=false --max_steps 100  \
        --per_device_train_batch_size 1 \
        --evaluation_strategy no --response_template "\n### Response" --dataset_text_field output --tokenizer_name_or_path ${model} \
        --torch_dtype float32 --data_path common_en_news_combined_512-preprocessed.jsonl

Expected behavior

The script terminates successfully with a reasonable execution time.

Observed behavior

Using more than 1 GPU causes random stalls there are no exceptions printed.

There's only one warning that I can see:

/tmp/ray/session_2024-03-06_12-02-23_662564_1/runtime_resources/pip/a1249035b2784b570db2ef4c0bc247b20dd7f217/virtualenv/lib/python3.10/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.

I've switched on debug logs and I can see the AIMStack observer printing debug logs to the terminal but there are no printouts from HF/Torch about processing steps. The job does not terminate even after running for 17 hours for an experiment which is expected to last about 4 minutes.

Occasionally I see this exception:

RuntimeError: NCCL Error 3: internal error - please report this issue to the NCCL developers

and when I do then the script terminates.

I also sometimes see segmentation faults, next time I run into one I'll make a note and add it to this issue.

Additional context

I'm launching the equivalent of the above script but as a ray remote method with a @ray.remote(num_gpus=2). I suspect that some one or more dependency packages have changed sometime in the last couple of weeks and these changes are causing the problems I'm facing.

Wrong repo - deleted

Add unit tests for tuning/config/configs.py

Is your feature request related to a problem? Please describe.

Add unit tests for the tuning/config/configs.py module.

Describe the solution you'd like

If possible, each function should have test cases in the file tests/config/test_configs.py
and pass test with the command pytest tests/config/test_configs.py

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the feature request here.

Document the linting process.

Is your feature request related to a problem? Please describe.

How to run, enable, disable linting,

Describe the solution you'd like

Document in a section in the contributors.md doc why, when, and how to do linting

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the feature request here.

fms-hf-tuning issue templates

We need issue templates for:

bug report
feature request
user story

You can refer to templates here https://github.com/caikit/caikit-nlp/tree/main/.github/ISSUE_TEMPLATE and modify as needed

Not passing `--peft_method` should fallback to a method which is not parameter efficient like full model param tuning

Currently, not setting --peft_method defaults to prompt tuning. However, intuitively, not passing --peft_method means not to use any parameter efficient method like LoRA and prompt tuning. Therefore, its good to fallback to full parameter tuning rather prompt tuning.

Update launch training for multi GPU training

Is your feature request related to a problem? Please describe.

Currently launch training used in dockerfile for integration with rest of stack only supports python jobs for single GPU training. This issue is to update the script with accelerate for launching multi GPU tuning.

Describe the solution you'd like

As issue #87 is done, also update launch_training.py to use the same
Test docker image with multi GPU training

feat: standardize the format for metrics, operations, controls in the yamls used by `TrainerControllerCallback`

Is your feature request related to a problem? Please describe.

Current format

controller-metrics:
    loss:
        Loss:
  - name: sparsity
    class: Sparsity
    args:
       key3: val3
       key4: val4

Describe the solution you'd like

New format

controller-metrics:
  - name: loss-one
    class: Loss
    arguments:
       key-num-1: val1
       key-num-2: val2
  - name: sparsity-one
    class: Sparsity
    arguments:
       key-num-3: val3
       key-num-4: val4
operations:
  - name: op1
    class: MyOperation1
controllers:
  - name: loss-controller
    triggers:
      - on_log
    rule: loss-one < 1.0
    operations:
      - hfcontrols.should_training_stop

Build the Python wheel for release, for pip installs

Need to build the python wheel so that pip install fms-hf-tuning will easily install and have the proper semver versioning.

Tasks include:

building wheel for release
Having a release process documented on how to build wheels for new releases. Having some automation around building wheels and publishing (refer to https://github.com/caikit/caikit-nlp release process)
Making first release into pip with version 0.1.0
Create a release changelog page - for the first release, please collect changelog from @alex-jw-brooks, @anhuong and @Ssukriti

feat: Return entire log-line in loss.py metric

Is your feature request related to a problem? Please describe.

Loss metric does not have information on the step and epoch it was generated.

Describe the solution you'd like

Return entire log-line in loss.py metric

Describe alternatives you've considered

Additional context

Ensure Huggingface, Torch Packages Up-to-Date with Necessary Improvements/Fixes

@hickeyma @Ssukriti @tedhtchang

The platform is build upon the torch and huggingface ecosystem whereby there are constant fixes that are pushed every release.

sometimes we also require to be on nightly pre-releases as there is a biweekly-monthly lag for each release.
The fear of upgrading packages comes with breaking APIs, which will severely impact any CI/CD testing. On the other hand, fixes improve memory consumption and training speed, and will be a good idea to regress the codebase with the latest nightlies.

Desiderata

Have a ground source of truth for package versions. A good choice would be pyproject.toml that can be used to also address #56 , #57
Have a system whereby nightlies (or new releases) can be constantly regressed with new code changes (a good choice would be to have one such regression for each new PR).
- will be a good idea to also consider some means to disable regressions, for PRs that do not update source code (e.g., documentation updates).
Have a means to freeze package versions during CD/CI testing, so that if a test fails, we know exactly on what versions it has failed for. And vice-versa, if a commit is known to work, we know exactly for what versions it worked for.

NPM addresses these desiderata for `node`

See here:

npm install will occur from package.json with semvar versioning that tells what kind of upgrades (e.g., minor, patch) are allowed.
- during npm install the packages will be resolved to install the latest packages that satisfy the constraints.
- the final versions of packages installed will be output in package-lock.json.
- the package-lock.json is typically checked to have a record of what exact package versions are working.
during CI/CD, all tests are run only after npm ci; note that this differs from npm install, in that it installs from package-lock.json. Thus there is no package resolution during testing. This ensures if the tests pass, we know exactly what package versions they pass for.

Current Solution

Use dependabot to automatically check for new versions, and then raise a PR to upgrade the package version.

add validation dataset to train

Description

currently the train only accepts JSON file to a train dataset. We also need to add support for validation dataset to enable early stopping and possibly other features in the future

Acceptance Criteria

Accept validation dataset in train

bug: sft_trainer.py::train() raises exception due to change in transformers 4.38.0

Issue

The __init__() method of the transformers SFTTrainer class raises an exception when using the latest commit of fms-hf-tuning (f4e8eb4) and transformers>=4.38.0.

Context

Currently, the dependency to transformers in the requirements.txt file is transformers>=4.34.1
The commit that makes fms-hf-tuning incompatible with transformers is huggingface/transformers@5f06053
- this got released at v 4.38.0

Possible solutions

Update fms-hf-tuning source code to be compatible with transformers>=4.38.0
~~Update requirements.txt so that it uses transformers<4.38.0~~

Done when

sft_trainer.py works with the dependencies that get installed when installing fms-hf-tuning

Example:

commit: f4e8eb4

model="hf-tiny-model-private/tiny-random-BloomForCausalLM"
python tuning/sft_trainer.py --output_dir tuned --model_name_or_path ${model} --use_flash_attn=false --num_train_epochs 5 \
        --evaluation_strategy no --response_template "\n### Response" --dataset_text_field output --tokenizer_name_or_path ${model} \
        --torch_dtype float32 --data_path common_en_news_combined_512-preprocessed.jsonl \
         --target_modules all-linear --peft_method lora

Exception:

Traceback (most recent call last):
  File "/projects/fms-hf-tuning/tuning/sft_trainer.py", line 308, in <module>
    fire.Fire(main)
  File "/projects/fms-hf-tuning/venv/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/projects/fms-hf-tuning/venv/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/projects/fms-hf-tuning/venv/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/projects/fms-hf-tuning/tuning/sft_trainer.py", line 304, in main
    train(model_args, data_args, training_args, tune_config)
  File "/projects/fms-hf-tuning/tuning/sft_trainer.py", line 252, in train
    trainer = SFTTrainer(
  File "/projects/fms-hf-tuning/venv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 295, in __init__
    super().__init__(
  File "/projects/fms-hf-tuning/venv/lib/python3.10/site-packages/transformers/trainer.py", line 648, in __init__
    self.is_fsdp_xla_v2_enabled = args.fsdp_config["xla_fsdp_v2"]
KeyError: 'xla_fsdp_v2'

Prompt Tuning returns low-quality results

Describe the bug

Prompt Tuning model generates low-quality output

Platform

Please provide details about the environment you are using, including the following:

Interpreter version: python 3.11.0
Library version: transformers 4.37.2 peft 0.8.2

Sample Code

My launching script is

export MODEL_PATH=/dccstor/ai4code-ansible/shared/ckpt/granite-20b-code-all-yaml-2k/
export DATA_PATH=/dccstor/weiz/peft/train.json
export OUTPUT_PATH=pt_ckpts
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=.
python tuning/sft_trainer.py  \
--model_name_or_path $MODEL_PATH  \
--training_data_path $DATA_PATH  \
--output_dir $OUTPUT_PATH  \
--peft_method "pt" \
--tokenizer_name_or_path $MODEL_PATH  \
--num_train_epochs 5  \
--per_device_train_batch_size 2  \
--per_device_eval_batch_size 1  \
--gradient_accumulation_steps 1  \
--evaluation_strategy "no"  \
--save_strategy "epoch"  \
--learning_rate 3e-2  \
--weight_decay 0.  \
--warmup_ratio 0.0  \
--lr_scheduler_type "linear"  \
--logging_steps 1  \
--include_tokens_per_second  \
--packing False  \
--response_template "\n### Response:"  \
--dataset_text_field "output" \
--use_flash_attn True  \
--torch_dtype "bfloat16" \
       --num_virtual_tokens 100 \
       --max_seq_length 512 \
--prompt_tuning_init RANDOM

To repeat, one needs to define MODEL_PATH to the yaml-2k path (which is shared via COS bucket) and DATA_PATH to the training data

Expected behavior

Using the first training data item as the input to the model should show something close to the ground truth, i.e.,

community.docker.docker_login:                                                                                                                         
    registry: "{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}"                                                            
    username: "{{ registry_username }}"                                                                                                                  
    password: "{{ registry_password }}"

Observed behavior

When running through inference, I got

.,.,.,.,.,.,.,.,.,.,........,.,.,.,.,...,.,.....,..,...,......,:.: -.:.:.:.:.:.:.............,........,..........,.:..........,.........,::::::::::::::::
::::::......:..............,.............,..............,.,..:.......:.........2.2......................

Additional context

The SFT training loss has looked absolutely normal though.
I was able to adapt the HF PT tutorial at https://huggingface.co/docs/peft/main/en/task_guides/clm-prompt-tuning to run our task to get reasonable results.

Add support for collecting metrics programmatically

Issue

I would like to use fms-hf-tuning to collect system level metrics while finetuning models, these include model load time, as well as device related metrics (like these that AIM collects).

One way would be to rely on just the measurements that AIM collects by pointing fms-hf-tuning to an AIM server then contacting AIM to retrieve the data. However, this is a bit restricting in that we cannot collect custom data, collect system metrics at a period other than the one that AIM is using - 30 seconds, or collect data at all if we don't spin up an AIM server.

A more convenient solution would be to allow providing an optional parameter to train() for which can contain a list of callbacks

fms-hf-tuning/tuning/sft_trainer.py

Lines 29 to 34 in fc07060

    
           def train( 
        
               model_args: configs.ModelArguments, 
        
               data_args: configs.DataArguments, 
        
               train_args: configs.TrainingArguments, 
        
               peft_config: Optional[Union[peft_config.LoraConfig, peft_config.PromptTuningConfig]] = None, 
        
           ):

In the same spirit I'd like to get access to the TrainingOutput object that sft_trainer.train() returns here (input_tokens_per_second, train_runtime, etc) :

fms-hf-tuning/tuning/sft_trainer.py

Line 168 in fc07060

trainer.train()

(just by returning the output of trainer.train() as the output of train()).

Done when

support collecting custom metrics via custom callbacks
return the output of trainer.train() to the caller of train()

feat: support for robust benchmarking of fms-hf-tuning

Is your feature request related to a problem? Please describe.

When running benchmarks using fms-hf-tuning we need to:

record exceptions such as GPU Out Of Memory, transient RuntimeError and NCCL errors, etc
ensure robust capture of system metrics
annotate AIM run information with additional information regarding the benchmarking in order to make the AIM data searchable via benchmarking labels e.g. benchmark experiment name, benchmark identifier

However fms-hf-tuning does not support these features hence requiring maintenance of external forks of fms-hf-tuning that provide them. Specifically

fms-hf-tuning does not surface the various exceptions in a way that is easily identifiable by an automated process I.e. currently requires grepping stderr.
system metrics are sent to an AIM server. However we have observed stability issues with storing and retrieving the data from AIM leading to lost data. In general for benchmarking we need a highly robust way to ensure the metrics are captured
There is no way to supply external metadata for insertion to AIM for a fms-hf-tuning run

Describe the solution you'd like

Here is a simplified view of the current state of the sft_trainer.py script:

def train(
        model_args,
        data_args,
        training_args,
        tune_config,
        trainer_controller_args,
    ):
    callbacks = [SomeCallback()]

    if aim_available:
        callbacks.append(get_aimstack_callback())

    callbacks.extend(other_callbacks)

    sft_trainer.train(callbacks=callbacks)

def main():
   (
        model_args,
        data_args,
        training_args,
        tune_config,
        trainer_controller_args,
    ) = parser.parse_args_into_dataclasses(return_remaining_strings=True)

   train(
        model_args,
        data_args,
        training_args,
        tune_config,
        trainer_controller_args,
    )

We propose the following

class BenchmarkingFriendlyAIMCallBack(AimCallBack):
    def on_train_begin(self, args, state, control, model=None, **kwargs):
        super().on_train_begin(args, state, control, model, **kwargs)
        # annotate self.run with custom metadata so that we can search for this run on AIM
    
    def on_train_end(self, args, state, control, **kwargs):
        if state.is_world_process_zero:
            # Dump aim metrics to a file such that we can collect this data without actually spinning up an AIM server
            ...
        super().on_train_end(args=args, state=state, control=control, **kwargs)

def parse_args():
    return (
        model_args,
        data_args,
        training_args,
        tune_config,
        trainer_controller_args,
        benhmarking_args,
    )

def train(
        model_args,
        data_args,
        training_args,
        tune_config,
        trainer_controller_args,
        callbacks,
    ):

    callbacks.extend(other_callbacks)
    sft_trainer.train(callbacks=callbacks)
    

def main():    
    try:   
        (
            model_args,
            data_args,
            training_args,
            tune_config,
            trainer_controller_args,
            benhmarking_args,
        ) = parse_args()

        aim_metadata = read_metadata(benchmarking_args.aim_metadata_path)
     
        callbacks = [ ]

        if aim_is_available():
            callbacks.append(
                # dumps AIM data to disk plus annotates AIM run with custom metadata
                BenchmarkingFriendlyAIMCallback(
                custom_metadata=aim_metadata,
                output_file=benchmarking_args.aim_output_file,
                )
            )
        train(
            model_args,
            data_args,
            training_args,
            tune_config,
            trainer_controller_args, 
            callbacks=callbacks, # new
        )
    # Catch GPU OOM exceptions, NCCL exceptions etc, and store them in a file for easy processing
    except GPUOutOfMemoryError as exc :
        report_gpu_oom(my_custom_args.aim_output_file, exc)
        raise
    except Exception:
        report_unknown_error(my_custom_args.aim_output_file, exc)
        raise

Pros:

other people/services wishing to robustly detect GPU OOM errors and extract system metrics can use the above features

Cons:

need to update the code in sft_trainer.py

As a plus, the proposed changes in train() enable us to experiment with new implementations of BenchmarkingFriendlyAIMCallback() without requiring updates to fms-hf-tuning (e.g. by using a wrapper script).

Describe alternatives you've considered

Make minimal changes to train() to assist developing third party wrapper scripts

We could slightly refactor the code such that the logic in the train() method for activating the current AIMCallback (via a call to get_aimstack_callback()) is in the main() method instead of the train() method.

This would enable us to use a wrapper script which implements our earlier proposed design of sft_trainer.py. The wrapper script would have a main() function similar to the above proposal and which would invoke tuning.train().

sft_trainer.py in fms-hf-tuning would look like this:

def parse_args():
    return (
        model_args,
        data_args,
        training_args,
        tune_config,
        trainer_controller_args,
    )

def train(
        model_args,
        data_args,
        training_args,
        tune_config,
        trainer_controller_args,
        callbacks,
    ):

    callbacks.extend(other_callbacks)
    sft_trainer.train(callbacks=callbacks)
    

def main():    
    (
        model_args,
        data_args,
        training_args,
        tune_config,
        trainer_controller_args,
        benhmarking_args,
    ) = parse_args()

    # Move the insertion of the AIM Callback from train() to main()
    callbacks = [ ]

    if aim_available:
        callbacks.append(get_aimstack_callback())
    
    train(
        model_args,
        data_args,
        training_args,
        tune_config,
        trainer_controller_args, 
        callbacks=callbacks, # new
    )

Pros:

changes to fms-hf-tuning are a handful of lines of code

Cons:

Other services/people wishing to robustly collect metrics and/or exceptions will have to implement these features themselves (e.g. using a wrapper script)

Switch to accelerate for Multi GPU

Is your feature request related to a problem? Please describe.

Accelerate https://huggingface.co/docs/transformers/en/accelerate is created by HF to help users easily train a Transformers model on any type of distributed setup, whether it is multiple GPU’s on one machine or multiple GPU’s across several machines. We should leverage the library for its ease of use.

Describe the solution you'd like

update the README.md removing instructions for torch.run and replace with accelerate.launch
replace the FSDP JSON with a config yaml like this.
move fsdp_config.json out of tuning/config (which houses code) into somewhere which only houses config fixtures.

Describe alternatives you've considered

torchrun used currently can be less user friendly . See #80

(@fabianlim thanks for the suggestions)

Add unit tests to tuning/utils/config_utils.py

Is your feature request related to a problem? Please describe.

Add unit tests for the tuning/utils/config_utils.py module.

Describe the solution you'd like

If possible, each function should have test cases in the file tests/tuning/utils/test_config_utils.py
and pass test with the command pytest tests/tuning/utils/test_config_utils.py

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the feature request here.

bug: Boolean values are represented as strings in default fsdp config translates to True

Describe the bug

Boolean values in fsdp config (https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/fsdp_config.json#L4-L6) are represented as string values. This does not translate to actual boolean values when loaded in hf/transformers library https://github.com/huggingface/transformers/blob/2a002d073a337051bdc3fbdc95ff1bc0399ae2bb/src/transformers/training_args.py#L1654

Solution

Use json boolean values instead of string representation. Meanwhile, I have as well raised an issue for discussion on transformers library to add a dataclass and argument parser with a motivating example (huggingface/transformers#29476)

Enable lint for PR

Enable lint and checks for every PR

Add unit tests for tuning/utils/merge_model_utils.py

Is your feature request related to a problem? Please describe.

Add unit tests for the tuning/utils/merge_model_utils.py module.

Describe the solution you'd like

If possible, each function should have test cases in the file tests/utils/test_merge_model_utils.py
and pass test with the command pytest tests/utils/test_merge_model_utils.py

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the feature request here.

Add loss function logs to a file

Add unit tests for tuning/data/tokenizer_data_utils.py

Is your feature request related to a problem? Please describe.

Add unit tests for the tuning/data/tokenizer_data_utils.py module.

Describe the solution you'd like

If possible, each function should have test cases in the file tests/data/test_tokenizer_data_utils.py
and pass test with the command pytest tests/data/test_tokenizer_data_utils.py

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the feature request here.

bug: AIM package being installed causes the trainer to expect the AIM server to be running.

Describe the bug

If the AIM package is installed then the AIM server is also expected to be running.

  File "/dccstor/rhassistant/seshapad/seshaenv/lib/python3.10/site-packages/aim/storage/migrations/utils.py", line 28, in upgrade_database
    raise subprocess.SubprocessError(f'Database upgrade failed with exit code {exit_code}')
subprocess.SubprocessError: Database upgrade failed with exit code 1

It should allow running tests and training in environments where the AIM package is installed but there is no AIM server running.

Platform

Please provide details about the environment you are using, including the following:

Interpreter version: Python 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] on linux
Library version: main branch commit db99b280e95c9e5822ee83beee050ba5575ab3bf

Sample Code

Steps to reproduce:

Create a conda or venv environment with the aim package and this library installed.
Run https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/sft_trainer.py

Expected behavior

Training should complete without errors. Only expect an AIM server to be running if the appropriate environment variables like AIMSTACK_DB are set.

Observed behavior

Training fails with an error talking about AIM


  File "/dccstor/rhassistant/seshapad/seshaenv/lib/python3.10/site-packages/aim/ext/exception_resistant.py", line 68, in wrapper
    return func(*args, **kwargs)
  File "/dccstor/rhassistant/seshapad/seshaenv/lib/python3.10/site-packages/aim/sdk/run.py", line 859, in __init__
    super().__init__(run_hash, repo=repo, read_only=read_only, experiment=experiment, force_resume=force_resume)
  File "/dccstor/rhassistant/seshapad/seshaenv/lib/python3.10/site-packages/aim/sdk/run.py", line 272, in __init__
    super().__init__(run_hash, repo=repo, read_only=read_only, force_resume=force_resume)
  File "/dccstor/rhassistant/seshapad/seshaenv/lib/python3.10/site-packages/aim/sdk/base_run.py", line 34, in __init__
    self.repo = get_repo(repo)
  File "/dccstor/rhassistant/seshapad/seshaenv/lib/python3.10/site-packages/aim/sdk/repo_utils.py", line 24, in get_repo
    repo = Repo.from_path(repo, init=True)
  File "/dccstor/rhassistant/seshapad/seshaenv/lib/python3.10/site-packages/aim/sdk/repo.py", line 210, in from_path
    repo = Repo(path, read_only=read_only, init=init)
  File "/dccstor/rhassistant/seshapad/seshaenv/lib/python3.10/site-packages/aim/sdk/repo.py", line 153, in __init__
    self.structured_db.run_upgrades()
  File "/dccstor/rhassistant/seshapad/seshaenv/lib/python3.10/site-packages/aim/storage/structured/db.py", line 98, in run_upgrades
    upgrade_database(self.db_url)
  File "/dccstor/rhassistant/seshapad/seshaenv/lib/python3.10/site-packages/aim/storage/migrations/utils.py", line 28, in upgrade_database
    raise subprocess.SubprocessError(f'Database upgrade failed with exit code {exit_code}')
subprocess.SubprocessError: Database upgrade failed with exit code 1

Support str in target_modules for LoraConfig

Request

LoraConfig can accept a List or a str for the target_modules as seen in the description below. This would be useful in order to support passing "all-linear" as an option instead of the specific attention layers.

Context

target_modules (Optional[Union[List[str], str]]) — The names of the modules to apply the adapter to. If this is specified, only the modules with the specified names will be replaced. When passing a string, a regex match will be performed. When passing a list of strings, either an exact match will be performed or it is checked if the name of the module ends with any of the passed strings. If this is specified as 'all-linear', then all linear/Conv1D modules are chosen, excluding the output layer. If this is not specified, modules will be chosen according to the model architecture. If the architecture is not known, an error will be raised — in this case, you should specify the target modules manually.

fms-hf-tuning LoraConfig currently accepts only a List: target_modules: List[str] = field(default_factory=lambda: ["q_proj", "v_proj"]). This causes it so that if one tries to pass all-linear it is interpreted as a List

Example

$ python tuning/sft_training.py --target_modules "all-linear"

# interpreted as
LoraConfig(r=8, lora_alpha=16, target_modules=['all-linear'], lora_dropout=0.05)

# subsequently gets used in SFTTrainer as
LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type='CAUSAL_LM', inference_mode=False, r=8, target_modules='all-linear', lora_alpha=16, lora_dropout=0.05, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}) 

# errors with
ValueError: Target modules {'all-linear'} not found in the base model. Please check the target modules and try again.

I tried testing setting target_modules: Union[List[str], str] = field(default_factory=lambda: ["q_proj", "v_proj"]) however "all-linear" was still interpreted as a List instead of a string. This is likely due to the command-line parsing.

Note that all-linear is supported in PEFT 1.8.0 and must be upgraded.

Acceptance criteria

we should be able to support passing target_modules = 'all_linear' to fms-hf-tuning from command line as well for Lora Tuning
we have tested change works with at least one model llama-7b and doesnt crash

Create a Contributor/Developer.md

Add a CONTRIBUTING.md and/or DEVELOPER.md file to help new open source members get up to speed quickly.

One suggestion was to use the caikit CONTRIBUTING.md file as a starting point.

validate parameters and edge cases

Description
In order for the code to exit clean , we need to validate that parameters are of the correct range and required parameters are passed by the users beforehand. if not, we should return Type/Value Errors and code should not crash.

Example: currently , num_epochs set to 0 , causes a divide by zero error. This exception should be handled with error message: num_epochs>1 .

Acceptance criteria

some sort of parameter validation has been added to the most frequently used parameters to ensure code doesnt crash

	rule_succeeded = eval(
	control_action.rule, {"__builtins__": None}, self.metrics
	)

	def train(
	model_args: configs.ModelArguments,
	data_args: configs.DataArguments,
	train_args: configs.TrainingArguments,
	peft_config: Optional[Union[peft_config.LoraConfig, peft_config.PromptTuningConfig]] = None,
	):

foundation-model-stack / fms-hf-tuning Goto Github PK

fms-hf-tuning's Issues

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Describe the bug

Platform

Sample Code

Expected behavior

Observed behavior

Describe the bug

Platform

Sample Code

Expected behavior

Observed behavior

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Request

Context

Acceptance Criteria

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Describe the bug

Platform

Sample Code

Expected behavior

Observed behavior

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Desiderata

NPM addresses these desiderata for node

Current Solution

Other Solutions

Context

Possible solutions

Example:

Describe the bug

Platform

Sample Code

Expected behavior

Observed behavior

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

NPM addresses these desiderata for `node`