Code Monkey home page Code Monkey logo

swissarmytransformer's Introduction

Introduction

sat(SwissArmyTransformer) is a flexible and powerful library to develop your own Transformer variants.

sat is named after "swiss army knife", meaning that all the models (e.g. BERT, GPT, T5, GLM, CogView, ViT...) share the same backone code and cater for versatile usages with some extra light-weight mixins.

sat is powered by deepspeed-ZeRO and model parallelism, aiming to provide the best practice for pretraining and finetuning large models (100M~20B parameters).

Install

    pip install SwissArmyTransformer

Features

  • Add model-agnostic components, e.g. prefix-tuning, in just ONE line!

    • Prefix-tuning (or P-tuning) improves finetuning via adding trainable parameters in each attention layer. To apply it to a GLM classification (or any other) model is easy with our library.
        class ClassificationModel(GLMModel): # can also be BertModel, RobertaModel, etc. 
            def __init__(self, args, transformer=None, **kwargs):
                super().__init__(args, transformer=transformer, **kwargs)
                self.add_mixin('classification_head', MLPHeadMixin(args.hidden_size, 2048, 1))
                # Arm an arbitrary model with Prefix-tuning with this line!
                self.add_mixin('prefix-tuning', PrefixTuningMixin(args.num_layers, args.hidden_size // args.num_attention_heads, args.num_attention_heads, args.prefix_len))
    • GPT and other auto-regressive models act differently during training and inference. During inference, text is generated token-by-token and we need to cache previous states for efficiency. With our lib, you only need to consider the behavior during training (teacher-forcing) and transform it to a cached auto-regressive model via adding a mixin:
        model, args = AutoModel.from_pretrained('glm-10b-chinese', args)
        model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
        # Generate a sequence with beam search
        from sat.generation.autoregressive_sampling import filling_sequence
        from sat.generation.sampling_strategies import BeamSearchStrategy
        output, *mems = filling_sequence(model, input_seq,
                        batch_size=args.batch_size,
                        strategy=BeamSearchStrategy(args.batch_size))
  • Build your Transformer-based model with minimal codes. We mentioned GLM, which only differs from standard transformer (called BaseModel) on position embedding (and training losses). We only need to focus on the related part when coding.

    Extend the whole definition:

    class BlockPositionEmbeddingMixin(BaseMixin):
        # Here define parameters for the mixin
        def __init__(self, max_sequence_length, hidden_size, init_method_std=0.02):
            super(BlockPositionEmbeddingMixin, self).__init__()
            self.max_sequence_length = max_sequence_length
            self.hidden_size = hidden_size
            self.block_position_embeddings = torch.nn.Embedding(max_sequence_length, hidden_size)
            torch.nn.init.normal_(self.block_position_embeddings.weight, mean=0.0, std=init_method_std)
        
        # Here define the method for the mixin
        def position_embedding_forward(self, position_ids, **kwargs):
            position_ids, block_position_ids = position_ids[:, 0], position_ids[:, 1]
            position_embeddings = self.transformer.position_embeddings(position_ids)
            block_position_embeddings = self.block_position_embeddings(block_position_ids)
            return position_embeddings + block_position_embeddings
    
    class GLMModel(BaseModel):
        def __init__(self, args, transformer=None):
            super().__init__(args, transformer=transformer)
            self.add_mixin('block_position_embedding', 
                BlockPositionEmbeddingMixin(args.max_sequence_length, args.hidden_size)
            ) # Add the mixin for GLM
  • Comprehensive supports for training. sat aims to provide the best practice for pretraining and finetuning, where you only need to finish forward_step and create_dataset_function but with hyperparameters to alter useful training configurations.

    • Extend the training to multiple GPUs or nodes by specifying --num_nodes, --num_gpus and a simple hostfile.
    • DeepSpeed and Model parallelism.
    • Better integration of ZeRO-2 and activation checkpointing.
    • Automatic extending and shuffling training data and memmap.
    • Successfully support the training of CogView2 and CogVideo.
    • The only open-source codebase supporting finetuning T5-10B on GPUs currently.

Quick Tour

The most typical python file to use Bert in sat (for inference) is as follows:

# @File: inference_bert.py
from sat import get_args, get_tokenizer, AutoModel
# Parse args, initialize the environment. This is necessary.
args = get_args() 
# Automatically download and load model. Will also dump model-related hyperparameters to args.
model, args = AutoModel.from_pretrained('bert-base-uncased', args) 
# Get the BertTokenizer according to args.tokenizer_type (automatically set).
tokenizer = get_tokenizer(args) 
# Here to use bert as you want!
# ...

Then we can run the code via

    SAT_HOME=/path/to/download python inference_bert.py --mode inference

All officially supported model names are in urls.py.

To finetune or pretrain a transformer is also extremely easy!

# @File: finetune_bert.py
from sat import get_args, get_tokenizer, AutoModel
from sat.model.mixins import MLPHeadMixin

def create_dataset_function(path, args):
    # Here to load the dataset
    # ...
    assert isinstance(dataset, torch.utils.data.Dataset)
    return dataset

def forward_step(data_iterator, model, args, timers):
    inputs = next(data_iterator) # from the dataset of create_dataset_function.
    loss, *others = model(inputs)
    return loss
    
# Parse args, initialize the environment. This is necessary.
args = get_args() 
model, args = AutoModel.from_pretrained('bert-base-uncased', args) 
tokenizer = get_tokenizer(args) 
# Here to use bert as you want!
model.del_mixin('bert-final')
model.add_mixin('classification_head', MLPHeadMixin(args.hidden_size, 2048, 1))
# ONE LINE to train! 
# args already includes hyperparams such as lr, train-iters, zero-stage ...
training_main(args, 
    model_cls=model, 
    forward_step_function=forward_step, # user define
    create_dataset_function=create_dataset_function # user define
)

Then we can run the code via

deepspeed --include localhost:0,1 finetune_bert.py \
    --experiment-name ftbert \
    --mode finetune --train-iters 1000 --save /path/to/save \
    --train-data /path/to/train --valid-data /path/to/valid \
    --lr 0.00002 --batch-size 8 --zero-stage 1 --fp16

Here we use data-parallel on GPUs 0,1. We can also launch the training on many inter-connected machines via --hostfile /path/to/hostfile. See the tutorial for more details.

To write your own model, you only need to consider the difference between the standard Transformer. For example, if you have a idea to improve the attention operation:

from sat.model import BaseMixin
class MyAttention(BaseMixin):
    def __init__(self, hidden_size):
        super(MyAttention, self).__init__()
        # MyAttention may needs some new params, e.g. a learnable alpha.
        self.learnable_alpha = torch.nn.Parameter(torch.ones(hidden_size))
    
    # This is a hook function, the name `attention_fn` is special.
    def attention_fn(q, k, v, mask, dropout=None, **kwargs):
        # Code for my attention.
        # ...
        return attention_results

Here attention_fn is a hook function, replacing the default action by the new function. All available hooks are in transformer_defaults.py. Now we can use add_mixin to apply our change to all the transformers, such as BERT, Vit and CogView. See the tutorial for more details.

Tutorials

Citation

Currently we don't have a paper, so you don't need to formally cite us!~

If this project helps your research or engineering, use \footnote{https://github.com/THUDM/SwissArmyTransformer} to mention us and recommend SwissArmyTransformer to others.

The tutorial for contributing sat is on the way!

The project is based on (a user of) DeepSpeed, Megatron-LM and Huggingface transformers. Thanks for their awesome work.

swissarmytransformer's People

Contributors

1049451037 avatar darrenying avatar dm-thu avatar duzx16 avatar dynamicheart avatar hanyullai avatar hhnqqq avatar jimmieliu avatar jintao-huang avatar ledw-2 avatar leizhao1234 avatar lkwq007 avatar lykeven avatar minkowski0125 avatar pierrefdz avatar sengxian avatar sleepychord avatar somefive avatar wenyihong avatar wseng avatar xujz18 avatar yzy-thu avatar zhangfantju avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

swissarmytransformer's Issues

ore.exceptions.ResponseStreamingError

在执行代码
model, model_args = CogVLMModel.from_pretrained(
"cogvlm-chat",
args=argparse.Namespace(
deepspeed=None,
local_rank=0,
rank=0,
world_size=1,
model_parallel_size=1,
mode='inference',
skip_init=True,
fp16=False,
bf16=True,
use_gpu_initialization=True,
device='cuda',
))
时,下载中途报错:
ore.exceptions.ResponseStreamingError: An error occurred while reading from response stream: ('Connection broken: IncompleteRead(8306688 bytes read, 81920 more expected)', IncompleteRead(8306688 bytes read, 81920 more expected))

这个一般是什么原因,请教如何解决?感谢

Problem with default init_method in function _initialize_affine_weight

When using default init_method=init.xavier_normal_ in VocabParallelEmbedding, ParallelEmbedding, ColumnParallelLinear, RowParallelLinear, it will report error with " init_method(master_weight, module=module, name=name) TypeError: xavier_normal_() got an unexpected keyword argument 'module' " in function _initialize_affine_weight.

GLM生成脚本

GLM的生成脚本在执行时一直有下面的报错:

Traceback (most recent call last):
  File "/GLM/GLM_generation/inference_glm.py", line 198, in <module>
    args = get_args(args_list)
  File "/miniconda3/envs/MGLM/lib/python3.9/site-packages/SwissArmyTransformer/arguments.py", line 385, in get_args
    initialize_distributed(args)
  File "/miniconda3/envs/MGLM/lib/python3.9/site-packages/SwissArmyTransformer/arguments.py", line 420, in initialize_distributed
    torch.distributed.init_process_group(
  File "/miniconda3/envs/MGLM/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 520, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "/miniconda3/envs/MGLM/lib/python3.9/site-packages/torch/distributed/rendezvous.py", line 142, in _tcp_rendezvous_handler
    store = TCPStore(result.hostname, result.port, world_size, start_daemon, timeout)
RuntimeError: Address already in use

多次修改端口号都会有这样的问题

确定数据集使用方法和接口:

目前NLP方面使用的是

def create_dataset_function(path, args):
    tokenizer = get_tokenizer()
    def process_fn(row):
        sentence, label = tokenizer._encode(row[0]), int(row[1])
        sentence = [tokenizer.get_command('ENC').Id] + sentence + [tokenizer.get_command('eos').Id]
        if len(sentence) >= args.sample_length:
            sentence = sentence[:args.sample_length]
        else:
            sentence.extend([-1] * (args.sample_length-len(sentence)))
        return {'sentence': np.array(sentence, dtype=np.int64), 'label': label}
    return TSVDataset(path, process_fn, with_heads=True)

是否需要进一步标准化?
对于QA这种两个部分的呢?
同一个数据集的多个split?
train-data/valid-data等如何和自己支持的数据集处理arguments上的冲突?

AutoModel加载模型时报错

加载模型时一直报错:

  • CODE:model, args = AutoModel.from_pretrained(args, 'bert-large-uncased')
  • ErrorInfo:fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN2at5emptyEN3c108ArrayRefIlEENS0_13TensorOptionsENS0_8optionalINS0_12MemoryFormatEEE

目前尝试很多cuda版本都不可以,这是否由apex安装不正确引起

当前环境:
cuda 11.6
torch 1.12.1+cu116

在镜像中安装报错

Collecting SwissArmyTransformer
Using cached SwissArmyTransformer-0.3.7-py3-none-any.whl (2.4 MB)
Requirement already satisfied: torch in /usr/local/lib/python3.8/site-packages (from SwissArmyTransformer) (1.13.1)
Requirement already satisfied: transformers in /usr/local/lib/python3.8/site-packages (from SwissArmyTransformer) (4.27.1)
Requirement already satisfied: sentencepiece in /usr/local/lib/python3.8/site-packages (from SwissArmyTransformer) (0.1.99)
Collecting tensorboardX
Using cached tensorboardX-2.6-py2.py3-none-any.whl (114 kB)
Collecting einops
Using cached einops-0.6.1-py3-none-any.whl (42 kB)
Requirement already satisfied: cpm-kernels in /usr/local/lib/python3.8/site-packages (from SwissArmyTransformer) (1.0.11)
Collecting datasets
Using cached datasets-2.12.0-py3-none-any.whl (474 kB)
Collecting deepspeed
Using cached deepspeed-0.9.3.tar.gz (807 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [21 lines of output]
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.0'
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-486ll747/deepspeed_c3e607a7d6854bbab1896908f44b2431/setup.py", line 119, in
os.environ["TORCH_CUDA_ARCH_LIST"] = get_default_compute_capabilities()
File "/tmp/pip-install-486ll747/deepspeed_c3e607a7d6854bbab1896908f44b2431/op_builder/builder.py", line 55, in get_default_compute_capabilities
if torch.utils.cpp_extension.CUDA_HOME is not None and installed_cuda_version()[0] >= 11:
File "/tmp/pip-install-486ll747/deepspeed_c3e607a7d6854bbab1896908f44b2431/op_builder/builder.py", line 43, in installed_cuda_version
output = subprocess.check_output([cuda_home + "/bin/nvcc", "-V"], universal_newlines=True)
File "/usr/local/lib/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/local/lib/python3.8/subprocess.py", line 493, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/local/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda-11.0/bin/nvcc'
Setting ds_accelerator to cuda (auto detect)
[WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
WARNING: You are using pip version 22.0.4; however, version 23.1.2 is available.
You should consider upgrading via the '/usr/local/bin/python3.8 -m pip install --upgrade pip' command.

请问'/usr/local/cuda/bin/nvcc'无法找到如何解决,安装该库有什么特殊要求吗

不支持流式dataset

visualglm只有FewshotData,数据直接加载到内存中会爆掉,改成
large_dataset_streamed = load_dataset("json", data_files=path,split="train", streaming=True)
dataset = large_dataset_streamed.map(datapreprocess)
的形式后,发现也不支持流式dataset。

huggingface版本的visualglm在前向传播报错,Exception: cuda rng state model-parallel-rng is not added

加载方式

model = AutoModel.from_pretrained(visualchatglm_model_path,trust_remote_code=True).to(torch.cuda.current_device())

运行环境如下:

SwissArmyTransformer               0.3.7
transformers                       4.28.1
deepspeed                          0.9.1
torch                              1.11.0+cu113
torchaudio                         0.11.0+rocm4.5.2
torchvision                        0.12.0+cu113
cpm-kernels                        1.0.11
einops                             0.6.1

报错日志如下:

Epoch_0:   0%|          | 0/16 [00:04<?, ?it/s]
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /export/App/training_platform/PinoModel/applications/VisualGLM/visual_chatgl │
│ m_instructing_mergeclose_v1.py:229 in <module>                               │
│                                                                              │
│   226 │   parser.add_argument('--lr', type=float, default=5e-6)              │
│   227 │   parser.add_argument('--accimulation_steps', type=int, default=4)   │
│   228 │   args = parser.parse_args()                                         │
│ ❱ 229 │   train(args)                                                        │
│   230                                                                        │
│                                                                              │
│ /export/App/training_platform/PinoModel/applications/VisualGLM/visual_chatgl │
│ m_instructing_mergeclose_v1.py:203 in train                                  │
│                                                                              │
│   200 │   │   │   │   │   │    model_save_path='/media/cfs/zhanglezhong/LLMS │
│   201 │   │   │   │   │   │    tensorboard_writer=tensorboard_writer)        │
│   202 │                                                                      │
│ ❱ 203 │   trainer.fit(logger=logger, log_interval=args.log_interval)         │
│   204                                                                        │
│   205 #     # save model checkpoint after fitting on only rank0              │
│   206 #     trainer.save_model(path=args.save_path, only_rank0=True, tokeniz │
│                                                                              │
│ /export/App/training_platform/PinoModel/applications/VisualGLM/coati/trainer │
│ /visual_sft_glm.py:134 in fit                                                │
│                                                                              │
│   131 │   │   │   │   labels = batch["labels"].to(torch.cuda.current_device( │
│   132 │   │   │   │   image = batch["img"].to(torch.cuda.current_device())   │
│   133 │   │   │   │   pre_image = batch["pre_image"]                         │
│ ❱ 134 │   │   │   │   outputs = self.model(input_ids=prompt_ids, images=imag │
│   135 │   │   │   │                                                          │
│   136 │   │   │   │   loss = outputs.loss                                    │
│   137 #                 if loss >= 2.5 and is_rank_0() :                     │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py: │
│ 1110 in _call_impl                                                           │
│                                                                              │
│   1107 │   │   # this function, and just call forward.                       │
│   1108 │   │   if not (self._backward_hooks or self._forward_hooks or self._ │
│   1109 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1110 │   │   │   return forward_call(*input, **kwargs)                     │
│   1111 │   │   # Do not call functions when jit is used                      │
│   1112 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1113 │   │   if self._backward_hooks or _global_backward_hooks:            │
│                                                                              │
│ /root/.cache/huggingface/modules/transformers_modules/visualglm/modeling_cha │
│ tglm.py:1462 in forward                                                      │
│                                                                              │
│   1459 │   │   │   return_dict: Optional[bool] = None,                       │
│   1460 │   ):                                                                │
│   1461 │   │   if inputs_embeds is None and past_key_values is None and imag │
│ ❱ 1462 │   │   │   image_embeds = self.image_encoder(images)                 │
│   1463 │   │   │   pre_id, pads, post_id = torch.tensor_split(input_ids,     │
│   1464 │   │   │   │   │   │   │   │   │   │   │   │   │      [pre_image_len │
│   1465 │   │   │   │   │   │   │   │   │   │   │   │   │      dim=1)  # imag │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py: │
│ 1110 in _call_impl                                                           │
│                                                                              │
│   1107 │   │   # this function, and just call forward.                       │
│   1108 │   │   if not (self._backward_hooks or self._forward_hooks or self._ │
│   1109 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1110 │   │   │   return forward_call(*input, **kwargs)                     │
│   1111 │   │   # Do not call functions when jit is used                      │
│   1112 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1113 │   │   if self._backward_hooks or _global_backward_hooks:            │
│                                                                              │
│ /root/.cache/huggingface/modules/transformers_modules/visualglm/visual.py:69 │
│ in forward                                                                   │
│                                                                              │
│    66 │   │   │   self.qformer.parameters().__next__().dtype)                │
│    67 │                                                                      │
│    68 │   def forward(self, image, **kwargs):                                │
│ ❱  69 │   │   enc = self.vit(image)[0]                                       │
│    70 │   │   out = self.qformer(enc)[0]                                     │
│    71 │   │   return self.glm_proj(out)                                      │
│    72                                                                        │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py: │
│ 1110 in _call_impl                                                           │
│                                                                              │
│   1107 │   │   # this function, and just call forward.                       │
│   1108 │   │   if not (self._backward_hooks or self._forward_hooks or self._ │
│   1109 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1110 │   │   │   return forward_call(*input, **kwargs)                     │
│   1111 │   │   # Do not call functions when jit is used                      │
│   1112 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1113 │   │   if self._backward_hooks or _global_backward_hooks:            │
│                                                                              │
│ /root/.cache/huggingface/modules/transformers_modules/visualglm/visual.py:28 │
│ in forward                                                                   │
│                                                                              │
│    25 │   │   batch_size = image.size(0)                                     │
│    26 │   │   input_ids = torch.zeros(batch_size, 1, dtype=torch.long, devic │
│    27 │   │   attention_mask = torch.tensor([[1.]], dtype=image.dtype, devic │
│ ❱  28 │   │   return super().forward(input_ids=input_ids, position_ids=None, │
│    29                                                                        │
│    30                                                                        │
│    31 class QFormer(BaseModel):                                              │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/sat/model/base_model.py:144 │
│ in forward                                                                   │
│                                                                              │
│   141 │   │   # Attention! the transformer might be shared by multiple model │
│   142 │   │   self.transformer.hooks.clear()                                 │
│   143 │   │   self.transformer.hooks.update(self.hooks)                      │
│ ❱ 144 │   │   return self.transformer(*args, **kwargs)                       │
│   145 │                                                                      │
│   146 │   def collect_hooks_(self):                                          │
│   147 │   │   names = list(HOOKS_DEFAULT.keys())                             │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py: │
│ 1110 in _call_impl                                                           │
│                                                                              │
│   1107 │   │   # this function, and just call forward.                       │
│   1108 │   │   if not (self._backward_hooks or self._forward_hooks or self._ │
│   1109 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1110 │   │   │   return forward_call(*input, **kwargs)                     │
│   1111 │   │   # Do not call functions when jit is used                      │
│   1112 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1113 │   │   if self._backward_hooks or _global_backward_hooks:            │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/sat/model/transformer.py:56 │
│ 9 in forward                                                                 │
│                                                                              │
│   566 │   │   │   │   │   │   output_this_layer=output_this_layer_obj, outpu │
│   567 │   │   │   │   │   )                                                  │
│   568 │   │   │   │   else:                                                  │
│ ❱ 569 │   │   │   │   │   layer_ret = layer(*args, layer_id=torch.tensor(i), │
│   570 │   │   │   │   │   │   output_this_layer=output_this_layer_obj, outpu │
│   571 │   │   │   │   if isinstance(layer_ret, tuple):                       │
│   572 │   │   │   │   │   layer_ret = layer_ret[0] # for legacy API          │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py: │
│ 1110 in _call_impl                                                           │
│                                                                              │
│   1107 │   │   # this function, and just call forward.                       │
│   1108 │   │   if not (self._backward_hooks or self._forward_hooks or self._ │
│   1109 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1110 │   │   │   return forward_call(*input, **kwargs)                     │
│   1111 │   │   # Do not call functions when jit is used                      │
│   1112 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1113 │   │   if self._backward_hooks or _global_backward_hooks:            │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/sat/model/transformer.py:33 │
│ 0 in forward                                                                 │
│                                                                              │
│   327 │   │   )                                                              │
│   328 │                                                                      │
│   329 │   def forward(self, hidden_states, mask, *args, **kw_args):          │
│ ❱ 330 │   │   return HOOKS_DEFAULT['layer_forward'](self, hidden_states, mas │
│   331                                                                        │
│   332                                                                        │
│   333 class BaseTransformer(torch.nn.Module):                                │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/sat/transformer_defaults.py │
│ :127 in layer_forward_default                                                │
│                                                                              │
│   124 │   # Layer norm at the begining of the transformer layer.             │
│   125 │   attention_input = self.input_layernorm(hidden_states)              │
│   126 │   # Self attention.                                                  │
│ ❱ 127 │   attention_output = self.attention(attention_input, mask, **kw_args │
│   128 │                                                                      │
│   129 │   # Third LayerNorm                                                  │
│   130 │   if self.layernorm_order == 'sandwich':                             │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py: │
│ 1110 in _call_impl                                                           │
│                                                                              │
│   1107 │   │   # this function, and just call forward.                       │
│   1108 │   │   if not (self._backward_hooks or self._forward_hooks or self._ │
│   1109 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1110 │   │   │   return forward_call(*input, **kwargs)                     │
│   1111 │   │   # Do not call functions when jit is used                      │
│   1112 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1113 │   │   if self._backward_hooks or _global_backward_hooks:            │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/sat/model/transformer.py:10 │
│ 3 in forward                                                                 │
│                                                                              │
│   100 │   │   if 'attention_forward' in self.hooks:                          │
│   101 │   │   │   return self.hooks['attention_forward'](hidden_states, mask │
│   102 │   │   else:                                                          │
│ ❱ 103 │   │   │   return HOOKS_DEFAULT['attention_forward'](self, hidden_sta │
│   104                                                                        │
│   105                                                                        │
│   106 class CrossAttention(torch.nn.Module):                                 │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/sat/transformer_defaults.py │
│ :63 in attention_forward_default                                             │
│                                                                              │
│    60 │   key_layer = self._transpose_for_scores(mixed_key_layer)            │
│    61 │   value_layer = self._transpose_for_scores(mixed_value_layer)        │
│    62 │                                                                      │
│ ❱  63 │   context_layer = attention_fn(query_layer, key_layer, value_layer,  │
│    64 │                                                                      │
│    65 │   context_layer = context_layer.permute(0, 2, 1, 3).contiguous()     │
│    66 │   new_context_layer_shape = context_layer.size()[:-2] + (self.hidden │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/sat/transformer_defaults.py │
│ :38 in standard_attention                                                    │
│                                                                              │
│    35 │                                                                      │
│    36 │   if attention_dropout is not None:                                  │
│    37 │   │   if mpu.get_cuda_rng_tracker is not None:                       │
│ ❱  38 │   │   │   with mpu.get_cuda_rng_tracker().fork():                    │
│    39 │   │   │   │   attention_probs = attention_dropout(attention_probs)   │
│    40 │   │   else:                                                          │
│    41 │   │   │   attention_probs = attention_dropout(attention_probs)       │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/contextlib.py:113 in __enter__            │
│                                                                              │
│   110 │   │   # they are only needed for recreation, which is not possible a │
│   111 │   │   del self.args, self.kwds, self.func                            │
│   112 │   │   try:                                                           │
│ ❱ 113 │   │   │   return next(self.gen)                                      │
│   114 │   │   except StopIteration:                                          │
│   115 │   │   │   raise RuntimeError("generator didn't yield") from None     │
│   116                                                                        │
│                                                                              │
│ /usr/local/anaconda3/lib/python3.8/site-packages/deepspeed/runtime/activatio │
│ n_checkpointing/checkpointing.py:174 in fork                                 │
│                                                                              │
│   171 │   │   the original state."""                                         │
│   172 │   │   # Check if we have added the state                             │
│   173 │   │   if name not in self.states_:                                   │
│ ❱ 174 │   │   │   raise Exception('cuda rng state {} is not added'.format(na │
│   175 │   │   # Store current rng state.                                     │
│   176 │   │   orig_cuda_rng_state = get_accelerator().get_rng_state()        │
│   177 │   │   # Set rng state to the desired one                             │
╰──────────────────────────────────────────────────────────────────────────────╯
Exception: cuda rng state model-parallel-rng is not added

请教一个问题,使用mp_size=2时的loss应该怎么写

logits, *mems = model(inputs_ids, position_ids, attention_mask)
# print(logits.shape)
loss_func = CrossEntropyLoss(ignore_index=-100)
loss = loss_func(logits.view(-1, logits.size(-1)).float(), labels.view(-1))``

我是这样写的loss计算方式,会出现一个/opt/conda/conda-bld/pytorch_1670525539683/work/aten/src/ATen/native/cuda/Loss.cu:242: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.`` 错误

AssertionError: data parallel group is not initialized

Hi,
I encounter an error as follows:

Traceback (most recent call last):
File "/data33/private/xinpeng/codebase/CogView2/pretrain_coglm.py", line 244, in
training_main(args, model_cls=BaseModel, forward_step_function=forward_step, create_dataset_function=create_dataset_function)
File "/home/xinpeng/miniconda3/envs/cogview/lib/python3.9/site-packages/SwissArmyTransformer/training/deepspeed_training.py", line 66, in training_main
train_data, val_data, test_data = make_loaders(args, hooks['create_dataset_function'])
File "/home/xinpeng/miniconda3/envs/cogview/lib/python3.9/site-packages/SwissArmyTransformer/data_utils/configure_data.py", line 166, in make_loaders
group=mpu.get_data_parallel_group())
File "/home/xinpeng/miniconda3/envs/cogview/lib/python3.9/site-packages/SwissArmyTransformer/mpu/initialize.py", line 97, in get_data_parallel_group
assert _DATA_PARALLEL_GROUP is not None,
AssertionError: data parallel group is not initialized

测试源码中给的qlora.py报错

直接跑源码的qlora.py,报错
image
给model.child = LoraLinear(100, 200, 10)改为model.child = LoraLinear(100, 200, 10,10,2)后,又报错
image

SAT Tokenizer 地址挂了

[2023-07-06 16:43:31,720] [INFO] [RANK 0] Try to load tokenizer from Huggingface transformers...
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
'HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Max retries exceeded with url: /repos/af/61/af61aa6351e76afb5cd67b257f67055118e5057e1f6d9cce1b4c1c566c85cfd9/5e974d9a69c242ce014c88c2b26089270f6198f3c0b700a887666cd3e816f17e?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27ice_text.model%3B+filename%3D%22ice_text.model%22%3B&Expires=1688890299&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2FmLzYxL2FmNjFhYTYzNTFlNzZhZmI1Y2Q2N2IyNTdmNjcwNTUxMThlNTA1N2UxZjZkOWNjZTFiNGMxYzU2NmM4NWNmZDkvNWU5NzRkOWE2OWMyNDJjZTAxNGM4OGMyYjI2MDg5MjcwZjYxOThmM2MwYjcwMGE4ODc2NjZjZDNlODE2ZjE3ZT9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODg4OTAyOTl9fX1dfQ__&Signature=N9U6q-ipHqFHDD165Efe7x6zNHqWs4wltulReGh8gQDDx75Zsygpg6J486rPYFwQxEVg0ZwJdbXVjNdLpC0Z7frfWvHzOjl97tT76e5yy5mL9SuOTiHDA0ASdsBhf8oj8bbCMACFnOhUnZjSsu1ooKh4KJm9Au7tFgPMUZ5ProbVx9n1m2xTjwXbKR1YRaUSPkZLVMIscAeBz5tDi-DHX3qr8dafubrLlOgZ25kfgTGK-yD9cE7jR3R6vYuRGHGUACISGnCNI76CftJ5QRHz6ZrVdUZQpZt7zZTpdD8cppzcNbtwdvtV6DUurw7Va7JOLZwbpdoidyoN0Y8vehHkXA__&Key-Pair-Id=KVTP0A1DKRTAX (Caused by ProxyError('Cannot connect to proxy.', timeout('timed out')))' thrown while requesting GET https://cdn-lfs.huggingface.co/repos/af/61/af61aa6351e76afb5cd67b257f67055118e5057e1f6d9cce1b4c1c566c85cfd9/5e974d9a69c242ce014c88c2b26089270f6198f3c0b700a887666cd3e816f17e?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27ice_text.model%3B+filename%3D%22ice_text.model%22%3B&Expires=1688890299&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2FmLzYxL2FmNjFhYTYzNTFlNzZhZmI1Y2Q2N2IyNTdmNjcwNTUxMThlNTA1N2UxZjZkOWNjZTFiNGMxYzU2NmM4NWNmZDkvNWU5NzRkOWE2OWMyNDJjZTAxNGM4OGMyYjI2MDg5MjcwZjYxOThmM2MwYjcwMGE4ODc2NjZjZDNlODE2ZjE3ZT9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODg4OTAyOTl9fX1dfQ__&Signature=N9U6q-ipHqFHDD165Efe7x6zNHqWs4wltulReGh8gQDDx75Zsygpg6J486rPYFwQxEVg0ZwJdbXVjNdLpC0Z7frfWvHzOjl97tT76e5yy5mL9SuOTiHDA0ASdsBhf8oj8bbCMACFnOhUnZjSsu1ooKh4KJm9Au7tFgPMUZ5ProbVx9n1m2xTjwXbKR1YRaUSPkZLVMIscAeBz5tDi-DHX3qr8dafubrLlOgZ25kfgTGK-yD9cE7jR3R6vYuRGHGUACISGnCNI76CftJ5QRHz6ZrVdUZQpZt7zZTpdD8cppzcNbtwdvtV6DUurw7Va7JOLZwbpdoidyoN0Y8vehHkXA__&Key-Pair-Id=KVTP0A1DKRTAX
[2023-07-06 16:43:44,341] [INFO] [RANK 0] Cannot find THUDM/chatglm-6b from Huggingface or sat. Creating a fake tokenizer...

Fail to load random states from checkpoints saved

I loaded checkpoint to continue pretrain and found the following error.

[2023-09-25 15:53:27,994] [INFO] [RANK 0] Unable to load optimizer from checkpoint <ckpt_path>, exiting. Specify --no-load-rng or --finetune to prevent attempting to load the random state.

checked to find there is no 'rng_tracker_states' item saved in function save_checkpoint but loaded in load_checkpoint (sat/training/model_io.py).

OPT, BLOOM?

It's quite inspiring you've made a library that works to unify the shared components of mainstream model architectures.

I was a little surprised to see that it didn't have support for OPT and BLOOM.

Any plans for these?

安装时报错

[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] One can disable async_io with DS_BUILD_AIO=0
[ERROR] Unable to pre-compile async_io
Traceback (most recent call last):
File "D:\anaconda\lib\site-packages\setuptools\sandbox.py", line 156, in save_modules
yield saved
File "D:\anaconda\lib\site-packages\setuptools\sandbox.py", line 198, in setup_context
yield
File "D:\anaconda\lib\site-packages\setuptools\sandbox.py", line 259, in run_setup
_execfile(setup_script, ns)
File "D:\anaconda\lib\site-packages\setuptools\sandbox.py", line 46, in _execfile
exec(code, globals, locals)
File "C:\Users\stone\AppData\Local\Temp\easy_install-gs_qn_mw\deepspeed-0.9.2\setup.py", line 162, in
File "C:\Users\stone\AppData\Local\Temp\easy_install-gs_qn_mw\deepspeed-0.9.2\setup.py", line 51, in abort
AssertionError: Unable to pre-compile async_io

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\novelaileak\SwissArmyTransformer\setup.py", line 17, in
setup(
File "D:\anaconda\lib\site-packages\setuptools_init_.py", line 87, in setup
return distutils.core.setup(**attrs)
File "D:\anaconda\lib\site-packages\setuptools_distutils\core.py", line 185, in setup
return run_commands(dist)
File "D:\anaconda\lib\site-packages\setuptools_distutils\core.py", line 201, in run_commands
dist.run_commands()
File "D:\anaconda\lib\site-packages\setuptools_distutils\dist.py", line 969, in run_commands
self.run_command(cmd)
File "D:\anaconda\lib\site-packages\setuptools\dist.py", line 1208, in run_command
super().run_command(command)
File "D:\anaconda\lib\site-packages\setuptools_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "D:\anaconda\lib\site-packages\setuptools\command\install.py", line 74, in run
self.do_egg_install()
File "D:\anaconda\lib\site-packages\setuptools\command\install.py", line 131, in do_egg_install
cmd.run(show_deprecation=False)
File "D:\anaconda\lib\site-packages\setuptools\command\easy_install.py", line 420, in run
self.easy_install(spec, not self.no_deps)
File "D:\anaconda\lib\site-packages\setuptools\command\easy_install.py", line 662, in easy_install
return self.install_item(None, spec, tmpdir, deps, True)
File "D:\anaconda\lib\site-packages\setuptools\command\easy_install.py", line 709, in install_item
self.process_distribution(spec, dist, deps)
File "D:\anaconda\lib\site-packages\setuptools\command\easy_install.py", line 754, in process_distribution
distros = WorkingSet([]).resolve(
File "D:\anaconda\lib\site-packages\pkg_resources_init_.py", line 789, in resolve
dist = best[req.key] = env.best_match(
File "D:\anaconda\lib\site-packages\pkg_resources_init_.py", line 1075, in best_match
return self.obtain(req, installer)
File "D:\anaconda\lib\site-packages\pkg_resources_init_.py", line 1087, in obtain
return installer(requirement)
File "D:\anaconda\lib\site-packages\setuptools\command\easy_install.py", line 681, in easy_install
return self.install_item(spec, dist.location, tmpdir, deps)
File "D:\anaconda\lib\site-packages\setuptools\command\easy_install.py", line 707, in install_item
dists = self.install_eggs(spec, download, tmpdir)
File "D:\anaconda\lib\site-packages\setuptools\command\easy_install.py", line 900, in install_eggs
return self.build_and_install(setup_script, setup_base)
File "D:\anaconda\lib\site-packages\setuptools\command\easy_install.py", line 1174, in build_and_install
self.run_setup(setup_script, setup_base, args)
File "D:\anaconda\lib\site-packages\setuptools\command\easy_install.py", line 1158, in run_setup
run_setup(setup_script, args)
File "D:\anaconda\lib\site-packages\setuptools\sandbox.py", line 249, in run_setup
with setup_context(setup_dir):
File "D:\anaconda\lib\contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "D:\anaconda\lib\site-packages\setuptools\sandbox.py", line 190, in setup_context
with save_modules():
File "D:\anaconda\lib\contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "D:\anaconda\lib\site-packages\setuptools\sandbox.py", line 169, in save_modules
saved_exc.resume()
File "D:\anaconda\lib\site-packages\setuptools\sandbox.py", line 143, in resume
raise exc.with_traceback(self._tb)
File "D:\anaconda\lib\site-packages\setuptools\sandbox.py", line 156, in save_modules
yield saved
File "D:\anaconda\lib\site-packages\setuptools\sandbox.py", line 198, in setup_context
yield
File "D:\anaconda\lib\site-packages\setuptools\sandbox.py", line 259, in run_setup
_execfile(setup_script, ns)
File "D:\anaconda\lib\site-packages\setuptools\sandbox.py", line 46, in _execfile
exec(code, globals, locals)
File "C:\Users\stone\AppData\Local\Temp\easy_install-gs_qn_mw\deepspeed-0.9.2\setup.py", line 162, in
File "C:\Users\stone\AppData\Local\Temp\easy_install-gs_qn_mw\deepspeed-0.9.2\setup.py", line 51, in abort
AssertionError: Unable to pre-compile async_io
请问应该如何解决?

有试过模型并行的朋友吗?

  1. 目前发现mp-size > 1 时,损失下降到10左右开始稳定;mp == 1,损失能稳定下降到5;
  2. 使用glm下面的finetune-sst2 脚本执行有如下问题,这个当时咱们有测试过吗?
  3. 这里加载的glm模型是不是不能直接用GLM中云盘提供的模型?[已尝试加载,assert 显示embedding weight 未初始化等问题]

../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [40,0,0], thread: [75,0,0] Assertion srcIndex < srcSelectDimSize failed.

Window 安装错误

尝试安装时候出现以下错误

PS D:\deeplearning\VisualGLM-6B> pip install SwissArmyTransformer>=0.4.4
Collecting SwissArmyTransformer>=0.4.4
  Obtaining dependency information for SwissArmyTransformer>=0.4.4 from https://files.pythonhosted.org/packages/91/b9/44a3e9cc0116a3ebf8b60f0ef67340ca22c195e37b9d8f951a37e114f300/SwissArmyTransformer-0.4.4-py3-none-any.whl.metadata
  Using cached SwissArmyTransformer-0.4.4-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: torch in c:\python311\lib\site-packages (from SwissArmyTransformer>=0.4.4) (2.0.1+cu118)
Collecting deepspeed (from SwissArmyTransformer>=0.4.4)
  Using cached deepspeed-0.10.0.tar.gz (836 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [24 lines of output]
      [WARNING] Unable to import torch, pre-compiling ops will be disabled. Please visit https://pytorch.org/ to see how to properly install torch on your system.
       [WARNING]  unable to import torch, please install it if you want to pre-compile any deepspeed ops.
      DS_BUILD_OPS=1
      Traceback (most recent call last):
        File "C:\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
          main()
        File "C:\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "D:\TMP\pip-build-env-94s77i31\overlay\Lib\site-packages\setuptools\build_meta.py", line 341, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "D:\TMP\pip-build-env-94s77i31\overlay\Lib\site-packages\setuptools\build_meta.py", line 323, in _get_build_requires
          self.run_setup()
        File "D:\TMP\pip-build-env-94s77i31\overlay\Lib\site-packages\setuptools\build_meta.py", line 488, in run_setup
          self).run_setup(setup_script=setup_script)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "D:\TMP\pip-build-env-94s77i31\overlay\Lib\site-packages\setuptools\build_meta.py", line 338, in run_setup
          exec(code, locals())
        File "<string>", line 130, in <module>
      AssertionError: Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

但是实际上torch的已经安装了的

Python 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.0.1+cu118'

don't drop_last for strict-eval

测试dev set结果很多时候需要准确值,现在如果eval_batchsize*world_size盈余则drop last。需要解决。

When bert distillation runs on saving model after training, an error occurs

    save_checkpoint(args.iteration, model, optimizer, lr_scheduler, args)                                
  File "/home/lvqingsong/new/SwissArmyTransformer/SwissArmyTransformer/training/model_io.py", line 82, in
 save_checkpoint                                                                                         
    model_specific_args = extract_model_specific_args_from_model(args, module)                           
  File "/home/lvqingsong/new/SwissArmyTransformer/SwissArmyTransformer/training/model_io.py", line 44, in
 extract_model_specific_args_from_model
    md.add_model_specific_args(parser)
  File "/home/lvqingsong/new/SwissArmyTransformer/examples/bert/bert_ft_model.py", line 18, in add_model_
specific_args
    return super().add_model_specific_args(parser)
  File "/home/lvqingsong/new/SwissArmyTransformer/SwissArmyTransformer/model/official/bert_model.py", lin
e 48, in add_model_specific_args
    group.add_argument('--num-types', type=int)
  File "/usr/lib/python3.8/argparse.py", line 1398, in add_argument
    return self._add_action(action)
  File "/usr/lib/python3.8/argparse.py", line 1602, in _add_action
    action = super(_ArgumentGroup, self)._add_action(action)
  File "/usr/lib/python3.8/argparse.py", line 1412, in _add_action
    self._check_conflict(action)
  File "/usr/lib/python3.8/argparse.py", line 1551, in _check_conflict
    conflict_handler(action, confl_optionals)
  File "/usr/lib/python3.8/argparse.py", line 1560, in _handle_conflict_error
    raise ArgumentError(action, message % conflict_string)
argparse.ArgumentError: argument --num-types: conflicting option string: --num-types

Cannot use torch.compile with SAT

I tried to use torch.compile but failed with SAT. The reason is that self.transformer.hooks.clear() in base_model.py also clear the hooks of torch.compile?

怎样使用DeepSpeed的offload功能降低显存占用?

我运行VisualGLM-6B的LoRA finetune时,由于显卡为16G显存,所以会报CUDA Out of Memory错误。我在命令行加入DeepSpeed配置文件:

gpt_options=" \
       --experiment-name finetune-$MODEL_TYPE \
       --model-parallel-size ${MP_SIZE} \
       --mode finetune \
       --train-iters 300 \
       --resume-dataloader \
       $MODEL_ARGS \
       --train-data ${train_data} \
       --valid-data ${eval_data} \
       --distributed-backend nccl \
       --lr-decay-style cosine \
       --warmup .02 \
       --checkpoint-activations \
       --save-interval 300 \
       --eval-interval 10000 \
       --save "./work/ckpt" \
       --deepspeed \
       --deepspeed_config finetune/deepspeed.json \
       --split 1 \
       --eval-iters 10 \
       --eval-batch-size 1 \
       --lr 0.0001 \
       --batch-size 1 \
       --skip-init \
       --fp16 \
       --use_lora
"

              

run_cmd="${OPTIONS_NCCL} ${OPTIONS_SAT} deepspeed --master_port 16666 --num_gpus=1 --hostfile ${HOST_FILE_PATH} finetune_visualglm.py ${gpt_options}"
echo ${run_cmd}
eval ${run_cmd}

配置文件的内容如下:

{
    "train_micro_batch_size_per_gpu": 1,
    "zero_allow_untested_optimizer": true,
    "gradient_accumulation_steps": 1,
    "fp16": {
        "enabled": "auto",
        "loss_scale": 0,
        "initial_scale_power": 16,
        "loss_scale_window": 1000,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto"
        }
    },
    "scheduler": {
        "type": "WarmupLR",
        "params": {
            "warmup_min_lr": "auto",
            "warmup_max_lr": "auto",
            "warmup_num_steps": "auto"
        }
    },
    "zero_optimization": {
        "stage": 2,
        "offload_param": {
            "device": "cpu"
        },
        "offload_optimizer": {
            "device": "cpu"
        },
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e9,
        "stage3_gather_16bit_weights_on_model_save": false
    }
}

这个配置在chatGLM下是成功的。
但是我用这个运行VisualGLM-6B微调模型时,还是会报CUDA Out of Memory错误,我的机器是128G内存,做LoRA微调应该是够用了,好像是ZeRO offload没生效,通过跟踪源码,发现在调from_pretrained的时候,调用get_model方法,其中调用model.to(device)的时候就失败,DeepSpeed应该没起作用。怎样将模型通过DeepSpeed加载到CPU内存中呢?

sat.arguments.get_args failed to handle the "-h" option

What I did

I pass the "-h" option to print the help message.
However, my script complains TypeError: %o format: an integer is required, not dict in /usr/lib/python3.8/argparse.py:Line633.
The following snippet can reproduce the phenomenon.

from sat.arguments import get_args
print(get_args(["-h"]))

Suggestion

The following patch works.

diff --git a/sat/arguments.py b/sat/arguments.py
index 5c53bd0..6bfee94 100755
--- a/sat/arguments.py
+++ b/sat/arguments.py
@@ -152,7 +152,7 @@ def add_training_args(parser):
     group.add_argument('--lr-decay-ratio', type=float, default=0.1)
     
     group.add_argument('--warmup', type=float, default=0.01,
-                       help='percentage of data to warmup on (.01 = 1% of all '
+                       help='percentage of data to warmup on (.01 = 1%% of all '
                             'training iters). Default 0.01')
     group.add_argument('--weight-decay', type=float, default=0.01,
                        help='weight decay coefficient for L2 regularization')

Version

The main branch, c5e09a8

请问dataloader能shuffle吗?

我看到 site-packages/sat/data_utils/configure_data.py 的 torch.utils.data.DataLoader() 没有指定 shuffle,也就是用默认值 False。是否意味着每个 epoch 开始时 dataset 不会被打乱顺序?在不改库实现的情况下,有没有其他方式能过做到 shuffle 呢?

roberta

  • 从roberta的官方checkpoint转化到basemodel的权重
  • 测试正确性
  • 写在examples/roberta里,不需要改库本身代码

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.