Code Monkey home page Code Monkey logo

samba's Introduction

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

arXiv

Samba is a simple yet powerful hybrid model with an unlimited context length. Its architecture is frustratingly simple:

Samba = Mamba + MLP + Sliding Window Attention + MLP stacking at the layer level.

Our largest model, Samba-3.8B, is trained on 3.2 trillion tokens from the Phi3 dataset, outperforming Phi3-mini on major benchmarks (e.g. MMLU, GSM8K and HumanEval) by a large margin. Samba can also achieve perfect long-context retrieval ability with minimal instruction tuning, while still maintaining its linear complexity with respect to sequence length. This ability leads to the impressive performance of Samba-3.8B-instruct on downstream tasks such as long-context summarization.

Performance ๐Ÿš€

Model MMLU GSM8K HumanEval GovReport SQuALITY
Phi-3-mini-4K-instruct 68.8 82.5 58.5 14.4 21.6
Samba-3.8B-instruct (preview) 71.9 87.6 62.8 18.9 21.2

We report 5-shot accuracy for MMLU, 8-shot CoT accuracy for GSM8K, 0-shot pass@1 for HumanEval and ROUGE-L for both GovReport and SQuALITY.

Updates

  • [June 11] Released the codebase for training Samba-421M and Samba-1.3B on SlimPajama.

Code Overview

Our training infrastructure on SlimPajama is a modified version of TinyLlama and LitGPT. One can easily specify different architectual configurations through modifying the model_name and the config file which includes tons of baseline architectures mentioned in the paper. Our RetNet and GLA implementations are from the awesome Flash Linear Attention repository.

Pretraining Samba from scratch

Please follow the Dockerfile to setup the environment. The data preparation mainly follows TinyLlama except that we only use the SlimPajama dataset.

Data Preparation

Download the Slimpajama dataset to your chosen directory.

cd /path/to/dataset
git lfs install
git clone https://huggingface.co/datasets/cerebras/SlimPajama-627B

The SlimPajama dataset takes 893GB diskspace. Use the provided scripts to tokenize the datasets and divide them into chunks.

python scripts/prepare_slimpajama.py --source_path /path/to/SlimPajama --tokenizer_path data/llama  --destination_path data/slim --split validation --percentage 1.0
python scripts/prepare_slimpajama.py --source_path /path/to/SlimPajama --tokenizer_path data/llama  --destination_path data/slim --split train --percentage 1.0

You are now ready to launch a job!

Training

The following script trains a default Samba-421M model on a single node of 8 GPUs with 20B tokens.

torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=samba-421M --rdzv_backend=c10d  --rdzv_endpoint=${MASTER_ADDR}:${MASTER_PORT} pretrain.py --train_data_dir data/slim --val_data_dir data/slim 

You can modify model_name to "Samba_1.3B" and train_config to "tsz512x4k_100B" for training a Samba-1.3B model with 100B tokens. We assume that you have 8 nodes each with 8 GPUs, and you can modify the number of nodes for training on fewer gpus.

Citation

If you find our work useful, please consider citing:

@article{ren2024samba,
      title={Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling}, 
      author={Liliang Ren and Yang Liu and Yadong Lu and Yelong Shen and Chen Liang and Weizhu Chen},
      journal = {arXiv preprint},
      year={2024},
      url={https://arxiv.org/abs/2406.07522}
}

Contact

Liliang Ren ([email protected])

samba's People

Contributors

eltociear avatar microsoft-github-policy-service[bot] avatar renll avatar yelongshen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

samba's Issues

Error when using Docker

Hello,

I have built a Docker image based on the Dockerfile provided, but I am seeing the following error when trying to prepare the data.

Docker command: docker run -d --name samba -v ./data:/app/data -v /data/hf-cache:/data/hf-cache --runtime nvidia samba python scripts/prepare_slimpajama.py --source_path /data/hf-cache/datasets--cerebras--SlimPajama-627B/snapshots/2d0accdd58c5d5511943ca1f5ff0e3eb5e293543/ --tokenizer_path data/llama --destination_path data/slim --split validation --percentage 1.0

Container logs:


=============
== PyTorch ==
=============

NVIDIA Release 23.07 (build 63867923)
PyTorch Version 2.1.0a0+b5021ba

Container image Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2014-2023 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for PyTorch.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

/usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
Traceback (most recent call last):
  File "/app/scripts/prepare_slimpajama.py", line 21, in <module>
    import lit_gpt.packed_dataset as packed_dataset
  File "/app/lit_gpt/__init__.py", line 7, in <module>
    from lit_gpt.model import GPT
  File "/app/lit_gpt/model.py", line 15, in <module>
    from xformers.ops import SwiGLU
  File "/usr/local/lib/python3.10/dist-packages/xformers/ops/__init__.py", line 8, in <module>
    from .fmha import (
  File "/usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/__init__.py", line 10, in <module>
    from . import (
  File "/usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/triton_splitk.py", line 548, in <module>
    _get_splitk_kernel(num_groups)
  File "/usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/triton_splitk.py", line 503, in _get_splitk_kernel
    _fwd_kernel_splitK_unrolled = unroll_varargs(_fwd_kernel_splitK, N=num_groups)
  File "/usr/local/lib/python3.10/dist-packages/xformers/triton/vararg_kernel.py", line 166, in unroll_varargs
    jitted_fn = triton.jit(fn)
  File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 864, in jit
    return decorator(fn)
  File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 853, in decorator
    return JITFunction(
  File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 699, in __init__
    self.src = self.src[re.search(r"^def\s+\w+\s*\(", self.src, re.MULTILINE).start():]
AttributeError: 'NoneType' object has no attribute 'start'

How to evaluate?

We tried to pretrain a 421M Samba, but after pretraining, we find that you did not open source the evaluation script.

Mamba2

First, thanks for this great contribution and the well written paper!

As Mamba2 also was released very recently, do you have any thoughts on the potential integration or impact of Mamba2 on the Samba architecture?

Would be much appreciated.

Typo on paper?

Hi, I was just wondering if there is a typo in the Arxiv preprint https://arxiv.org/pdf/2406.07522 ?

The MMLU results for Mamba-2.8 seems a bit low (26, vs 45.28 for the Mamba-1.8) model?

Is it a typo or is there some other reason why the mamba-1.8b model is outperforming its larger cousin in MMLU?
(Or am I misreading something?)

Please advise.
Thank you.!

Support for Transformers library

Hi! Thank you for the great work on Samba! The hybrid model is very interesting.

Since Mamba-1 is now supported in the latest version of transformers, is there any plan for Samba to support the transformers library as well? This could benefit many researchers in the community.

Supporting transformers would allow for easier integration with existing workflows and tools, potentially increasing adoption and enabling more researchers to experiment with Samba.

inference code

I'm turning the learning. Is there any code to infer the model after learning? If not, when are you going to upload the inference code? And is there any reference or url to refer to for inference?

Inferrence Code

Amazing work, team! Thank you sincerely for sharing.

I have trained a toy model but have completely failed creating an inference script. Sharing one would be sincerely appreciated!

weight access

Great job! Will you be open-sourcing the model weights?

Too late for a name change?

This is seriously confusing with Microsoft SMB/Samba.

Samba is the standard Windows interoperability suite of programs for Linux and Unix. Samba is Free Software licensed under the GNU General Public License and the Samba project is a member of the Software Freedom Conservancy. Since 1992, Samba has provided secure, stable and fast file and print services for all clients using the SMB/CIFS protocol, such as all versions of DOS and Windows, OS/2, Linux and many others. Samba is an important component to seamlessly integrate Linux/Unix Servers and Desktops into Active Directory environments. It can function both as a domain controller or as a regular domain member.

https://github.com/samba-team/samba

Models

Hi, thanks for releasing Samba! Are there any plans to release the pretrained models? Thanks!

Vocab size mismatch between config.py and the paper

Hi,
Thanks for this very interesting paper !
Comparing the table 9 in the paper and the config.py file, it looks like there is a mismatch between the vocabulary size.
Which one did you actually used ?

Best Regards,

LitGPT

Congrats on this research milestone ๐Ÿ™Œ! And itโ€™s nice to see that our LitGPT library has been has been useful for this project. However, note that LitGPT is an open-source project, and the Apache 2.0 open-source license requires including the original license and copyright notice. Right now, the files in Sambaโ€™s lit_gpt subfolder only say

# Copyright (c) Microsoft Corporation ...

which needs to be updated to also include

Copyright Lightning AI. Licensed under the Apache License 2.0,
see LICENSE file at https://github.com/Lightning-AI/litgpt/blob/main/LICENSE

Could you please correct this mistake as soon as possible and include the correct LitGPT license reference.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.