Code Monkey home page Code Monkey logo

scaled-rope's Introduction

Multinode LoRA + Flash + DS

This project adds LoRA, Flash attn patch and DS (Deepspeed) to Scaled Rope that can be run multinode.

Flash Attention 2 and LLaMA 2 ready ๐Ÿš€

Setup and Installation

  1. To install the necessary dependencies, use pip to install the required Python packages:

    pip install -r requirements.txt
    
  2. Update the config.yaml file as per your requirements.

Usage

To run the application, use the following command:

python --configs defaults <your_override_config>

Replace <your_override_config> with your specific configuration specified in configs/config.yaml. Command line arguments can also be overridden.

Please Note: This uses the HF recent PR, so models are HF compatible. Linear scaling argument: 'interpolation_factor', i.e. how much you want to scale the model. If set to None will scale config.max_position_embeddings / 4096. As this is the default for LLaMA 2.

Data

  • Specify packed untokenized datasets on the hub under dataset_names e.g. (Multi-Domain-Expert-Layers/the_pile_books3_packed_128k)
  • If pretokenized=True, specify a single pre-tokenized dataset on the hub under dataset_names (conceptofmind/rp-packed-32k-no-filter for OpenLLaMA)

Running on a Single Node

Use the following commands for running on a single node:

  1. Export the necessary paths:

    export PYTHONPATH="/mnt/data/jordiclive/scaled-rope:$PYTHONPATH"
    export TRANSFORMERS_CACHE="/mnt/data/jordiclive/transformers_cache"
    export HF_DATASETS_CACHE="/mnt/data/jordiclive/transformers_cache"
    export HF_HOME="/mnt/data/jordiclive/transformers_cache"
    export WANDB_API_KEY=""
    
  2. Run the script:

    deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 61500 finetune.py --output_dir saved_ckpts_32k --configs defaults lora-7b-llama2 --deepspeed
    

Running on Multiple Nodes

Example script using the slurm launcher with deepspeed: scripts/juwels_booster.sh

scaled-rope's People

Contributors

bloc97 avatar jordiclive avatar jquesnelle avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.