Code Monkey home page Code Monkey logo

lm-flow's Introduction

lm-flow

lm-flow is an experimental library for evalating and training ensembles of language models.

Using lm-flow

  • Create an ensemble definition in TypeScript. (see, for example src/samples/openai.ts)
  • Use lm-flow to create command-line tool.
import {main} from 'lm-flow';

const enemble = ...
main(ensemble, []);
  • Author test cases. (see, for example src/samples/openai.ts)
  • Run your tool to evaluate a subset of test cases.
% node.exe ./build/src/samples/openai.js eval -i data/cases2
  • COMING SOON: Generate model training data from a subset of test cases.

Building lm-flow

Instructions for setting up your environment and building lm-flow can be found here.

Running the Examples

lm-flow comes with an example, using an ensemble of a single OpenAI GPT 3.5 model. Here's the help message command:

% node build/src/samples/openai.js -h     
Usage: openai [options] [command]

Tool to train and evaluate multi-LLM systems.

Options:
  -h, --help        display help for command

Commands:
  eval [options]    Evaluate a multi-model system
  train [options]   Train a multi-model system
  format [options]  Format results
  clean [options]   remove all files from output folder
  help [command]    display help for command

The following environment variables can also be defined in .env:
  OPENAI_API_KEY - OpenAI API key
  OPENAI_ENDPOINT - OpenAI API endpoint (defaults to https://api.openai.com/v1/chat/completions)
  OPENAI_ORGANIZATION - OpenAI organization
  AZURE_OPENAI_API_KEY - Azure OpenAI api key
  AZURE_OPENAI_ENDPOINT - Azure OpenAI endpoint
  INPUT_FOLDER - Folder with test cases (defaults to ./data/cases)
  OUTPUT_FOLDER - Folder to write run logs (defaults to ./data/runs)

Before running this example, you must set the OPENAI_API_KEY environment variable or add it to the .env file.

% node build/src/samples/openai.js eval -i data/cases2
lm-flow tool run "eval" command on Fri Nov 03 2023 11:00:36 GMT-0700 (Pacific Daylight Time).
Configuration from "./.env":
Configuration:
  INPUT_FOLDER: data/cases2
  OUTPUT_FOLDER: ./data/runs
  FILTER: (no filter)
  CONCURRANCY: 1

Processed 1 test case
Saving run log to "./data/runs/cab52e78-bf01-46bf-9c21-0a1ae8ffb985.yaml".
Completed evaluation run.

No warnings.
No errors.

The run log is in ./data/runs/cab52e78-bf01-46bf-9c21-0a1ae8ffb985.yaml:

testRunId: cab52e78-bf01-46bf-9c21-0a1ae8ffb985
cmd: >-
  node.exe
  ./build/src/samples/openai.js eval -i data/cases2
cwd: /git/lm-flow
timestamp: 2023-11-03T18:00:36.832Z
user: mike
models:
  - type: mock
    name: model1
    config:
      exactMatch: false
      defaultResponse: I don't understand
      cache:
        - prompt: hello, world
          completion: '2'
        - prompt: hello
          completion: '1'
  - type: mock
    name: model2
    config:
      exactMatch: false
      defaultResponse: I don't understand
      cache:
        - prompt: '0'
          completion: goodbye
        - prompt: '1'
          completion: hello
        - prompt: '2'
          completion: hello hello
  - type: azure
    name: azure-3.5
    config:
      max_tokens: 3000
  - type: openai
    name: openai-3.5
    config:
      model: gpt-3.5
      max_tokens: 3000
  - type: openai
    name: openai-3.5-turbo-16k
    config:
      model: gpt-3.5-turbo-16k
      max_tokens: 3000
  - type: openai
    name: openai-4
    config:
      model: gpt-4
      max_tokens: 3000
cases:
  - testCaseId: one
    sha: 81c17cd8a076416a2c767dd2462c23b3aee7637c29205955180fb0b40780d292
    context:
      user: user1
      date: 2023-11-01T23:12:40.452Z
    log:
      type: model
      model: openai-3.5-turbo-16k
      name: openai
      input: Hello, world
      prompt:
        - role: system
          content: >-
            You are an assistant that counts the number of words in the user
            text prompt.

            Return only the number.
        - role: user
          content: Hello, world
      completion: '2'
      output: 2
      judgment: true
      expected: 2

lm-flow's People

Contributors

mikehopcroft avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.