Code Monkey home page Code Monkey logo

contextualize's Introduction

contextualize

contextualize is a package to quickly retrieve and format file contents for use with LLMs.

Installation

You can install the package using pip:

pip install contextualize

or pipx for using the CLI globally:

pipx install contextualize

Usage (reference.py)

Define FileReference objects for specified file paths and optional ranges.

  • set range to a tuple of line numbers to include only a portion of the file, e.g. range=(1, 10)
  • set format to "md" (default) or "xml" to wrap file contents in Markdown code blocks or <file> tags
  • set label to "relative" (default), "name", or "ext" to determine what label is affixed to the enclosing Markdown/XML string
    • "relative" will use the relative path from the current working directory
    • "name" will use the file name only
    • "ext" will use the file extension only

Retrieve wrapped contents from the output attribute.

CLI

A CLI (cli.py) is provided to print file contents to the console from the command line.

  • cat: Prepare and concatenate file references
    • paths: Positional arguments for target file(s) or directories
    • --ignore: File(s) to ignore (optional)
    • --format: Output format (md or xml, default is md)
    • --label: Label style (relative for relative file path, name for file name only, ext for file extension only; default is relative)
    • --output: Output target (console (default), clipboard)
    • --output-file: Output file path (optional, compatible with --output clipboard)
  • ls: List token counts
    • paths: Positional arguments for target file(s) or directories
    • --encoding: Encoding to use for tokenization, e.g., cl100k_base (default), p50k_base, r50k_base
    • --model: Model (e.g., gpt-3.5-turbo/gpt-4 (default), text-davinci-003, code-davinci-002) to determine which encoding to use for tokenization. Not used if encoding is provided.

Examples

  • cat:
    • contextualize cat README.md will print the wrapped contents of README.md to the console with default settings (Markdown format, relative path label).
    • contextualize cat README.md --format xml will print the wrapped contents of README.md to the console with XML format.
    • contextualize cat contextualize/ dev/ README.md --format xml will prepare file references for files in the contextualize/ and dev/ directories and README.md, and print each file's contents (wrapped in corresponding XML tags) to the console.
  • ls:
    • contextualize ls README.md will count and print the number of tokens in README.md using the default cl100k_base encoding.
    • contextualize ls contextualize/ --model text-davinci-003 will count and print the number of tokens in each file in the contextualize/ directory using the p50k_base encoding associated with the text-davinci-003 model, then print the total tokens for all processed files.

Related projects

contextualize's People

Contributors

jmpaz avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.