Code Monkey home page Code Monkey logo

llm-evaluation-s-always-fatiguing / leaf-playground Goto Github PK

View Code? Open in Web Editor NEW
20.0 4.0 0.0 809 KB

A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.

License: MIT License

Python 99.92% Dockerfile 0.08%
llm-evaluation agent-based-simulation automation evaluations agent agents chatgpt

leaf-playground's People

Contributors

pandazki avatar panqiwei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

leaf-playground's Issues

Refinement Needed in the Location Definition of the action_exec_timeout Field

I have noticed that in the project, the action timeout duration is determined by the action_exec_time in SceneAgent. However, an agent might have multiple actions with differing timeout requirements. Under the current setup, this could lead to issues. To address this, I propose a solution: it would be more appropriate to place action_exec_time within ActionDefinition.

Below is my pull request regarding this solution. If you find it suitable, please consider merging it.

#40

[Roadmap] v1.0.0

Overview

This is a demonstration of the roadmap to v1.0.0.

The core objective of this project is to deliver a meticulously designed and sufficiently flexible framework, accompanied by a set of tools to assist developers in rapidly implementing simulation scenarios where multiple LLM agents can interact or compete to fulfill specific needs or tasks with minimal code. Simultaneously, the project incorporates pre-built, diverse simulation scenarios to enable developers to directly test the specific performance of their LLM agents within corresponding contexts, and to compare with other agents implemented by their own or the community.

By the time of the v1.0.0 release, this project will encompass the following features:

  • A highly abstract core framework with standardized protocols to creat scene projects.
  • A web service that is stable enough to concurrently running multiple scenario simulation tasks.
  • Develop scene projects as many as possible.
  • Implement popular LLM reasoning methods as many as possible.
  • Support popular LLM backends as many as possible.
  • Support popular prompting frameworks as many as possible.

Table of Contents

  1. Core Framework Implementation
  2. Web Service Implementation
  3. Scene Projects Development
  4. LLM Reasoning Methods Implementations
  5. LLM Backends Supporting
  6. Prompting Frameworks Supporting

Core framework implementatioin

Implement a meticulously designed, highly abstract core framework where defines all the elements necessary for creating a scene project, providing standardized protocols that accurately identify all scene projects' components in accordance with the specified configurations.

(todo list here)

Web service implementatioin

Implement a stable, high-concurrency web service that offers a range of APIs that facilitate seamless interaction with leaf-playground-webui. It will operate in a containerized manner, concurrently executing multiple scenario simulation tasks.

(todo list here)

Scene projects development

Develop a multitude of scene projects that combine entertainment value and application value to meet various evaluation needs of community users. The results of simulation tasks generated by each scene project should effectively quantify the specific application skills and general abilities of LLM agents.

(todo list here)

LLM reasoning methods implementations

See #9 for more details.

LLM Backends supporting

Support a selection of mainstream LLM backends, and define communication protocols when necessary.

(todo list here)

Prompting frameworks supporting

See #9 for more details.

[Feature] Support popular prompting frameworks and initial implementation of popular reasoning strategies

There is a recently published paper A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future.

In addition to supporting mainstream LLM backends, the project should also encompass support for some popular prompting frameworks such as langchain, semantic kernel, prompt flow, textai, LlamaIndex, etc. The fundamental requirement is to offer straightforward wrappers for these frameworks, ensuring developers can seamlessly integrate them into our project. Additionally, efforts should be made to combine and encapsulate commonly used functionalities from these frameworks, minimizing redundancy in developers' work.

Simultaneously, the project should extensively implement useful LLM reasoning strategies and encapsulate them into functional modules. Ideally, these strategies should be packaged into individual tools, enabling LLMs to autonomously select and apply the appropriate strategy as needed.

(todo list here)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.