Code Monkey home page Code Monkey logo

Comments (6)

lgrammel avatar lgrammel commented on June 27, 2024

Here is a quick summary:

  1. Actions are registered in the ActionRegistry.
  2. Descriptions of the actions are included in the OpenAI prompt by calling actionRegistry.getAvailableActions(), e.g. through the AvailableActionsSectionPrompt.
  3. actionRegistry.getAvailableActionInstructions() generates explanation and a detailed list of all actions using their examples and the formatter being used. For the JsonActionFormat, that prompt section looks e.g. like tihs:
## AVAILABLE ACTIONS
You can perform the following actions using JSON:

### tool.search-wikipedia
Search wikipedia using a search term. Returns a list of pages.
Syntax:
{
  "action": "tool.search-wikipedia",
  "query": "{search query}"
}

### tool.read-wikipedia-article
Read a wikipedia article and summarize it considering the query.
Syntax:
{
  "action": "tool.read-wikipedia-article",
  "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
  "topic": "{query that you are answering}"
}

### done
Indicate that you are done with the task.
Syntax:
{
  "action": "done"
}

## RESPONSE FORMAT (ALWAYS USE THIS FORMAT)

Explain and describe your reasoning step by step.
Then use the following format to specify the action you want to perform next:

{
  "action": "an action",
  "param1": "a parameter value",
  "param2": "another parameter value"
}

You must always use exactly one action with the correct syntax per response.
Each response must precisely follow the action syntax.
  1. The LLM response can include an action after the text. It is parsed using the ActionFormat parse method, e.g. in DynamicCompositeStep.generateNextStep()
  2. The action is then retrieved from the registry and an action step is created (also in DynamicCompositeStep).
  3. When the step is executed and it is a ToolStep (which is created by ToolActions), then its executor is invoked.
  4. The tool executor runs the actual code.

Here is more information on the different concepts:

Action
Actions are descriptions of operations that the LLM can decide to do. The LLM is informed about the available actions in the prompt, and if they are part of the response, they are parsed.

Step
The main operation of one iteration of the agent.

Tool
Tools run code on behalf of the agent. The LLM can decide to use tools by choosing a ToolAction in its response. ToolActions create ToolSteps, which run the ToolExecutor.

from js-agent.

dcsan avatar dcsan commented on June 27, 2024

wow thanks for the detailed reply! so this is how the system this side functions and handles calling the tools. this is code that I can probably follow.

But I'm still not clear how the decision to even include a tool is made, in terms of how the openAI model decides whether to include the tool's boilerplate in it's response?
eg when would it look up a wikipedia article, vs just replying based on what it knows?

Is this based on sending a pre-prompt/system prompt which is the tool description, and then just hoping the model chooses the tool on relevant occasions, and populates the tools boilerplate with exactly what it wants to lookup?

if this calling the tools code / processing the result is all done in your system, does that allow the openAI model to do anything with the tools output? Is it fed back to that agent somehow? eg

user: How many plays did Shakespeare write?
bot: (makes up a reply? or looks up in wikipedia?)

if it does use the tool to do a look up, how does it know the result?
how can it form a sentence based on the tools query results?
is that tool result passed in as another 'turn' of the conversation? (your 'step' above?)

from js-agent.

lgrammel avatar lgrammel commented on June 27, 2024

Yes, the LLM chooses the tool based on the prompt and provides the arguments as well. The results are then fed into the prompt for the next step (iteration).

from js-agent.

dcsan avatar dcsan commented on June 27, 2024

so the LLM response is smart enough to parse this

{
  "action": "an action",
  "param1": "a parameter value",
  "param2": "another parameter value"
}

and replace "an action" with eg "tool.read-wikipedia-article", and the same for parameters.
this is implemented / called by the client

How is the result of the action (eg reading a wikipedia page) embedded into the bots response?
of course we don't want to send the wiki page to the user as a reply.
So is there some parameter on actions that says "send the result of an action back to the LLM, rather than routing the output to the user" so the LLM could integrate that knowledge into the next step (iteration) response?
Or is that just the default, that executing a tool will take two conversation turns vs a single LLM ask:reply ?

from js-agent.

lgrammel avatar lgrammel commented on June 27, 2024

The agent runs a loop until it's done - it does not immediately respond to the user. The output that you are seeing in the console is just an example from observing the agent run. I plan to add examples of other environments as well.

You can find more details about loops here:

On each iteration of a loop with GenerateNextStepLoop, a prompt is sent to the LLM. Results from previous action steps are included using the resultFormatter associated with the action:

You can find the default result formatters in the actions (you can change them to your liking when you set up your agent), e.g.: https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/tool/programmable-google-search-engine/ProgrammableGoogleSearchEngineTool.ts#L26

from js-agent.

lgrammel avatar lgrammel commented on June 27, 2024

I've added some basic documentation: https://js-agent.ai/docs/intro

from js-agent.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.