Here is a quick summary: Actions are registered in the <a href

so the LLM response is smart enough to parse this <div class="highlight highlight-

I've added some basic documentation: <a href="https://js-agent.ai/docs/intro" rel="nof

how do actions work? about js-agent HOT 6 CLOSED

lgrammel commented on June 27, 2024

how do actions work?

from js-agent.

Comments (6)

lgrammel commented on June 27, 2024

Here is a quick summary:

Actions are registered in the ActionRegistry.
Descriptions of the actions are included in the OpenAI prompt by calling actionRegistry.getAvailableActions(), e.g. through the AvailableActionsSectionPrompt.
actionRegistry.getAvailableActionInstructions() generates explanation and a detailed list of all actions using their examples and the formatter being used. For the JsonActionFormat, that prompt section looks e.g. like tihs:

## AVAILABLE ACTIONS
You can perform the following actions using JSON:

### tool.search-wikipedia
Search wikipedia using a search term. Returns a list of pages.
Syntax:
{
  "action": "tool.search-wikipedia",
  "query": "{search query}"
}

### tool.read-wikipedia-article
Read a wikipedia article and summarize it considering the query.
Syntax:
{
  "action": "tool.read-wikipedia-article",
  "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
  "topic": "{query that you are answering}"
}

### done
Indicate that you are done with the task.
Syntax:
{
  "action": "done"
}

## RESPONSE FORMAT (ALWAYS USE THIS FORMAT)

Explain and describe your reasoning step by step.
Then use the following format to specify the action you want to perform next:

{
  "action": "an action",
  "param1": "a parameter value",
  "param2": "another parameter value"
}

You must always use exactly one action with the correct syntax per response.
Each response must precisely follow the action syntax.

The LLM response can include an action after the text. It is parsed using the ActionFormat parse method, e.g. in DynamicCompositeStep.generateNextStep()
The action is then retrieved from the registry and an action step is created (also in DynamicCompositeStep).
When the step is executed and it is a ToolStep (which is created by ToolActions), then its executor is invoked.
The tool executor runs the actual code.

Here is more information on the different concepts:

Action
Actions are descriptions of operations that the LLM can decide to do. The LLM is informed about the available actions in the prompt, and if they are part of the response, they are parsed.

Step
The main operation of one iteration of the agent.

Tool
Tools run code on behalf of the agent. The LLM can decide to use tools by choosing a ToolAction in its response. ToolActions create ToolSteps, which run the ToolExecutor.

from js-agent.

dcsan commented on June 27, 2024

wow thanks for the detailed reply! so this is how the system this side functions and handles calling the tools. this is code that I can probably follow.

But I'm still not clear how the decision to even include a tool is made, in terms of how the openAI model decides whether to include the tool's boilerplate in it's response?
eg when would it look up a wikipedia article, vs just replying based on what it knows?

Is this based on sending a pre-prompt/system prompt which is the tool description, and then just hoping the model chooses the tool on relevant occasions, and populates the tools boilerplate with exactly what it wants to lookup?

if this calling the tools code / processing the result is all done in your system, does that allow the openAI model to do anything with the tools output? Is it fed back to that agent somehow? eg

user: How many plays did Shakespeare write?
bot: (makes up a reply? or looks up in wikipedia?)

if it does use the tool to do a look up, how does it know the result?
how can it form a sentence based on the tools query results?
is that tool result passed in as another 'turn' of the conversation? (your 'step' above?)

from js-agent.

lgrammel commented on June 27, 2024

Yes, the LLM chooses the tool based on the prompt and provides the arguments as well. The results are then fed into the prompt for the next step (iteration).

from js-agent.

dcsan commented on June 27, 2024

so the LLM response is smart enough to parse this

{
  "action": "an action",
  "param1": "a parameter value",
  "param2": "another parameter value"
}

and replace "an action" with eg "tool.read-wikipedia-article", and the same for parameters.
this is implemented / called by the client

How is the result of the action (eg reading a wikipedia page) embedded into the bots response?
of course we don't want to send the wiki page to the user as a reply.
So is there some parameter on actions that says "send the result of an action back to the LLM, rather than routing the output to the user" so the LLM could integrate that knowledge into the next step (iteration) response?
Or is that just the default, that executing a tool will take two conversation turns vs a single LLM ask:reply ?

from js-agent.

lgrammel commented on June 27, 2024

The agent runs a loop until it's done - it does not immediately respond to the user. The output that you are seeing in the console is just an example from observing the agent run. I plan to add examples of other environments as well.

You can find more details about loops here:

https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/step/Loop.ts
https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/step/GenerateNextStepLoop.ts (this is the type of loop used by the wikipedia QA example)

On each iteration of a loop with GenerateNextStepLoop, a prompt is sent to the LLM. Results from previous action steps are included using the resultFormatter associated with the action:

https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/prompt/RecentStepsPrompt.ts#L48

You can find the default result formatters in the actions (you can change them to your liking when you set up your agent), e.g.: https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/tool/programmable-google-search-engine/ProgrammableGoogleSearchEngineTool.ts#L26

from js-agent.

lgrammel commented on June 27, 2024

I've added some basic documentation: https://js-agent.ai/docs/intro

from js-agent.

how do actions work? about js-agent HOT 6 CLOSED

Comments (6)

Related Issues (12)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent