Comments (6)
Here is a quick summary:
- Actions are registered in the ActionRegistry.
- Descriptions of the actions are included in the OpenAI prompt by calling actionRegistry.getAvailableActions(), e.g. through the AvailableActionsSectionPrompt.
- actionRegistry.getAvailableActionInstructions() generates explanation and a detailed list of all actions using their examples and the formatter being used. For the JsonActionFormat, that prompt section looks e.g. like tihs:
## AVAILABLE ACTIONS
You can perform the following actions using JSON:
### tool.search-wikipedia
Search wikipedia using a search term. Returns a list of pages.
Syntax:
{
"action": "tool.search-wikipedia",
"query": "{search query}"
}
### tool.read-wikipedia-article
Read a wikipedia article and summarize it considering the query.
Syntax:
{
"action": "tool.read-wikipedia-article",
"url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
"topic": "{query that you are answering}"
}
### done
Indicate that you are done with the task.
Syntax:
{
"action": "done"
}
## RESPONSE FORMAT (ALWAYS USE THIS FORMAT)
Explain and describe your reasoning step by step.
Then use the following format to specify the action you want to perform next:
{
"action": "an action",
"param1": "a parameter value",
"param2": "another parameter value"
}
You must always use exactly one action with the correct syntax per response.
Each response must precisely follow the action syntax.
- The LLM response can include an action after the text. It is parsed using the ActionFormat parse method, e.g. in DynamicCompositeStep.generateNextStep()
- The action is then retrieved from the registry and an action step is created (also in DynamicCompositeStep).
- When the step is executed and it is a ToolStep (which is created by ToolActions), then its executor is invoked.
- The tool executor runs the actual code.
Here is more information on the different concepts:
Action
Actions are descriptions of operations that the LLM can decide to do. The LLM is informed about the available actions in the prompt, and if they are part of the response, they are parsed.
Step
The main operation of one iteration of the agent.
Tool
Tools run code on behalf of the agent. The LLM can decide to use tools by choosing a ToolAction in its response. ToolActions create ToolSteps, which run the ToolExecutor.
from js-agent.
wow thanks for the detailed reply! so this is how the system this side functions and handles calling the tools. this is code that I can probably follow.
But I'm still not clear how the decision to even include a tool is made, in terms of how the openAI model decides whether to include the tool's boilerplate in it's response?
eg when would it look up a wikipedia article, vs just replying based on what it knows?
Is this based on sending a pre-prompt/system prompt which is the tool description, and then just hoping the model chooses the tool on relevant occasions, and populates the tools boilerplate with exactly what it wants to lookup?
if this calling the tools code / processing the result is all done in your system, does that allow the openAI model to do anything with the tools output? Is it fed back to that agent somehow? eg
user: How many plays did Shakespeare write?
bot: (makes up a reply? or looks up in wikipedia?)
if it does use the tool to do a look up, how does it know the result?
how can it form a sentence based on the tools query results?
is that tool result passed in as another 'turn' of the conversation? (your 'step' above?)
from js-agent.
Yes, the LLM chooses the tool based on the prompt and provides the arguments as well. The results are then fed into the prompt for the next step (iteration).
from js-agent.
so the LLM response is smart enough to parse this
{
"action": "an action",
"param1": "a parameter value",
"param2": "another parameter value"
}
and replace "an action" with eg "tool.read-wikipedia-article",
and the same for parameters.
this is implemented / called by the client
How is the result of the action (eg reading a wikipedia page) embedded into the bots response?
of course we don't want to send the wiki page to the user as a reply.
So is there some parameter on actions that says "send the result of an action back to the LLM, rather than routing the output to the user" so the LLM could integrate that knowledge into the next step (iteration) response?
Or is that just the default, that executing a tool will take two conversation turns vs a single LLM ask:reply ?
from js-agent.
The agent runs a loop until it's done - it does not immediately respond to the user. The output that you are seeing in the console is just an example from observing the agent run. I plan to add examples of other environments as well.
You can find more details about loops here:
- https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/step/Loop.ts
- https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/step/GenerateNextStepLoop.ts (this is the type of loop used by the wikipedia QA example)
On each iteration of a loop with GenerateNextStepLoop, a prompt is sent to the LLM. Results from previous action steps are included using the resultFormatter associated with the action:
- https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/prompt/RecentStepsPrompt.ts#L48
You can find the default result formatters in the actions (you can change them to your liking when you set up your agent), e.g.: https://github.com/lgrammel/gptagent.js/blob/main/packages/agent/src/tool/programmable-google-search-engine/ProgrammableGoogleSearchEngineTool.ts#L26
from js-agent.
I've added some basic documentation: https://js-agent.ai/docs/intro
from js-agent.
Related Issues (12)
- Add actions test suite HOT 1
- Handle errors properly
- Architectural thoughts/discussion HOT 4
- Production example usecase: building presentations HOT 1
- ERR_PNPM_NO_IMPORTER_MANIFEST_FOUNDβ HOT 3
- Create a tree-shakeable build that supports commonjs and es modules
- javascript-developer where is hello ? HOT 1
- Agent get stuck in the loop at the end HOT 6
- Discussion: Being able to serialise steps and continue -> Can we use. agent to create it's own tools? HOT 1
- Running example javascript-developer HOT 1
- FEAT: javascript-developer: Support non arm build
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from js-agent.