sirji-ai / sirji Goto Github PK

View Code? Open in Web Editor NEW

69.0 9.0 19.0 1.02 MB

Sirji is a framework to build & run custom AI agents for your everyday dev tasks.

Home Page: https://sirji.ai

License: MIT License

Python 48.78% JavaScript 12.54% TypeScript 28.42% CSS 5.38% HTML 4.88%

ai ai-agents vscode-extension

sirji's Introduction

Sirji is a framework to build & run custom AI agents for your everyday dev tasks.

Built with ❤️ by True Sparrow

Sirji

Sirji is a framework designed to build and run custom AI agents for your everyday development tasks.

Sirji has two main product components: Sirji Studio and Sirji VS Code Extension.

Sirji Studio

We have built the framework that allows the developer community to build custom agents simply by writing pseudocode in plain English.

Custom agents help capture and convey the developer's code writing style and domain knowledge to Sirji. They performs a specific task based on a custom pseudocode. The community can create a custom agent either by modifying an existing agent or by writing an entirely new agent with a different pseudocode.

Here is a guide for writing agent pseudocode.

Sirji VS Code Extension

We have build and released the VS Code Extension on the Visual Studio Marketplace.

This extension has the following features implemented:

Interactive chat interface allows user to submit their problem statements and give feedback to Sirji.
Messaging protocol implements the allowed response templates for messages exchanged between various agents.
Orchestrator enables requirement gathering, recipe selection, recipe execution by invoking agents
Executor makes these functionalities accessible to the agents: file system access, search, find & replace, insert text in project files, install packages, execute commands, run code, run test cases.
Agent Sessions provide the ability to invoke an agent with a fresh LLM conversation or continue on an existing LLM conversation.
Logs and Token Usage Summary are displayed alongside the interactive chat interface.

Installation

You can start using Sirji by installing this extension from the Visual Studio Marketplace.

Make sure you have installed all of the following prerequisites on your machine:

Visual Studio Code (>= 1.80.2)
Node.js (>= 18) and npm (>= 8.19)
Python (>= 3.10) - Make sure python --version runs without error.
tee command - Make sure which tee runs without error.

Also, you will need an OpenAI API key to access the GPT-4o model.

Demo Video

Here's a three-minute demo. We tasked Sirji with creating a new API and its test cases in an existing Node.js repository. Sirji uses these custom agents we developed:

Code Planner: Generates an implementation guide with steps and code snippets based on domain knowledge.
Test Planner: Generates an implementation guide for test cases, following the existing framework and conventions.
Code Writer: Implements the code changes specified in the guides.

Watch on YouTube: https://www.youtube.com/watch?v=NA7uPIvcvmg

Note: Sirji is still rough on the edges. We are also working on better documentation (particularly for Sirji Studio). So we suggest if you want to try Sirji, please let us give you a walkthrough. You can book a call with us here.

Architecture

Let's go step by step in understanding the architecture.

Agent

An Agent in the Sirji framework is a modular AI component that performs a specific task based on a custom pseudo code.

An agent defines its skills in an agent.yml file. This file lists the skills of the agent and the pseudo code to follow for each skill. It also specifies the LLM provider and model to be used for inference. Additionally, it lists the sub-agents that can be invoked from within the agent to complete certain parts of the task.

Recipe (recipe.json)

A Recipe is a file that lists prescribed tasks and tips for solving a particular problem. It also indicates which agent should perform each task. The tips provide guidance for addressing issues that arise when tasks are performed out of the prescribed order. Each tip specifies the task and the agent responsible for it.

Orchestrator

The Orchestrator is the central component in the Sirji framework, responsible for following:

Gathering requirement from the user.
Showing list of available recipes to the user and asking for their choice.
Reading the selected recipe and managing the flow & execution of prescribed tasks from the selected recipe.

Agent Sessions

An agent can be invoked in either a fresh session or asked to continue an existing session. When invoked in a new session, it starts with a new system prompt and does not retain the context from the previous session. Sessions help keep the context focused on specific tasks.

Messaging Protocol

The messaging protocol defines how the response from an LLM inference for an agent should appear. It specifies the recipient-specific allowed Response Templates. These Response Templates also adhere to an interface that mandates the presence of keys: FROM, TO, BODY, SUMMARY, and ACTION. The BODY may contain an ACTION-specific information schema.

Project Folder

The Project Folder is the primary directory for accessing all user-specific project files, including code files, documentation, and other relevant resources. When initializing Sirji, the user selects this folder as the primary workspace for the project.

Agent Output Folder

The Agent Output Folder is designated for storing the results and data outputs generated by the agents. This folder is different from the project folder and this ensures that operational data is kept separate from project files.

Agent Output Index

The Agent Output Index is an index file for the Agent Output Folder that keeps track of all files written by agents in that folder along with the a brief description of the file contents.

PyPI Packages

We have published following 3 PyPI packages, implementing different responsibilities. These packages are invoked by Python Adapter Scripts, which are spawned by the extension.

sirji-agents

sirji-agents (located in the agents folder of this monorepo) is a PyPI package that implements the following components of the Sirji AI agentic framework:

Orchestrator: The Orchestrator is the central component in the Sirji framework, responsible for managing the flow and execution of tasks across different agents.
Generic Agent: Run time composable class providing the agent functionality as per the pseudo code provided in the agent.yml file.
Research Agent: Utilizes RAG (Retrieval-Augmented Generation) and gets trained on URLs and search terms.

sirji-messages

sirji-messages (located in the messages folder of this monorepo) is a PyPI package that implements the Sirji messaging protocol with the following highlights:

Message Factory: A factory that provides a Message class for a given action.
Message Parser: Parse structured message strings into Python dictionaries for easy access to the message components.
Allowed Response Templates: Provides the part of the system prompt describing allowed Response Templates for a given agent pair.
Custom Exceptions: A set of custom exceptions thrown by the message parser.
Enums for Agents and Actions: Provides easy auto-completion while writing code.

sirji-tools

sirji-tools (located in the tools folder of this monorepo) implements these tools:

Crawling (downloading web pages to markdown files)
Searching on Google
Custom Logging

Architecture Diagram

Roadmap

We are calling our next release the ‘Core’ Release (ONGOING).

Here is the link to the ‘Core’ release’s roadmap: https://github.com/orgs/sirji-ai/projects/5

This is a significant release focused on the following key areas:

User accounts: Users will be required to create an account with Sirji. They can either bring their own LLM key or subscribe to a free but rate-limited Developer plan.
Improve reliability: The first version of the VS Code extension improved usability, but after using it ourselves for a while, we identified several issues and limitations ranging from incomplete solutions to a lack of web debugging capabilities. We are addressing these issues to make Sirji more reliable in solving software problems.
Custom agents and recipes: We are developing the framework to enable users to create and use custom agents and recipes (instructions on how the agents interact). This involves enhancing the orchestration functionality and refactoring existing base agents.

Contributing

We welcome contributions to Sirji! If you're interested in helping improve this VS Code extension, please take a look at our Contributing Guidelines for more information on how to get started.

Thank you for considering contributing to Sirji. We look forward to your contributions!

Reporting Issues

If you run into any issues or have suggestions, please report them by following our issue reporting guidelines. Your reports help us make Sirji better for everyone.

Stay In Touch

Office Hours

License

Distributed under the MIT License. See LICENSE for more information.

sirji's People

Contributors

Stargazers

Watchers

Forkers

sunilkhedar kedarchandrayan docxology shinsudesu vital121 private-gpt v-r-dighe devparanjay gadharisachin dhruvtailor7 mohitcharkha aman-barbaria somsbm alpeshvmodi atharva101 ajinkyac03 ravikadam

sirji's Issues

Show the running total for tokens consumed and dollar cost

To get an idea of the dollar cost for each run, show the running total of tokens consumed and the running dollar cost.

Tasks

From sirji-agents, return the tokens consumed (prompt and completion) in the single message processing.
Persist the running total in Coder, Planner and Researcher conversations files.
Show total tokens consumed and dollar amount at their respective places on the chat window.

Refactor LLM Inference to Use Factory Pattern and Support Environment Variable Configurations

Description

We are looking to enhance the adaptability of our LLM inference mechanism in the LLMAgentBase within agents/sirji_agents/llm/base.py. This task involves refactoring the current implementation to incorporate the Factory Design Pattern, enabling dynamic selection of LLM providers (e.g., OpenAI) based on environment variables, SIRJI_MODEL_PROVIDER, SIRJI_MODEL, SIRJI_MODEL_PROVIDER_API_KEY.

Moreover the existing environment variable SIRJI_OPENAI_API_KEY will still be used in the Research Agent and should be retained.

This issue is a great chance for contributors interested in design patterns.

Requirements

Refactor the LLMAgentBase to use the Factory Design Pattern for LLM provider instantiation.
Use environment variables SIRJI_MODEL_PROVIDER (to specify the model provider), SIRJI_MODEL (to specify the model), and SIRJI_MODEL_PROVIDER_API_KEY (to specify the API key).
Ensure extensibility for incorporating future LLM providers.

What You Will Do

Create an LLM Provider Interface: Define an abstract base class (interface) for LLM providers. This should declare common methods all providers must implement, such as initializing with an API key (SIRJI_MODEL_PROVIDER_API_KEY) and retrieving model responses.
Implement Provider-Specific Classes: For each LLM provider (starting with OpenAI), implement a class that adheres to the LLM provider interface. It should encapsulate all logic specific to interacting with the given LLM's API.
Develop a Factory Class: Create a factory class responsible for instantiating LLM provider objects. This class should:
- Read the SIRJI_MODEL_PROVIDER and SIRJI_MODEL environment variables.
- Dynamically instantiate and return an instance of the correct LLM provider class based on these variables.
Integrate Factory into LLMAgentBase: Modify LLMAgentBase to use the factory for creating an LLM provider instance during its initialization, rather than directly creating an OpenAI client.
Documentation and Comments: Update or add comments in the code to reflect these changes. Document how one could add a new LLM provider following the implemented pattern in agents/README.md file.

Sirji's Extendable AI Agentic Framework Architecture

Sirji is an AI agentic framework designed as a VS Code extension to aid developers in solving a range of software development problems.

Let's go step by step in understanding the architecture.

Agent

An Agent in the Sirji framework is a modular AI component that performs a specific task based on a custom pseudo code.

Example agent.yml file

id: NODE_JS_API_CODER
name: Node JS API Coder

llm:
  provider: openai
  model: gpt-4o

sub_agents:
    - id: CODE_PLANNER
    - id: CODE_WRITER

skills:
  - skill: "Developing robust backend REST APIs using Node.js and Express, integrating Sequelize ORM with PostgreSQL databases, and implementing Redis for efficient caching solutions."
    pseudo_code: |
      1. INVOKE_AGENT CODE_PLANNER to analyze problem statements, identify relevant files, thoroughly review their contents, identify relevant dependencies, thoroughly review their contents too, and create a detailed implementation guide.
      2. INVOKE_AGENT CODE_WRITER to read the implementation guide, extract necessary file changes, apply these modifications to the project files.
      3. Respond back to the agent who invoked you, detailing your actions in the BODY.

Recipe (recipe.json)

Example Recipe File

{
  "prescribed_tasks": [
    {
      "task": "Write epics and user stories.",
      "agent": "PRODUCT_MANAGER"
    },
    {
      "task": "Write architecture components.",
      "agent": "ARCHITECT"
    },
    {
      "task": "Implement the epic & user stories using the architecture components.",
      "agent": "CODER"
    }
  ],
  "tips": [
    {
      "deviation": "Architecture components are not in sync with the epics and user stories.",
      "agent": "PRODUCT_MANAGER",
      "task": "Make the epics and user stories consistent with the architecture components."
    }
  ]
}

Recipe Index

The Recipe Index is a file that stores metadata about all the recipes in the Sirji framework. It contains details such as the description of each recipe and its file path.

[
  {
    "description": "Description of the recipe",
    "file": "Path to the recipe file"
  }
]

Agent Sessions

An Agent can be invoked in either a fresh session or asked to continue an existing session. When invoked in a new session, it starts with a new system prompt and does not retain the context from the previous session. Sessions help keep the context focused on specific tasks.

Orchestrator and User-Driven Configuration

The Orchestrator is the central component in the Sirji framework, responsible for managing the flow and execution of tasks across different agents. It coordinates the execution of tasks and ensures user-driven customization is integrated effectively.

Workflow

Problem Statement Analysis: The Orchestrator first asks the user for a problem statement. Depending on the requirements, such as a GitHub issue URL, it gathers necessary information by fetching issue content and cloning the repository. It then documents the detailed problem statement in the Shared Resources.
Recipe Selection and Customization: The Orchestrator selects the appropriate recipe for the session. If the user opts to reuse a previously loaded recipe and agents, it skips automatic recipe selection. Otherwise, it uses the Recipe Index and the problem statement to determine the most suitable recipe, storing the selected recipe.json file in the "active_recipe" folder. Additionally, it stores the agent.json files in the "active_recipe/agents" folder. This directory also contains a README file that explains various terminologies, provides descriptions of each file, and offers instructions on how to customize the recipes and agents, as well as guidance on publishing custom recipes and agents by making pull requests to the Sirji GitHub repository.
User Customization: Users can edit the files within the "active_recipe" folder to tailor the setup to their specific needs.
User Confirmation: Once the user finalizes their modifications in the "active_recipe" folder, they notify Sirji to proceed. Sirji then copies the whole "active_recipe" folder to the current session's folder to maintain a reference of the configurations used.
Configuration Preservation: The config.json file inside the "active_recipe" folder includes a retain_changes setting (defaults to false). If set to true, Sirji preserves these customizations and skips auto-selection in future sessions.
Task Execution and Management: The Orchestrator interprets the active recipe to identify and assign tasks to the appropriate agent. It also uses the "tips" in the recipe to handle deviations from the prescribed task order, ensuring smooth progression.

Project Folder

Agent Output Folder

Agent Output Index

The Agent Output Index is an index file for the Agent Output Folder that keeps track of all files written by agents in that folder along with the a brief description of the file contents.

Example Shared Resources index.json file

{
  "SIRJI/problem.txt": {
    "description": "Problem statement from the SIRJI_USER.",
    "created_by": "SIRJI"
  },
  "CODE_PLANNER/implementation_guide.md": {
    "description": "This file contains a detailed implementation guide for developing the ...",
    "created_by": "CODE_PLANNER"
  }
}

PyPI package for Crawler and search tools used by Sirji.

The problem to solve

To make the Sirji tools (crawler, search, and logger) reusable between different user interfaces like GUI, VS Code extension, this issue aims at publishing them as a package.

Tasks

Crawler tool to download as markdown
Search tool to fetch search results for a term
Logger tool to implement the custom logger
Test cases with threshold of 80% coverage
README with usages
Publish to PyPI
Search using the search page parse
Integrate with VS Code extension

Workspace Folder Initialization to Distinguish Between Empty and Non-empty Directories

Description

In the current version of Sirji, it is not aware of existing files in the workspace folder, unless specified in the problem statement. Due to this, it defaults to a greenfield development rather than building up on the existing code.

To better accommodate both scenarios (new projects and continuation of existing projects), Sirji should be enhanced to recognize whether a selected workspace folder is empty or not and respond appropriately:

Case 1: Empty Workspace Folder
- If the workspace folder is selected and found to be empty, treat it as the beginning of a new greenfield development project. This scenario is suitable for users wishing to start a fresh project.
Case 2: Non-empty Workspace Folder
- If the workspace folder is selected and found to have existing files and folders, prompt the user with a confirmation message. This message should inform them that they are about to work in a non-empty workspace. It is essential to alert the user to prevent unintentional modification of existing files.

Expected Behavior After Enhancement:

If the user confirms the dialogue about a non-empty workspace, Sirji should proceed with accessing the existing files, allowing the user to build upon them.
If the workspace is empty or the user confirms starting anew, the project will be treated as a greenfield development.

Tasks

Confirmation UI - TODO elaborate

Make Sirji compatible with macOS, Linux and Windows

Identify and remove OS hardcodings.
Sirji should identify it's OS and then work accordingly.

Simplified Package Installation with Pre-Check

Description

Presently Sirji installs packages at the machine level which can interfere with machine-level installed versions. We need to avoid this as much as possible.

Tasks

For programming language-specific packages that can be installed locally, Siji should proceed to install them directly without prompting for confirmation.
For other installations, that do not fit the above, Sirji should ask for confirmation.
Additionally, before initiating the installation process, the system will first verify if the package is already installed by checking its version.

Introduce Code Review Agent

Description

We've noticed several recurring issues with Sirji where it's not fully implementing the expected functionalities. Despite indicating that a feature has been fully implemented, we are observing significant gaps. The key problems include:

Frontend and Backend Integration: Often, Sirji skips the necessary integration between the frontend and backend parts of the projects. This leaves the features non-functional even though Sirji reports completion.
Frontend Implementation Issues: In some scenarios, the frontend components of a feature are not implemented at all.
Linking Between Files Is Missing: Sirji frequently neglects to link files correctly. This is crucial for the functionality as it requires files to be required in one another for the code to work as intended. For example, while developing a React App, it usually misses linking files from the main App.js file.
Assumed Changes: There have been instances where Sirji acts as though certain changes or additions have been made to the code without actually implementing them. This is misleading and results in incomplete features.

Proposed Solution

Introduce the Code Review Agent that will ensure that the code aligns with the expected outcomes.

The responsibilities of the Code Review Agent will include:

Reviewing the code against the user stories to ensure alignment with expected outcomes.
Find incomplete implementations, if any, in the generated code.
Providing feedback on the code review and writing it in shared resources.
Ensuring that adjustments are made based on the feedback before a task is marked as complete.
Publish its skills & messages and implement the agent interface.

Loading indicators next to sirji and the user

Show loading indicators next to sirji to show that sirji is working on something, and next to the user to show that the user input is needed

Tasks

Add loading indicator next to sirji chat message to indicate sirji is working on something
Add loading indicator next to user chat message in case of user input is needed

Server Process Management Issues

Description

We're encountering multiple issues with how Sirji handles server processes, particularly around starting, restarting, and logging.

Below are the specific problems:

Server Start-Up Failure: Occasionally, Sirji forgets to start the server as expected after the solution is complete.
Server Restart Issue: Code changes do not trigger a server restart, which is essential for real-time updates.
Log File Overwriting: The logs from running servers are not uniquely preserved; they risk being overwritten by subsequent processes.
Log File Handling: Currently, for the 'run server' command, Sirji sends only the log file path to LLM without the actual log content, hindering issue detection and resolution.

These issues significantly hinder our development workflow and require immediate attention for a fix.

Unclear Confirmation Requests from Sirji

Description

Users have reported difficulty in understanding whether Sirji requires confirmation from them.

Tasks

Sirji should pose explicit questions whenever clarification is needed. An ideal question format would be direct, such as "Should we continue?"
It is crucial that such questions are asked only when necessary.

Test cases for sirji-agents PyPI package

Description

Write test cases with minimum 80% coverage for sirji-agents PyPI package

Tasks

Mock API calls for OpenAI Completions API and OpenAI Assistants API
Test Cases

UI changes to the toolbar

Show default, active, and inactive states to tab links UI in the toolbar.

Tasks

Update tab links UI in the toolbar
Add default, active, and inactive states to tab links UI in the toolbar
Add default, active, and inactive states to icons in the toolbar
When log panel is empty show the message - This is where the logs from the researcher will be.

Files Created Outside Workspace

Description

There is an issue where Sirji occasionally attempts to create files and folders outside of its designated workspace. This behavior is concerning because it has the potential to interfere with the user's own files. A specific instance of this problem was observed when Sirji navigated to the home directory (~) and started creating files there.

Add validations in EXECUTOR for:
- When trying to write a file inside of the workspace and getting a path from LLM that is not inside the workspace folder tree, return error response.
- When trying to write a file inside of the shared resources and getting an absolute path from LLM, return error response.

Agents for Product Manager and Architect

Description

Product Manager Agent

Add a Product Manager agent that will convert the problem statement into a user story.
This user story with features will have to be confirmed by the user. Users can suggest changes. Revise the user story till it is finalized.

Architect Agent

The finalized user story will be used by the Architect user agent to prepare a list of architecture components including things like programming language (example: Python, Node.js), Framework (example: Express, Flask), Database (example: PostgreSQL, MySQL), Cache (example: Memcache, Redis), etc.
These architecture components will have to be confirmed by the user. Users can suggest changes. Revise the architecture components till they are finalized.
Pass the problem statement, finalized user story, and finalized architecture components to the Planning Agent to generate the steps.

Scroll to the bottom on switching Coder/Planner/Researcher tabs

Scroll to the bottom of the tab panel whenever we switch between tabs

Receiving Invalid Message Formats from LLM

Description

Currently, Sirji sometimes receives messages in invalid format from LLM. Even after retrying a couple of times, LLM has no other option as the possible actions are not sufficient for giving the required data points in the response.

These errors became more prominent after the removal of the INFORM action.

Integrate Architect Agent with Orchestration Agent

Description

In the current implementation, the Coding Agent internally managed tasks that will now be designated for the Architect Agent (in line with the Divide and Conquer strategy).

The Architect Agent will be responsible for:

Preparing a proposal for architectural components.
Sharing this proposal with the user to collect feedback.
Making necessary adjustments to the proposal based on the user's feedback.
Take confirmation from the user for any system-wide package installations.
Storing the finalized architectural components in shared resources for further reference.
Publish its skills & messages and implement the agent interface.
Add rules and definitions to the agent config.

Implementing Repository File Reading Capability

Objective

The aim is to introduce the capability for Sirji to read through the files of a GitHub repository efficiently. This functionality is crucial for two distinct use cases:

Case 1: Utilizing the GitHub repository as a library to leverage its functions for problem-solving.
Case 2: Making modifications directly to the repository itself.

Background and Rationale

Reading the repository files can be approached in multiple ways:

Utilizing the files as training data for the Retrieval-Augmented Generation (RAG) method.
Directly including the file contents in the context window.

While the RAG method allows for training with a vast expanse of knowledge, having the knowledge directly in the context often yields better results. Personally, I am inclined towards direct context inclusion due to its efficacy.

Proposed Implementation Strategy

Reflecting on how developers interact with repositories, significant insights are often drawn from the file names and their directory structure. This observation leads to a two-step approach for Sirji:

Initial Shortlisting: Sirji reads the relative paths of all files, leveraging the directory structure for an informed initial selection.
Incremental Deep Dive: Read the contents of the shortlisted files initially, expanding the scope to other files as required, in an incremental fashion.

Further Optimization

To optimize token consumption, a shift from a singular lengthy conversation thread to multiple, focused threads is proposed. Each thread could cater to specific tasks or inquiries, enhancing efficiency and relevance. For instance, when leveraging a repository primarily as a library, Sirji would engage with the thread related to library utilization exclusively, until the need is met.

Conclusion

This proposal aims to equip Sirji with a strategic, efficient mechanism for navigating and utilizing GitHub repositories, mirroring the intuitive process followed by developers. Feedback and further suggestions on this approach are highly welcome.

Test cases for VS Code Extension

Description

Write test cases for VS Code Extension

Tasks

Research on how test cases for VS code extension are written.
Write test cases

Update Settings Modal to Show Partial OpenAI API Key

Update settings modal in case the OpenAI API Key is already provided

Show already given OpenAI API Key then show it as *********1234
Update the save button language

Continuing Before Package Installations Complete

Description

Sirji does not pause its process for the completion of long-running package installations. This leads to subsequent steps erroneously assuming the necessary packages are installed, causing failures.

Split the screen into two columns to separate the chat terminal and the logs panel.

Chat terminal Panel
Steps and Logs Panel

Integrate PyPI packages with Dogfood GUI

The problem to solve

To test that everything which is needed for Sirji UI to work is available in the PyPI packages, we will integrate them with the Dogfood GUI version.

Tasks:

Integrate sirji-messages with GUI
Integrate sirji-agents with GUI
Persisting conversation history
Test end to end with different problem statements

Implementing Workspace File Reading Capability

Description

Introduce new action READ_DIR_STRUCTURE which will give file paths of all the files in a specified directory and all its sub-directories.
Introduce new action READ_FILES which will read content from an array of file paths.
Avoid reading content from certain file types (for example: media, bin) and large files.
Remove READ_FILE and use READ_FILES instead.

Inconsistent State Updates for Planner Steps

Description

The planner-generated steps within Sirji do not consistently update their status to 'running' or 'completed'. This inconsistency affects the overall task tracking.

Build VS Code Extension

The problem to solve

Make Sirji more developer friendly.

Proposed solution

Create a Visual Studio Code Extension that developers can use in their development environment, whether their local machine or a remote development environment.

Tasks:

Integrate Planning Agent with Orchestration Agent

In the current implementation of the Planning Agent, the necessary initial steps such as "Prerequisite Identification" and "Research" to know more about the problem statement, are ignored. For example, if tasked with fixing an issue from a GitHub repository based on just the URL, the first steps should involve cloning the repository and reviewing the issue description before even starting on creating epics & user stories, deciding the architecture, and generating the problem-solving steps.

To improve, the Planning Agent will now focus on:

Gathering requirements through user interaction to better understand the problem.
Identifying prerequisites needed for addressing the problem.
Conducting research in areas that aren't well understood.
Storing the finalized problem statement (improved using user feedback and research) in shared resources for further reference.
Publish its skills & messages and implement the agent interface.

Show selection of Model Provider and Model in Settings

Allow users to select the model provider (example: openai) and model (example: gpt-4-turbo) from the settings UI. There will be recommended values as defaults.

Introduce QA Agent

Description

Currently, Sirji cannot detect when a user encounters an error while opening a webpage or utilizing any website-related features it has implemented. This limitation hinders Sirji's autonomous debugging abilities, making it reliant on users to report errors by copying and pasting them as feedback manually. Enhancing Sirji with error detection for web page interactions would greatly improve its self-diagnostic and troubleshooting capabilities.

Proposed Solution

Upon completing all user stories, the QA Agent will undertake the following responsibilities:

Write or adjust test cases by consulting shared resources and the generated code.
Execute these test cases and write the test results in shared resources.
Publish its skills & messages and implement the agent interface.

Before sending SOULTION_COMPLETE message, Product Manager verifies the solution

TODO elaborate

Gpt4 turbo with vision support

Gpt4 turbo with vision is now generally available in API.
So the user should be able to upload a wireframe (even a handwritten one) and Sirji should be able to create a UI similar to it.

TODO: elaborate here.

PyPI package for Sirji messaging protocol implementation to create, validate and parse messages.

The problem to solve

To make the Sirji message protocol reusable between different user interfaces like GUI, VS Code extension, this issue aims at publishing the Sirji message protocol as a package.

Tasks

Support for new embedding & inference strategy

The problem to solve

To reduce the cost we should extend the embedding & inference strategy.

Proposed solution

Make the following changes in the researcher package:

Introduce a new embedding strategy and store them in a free-to-use vector database (ex: qdrant).
And for inference, add a new strategy using another LLM (ex: Anthropic Claude 3).

Additional context

Add the embedding support under EmbeddingsFactory.
Add the inference support under InfererFactory.

Continuous Running Process Management

Description

Special handling for continuously running processes (for example, Server process).

Tasks

Introduce new message action RUN_SERVER for commands which will start a continuously running process.
System prompt changes to enable messages with the new action.
vscode.Task is to be used for starting such processes.

Integrate Product Manager Agent with Orchestration Agent

Description

In the current implementation, the Coding Agent handles the responsibilities of the Product Manager Agent internally. We believe assigning these tasks to a specialized Product Manager Agent will improve reliability, in line with the "Divide and Conquer" strategy, as the prompt will now be focused on handling fewer responsibilities.

The Product Manager Agent will be responsible for:

Creating epics and user stories based on the finalized problem statement provided by the Planning Agent.
Sharing these epics and user stories with the user to gather feedback and make any necessary adjustments.
Storing the finalized epics and user stories in shared resources for further reference.
Publish its skills & messages and implement the agent interface.

Inability to Recognize Own Capability

Description

Currently, Sirji often displays the message "Outside of my capability" for tasks that are actually within its capabilities, leading to unnecessary interruptions in workflow. This issue is particularly notable in scenarios involving the installation of databases (e.g., MongoDB) and the subsequent verification of installation.

Proposed Solution

Add debugging capability to resolve these issues.
IMP - Error log for command execution on not available to LLM

Generate Detailed Multiple User Stories

Description

The Product Manager should develop detailed user stories that include screen layout details and specify which functionalities correlate to which files.
The interfaces should be designed to look modern and clean.

README design for the VS Code Release

The problem to solve

Write a README mentioning the architecture, necessary details for using the extension, etc.

Tasks

Containerize using Docker

The problem to solve

The current codebase requires the user to set up all required packages manually. When the package will grow, the number of dependencies will increase and the installation steps will become more time-consuming,

Proposed solution

We should add Docker support.

Step status is not getting updated

Step status should update every time Sirji starts on a step and completes it.

Handling Unexpected Python Errors in Sirji by Opening Chat Interface

Description

When Sirji encounters a Python error, it becomes unresponsive and locks users out of further interaction.

Tasks

Upon catching a Python error, the error message is to be shown to the user in the chat.
The chat message box should be enabled upon catching a Python error.

Capture command output while letting the user view and interact with it

The problem to solve

When Sirji receives a message to execute a command (like - Installing a library, cloning a GitHub repository, or executing a script), the message is forwarded to the Executor. The executor launches an instance of VS Code Terminal and forwards the command to execute.

An ideal solution: The executor should be able to read the stdin, stdout, and stderr (input & output) streams to debug and rectify any issues/errors while letting the user interact and view the stdout.

Proposed solution

Use tee in this release to solve this problem.

Example: <<command>> 2>&1 | tee <<output file>>

Alternative solutions

Use vscode.window.onDidWriteTerminalData: The API is not yet stable and is in the experimental state. For more details, please follow the discussion here.
Use only stderr stream: The idea is to write the standard error stream in a file and read it to fix errors, if any.

Example: <<command>> 2> <<output file>>

While testing this approach, we observed that the prompt of the input(prompt) python function is sent to the stderr stream. Here is an example for you to try:
- Create: user_input.py
```
num = input("Enter number: ")
print(number)
```
- Execute: user_input.py
```
> python user_input.py 2> error.txt
10
10
```
- Observe: The execution waits for the user input without showing the prompt message. Post user input (i.e. stdin), the number is printed on the terminal (i.e. stdout). But the "Enter number:" prompt is written in an error.txt file (i.e. stderr).

Request API Keys for Third-Party Integrations

Description:

For Sirji to effectively interact with third-party APIs, it's crucial that it asks users to input the necessary API keys. Users should be prompted to enter these keys into an environment variable file, enabling Sirji to securely access them from the generated code.

GUI Frontend for Sirji

Have been toying around building something similar.

Continuing the efforts, I have tweaked and built the frontend for Sirji - https://sirji-ai-fe-demo-duiq.vercel.app/

It is not fully functional yet, will wait for Sirji to be full developed before making it completely useable.

Do have a look, and let me know what do you think. Would love to contribute further.

Repo link: https://github.com/poornima-alugubelly/sirji-ai-fe-demo
Tech stack used:

Nextjs 14
Tailwind CSS
Zustand

Todo:
Integrate workspace:

logger
Browser
shell
planner

Prerequisites check on Sirji Activation

Make sure the user has installed all of the following prerequisites on the machine:

Python (>= 3.10) - Make sure python --version runs without error.
tee command - Make sure which tee runs without error.
OS-specific checks.

Employ Product Manager Agent and Architect Agent for Enhancement Requests

The Product Manager Agent and the Architect Agent should publish tips that the Orchestration Agent will use to invoke them upon receiving enhancement requests.

PyPI package for Research, Coding and Planning agents used by Sirji.

The problem to solve

To make the Sirji message handling via agents (planner, coder, and researcher) reusable between different user interfaces like GUI, VS Code extension, this issue aims at publishing them as a package.

Tasks

Researcher to work with the existing assistant if the assistant ID is passed to it
Researcher to be integrated with Crawler, Search and Logger tools from sirji-tools package
Researcher to implement the train and infer functionality
Researcher to handle different messages based on action
Coder to take in the history of messages as input and stop relying on history from memory
Base class for both Coder and Planner
Planner to take in the history of messages as input and stop relying on history from memory
README with usages
Publish to PyPI

Persist historical planning sessions

TODO elaborate

Showing logs from agents in sub-tabs

Description

Currently, the Sirji VS Code extension lacks the functionality to display logs from agents, such as the Coding Agent, Planning Agent, and Research Agent, which are crucial for understanding inter-agent interactions.

Tasks

Implement a feature in the Sirji VS Code extension to showcase logs generated by agents, including the Coding Agent, Planning Agent, and Research Agent.
Ensure the logs are presented in a structured manner, allowing users to easily comprehend the background interactions between agents.