emrgnt-cmplxty / automata Goto Github PK

View Code? Open in Web Editor NEW

582.0 582.0 95.0 198.94 MB

Automata: A self-coding agent

License: Apache License 2.0

Python 96.12% Shell 2.48% Dockerfile 1.40%

agi ai autonomous-agents autonomous-systems gpt gpt-4 llm openai

automata's People

Contributors

Stargazers

Watchers

Forkers

yacinemtb emergentagi digitalhallucinations snoopycn jjhw guilhermep apollohuang1 gravitronics zostaw exarchias datacte tot0 skyline-development-studio th37rose omar-moreno nolantrem sauravpanda dapper-magician azure-arc-0 vital121 lancechung8888 voynow alexeydep touristshaun peytontolbert ludoplex tomchapin sourcery-ai-bot etck wdshin sycomix chrisgantz ishaan-jaff bellyfat arungaonkar rotemh snowgrass jwstewart42 ashrayergso sundogs8603 kustomzone pjoi yifever ax2l tomitoivio nithesh2303 jaynivaan medsag9 clic-ethiopia jeromyjsmith bigrussthesolver stracerxx price-digital wemersiveadmin silasdao spankyed tomexmachina pepper-pots sunahako oldmangrizzz cryptoronto leonai-do cywf supremedreamz amernet thyguy markandrewdemarest huntemall sm-da bradmiles007 brainlabz almakedon f901107 nifty0x xcytxs lokeshjonnakuti gitrjaa edcet dudesmitherz lalomorales22 ccc0168 meetraj19 sourabhshah9881 arunkontham hienpro00123 manjero mahadih534 raphael2025 dorucioclea smuflhi snafi99 j182razor qinwentu muflhi01 mindziuzz

automata's Issues

Increase Test Coverage to 80%

Test coverage is a crucial aspect of maintaining a healthy, robust codebase. It allows us to ensure that our code is appropriately validated and can help in the early detection of bugs and issues. Currently, our test coverage stands at 64.52%, as per our latest Codecov report. Our aim is to raise this to a minimum of 80% to increase the reliability of our codebase and ensure that critical functionalities are adequately tested. Achieving this goal will require concerted effort and detailed understanding of our codebase and its various components.

Here are the steps you can follow to increase the test coverage:

Identify Uncovered Areas: Utilize Codecov or a similar tool to find out the test coverage line by line and file by file. Identify the areas of code that are not currently covered by our tests. This will give us a clear idea of where we need to focus our efforts.
Prioritize Critical Code: Prioritize the uncovered code sections based on their importance and potential impact. Critical functionalities that affect the overall performance of the system should be prioritized.
Write Comprehensive Tests: Write unit tests that cover the previously uncovered areas. Ensure the tests cover a range of scenarios and edge cases to make them as thorough as possible.
Review and Refactor Existing Tests: Review the existing tests and look for potential improvements. There might be overlapping tests that can be refactored or removed to increase efficiency.
Continuous Monitoring: Keep monitoring the test coverage reports after adding or updating the tests. Continuous monitoring will help us maintain or increase the coverage percentage.

Please note that while increasing test coverage is important, it's crucial to focus on the quality of tests rather than just the quantity. Higher test coverage with poorly written tests can give a false sense of security. The goal should be to write effective and efficient tests that truly validate the functionality of our code.

In addition, make sure to document your process and learnings, which will serve as a valuable resource for the team in the future. Don't hesitate to ask questions or seek help if needed. Your contributions to this project are greatly appreciated!

See the last test result here -

Codecov Report

❗ No coverage uploaded for pull request base (main@3a54c5c). Click here to learn what that means.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main      #51   +/-   ##
=======================================
  Coverage        ?   64.52%           
=======================================
  Files           ?       54           
  Lines           ?     3614           
  Branches        ?        0           
=======================================
  Hits            ?     2332           
  Misses          ?     1282           
  Partials        ?        0

Flag	Coverage Δ
unittests	`64.52% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Re-factor `tool_utils` to better follow best practices

Just went through thetool_utils module, and I think we can make some improvements to make it cleaner, and more flexible.

No More Hard-Coding: Our AgentToolFactory currently has hard-coded mappings between ToolkitType and Tool class, and between ToolkitType and its arguments. This isn't great if we add more tools or toolkit types. We should find a way to set it up so tool classes can register themselves with the factory.
Smarter Dependency Management: The DependencyFactory is works as expected, but I'm wondering if Dependency Injection could work better? It could help us write more modular, testable code.
Helpful Exceptions: Our custom exceptions could potentially do more work for us. Let's consider making ToolCreationError include the missing argument and the required type to help us debug faster.
Break It Down: tool_utils is doing a lot right now. Does applying the Single Responsibility Principle to split it into smaller, more focused modules make sense?

TLDR - Let's refactor tool_utils to improve it's quality.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Add support for Anthropic AI

Description

Currently, our project solely supports OpenAI as the language model and embedding provider. To enhance the functionality of our service, we propose the integration of Anthropic AI. However, this integration presents certain challenges due to our current infrastructure.

Previously, we utilized a custom action framework, which we've now replaced with OpenAI's function support. This decision was driven by the excellent fine-tuning capabilities of OpenAI models, providing function outputs with a high degree of reliability. To incorporate Anthropic AI, we need to encapsulate the function output logic within an abstract action handler. Following this, we must mirror agent provider and construction logic similar to that in place for OpenAI.

Proposed Solution

Our solution includes the creation of several abstract classes in Python to facilitate the integration of Anthropic AI. These classes are:

LLMCompletionResult: Defines different types of LLM completion results.
LLMChatMessage: Defines different types of LLM chat messages.
LLMConversation: An abstract base class for different types of LLM conversations.
LLMConversationDatabaseProvider: An abstract base class for different types of database providers.
LLMChatCompletionProvider: An abstract base class for different types of LLM chat completion providers.

You can find the exact definitions for these classes in the abstractions.py file in the repository.

In addition to the aforementioned classes, we will also need to adjust the logic around agents. We have an abstract Agent class and several related classes and enums that must be accounted for in our integration. These include:

AgentToolkitNames: An enum for the different types of agent tools.
AgentToolkitBuilder: An abstract class for building tools for providers.
AgentInstance: An abstract class for implementing an agent instance.

The exact definitions for these classes are available in the agents.py file in the repository.

Tasks

To summarize, the tasks that need to be completed are as follows:

Create an abstract action handler to encapsulate the logic around the function output of OpenAI models.
Mirror agent provider and construction logic for Anthropic AI, similar to what we currently have for OpenAI.
Modify the abstract classes mentioned above to accommodate the integration of Anthropic AI. This includes creating subclasses or adjusting methods where necessary.
Test the integration to ensure functionality and compatibility.

Estimated Effort

This is a large-scale task that involves substantial changes to the codebase. We estimate this will require a non-trivial amount of work, depending on the developer's familiarity with the codebase and Anthropic AI.

Requested Reviewers

Given the complexity of the task, we request review and feedback from @anthropic-integration-lead and @Project-Lead before, during, and after implementation to ensure all changes align with our project goals.

Completion Criteria

The integration will be considered successful when:

Anthropic AI can be used as an alternative to OpenAI as the language model and embedding provider.
The abstract classes and methods can accommodate the functionality of both OpenAI and Anthropic AI.
All tests pass, and the system works without errors.
Code review and feedback from requested reviewers have been addressed and integrated.

Notes

Given that this is a major change to the codebase, we suggest implementing this on a separate branch and using feature flags to control the availability of this new integration. This way, we can seamlessly roll back the changes in case of any unforeseen issues.

Introduce custom error handling across codebase

Title: Introduce custom errors handling across codebase

Issue:

Right now we use boilerplate ValueError and Exception throughout the codebase. While these standard exceptions can provide basic error handling functionality, they lack the specificity and flexibility needed for effective debugging and error resolution.

Problem:

The primary issue with this approach is that it can be hard to identify exactly where an error has occurred and what caused it, especially in a large codebase. This is because standard exceptions do not provide specific context about the error, its source, or the part of the codebase it affects.

Furthermore, there may be certain scenarios in our application where we'd like to handle exceptions differently based on the type of error that occurred. This is not easily achievable with the current, generic exception handling.

Proposed Solution:

In order to provide more specific and meaningful error messages, and to handle exceptions in a more flexible way, we propose introducing custom exception classes across the codebase.

These custom exceptions should inherit from the base Exception class or appropriate subclasses (such as ValueError or TypeError), and should be designed to encapsulate a particular type of error that could occur within our application.

Here's an example of what a custom exception class might look like:

class CustomDatabaseError(Exception):
    """Raised when a specific error occurs in the database module."""
    
    def __init__(self, message="A database error occurred"):
        self.message = message
        super().__init__(self.message)

In this example, the CustomDatabaseError would be raised whenever a specific database-related error occurs, making it easier to identify and handle this specific kind of error.

Tasks:

Review the codebase to identify areas where custom exceptions would be beneficial.
Design custom exception classes that encapsulate specific types of errors in our application.
Replace generic exceptions with custom exceptions throughout the codebase.
Test the system to ensure custom exceptions are working as expected and providing more meaningful error information.
Update documentation to reflect the introduction of custom exceptions.

Through these steps, we can create a more robust and developer-friendly error handling system that will make our codebase easier to maintain and debug.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Add issue title here

File: .gitignore
FIXME- The following should be removed by tests, but isn't

Productionize the `SymbolRank` Overview construction

An overview of the local repository can be created based on the output from SymbolRank. SymbolRank is necessary to avoid having to print every symbol in the codebase when providing an overview of the important bits of the codebase.

The resulting output looks something like this

An overview of the important python modules, and their internal functions and classes follows below.

cli.scripts.run_doc_embedding_l2.main
cli.scripts.run_doc_embedding_l3.main
cli.scripts.run_doc_post_process.main
config.agent_config_builder.AutomataAgentConfigBuilder
config.agent_config_builder.AutomataAgentConfigBuilder._validate_type
config.agent_config_builder.AutomataAgentConfigFactory
config.agent_config_builder.AutomataAgentConfigFactory.create_config
config.config_types.AgentConfigName
config.config_types.AutomataAgentConfig
config.config_types.AutomataAgentConfig.load
core.agent.action.Action
...

This output is produced by this following snippet -

    class TemplateFormatter:
        @staticmethod
        def create_default_formatter(
            config: "AutomataAgentConfig", max_default_overview_symbols: int = 100
        ) -> Dict[str, str]:
            """
            Create a default template formatter.

            Args:
                config (AutomataAgentConfig): The AutomataAgentConfig to use.

            Returns:
                Dict[str, str]: The default template formatter.

            Raises:
                NotImplementedError: If the config_name is not supported.

            TODO:
                - Consider how we might implement dependency injection across this call stack
                - Replace symbol_search with symbol_rank when it is implemented on DependencyFactory
            """
            formatter = {}
            if config.config_name == AgentConfigName.AUTOMATA_READER:
                from automata.core.agent.tools.tool_utils import DependencyFactory

                symbol_search = DependencyFactory().get("symbol_search")
                symbol_rank = symbol_search.symbol_rank
                ranks = symbol_rank.get_ranks()
                symbol_dotpaths = [
                    ".".join(symbol.dotpath.split(".")[1:])
                    for symbol, _ in ranks[:max_default_overview_symbols]
                ]
                formatter["symbol_rank_overview"] = "\n".join(sorted(symbol_dotpaths))

This is a rather unelegant approach to solving this problem. Some more thought should be put into where TemplateFormatter should live and how the logic to calculate the symbol_rank_overview should be implemented.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Add issue title here

File: .gitignore
FIXME- The following should be removed by tests, but isn't

Extend and format docstrings throughout the codebase

Title: Enhancing Docstring Quality through LLM Integration and Formatting

Issue Description:

Our project's documentation is a unique hybrid of human contributions and content generated by our Language Model (LLM). The structure comprises three layers:

L1: Inline docstrings authored by humans
L2: Docstrings generated by our LLM, which utilizes both source code and L1 docstrings, enriched with SymbolRank
L3: Docstrings created by the LLM using source code, L1 & L2 docstrings, and SymbolRank

L3 documentation aids coding by being available alongside the source code. This layer presents a potential opportunity for improvement - we could utilize it to enrich our L1 docstrings, thereby enhancing the initial human-created documentation.

The purpose of this task is to create a mechanism to "bubble up" information from L3 back to L1. This means that we will take the enhanced documentation of L3, refine it, and integrate it into the L1 docstrings. By doing so, we can provide coders with more comprehensive and insightful documentation to follow.

Furthermore, this integration allows for recursive improvement over iterations. If the LLM pipeline is run a second time, it would start with the enriched docstrings from the first run, thus progressively amplifying the quality and richness of the documentation.

Let's take the Tensorflow library as an example to visualize the level of comprehensive docstrings we aim to achieve:

def run_with_all_saved_model_formats(
    test_or_class=None,
    exclude_formats=None):
  """Execute the decorated test with all Keras saved model formats).

  This decorator is intended to be applied either to individual test methods in
  a `keras_parameterized.TestCase` class, or directly to a test class that
  extends it. Doing so will cause the contents of the individual test
  method (or all test methods in the class) to be executed multiple times - once
  for each Keras saved model format.

  The Keras saved model formats include:
  1. HDF5: 'h5'
  2. SavedModel: 'tf'

  Note: if stacking this decorator with absl.testing's parameterized decorators,
  those should be at the bottom of the stack.

  Various methods in `testing_utils` to get file path for saved models will
  auto-generate a string of the two saved model formats. This allows unittests
  to confirm the equivalence between the two Keras saved model formats.

  For example, consider the following unittest:

  \`\`\`python
  class MyTests(testing_utils.KerasTestCase):

    @testing_utils.run_with_all_saved_model_formats
    def test_foo(self):
      save_format = testing_utils.get_save_format()
      saved_model_dir = '/tmp/saved_model/'
      model = keras.models.Sequential()
      model.add(keras.layers.Dense(2, input_shape=(3,)))
      model.add(keras.layers.Dense(3))
      model.compile(loss='mse', optimizer='sgd', metrics=['acc'])

      keras.models.save_model(model, saved_model_dir, save_format=save_format)
      model = keras.models.load_model(saved_model_dir)

  if __name__ == "__main__":
    tf.test.main()
  \`\`\`
  ...

This task, therefore, is not just about enhancing the quality of our documentation but also about creating a novel and innovative approach to docstring creation and maintenance. By using a recursive model of improvement, we can leverage the strengths of both human input and machine learning models to deliver highly effective and continuously improving documentation. This blend of human intuition and machine efficiency has the potential to revolutionize the way we think about and generate documentation in the coding process.

Feel free to post any questions or concerns you have about this implementation. Your contribution to this project is highly appreciated!

Refactor `SymbolSimilarity` Class to Leverage the New `VectorDatabaseProvider`

Following the implementation of the calculate_similarity method in the VectorDatabaseProvider, it's necessary to refactor the SymbolSimilarity class, so it utilizes the new functionality. Currently, SymbolSimilarity calculates the similarity scores locally, but with the changes to the VectorDatabaseProvider here, this computation can now be moved to the database level.

In particular, the methods get_query_similarity_dict and get_nearest_entries_for_query need to be re-written to call the VectorDatabaseProvider.calculate_similarity method instead of computing similarity scores locally.

Here's one potential high-level plan of the refactoring needed:

Modify SymbolEmbeddingHandler class to expose calculate_similarity on it's local reference embedding_db
Refactor get_query_similarity_dict to call embedding_handler.calculate_similarity, passing the query embedding, and then post-process the result into the desired format (a dictionary mapping symbols to similarity scores).
Similarly, refactor get_nearest_entries_for_query

Please update the unit tests for the SymbolSimilarity class accordingly to ensure the refactored class works as expected.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Setup Docker After Poetry Migration

Problem:
Following the migration to using Poetry for dependency management, we want to ensure our application runs consistently across different environments. For this, we need to containerize our application with Docker.

Proposed Solution:
Let's create a Dockerfile for our Python project. This Dockerfile will describe the steps to create a Docker image of our application.

Steps:

Create a Dockerfile in the root directory of the project.
In the Dockerfile, start from a base Python image. The version should match the Python version you're using. For example, if you're using Python 3.9:
```
FROM python:3.9
```
Set the working directory in the Docker image to be /app.
```
WORKDIR /app
```
Copy your pyproject.toml and poetry.lock files into the Docker image.
```
COPY pyproject.toml poetry.lock ./
```

Install Poetry in the Docker image. We don't need the extras so we're using the --no-root option.

RUN pip install poetry \
&& poetry config virtualenvs.create false \
&& poetry install --no-interaction --no-ansi --no-root

Copy your code into the Docker image.
```
COPY . ./
```
Run the application. This could vary based on how you run your application. For example, if you run your application using a module:
```
CMD ["python", "-m", "your_application"]
```

Impact on CI/CD and scripts:
Dockerizing our application will require changes to our CI/CD pipeline to build and push the Docker image to a Docker registry. We also need to ensure our application can still connect to any external resources it needs (like a database) when running inside a Docker container.

Testing:
We need to build the Docker image, run a container from that image, and test our application to ensure it works as expected. Any external resources need to be available for testing, and we should also ensure that the container works as expected when restarted.

Please ensure to adjust the Dockerfile and the issue according to the specific needs of your project. The CMD instruction in the Dockerfile, for example, will depend on how your application is usually run.

Add issue title here

File: automata/tools/builders/py_writer.py
FIXME - Should try / catch be here or upstream in the agent?

Update Click to 8.1.6 upon release and remove ignore statements from CLI

Currently, mypy will skip type checking for the cli commands. This is due to a bug from the click library that should be updated in Click 8.1.6. Once released, all instances of # type: ignore should be removed from the CLI code.

Enhancement: Improve Modularity, Flexibility, and Testing Support in `Agent`, `Eval`, and `Task` Modules

Based on the description of the agent, evaluation, and task modules, the following strengths and weaknesses are identified:

Strengths:

Abstraction: All three modules have made good use of abstraction through interfaces and abstract base classes. This enables other developers to understand the key features of each class and how it should be used, without needing to know the details of how the features are implemented.
Encapsulation: The classes are designed with good encapsulation. Each class is responsible for its own state and behavior, and the internal workings are hidden from other classes. This makes the modules easy to understand, maintain, and modify.
Modularity and Single Responsibility: The module breakdown shows good modularity and each class follows the single responsibility principle. For example, the Task class is responsible for managing task details and status, TaskEnvironment manages the task execution environment, and AutomataTaskExecutor is responsible for task execution. This separation of responsibilities makes the code easier to understand, test, and maintain.
Extensibility: By using abstract classes and interfaces, the modules provide a foundation that can be extended to handle new requirements. For example, new types of tasks, agents, or evaluations can be added by extending or implementing the existing classes and interfaces.
Error Handling: The agent module has custom exceptions to handle different types of errors that may occur during the agent's execution. This allows for robust error handling and helps to make the system more reliable.
Documentation: The interfaces are well-documented, providing clear and detailed explanations of the classes and their methods. This makes the code easier to understand and use.

Weaknesses:

Cohesion: While the overall design shows high cohesion, the Agent class seems to have multiple responsibilities, including running the agent and setting up the database provider. This could make it more difficult to maintain.
Dependencies: There appears to be some coupling between the modules. For instance, the IAutomataTaskExecution class in the task module creates an OpenAIAutomataAgent. If the agent module changes, it could require changes in the task module.
Testing: The description doesn't mention any built-in testing features or support. Ideally, the design would include interfaces or abstract classes for mock objects, allowing unit tests to be written that can run independently of the actual implementations.
Flexibility: While there is some flexibility built into the design through the use of abstract base classes and interfaces, certain concrete classes like OpenAIAutomataAgentInstance and OpenAIAutomataConversationDatabase suggest a strong dependency on specific implementations. This could limit flexibility.

In terms of adherence to programming principles, the design largely follows the SOLID principles, a popular set of five design principles intended to make software designs more understandable, flexible, and maintainable. However, as mentioned, there could be improvements, particularly with regards to the Dependency Inversion Principle (DIP), which suggests that higher-level modules should not depend on lower-level modules. Both should depend on abstractions. This would further decouple the system and improve its maintainability.

Rework `AutomataInstructionPayload` + `AutomataAgentConfig` construction

The way we're constructing AutomataInstructionPayload right now is a bit convoluted and involves some configuration factory magic that makes the process opaque. We could definitely streamline and clarify this to make our codebase easier to work with. Here's what I'm thinking:

More Transparent Config Factory: The AutomataAgentConfigFactory is a bit tricky right now, especially when it comes to deciding between main_config_name and main_config. Let's make this process more transparent and intuitive. Can we find a way to make these decisions clearer or reduce the need for this kind of magic?
Consolidate Payload Construction: It seems like the AutomataAgentConfig and AutomataAgentConfigFactory classes are both involved in creating the AutomataInstructionPayload This can make it tough to follow how everything comes together. Let's explore ways to consolidate this process, like having a single point of creation for AutomataInstructionPayload.
Improve Overview Handling: Right now, we're calling _add_overview_to_instruction_payload after loading the config in AutomataAgentConfig.load. It'd be cleaner to integrate this into the AutomataInstructionPayload creation process so that we have all the necessary components right from the start.
Enhance Config Loading: AutomataAgentConfig.load and AutomataAgentConfig.load_automata_yaml_config have a lot of responsibility, from loading YAML files to building repository overviews. Could we simplify and make this more straightforward?
Clearer Exception Handling: The create_config method in AutomataAgentConfigFactory has several ValueError exceptions that could be more informative. Let's consider creating custom exceptions that provide more context about what's wrong.

I'm sure there are many ways we could tackle these issues, so I'm open to hearing your thoughts and ideas. Let's make our code cleaner and easier to understand.

Integrate non-symbol documentation

Our current SymbolDocEmbeddingHandler class is an effective method for embedding source code documentation into symbols. However, there are instances in which relevant and important documentation does not correspond directly to a specific symbol. For instance, the contents of a README.md file or any other similar documents. These non-symbol documents often contain valuable contextual information that could be of significant importance in certain coding instances.

Problem:

We currently lack a strategy to include these non-symbol documents in our embedding and retrieval process. Not incorporating such important pieces of information can result in a less context-aware and informative system, which can potentially affect the quality of our coding assistance.

Tentative Solution:

Please note that the following proposed solution is preliminary and subject to revision.

One possible solution is to introduce a new class, let's call it NonSymbolDocEmbeddingHandler. This class can be similar in structure to SymbolDocEmbeddingHandler but would be specifically designed to handle non-symbol documents.

We can treat each non-symbol document as an individual entity that has its own embeddings. We can store these embeddings in the database just like we do for symbols. The document's name (or path) can serve as its identifier.

Here's a rough blueprint for the NonSymbolDocEmbeddingHandler class:

class NonSymbolDocEmbeddingHandler:

    def __init__(self, embedding_db: VectorDatabaseProvider, embedding_provider: EmbeddingProvider) -> None:
        self.embedding_db = embedding_db
        self.embedding_provider = embedding_provider

    def get_embedding(self, doc_name: str) -> np.ndarray:
        return self.embedding_db.get(doc_name)

    def update_embedding(self, doc_name: str, doc_content: str) -> None:
        if self.embedding_db.contains(doc_name):
            self.embedding_db.discard(doc_name)

        doc_embedding = self.embedding_provider.build_embedding(doc_content)
        self.embedding_db.add(doc_name, doc_embedding)

By incorporating such a system, we would be able to create and retrieve embeddings for non-symbol documents, integrating them into our current workflow.

Points to Consider:

A mechanism is needed to detect and handle updates in the non-symbol documents, similar to how we handle symbol source code changes.
A strategy for identifying when to consider non-symbol documentation in the retrieval process. Perhaps certain queries or contexts can trigger the consideration of these documents.
Performance implications: The addition of non-symbol documents could significantly increase the amount of data stored in the database. This might require optimizations or changes to how we store and retrieve embeddings.

Tasks:

Design and implement NonSymbolDocEmbeddingHandler.
Modify database interface to handle non-symbol document embeddings.
Implement mechanism to update non-symbol document embeddings when their contents change.
Test the new system to ensure it works as expected and does not introduce performance issues.

Your feedback and suggestions are welcome to help refine this preliminary solution.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Consider use of Singleton patterns (e.g. `PyModuleLoader`, `DependencyFactory`, and `OpenAIAutomataAgentToolkitRegistry`).

Issue:

We have identified several key classes in our codebase (PyModuleLoader, DependencyFactory, OpenAIAutomataAgentToolkitRegistry) that have been designed as singletons. While singleton design pattern ensures a single instance of a class and global point access to that instance, it's also widely considered as an anti-pattern due to its global state, tight coupling, and hard-to-test nature.

A thorough investigation of the use of Singleton patterns across the codebase should be conducted to determine if they are indeed the best choice in each case and if not, alternatives should be proposed and implemented.

Implementation:

The first step is to locate all classes using Singleton patterns across the codebase, particularly focusing on PyModuleLoader, DependencyFactory, and OpenAIAutomataAgentToolkitRegistry. We need to analyze the specific usage of these classes, and evaluate if the Singleton pattern is the best fit in each case.

Potential considerations to take into account:

Single Responsibility Principle: Are these classes responsible for more than they should be, making them harder to manage?
Testability: Is the Singleton pattern making it hard to write unit tests for the functionality these classes provide?
Tight coupling: Are these classes too interdependent, making the system rigid and less maintainable?
Concurrent Usage: How is the Singleton pattern affecting the system's behavior in multi-threaded environments?

For classes where Singleton may not be the best fit, propose alternative designs. These could include refactoring towards the factory pattern, service objects, or dependency injection, among others.

Points to consider:

Refactoring Impact: Ensure to thoroughly analyze the impact of refactoring the singletons, considering both immediate and future needs of the system.
Backward Compatibility: Refactoring should not break the current functionality of the system. Adequate measures should be taken to maintain backward compatibility.
Performance: The refactoring should not negatively affect the system's performance. It is advisable to benchmark performance before and after refactoring.
Testing: Thorough testing should be performed to ensure the system's performance and reliability. New unit tests should be written for the new implementations and integration tests should be updated to reflect the changes.

Tasks:

Analyze each class's specific usage and evaluate if Singleton pattern is the best choice.
For classes where Singleton is not the best fit, propose alternative designs.
Implement the proposed changes.
Ensure the refactoring does not break the current functionality.
Benchmark performance before and after refactoring.
Write unit tests for new implementations and update existing integration tests.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Re-implement `Task` Workflow

The Task workflow was structured around a hierarchy of classes, starting with a base Task class, then a GitHubTask class inheriting from Task, and finally an AutomataTask class inheriting from GitHubTask.

The Task class was a generic task object, primarily characterized by attributes like task_id, priority, max_retries, status, and retry_count.

The GitHubTask class extended Task by including elements specifically designed for tasks that need to interact with a GitHub repository, like github_manager (now removed) and task_dir, and methods such as setup_task_env() and commit_task().

The AutomataTask class further extended GitHubTask to represent tasks to be executed by the AutomataAgent via the TaskExecutor. It introduced additional attributes like rel_py_path, agent, result, and error.

Finally, a TaskExecutor class was implemented for managing and executing tasks using different behaviors specified by the IExecuteBehavior interface.

Considerations for Re-implementation

When we re-implement the Task workflow, there are some crucial points we need to take from the previous implementation:

Class Hierarchy: The class hierarchy of Task, GitHubTask, and AutomataTask was a well-structured way of breaking down task complexities. We should retain this structure but ensure that each class and method is thoroughly documented and serves a clear purpose.
TaskExecutor: The TaskExecutor is a crucial component. It separates the execution logic from the task definition, providing flexibility in executing different types of tasks. When re-implementing, we need to consider whether the current design still serves our needs or if it requires adjustments.
Error Handling and Retries: The error handling and retrying mechanisms should be robust and clear. We need to ensure that the new implementation adequately handles task failures, with an appropriate retry policy and clear visibility into what caused the failure.
GitHub Interaction: The integration with GitHub was a unique feature. However, we need to ensure that the tasks are properly encapsulated, and interaction with GitHub should not leak into every aspect of the tasks. We might want to consider how best to isolate this functionality.

The logic for this can be seen here.

Requested Action

The goal of this issue is to reintroduce the Task workflow into our codebase, taking the positives from the previous implementation, addressing its weaknesses, and ensuring it suits our current and future needs. We encourage contributions to this effort and welcome discussion on the best ways to go about it.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Implement Experimental `ConversationMemoryEmbedding`

Issue:

In order to provide a more interactive and context-aware agent experience, we should integrate "MemoryEmbedding" functionality. The main idea behind this feature is to vectorize each conversation using our pre-existing embedding providers and store it into a relevant database. This would effectively create a "memory" for the system where past conversations are represented as vectors (embeddings), and can be utilized for future interactions.

If used correctly, these memories could help the system draw more context from previous interactions and bring the system closer to self-improvement.

Implementation:

The first step might be to create a new class MemoryEmbeddingHandler that inherits from SymbolEmbeddingHandler. MemoryEmbeddingHandler will use an instance of AutomataAgentDatabase and an EmbeddingProvider to vectorize a conversation and store it in the database. I say might be because it is a bit strange for the MemoryEmbeddingHandler to inherit from the SymbolEmbeddingHandler when these memories do not correspond to symbols. This seems to indicate that we want to re-think the inheritance structure in embedding_types.

The MemoryEmbeddingHandler class should at minimum contain the following methods:

get_embedding(self, session_id: str) -> Any: This method fetches a conversation from the database using the provided session_id, converts it into a string, and generates an embedding using the EmbeddingProvider. The method then returns this embedding.
update_embedding(self, session_id: str, new_message: OpenAIChatMessage) -> None: This method adds a new message to a conversation in the database and updates the corresponding conversation embedding.

We can use our existing embedding providers such as OpenAIEmbedding for this task. In addition to this, we may need to update our AutomataAgentDatabase class to support the storing and retrieval of conversation embeddings.

Points to consider:

Error Handling: Ensure to add appropriate error handling, especially when interacting with the database and when generating embeddings.
Performance: We should consider how the system will perform as the number of conversations grows. It might be useful to think about how to periodically clean up old conversations from the database if they are no longer needed, or consider using a more scalable storage solution if necessary.
Testing: Thorough testing should be performed to ensure the system's performance and reliability, especially since this is an experimental feature. We need to test both the MemoryEmbeddingHandler functionality and the changes made to the AutomataAgentDatabase.

Tasks:

Create a new class MemoryEmbeddingHandler
Implement get_embedding and update_embedding methods in MemoryEmbeddingHandler.
Update AutomataAgentDatabase to support conversation embeddings.
Implement unit tests for MemoryEmbeddingHandler.
Implement integration tests for MemoryEmbeddingHandler and AutomataAgentDatabase.

For further inspiration, we can investigate how auto-gpt is using memories here.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Add issue title here

File: tests/unit/context_retriever/test_context_handler.py
FIXME - This can't be a good pattern, let's cleanup later.

Compatibility with Other Codebases/Languages

Hi there,

The ideia behind this tool is great, it seems very promising. I'm wondering, though, if it's possible to use it with different codebases, written in different languages.

I generated an SCIP index from Supabase using scip-typescript. After moving index.scip to automata/config/symbol, I got some errors while trying to generate the code embeddings:

INFO:automata.cli.commands:Calling run_code_embedding
  0%|                                                                                                                                                                                    | 0/2166 [00:00<?, ?it/s]ERROR:automata.cli.scripts.run_code_embedding:Failed to update embedding for @types.environment.d.ts.global.NodeJS.ProcessEnv: Module descriptor @types not found
ERROR:automata.cli.scripts.run_code_embedding:Failed to update embedding for App.tsx.App: Module descriptor App.tsx not found
ERROR:automata.cli.scripts.run_code_embedding:Failed to update embedding for Providers.tsx.ThemeProviderProps: Module descriptor Providers.tsx not found
ERROR:automata.cli.scripts.run_code_embedding:Failed to update embedding for Providers.tsx.UseThemeProps: Module descriptor Providers.tsx not found
ERROR:automata.cli.scripts.run_code_embedding:Failed to update embedding for app.layout.tsx.RootLayout: Module descriptor app not found
ERROR:automata.cli.scripts.run_code_embedding:Failed to update embedding for app.page.tsx.PostList: Module descriptor app not found
ERROR:automata.cli.scripts.run_code_embedding:Failed to update embedding for components.Account.tsx.Account: Module descriptor components not found
ERROR:automata.cli.scripts.run_code_embedding:Failed to update embedding for components.Account.tsx.Account: Module descriptor components not found
ERROR:automata.cli.scripts.run_code_embedding:Failed to update embedding for components.Account.tsx.Profiles: Module descriptor components not found
ERROR:automata.cli.scripts.run_code_embedding:Failed to update embedding for components.AdminAccess.tabs.tsx.Props: Module descriptor components not found
ERROR:automata.cli.scripts.run_code_embedding:Failed to update embedding for components.AdminAccess.tabs.tsx.Tabs: Module descriptor components not found
ERROR:automata.cli.scripts.run_code_embedding:Failed to update embedding for components.Admonition.tsx.Props: Module descriptor components not found
ERROR:automata.cli.scripts.run_code_embedding:Failed to update embedding for components.Auth.tsx.Auth: Module descriptor components not found
...

Is this currently supported or planned to be supported in the future?

Generalize index and embedding pipeline to major open source repositories

Issue:

To advance our understanding of various open-source projects and increase our system's intelligence, we need to generate embeddings for key open source repositories like LangChain, LLamaIndex, etc. By representing these repositories as embeddings, we can leverage them for better code analysis, similarity checks, and code comprehension.

In our existing system, we create embeddings for our own codebase using an OpenAIEmbeddingProvider and a SymbolCodeEmbeddingHandler. We test extending this functionality to include the mentioned open-source repositories.

Implementation:

Primarily, the steps involve:

Set up the codebase of each open-source repository in a similar manner to our own.
Use the OpenAIEmbeddingProvider to generate embeddings for the symbols in the repositories.
The SymbolCodeEmbeddingHandler will process these embeddings and save them for future use.

We need to adapt our existing run_code_embedding script to handle these open source repositories, which would involve:

Initializing the necessary components like the py_module_loader.
Setting up the correct paths for the index-file and code-embedding-file for each repository.
Creating instances of OpenAIEmbeddingProvider and SymbolCodeEmbeddingHandler.
Processing each symbol in the repository and generating the respective embeddings.

Points to consider:

Repository Setup: Different repositories might have different setup requirements. Keep this in mind while setting up the repositories.
Scalability: As more repositories are added, the system should be able to handle the increase in data efficiently.
Automation: Consider automating the process of setting up new repositories and generating embeddings.
Error Handling & Testing: Proper error handling mechanisms should be in place to handle potential issues during the setup or embedding generation process. Additionally, perform thorough testing to ensure the system's performance and reliability.

Tasks:

Set up open source repositories.
Adapt the run_code_embedding script to handle the open-source repositories.
Create instances of OpenAIEmbeddingProvider and SymbolCodeEmbeddingHandler for each repository.
Generate embeddings for each repository.
Automate the process for easy addition of more repositories in the future.
Implement error handling and thorough testing.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Migrate Python Project Dependency Management to Poetry

Title: Migrate Python Project Dependency Management to Poetry

Problem:
Our project is currently using requirements.txt for dependency management. However, this method requires separate steps for packaging, and could potentially lead to dependency conflicts or issues.

Proposed Solution:
We should migrate our dependency management to Poetry. This tool integrates dependency management and packaging into a single framework, which will streamline our development process and help prevent any future dependency issues.

Steps:

Install Poetry globally in the environment:

curl -sSL https://install.python-poetry.org | python -

In the project directory, initialize a new Poetry project:
```
cd your_project_dir
poetry init
```
Add dependencies from the existing requirements.txt file to Poetry:
```
poetry add $(cat requirements.txt | tr "\n" " ")
```
After verifying all dependencies are correctly added to Poetry, remove the requirements.txt file:
```
rm requirements.txt
```
Install the dependencies in a new virtual environment:
```
poetry install
```

Impact on CI/CD and scripts:
Before removing the requirements.txt file, we need to ensure that this won't affect our current CI pipeline or any scripts that use this file. We need to replace any usage of pip install -r requirements.txt with poetry install, which will create a virtual environment and install the dependencies there. We should check all our CI/CD scripts and make necessary changes.

Testing:
After migrating to Poetry, we need to thoroughly test our application and CI/CD pipeline to ensure everything is working correctly.

This issue aims to provide a step-by-step guide for the migration process while keeping potential issues and their solutions in mind. Feel free to modify it to suit your specific project requirements.

Re-implement `Task` WebApp

As we are in the process of re-implementing our Task management system, we should also focus on bringing back the Task WebApp that provided a friendly interface for users to interact with our task server. The WebApp was previously designed to work seamlessly with the Task server, providing an easy way to create, manage, and view tasks.

Key features of the previous implementation were:

APITask Class: A class to encapsulate the details of a task, including the task ID, model, main configuration, instruction configuration, helper agent names, toolkits, instructions, status, and error. This class also had methods to create an APITask object from the server's response data and to strip the task ID from the object.
Submit New Task Function: This function used to create a new APITask object from the agent store's state and send a POST request to the /task/initialize endpoint to initialize a new task.
Get Task by ID Function: This function used to send a GET request to the /task/<task_id> endpoint to fetch the details of a task by its ID.
Get Task Logs by ID Function: This function used to send a GET request to the /task/logs/<task_id> endpoint to fetch the logs of a task by its ID.
Get Task Summaries Function: This function used to send a GET request to the /task_summaries endpoint to fetch the summaries of all tasks.

We need to re-implement the WebApp with these functionalities, taking into account any changes made to the server's API endpoints. In addition to the re-implementation, we should focus on enhancing the WebApp with a better user experience, improved error handling, and additional features.

Here are some points to consider for the re-implementation and enhancement:

User Experience: The WebApp should provide a friendly and intuitive user interface, guiding the users through the process of creating, managing, and viewing tasks.
Error Handling: The WebApp should handle any errors that may occur during the interaction with the server and provide clear and helpful error messages to the users.
Code Quality: The code should be well-structured, easy to understand, and maintainable. It should follow best practices and coding standards for JavaScript and React (if we're using React for the front-end).
Additional Features: Consider adding more features to the WebApp, such as a way to edit or delete tasks, filter or sort the list of tasks, view detailed task logs, etc.

Requested Action

The image below is screenshot showing a limited view into the original demo.

As always, don't hesitate to ask if you have any questions or need further clarification. I can provide the old front end code as a starting point. Your contributions to this project are highly valued!

Enhancement of `Database` architecture across the application

Sure, here's an example of how this task could look like in a GitHub issue:

Summary

This issue is about enhancing our current system architecture and database design. We need to consider a few factors for improvement including the decoupling of databases, transaction handling, database schema changes, task & evaluation workflow integration, and using dependency injection for databases.

Description

We currently use three databases:

AutomataAgentTaskDatabase - tracks all tasks.
OpenAIAutomataConversationDatabase - stores all interactions for a session.
EvalResultWriter - records evaluation results.

These databases are connected using a common session_id.

Tasks

Decouple Databases: Investigate if the databases need to be decoupled into separate services to avoid potential bottlenecks or single points of failure and to allow each service to scale independently.
Transactions and Consistency: Implement mechanisms for handling transactions across databases and consider the level of consistency required between them.
Database Schema Changes: Plan for potential database schema changes that could include adding new fields, changing data types, or restructuring databases.
Task & Evaluation Workflow Integration: Ensure smooth integration of the task-oriented workflow with the evaluation harness. Consider using event-driven architecture or a job queue.
Dependency Injection for Databases: Make code more flexible and testable by using dependency injection for databases, as noted in the OpenAIAutomataConversationDatabase class.
Unit Testing and Mocking: Continue focusing on unit tests for individual components and use mocking to isolate dependencies. Ensure coverage of edge cases and expected workflows.
Data Backup and Recovery: Define a strategy for backing up data and recovering from data loss or corruption. Consider replication if the data is critical.

Acceptance Criteria

The system is enhanced based on the tasks defined above.
All changes have comprehensive unit test coverage.
The application functions as expected and all existing functionality is unaffected by these changes.

Remember to assign the issue to the relevant team member(s) and add appropriate labels, in this case "Enhancement". You might also want to link to any relevant milestones or projects.

Refine the available `AutomataAgents`

Right now the production agent pipeline offers the three following agents

automata-main - The main agent which carries out high-level planning and delegates tasks to sub-agents
automata-writer - A helper agent responsible for writing code into output files. Future responsibilities could be testing and iterating on said code.
automata-reader - A helper agent responsible which parses natural language queries to retrieve relevant code context from the local repository.

When provided with the right tools, we predict that three agents can reach proficiency in basic programming tasks. The most importing supporting tools for this objective are the ContextOracle and the PyWriter.

The goal of this issue is to refine our approach to get us closer to this objective. These agents need to be tested and refreshed with the latest tooling to bring us to this objective. Thus, the task is to identify some key tasks which can be ran to assess the agent performance, and to iterate until we believe we are closer to optimal performance. Over time a task such as this should become more quantitative, but for now we will adopt an ad-hoc approach in order to bolster our demo tape.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Evals for Symbol Retrieval

Based on automata/tests/eval/test_eval_code_writing.py
As part of the evaluation framework, there's a need for a new feature for evaluating code retrieval given a specific query. The goal is to assess the ability of the system to fetch the most relevant symbol in response to a given query.

Requirements:

Input Query: A string that represents the user's question or instruction, similar to the existing instructions input.
Expected Symbol: This will be the expected code that should ideally be retrieved in response to the input query. This serves a similar purpose to expected_actions.
Quantification of Closeness: A mechanism to quantify how close the retrieved code is to the expected symbol. The exact methodology for this quantification is still to be determined. Potential strategies could include:

expected symbol == Top retrieved symbol
expected symbol in Top 5 retrieved symbols

Data Source: The code should be retrieved from an existing .scip file (or similar sources if deemed necessary).

Proposed Steps:

Define the new input structure that will include the query and the expected symbol.
Develop an algorithm or utilize existing libraries to compare the retrieved code with the expected symbol.
Integrate this evaluation into the existing framework ensuring compatibility with current components.
Test the new feature with representative examples to ensure its functionality and accuracy.

Potential Challenges:

Defining a robust method for quantifying the quality of retrieval
Handling ambiguities or multiple valid answers to a single query

Next Steps:

Make evals more flexible to handle extensions similar to the extension mentioned above
Add eval for quantifying vanilla code retrieval (retrieval augmented generation, RAG) for SymbolRank benchmarking

This issue serves as a baseline for discussions and iterations. Feedback, clarifications, and suggestions are highly encouraged to refine the requirements and implementation details.

Add issue title here

File: automata/symbol_embedding/builders.py
FIXME - This class is still in an experimental state

Consider removing "overview" from the Retriever agent

We currently use py_utils.build_repository_overview to generate an overview of the available modules, classes, and methods in the local codebase.

This consumes a large number of tokens and becomes increasingly intractable as our codebase grows.

We should move to retrieving and parsing the top 50 or 100 symbols with SymbolRank, instead of returning a complete overview.

This will give the import high-level details of the repository without consuming unnecessary context.

Add support for `Chroma`

We currently have an abstract base claseVectorDatabaseProvider which defines the interfaces and behaviors expected of any vector database provider. This includes functions such as save, load, add, update_database, clear, get_ordered_embeddings, contains, discard, get and entry_to_key.

We also have implemented a concrete class JSONVectorDatabase that provides the actual implementation of these interfaces for JSON file-based storage.

Now, we want to add support for a new provider called Chroma.

To do this, we need a new concrete class that extends the VectorDatabaseProvider abstract base class and provides the implementation specific to the Chroma vector database.

Below is an example of how you can implement this. The implementation is simplified and might need adjustments to fit our use case.

Please note that I have assumed T to be a tuple of a document and its metadata and K to be the id of the document.

from typing import Tuple

class ChromaVectorDatabase(VectorDatabaseProvider[Tuple[str, Dict[str, Any]], str]):
    """Concrete class to provide a vector database that uses Chroma."""

    def __init__(self, collection_name: str):
        self.client = chromadb.Client()
        self.collection = self.client.get_or_create_collection(collection_name)

    def save(self):
        # Chroma handles persistence automatically.
        pass

    def load(self):
        # In Chroma, the data loading happens automatically when creating or getting the collection.
        pass

    def add(self, entry: Tuple[str, Dict[str, Any]]):
        document, metadata = entry
        self.collection.add(
            documents=[document], 
            metadatas=[metadata], 
            ids=[self.entry_to_key(entry)]
        )

    def update_database(self, entry: Tuple[str, Dict[str, Any]]):
        # Chroma might not support updating entries. You may need to delete and re-add.
        self.discard(self.entry_to_key(entry))
        self.add(entry)

    def discard(self, key: str):
        # Assuming Chroma has a delete method
        self.collection.delete(ids=[key])

    def contains(self, key: str) -> bool:
        # Assuming Chroma has a get method which raises an error if the id does not exist
        try:
            self.collection.get(ids=[key])
            return True
        except:
            return False

    def get(self, key: str) -> Tuple[str, Dict[str, Any]]:
        # Assuming Chroma has a get method which returns the document and its metadata
        return self.collection.get(ids=[key])

    def clear(self):
        # Assuming Chroma has a method to delete all entries in a collection
        self.collection.delete_all()

    def get_ordered_embeddings(self) -> List[Tuple[str, Dict[str, Any]]]:
        # Chroma might not support retrieving all entries ordered by their similarity to a given vector. 
        # You will need to adjust this method based on your specific use case and Chroma's capabilities.
        pass

    def entry_to_key(self, entry: Tuple[str, Dict[str, Any]]) -> str:
        # The key is assumed to be the first sentence of the document
        return entry[0].split(".")[0]

I'd recommend reading the documentation or source code of Chroma to find out how to implement the exact behaviors required by your abstract base class. For instance, the get_ordered_embeddings method might not be possible to implement with Chroma's current API. Similarly, it is unclear from the provided information whether Chroma supports updating entries or only adding and deleting them. To fully implement Chroma we will need to work through all of these details and more.

Add issue title here

File: automata/symbol_embedding/vector_databases.py
FIXME - Consider how to properly handle typing here.

Rename `py_retriever` tool and associated classes to `py_reader`

This convention will make the codebase more consistent, as there is a py_writer Tool and a corresponding PyCodeWriter class. We should mirror this with code reading.

Untracked setup.sh file can lead to inadvertent commits

Describe the bug
The current setup instructions involve creating and modifying a setup.sh file based on an example file. However, the setup.sh file is not included in the .gitignore, which means it appears as an untracked file in the Git status. This could potentially lead to the inadvertent addition and commit of this file, which could be problematic as it contains sensitive information such as API keys.

I had posted a short message about this in the project's Discord channel, but it appears to have gone unnoticed, likely due to the relatively low traffic in the channel currently. I opted not to send a direct message regarding the issue, as I prefer to address it through the proper channels.

To Reproduce
Steps to reproduce the behavior:

Follow the current setup instructions which include creating a setup.sh file.
Run git status.

Expected behavior
The setup.sh file should not appear in the list of untracked files.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Windows 11
Github Desktop / Git on CMD

Additional context
Actual behavior:
The setup.sh file appears in the list of untracked files.

Proposed Solution:
I would be more than willing to tackle this issue myself. I propose to add setup.sh to the .gitignore file to prevent it from being tracked by Git. This would also give me the chance to practice contributing to the project.

Containerize Project with Docker

Description:

This issue outlines the proposed steps for containerizing the Automata Project using Docker. This containerization would allow for increased portability and a simplified deployment process.

Requirements:

Docker
Docker-compose (if we plan to use multi-container setup)

Steps:

Create a Dockerfile: This file is the blueprint for building the Docker image for the Automata project. It should contain all the necessary steps to set up the environment and the project itself.

Sample Dockerfile:

# Use an official Python runtime as a parent image
FROM python:3.9-slim

# Set the working directory in the container to /app
WORKDIR /app

# Add the current directory contents into the container at /app
ADD . /app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Make port 80 available to the world outside this container
EXPOSE 80

# Define environment variable
ENV NAME Automata

# Run app.py when the container launches
CMD ["python", "app.py"]

Build the Docker image: The Docker image can be built using the Dockerfile created in the previous step.

Sample command:

docker build -t automata .

Create a docker-compose file: If the Automata project has dependencies on other services such as databases, we might need a docker-compose file to manage multiple containers.

Sample docker-compose.yml:

version: '3'
services:
  web:
    build: .
    ports:
     - "5000:5000"
  redis:
    image: "redis:alpine"

Push the Docker image to Docker Hub or any other Docker registry: This step will make the Docker image accessible to anyone who wants to run the Automata project.

Sample command:

docker push automata

Update the README.md file: The README.md file should be updated to include instructions on how to run the Docker image.
Implement CI/CD pipeline (Optional): If we want to automate the build and deployment process, we can use GitHub Actions, Jenkins, or any other CI/CD tool to build the Docker image and push it to Docker Hub whenever changes are pushed to the main branch.

~~Install Docker and Docker-compose on the development machine.~~ Completed by @Exarchias in #286
~~Write a Dockerfile for the Automata project.~~ Completed by @Exarchias in #286
~~Test the Dockerfile.~~ Completed by @Exarchias in #286
~~If necessary, write a Docker-compose file.~~
Update the README.md file with instructions on how to use the Docker image.
~Set up a CI/CD pipeline to automate the Docker build and push process when changes are made to the project. ~ Completed in #329

Modify embedding scripts to do batch writes

Writing to the Chroma db in single instances appears quite slow. We need to do further testing to confirm, but in general it is good practice to write to dbs in batches. We should extend run_doc_embedding and run_code_embedding to default to this behavior.

Migration to OpenAI Function API from Current Markdown Parsing Method

The existing system processes Actions using a markdown-based method as managed by the AutomataActionExtractor. We initially chose markdown because the OpenAI Language Model (LLM) has a good command over it. However, OpenAI has recently launched a feature that enables API users to describe functions to the models, and the models can generate structured JSON responses based on those function descriptions. This Function API could potentially simplify and streamline our workflow, and I am proposing a migration to this new model.

Context:

The new Function API by OpenAI allows models to comprehend function descriptions and return JSON formatted structured data. This improvement can boost our application's reliability in retrieving structured data and converting natural language to API calls.

The general steps for using the Function API are:

Define a set of functions and call the model with the user query.
Parse the JSON response from the model and call the corresponding function in our code with the provided arguments.
Call the model again with the function response, allowing the model to summarize the results back to the user.
You can read more about this in OpenAI's documentation on function calling here.

Proposed Changes:

To leverage the Function API, we need to revise our AutomataActionExtractor. Instead of parsing markdown, we will process JSON responses representing function calls. The actions will align with the function description and responses from the new Function API.

This migration would potentially lead to a more maintainable codebase and improve our API call performance and reliability.

Potential Downsides:

One key downside is that this transition would couple our system more closely with OpenAI's API, making our code more sensitive to potential future changes in the API. However, considering the benefits and the robustness of OpenAI's platform, this risk might be worth taking.

Next Steps:

Let's discuss this proposal further and begin planning the transition if we agree on this path. We will need to identify any necessary changes in our codebase to accommodate this new approach.

Request for Feedback:

Please share your thoughts, ideas, and feedback on this proposed migration. This is a significant shift, but the potential benefits might outweigh the risks. Your input will be invaluable in this decision-making process.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Implement Experimental `RepositoryMemoryEmbedding`

Issue:

To enhance the agent's ability to learn from past interactions, we are considering the addition of a "RepositoryMemoryEmbedding" feature. The central premise is to vectorize each GitHub commit or event, enabling the system to treat the GitHub repository as a "memory". This would allow for indexing and searching of past changes, comments, issues, and pull requests, increasing the context awareness of our system.

With proper implementation, this repository memory could provide valuable insights and make our system more context-aware. By learning from past interactions with the code, our agent could offer improved assistance and recommendations.

Implementation:

A new class RepositoryMemoryEmbeddingHandler could be created that inherits from the MemoryEmbeddingHandler. This handler will need an instance of the GithubAPIHandler and an EmbeddingProvider to vectorize GitHub events and store these vectors into a suitable database.

The RepositoryMemoryEmbeddingHandler class should at minimum contain the following methods:

get_embedding(self, repo_id: str) -> Any: This method retrieves a GitHub event using the repo_id, turns it into a string, and generates an embedding using the EmbeddingProvider. This embedding is then returned.

update_embedding(self, repo_id: str, new_event: GithubEvent) -> None: This method adds a new GitHub event and updates the corresponding event embedding.

Error Handling: Please include comprehensive error handling, especially when interacting with the GitHub API and while creating embeddings.

Performance: Consider the performance of the system as the number of GitHub events grows. We might need to contemplate how to periodically prune old GitHub events if they are no longer necessary, or consider employing a more scalable storage solution.

Testing: Comprehensive testing should be conducted to ensure the system's robustness, particularly since this is an innovative feature. Both the functionality of RepositoryMemoryEmbeddingHandler and any changes made to the GithubAPIHandler need to be thoroughly tested.

Tasks:

Create a new class RepositoryMemoryEmbeddingHandler.
Implement get_embedding and update_embedding methods in RepositoryMemoryEmbeddingHandler.
Update GithubAPIHandler to support event embeddings.
Implement unit tests for RepositoryMemoryEmbeddingHandler.
Implement integration tests for RepositoryMemoryEmbeddingHandler and GithubAPIHandler.

For inspiration, look at how auto-gpt utilizes memories. Please feel free to ask questions or seek clarification. Your input to this project is greatly appreciated!

Consider saving `_build_rankable_subgraph` output to disc (or make it more efficient)

The current implementation of _build_rankable_subgraph in SymbolGraph does not scale well. We either need to improve the scaling or start saving output to disc to speed up iteration.

Add issue title here

File: automata/singletons/py_module_loader.py
FIXME - This fails if the path is not rooted in the base directory

Remove `Redbaron` dependency

It is probably not wise to use Redbaron as a dependency, given that it appears to no longer be maintained. This needs to be excised from the code base in favor of AST.

Doing so will require us to implement our own version of BoundingBox in order to facilitate calculations like fst_object.absolute_bounding_box. This is a non-trivial exercise and so I have delayed undertaking it.

Someone should investigate this undertaking.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Evaluate and Enhance Our CI Pipeline

Is our CI pipeline complete?

While our current Continuous Integration (CI) pipeline is robust and performing well, I think there's always room for us to step back and evaluate if there's more we could do. Currently, the pre-commit workflow follows this sketch:

repos:
   ...
    hooks:
      - id: mypy
    ...
    hooks:
      - id: yapf
    ...
    hooks:
      - id: black
    ...
    hooks:
      - id: flake8
    ...
    hooks:
      - id: isort
        language_version: python3.9 # Replace with your Python version
    ...
    hooks:
      - id: pytest
    ...
      - id: config_validation
        name: Config Validation
        entry: python -m automata.cli.scripts.run_config_validation
        language: system
        types: [python]

And we have corresponding GitHub Actions to ensure these pre-commits are satisfied.

While we're covering many bases, it's worth considering if there are other tools or checks we could benefit from. Here are some possibilities I've considered:

Code Coverage Checks: Consider adding a tool like coverage or codecov to measure the extent to which our codebase is covered by our tests. This would help us ensure we're thoroughly testing our code and identify any areas that might need more testing.
Security Checks: A tool like bandit or safety could help us catch any potential security vulnerabilities in our codebase.
Docstring Checks: We could use a tool like pydocstyle to check the quality and consistency of our docstrings.
Dependency Checks: We could add a tool like pipdeptree or safety to check for any dependency conflicts or security vulnerabilities in our dependencies.

Of course, these are just ideas, and the benefits should outweigh the added complexity for each tool we consider adding. I'm open to hearing everyone's thoughts on these suggestions or if there are other tools you think would be beneficial.

Enhance Regression Testing and Implement `Eval` Class for Robust Agent Evaluation

I wanted to bring up the subject of our regression testing and the potential for a new, more comprehensive approach to evaluating our agents' performance. Currently, our regression testing is limited mainly to search functionality, which leaves a lot of our codebase under-tested. A robust and comprehensive testing framework is critical to ensuring code quality, catching bugs early, and facilitating smooth code integration. It's also a key contributor to long-term code maintainability.

To this end, I propose two main initiatives:

Expand our Regression Testing: By enhancing our suite of regression tests, we can ensure that changes in one part of our code don't break something somewhere else. This will help us maintain system integrity and minimize the risks associated with ongoing development.
Introduce an Evaluation Suite with Eval Class: In addition to expanded regression testing, we should consider implementing a comprehensive evaluation suite featuring an Eval class. This class will allow us to evaluate how well our agents are performing by comparing their actions with expected outcomes.
Here's a rough skeleton of how the Eval class could look like:


class Eval(abc.ABC):
    """
    Evaluation classes generally should override two methods:
    `generate_eval_result`: Takes an instruction and a list of expected actions and evaluates the correctness of the agent's actions.
    `_extract_actions`: Removes the actions from a passed list of messages.
    """

    def __init__(self, *args, **kwargs):
        if "main_config" not in kwargs:
            raise ValueError("main_config must be provided to Eval")
        self.config = AutomataAgentConfigFactory.create_config(args, kwargs)

    def generate_eval_result(self, instructions: str, expected_actions: List[EvalAction]):
        """
        Evaluates a single sample.
        """
        logger.debug("Evaluating Instructions: %s", instructions)
        agent = AutomataAgentFactory.create_agent(instructions=instructions, config=self.config)
        agent.run()
        messages = Eval._extract_non_instruction_messages(agent)
        extracted_actions = Eval._extract_actions(messages)
        return calc_eval_result(extracted_actions, expected_actions)

    @staticmethod
    def _extract_actions(messages: List[OpenAIChatMessage]) -> List[Action]:
        """Extracts actions from a list of messages."""
        extracted_actions: List[Action] = []
        for message in messages:
            actions = AutomataActionExtractor.extract_actions(message.content)
            extracted_actions.extend(actions)
        return extracted_actions

Logic for the old Eval implementation can be seen here.

The Eval class's primary purpose is to take a set of instructions and corresponding expected actions and evaluate whether the instruction execution generates the anticipated actions. It's important to note that a good suite of evaluations won't necessarily pass with 100% success, but rather it will provide us a performance baseline and a clear target to strive for.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Update embedding scripts to remove 'missing' symbols

Embeddings can be refreshed via the code below

# Build/refresh the code embeddings
automata run-code-embedding

# "L1" docs are the docstrings written into the code
# "L2" docs are generated from the L1 docs + symbol context
# Build/refresh and embed the L2 docs
automata run-doc-embedding-l2

# "L3" docs are generated from the L2 docs + symbol context
# Build/refresh and embed the L3 docs
automata run-doc-embedding-l3

Unfortunately, the way these scripts are structured they do not remove symbols from the embedding that are no longer supported in the codebase. Overtime this causes a creeping growth in observed errors like these:

...
ERROR:automata.core.symbol.graph:Error processing scip-python python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0 `automata.core.coding.py_coding.navigation`/find_syntax_tree_node().: Symbol(scip-python python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0 `automata.core.coding.py_coding.navigation`/find_syntax_tree_node()., scip-python, Package(python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0), (Descriptor(automata.core.coding.py_coding.navigation, 1), Descriptor(find_syntax_tree_node, 4)))
ERROR:automata.core.symbol.graph:Error processing scip-python python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0 `automata.tests.unit.test_python_writer_tool`/test_extend_module_with_new_function().: Symbol(scip-python python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0 `automata.tests.unit.test_python_writer_tool`/test_extend_module_with_new_function()., scip-python, Package(python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0), (Descriptor(automata.tests.unit.test_python_writer_tool, 1), Descriptor(test_extend_module_with_new_function, 4)))
ERROR:automata.core.symbol.graph:Error processing scip-python python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0 `automata.tests.unit.test_python_writer_tool`/test_extend_module_with_documented_new_function().: Symbol(scip-python python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0 `automata.tests.unit.test_python_writer_tool`/test_extend_module_with_documented_new_function()., scip-python, Package(python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0), (Descriptor(automata.tests.unit.test_python_writer_tool, 1), Descriptor(test_extend_module_with_documented_new_function, 4)))
ERROR:automata.core.symbol.graph:Error processing scip-python python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0 `automata.core.coding.py_coding.module_tree`/LazyModuleTreeMap#put_module().: Symbol(scip-python python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0 `automata.core.coding.py_coding.module_tree`/LazyModuleTreeMap#put_module()., scip-python, Package(python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0), (Descriptor(automata.core.coding.py_coding.module_tree, 1), Descriptor(LazyModuleTreeMap, 2), Descriptor(put_module, 4)))
ERROR:automata.core.symbol.graph:Error processing scip-python python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0 `automata.core.coding.py_coding.module_tree`/DotPathMap#contains_dotpath().: Symbol(scip-python python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0 `automata.core.coding.py_coding.module_tree`/DotPathMap#contains_dotpath()., scip-python, Package(python automata 9db05b7e7ebd49f93703df45accd7e5f9d5cedb0), (Descriptor(automata.core.coding.py_coding.module_tree, 1), Descriptor(DotPathMap, 2), Descriptor(contains_dotpath, 4)))

We should modify the re-fresh scripts to remove these 'missing' symbols.

Feel free to post any questions or concerns you have about this implementation. Your contribution to this project is highly appreciated!

Add issue title here

File: automata/singletons/dependency_factory.py
FIXME - Fix this type ignore

Specify list of supported symbols for document generation, add override flag to `EmbeddingHandler`

The goal is to modify the run_doc_embedding script to accept an arbitrary list of new symbols to regenerate [preferably in some human readable form that is parsed]. These symbols should then overwrite existing results in the database with new documentation if necessary. To implement the second step a flag will need to be added to the EmbeddingHandler.

Understand the Project: Review the Automata project to get a clear understanding of its goals and how it operates. This includes understanding how the document generation pipeline works, the role of indices and embeddings, and how they interplay to allow for efficient code and documentation generation.
Setup the Environment: Clone the Automata project into the "../repo_store/automata" directory relative to your local working directory.

Code: git clone [email protected]:emrgnt-cmplxty/Automata.git ../repo_store/automata
Generate New Indices: Once you have cloned the Automata project, navigate to the "scripts" directory and run the generate_indices.sh script to generate new indices.

Code:
```
cd scripts
./generate_indices.sh
```
You can verify the creation of the indices by navigating to the automata-embedding-data directory and running git status.

Code:
```
cd automata-embedding-data
git status
```
Refresh Code Embeddings: From your main directory, run the run-code-embedding command. This will refresh the embeddings in the database, rolling the commit hash forward where the symbol source code hasn't changed and recalculating the index where necessary.

Code: poetry run automata run-code-embedding
Run Document Embedding: Next, run the run-doc-embedding command locally to see it in action.

Code: poetry run automata run-doc-embedding

This command generates new docs for newly added symbols and moves forward with the symbols that are in the index and map onto the database. This is a crucial part of the pipeline which generates the Automata docs.
Understand the Codebase: As you carry out these steps, ensure to take mental notes to understand the workings of the codebase. Insert print statements and other debug aids to get a sense of what's happening in the pipeline.
Modify run_doc_embedding: Once you are confident with your understanding of the pipeline, your goal is to modify run_doc_embedding to accept a list of new symbols to regenerate, and overwrite existing results in the database with new documentation if necessary.
Seek Help: If you encounter any issues or have questions, don't hesitate to ask for help. Once you have successfully carried out these tasks, check in for the next steps.

Remember, understanding the pipeline and how everything fits together is the key to this task.

Automata Documentation Files Generation Alters Git State

Describe the bug
Upon running the generate_faq.py and sphinx-build -a . docs scripts, additional files get created within the ./docs/docs directory. These files are currently not ignored by the Git configuration, causing uncommitted changes to be present in the working directory.

To Reproduce
Steps to reproduce the behavior:

Navigate to the docs directory in the Automata root folder.
Run python generate_faq.py.
Run sphinx-build -a . docs.
Observe the creation of new files within the ./docs/docs directory.

Expected behavior
The ./docs/docs directory should be added to the .gitignore file to avoid tracking unnecessary files.

Screenshots
Not applicable.

Desktop:

OS: Windows
Git Bash / Git Desktop

Smartphone:
Not applicable.

Additional context
These auto-generated files are not needed in the repository, and their presence may confuse contributors. Including the ./docs/docs directory in .gitignore would solve this issue.

Notably, some files outside the ./docs/docs directory, which already existed, are being modified during the documentation generation process. I acknowledge these modifications as intended and beneficial. Therefore, I plan to commit these files to the repository after the documentation generation. This strategy ensures that all essential updates are incorporated while excluding unnecessary, auto-generated files.

Implement `calculate_similarity` Method in `VectorDatabaseProvider`

The VectorDatabaseProvider abstract base class currently has a placeholder for the calculate_similarity method. For a functioning implementation, this method needs to calculate the similarity between a given vector (i.e., symbol embedding) and all vectors in the database. This is crucial for any use case involving similarity or distance-based retrieval of symbols in the database.

The calculate_similarity method should return a list of dictionaries, each containing a symbol and its corresponding similarity score. The implementation will largely depend on the data structure used for the database and the specific similarity measure employed (for instance, cosine similarity for vector data).

Here's a high-level overview of what needs to be done:

Update the calculate_similarity method in VectorDatabaseProvider to correctly compute similarity scores instead of raising a NotImplementedError.
The similarity scores should be computed for all vectors in the database against the provided input vector.
The output should be a list of dictionaries, each containing a Symbol instance and the corresponding similarity score as a float value.
Test the method to ensure it correctly computes and returns similarity scores.
Feel free to post any questions or concerns you have about this implementation. Your contribution to this project is highly appreciated!

Discrepancy in Naming of Sphinx Dependencies

Describe the bug
In the current configuration of our requirements.txt, the dependencies for Sphinx are listed as sphinx-panels and sphinx-rtd-dark-mode. However, on a Windows system, these dependencies are installed and imported with underscores rather than hyphens, specifically as sphinx_panels and sphinx_rtd_dark_mode.

To Reproduce
This discrepancy can be observed when attempting to install the dependencies on a Windows system using pip, as per the instructions in our documentation.

Expected behavior
The dependencies should be correctly listed in requirements.txt such that the installation commands run smoothly across different platforms, including Windows.

Desktop (please complete the following information):

OS: Windows 11

Additional context
This issue might lead to confusion or errors when contributors try to set up the project on a Windows system. We need to address this discrepancy and update the requirements.txt to reflect the correct names of the dependencies. Furthermore, it would be beneficial to verify the cross-compatibility of these dependency names across other platforms.

Potential Issue with `update_existing_embedding` functionality in EmbeddingHandlers

There is a potential problem with the update_existing_embedding function within the SymbolCodeEmbeddingHandler class. The problem appears to be exclusive to updating existing embeddings, while the creation of new embeddings is functioning as expected.

The update_existing_embedding function presently compares the symbol's source code with that of the existing embedding. If there's a discrepancy, the method discards the old embedding and generates a new one. However, this may not account for all potential changes to the symbol that would necessitate a new or updated embedding.

Additionally, the JSONVectorDatabase.discard() and JSONVectorDatabase.add() operations might not be persisting the changes to disk. If these methods do not automatically save changes, or if a required commit operation isn't being executed, the updates might not persist across sessions.

Lastly, it might be beneficial to review the build_embedding function. If it isn't designed to generate updated embeddings from old ones, it may not be suitable for its use within update_existing_embedding.

While it is challenging to determine the exact issue without more context or error messages, these potential problem areas could be a good starting point for investigation.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Re-implement `Task` Flask Server

Our codebase includes a server that was previously integrated with our Task workflow and provided a user-friendly interface for managing ongoing tasks. Unfortunately, we recently removed this Task workflow, including the server, due to its initial implementation being somewhat messy and incomplete. The last version of the server code can be seen here.

However, it is crucial to re-implement the Task server to efficiently manage the task processing. Below is the documentation for the original Task Server for your reference.

This Task Server was built using Flask and integrated with various components of the Task workflow including GitHub manager, Task Registry, and Task Executor. It exposed several endpoints to create, execute, and commit tasks as well as retrieve task details, task logs, and conversation details.

Key aspects of the server included:

before_request() method to set up a GitHub manager, Task Registry, and Task Executor before each request.
run_with_logs() method to ensure proper logging configuration in the subprocess while running the task.
Various GET endpoints like /tasks, /task/<task_id>, /task/logs/<task_id>, /conversation/<session_id>, and /full_conversation/<session_id> to fetch all tasks, a specific task, logs of a task, conversation without a prompt, and full conversation respectively.
/task/initialize endpoint (POST method) to initialize a task with provided parameters and run it in a subprocess.
/task/<task_id>/execute endpoint (POST method) to execute a specific task using the Task Executor.
/task/<task_id>/commit endpoint (POST method) to commit a specific task to a GitHub repository.

While re-implementing the server, it is critical to maintain the ease of use and efficiency of the previous server. However, we should take the opportunity to improve upon areas where the previous server was lacking. Here are some points to consider:

Logging Configuration: The previous server had a somewhat hacky way of configuring logging, especially for subprocesses. This can be improved to make the logging more streamlined and efficient.
Error Handling: There was not enough comprehensive error handling in the previous server implementation. It should be considered to implement more thorough error handling and provide clearer error messages.
Code Optimization: Look for opportunities to refactor the code, reduce redundancy and improve readability and maintainability.
Extensibility: Design the server in a way that it can be easily extended to add more endpoints in the future as per our requirements.

Let's work together to make the Task Server a robust, reliable, and efficient component of our codebase. Please feel free to add any ideas or suggestions to enhance the Task server.

As always, don't hesitate to ask if you have any questions or need further clarification. Your contributions to this project are highly valued!

Improve "contributing.md" to include more detailed instructions for contributors

The current contributing.md provides a basic framework for contributions, but there are opportunities to enhance it and make it more helpful to contributors.

Here are the proposed enhancements:

Setting up the project locally: Include detailed instructions about how to clone the repo, install the dependencies, and run the project.
Creating an Issue: Provide guidelines on how to correctly create an issue. The instructions should emphasize on providing a clear title, a detailed description of the problem or the feature request, and selecting the appropriate labels.
Creating a Pull Request: Outline the process of creating a PR. Encourage contributors to create a new branch for each PR. Provide instructions on how to fetch and pull the latest changes from the main branch before pushing their changes.
Commit Message Guidelines: Introduce the concept of conventional commits and explain why they are important. Include a link to a detailed guide for contributors who want to learn more about it.

By incorporating these enhancements, we can make the process of contributing to Automata more streamlined and efficient, thereby attracting more valuable contributions from the open-source community.

emrgnt-cmplxty / automata Goto Github PK

automata's People

Contributors

Stargazers

Watchers

Forkers

automata's Issues

Codecov Report

Description

Proposed Solution

Tasks

Estimated Effort

Requested Reviewers

Completion Criteria

Notes

Title: Introduce custom errors handling across codebase

Issue:

Problem:

Proposed Solution:

Tasks:

Problem:

Tentative Solution:

Points to Consider:

Tasks:

Issue:

Implementation:

Points to consider:

Tasks:

Considerations for Re-implementation

Requested Action

Issue:

Implementation:

Points to consider:

Tasks:

Issue:

Implementation:

Points to consider:

Tasks:

Title: Migrate Python Project Dependency Management to Poetry

Requested Action

Summary

Description

Tasks

Acceptance Criteria

Requirements:

Proposed Steps:

Potential Challenges:

Next Steps:

Requirements:

Steps:

Recommend Projects

Recommend Topics

Recommend Org