patterns-ai-core / langchainrb Goto Github PK

View Code? Open in Web Editor NEW

1.2K 35.0 160.0 1.89 MB

Build LLM-powered applications in Ruby

Home Page: https://rubydoc.info/gems/langchainrb

License: MIT License

Ruby 99.97% Shell 0.03%

artificial-intelligence machine-learning ml rubyml vector-search agents ai-agents

langchainrb's People

Contributors

Stargazers

Watchers

Forkers

obie rafaelqfigueiredo mission-met bborn tabfugnic petergoldstein kawakamimoeki alchaplinsky justjoehere progm faustus7 timrwilliams technicalpickles zhenhangtung foreverlovewisdom akmhmgc atipugin keshavbiswa jorgeluis594 mattlindsey zewelor yorzi jackpaulcollins mael-ha cxz breckenedge madbomber oluvvafemi cafonso claudey creaturenex erbunao jinshen-cn bboylen femmestem joshweir anhnguyen1tomosia arthurhess satoru02 chaskiq dcluna liteo gose selectus2 yuanzhongqiao sivagollapalli onerinas strivedi183 antonzaharia nheingit santib lukasedw rthbound alexrudall femto diego-suarez sherlockgomes geeknees andrew-pilot asmorris tanys123 dferrazm mdeering austinbrown34 taylorstib acidtib uehara1414 misteral mengqing vineelvineel baldrailers xorgnak itsderek23 abradburne interlinked-technologies-ltd anjolovic ci moh-alsheikh elafo vatogato sahilbansal11 duncantmiller lilystoney fdaugs mathieuripert bricolage ddazza roshan92 threeriversainexus sperreault palladius spikex tonyyanga teloslabs noreastergroup nullstyle nossidge mtrefilek mikekosulin rasmachineman

langchainrb's Issues

Add additional examples to /examples

Add additional examples to /examples. We don't have all of the current functionality/use-cases documented. I think it's helpful to showcase what this library can and cannot do!

🛠️ Tools: Add Wolfram Alpha tool

Build a new Tool Langchain::Tool::WolframAlpha that can be connected to Agents.

Tasks:

Find a suitable Wolframe Alpha Ruby gem

Resources:

Langchain Wolfram Alpha tool

Improve the Langchainrb documentation

Problem

We have outgrown the single-file README. It is time to migrate to a more scalable solution.

Auto-generated documention RDoc-style may not be ideal as it'll read as a dry technical manual.

Ideas

Create a dedicated a website (@rickychilcott shared https://www.phlex.fun/ as an example)

Design a modular system where dependencies can be optionally included

Problem:

Currently if you'd like to only use Qdrant and OpenAI, this gem will install a bunch of other dependencies you don't need. There's no reason to do this.

Options to consider:

We can build a similar system to omniauth where different strategies are installed separately omniauth-github, omniauth-facebook, etc.
We'd ask the user to install gems, on their own, in their applications and then we'd require them in this gem (pseudo-codeish):

class Qdrant
  def initialize
     require "qdrant"
     ...

Thank you for raising this issue @alchaplinsky!

Use maybe ruby-openai ?

Have you checked https://github.com/alexrudall/ruby-openai ?

Implement the LLM wrapper for the Hugging Face gem

HF gem: https://github.com/alchaplinsky/hugging-face

Support for JSON examples in PromptTemplate

It's common when prompt engineering to require the LLM to return results in JSON format and to include an example JSON response within the prompt. The current Prompt variable parser can't handle this as it interprets the example JSON as an input variable and then complains that it is missing:

require "langchain"
simple_template = "Tell me a {adjective} joke. Return in JSON in the format {{joke: 'The joke'}}"
Prompt::Base.extract_variables_from_template(simple_template)
=> ["adjective", "joke: 'The joke'"]

The current Python f-string parser handles this by allowing you to escape a curly brace with a double curly brace, as shown in the example above. Python f-string spec

Looking at the regex its clearly trying to do something with double curly brackets so before I try and fix it can you shed some light on the original implementation?

Improve agent prompt templates by using yaml instead of json

Problem

The prompt templates are hard to read and understand since they don't support muli-line strings.

Solution

Rename the files to .yaml, update the strings, and update code.

An abstraction that interfaces with a database and uses an LLM to come up with SQL queries to be executed.

Background

A popular use-case is constructing a prompt for the LLM that includes the database schema, and user's question at hand to construct a SQL query that gets executed on LLM's behalf. The SQL query's result then gets fed back to the LLM to synthesize an answer to the user's question.

Objectives

Create an entity (should it be a Tool? an Agent? or another abstraction?) that:

Can establish a connection to a database
Can execute (sanitized!) SQL queries
Can construct a prompt for the LLM that:
- Includes information about the DB schema, or only relevant tables? Subject to context window limits.
- Includes the instructions for the LLM
- Includes the user's question.
- Good prompt to model after: link.

An abstraction that interfaces with any API and uses an LLM to come up with API calls to be executed.

Background

A popular use-case is constructing a prompt for the LLM that includes the API specs, and user's question at hand to construct an API call that gets executed on LLM's behalf. The API call's result then gets fed back to the LLM to synthesize an answer to the user's question.

Objectives

Create an entity (should it be a Tool? an Agent? or another abstraction?) that:

Can execute API calls
Can construct a prompt for the LLM that:
- Includes API specs, or only relevant endpoints? Subject to context window limits.
- Includes the instructions for the LLM
- Includes the user's question.

Support for pgvector

pgvector is an extension for PostgreSQL that allows querying and indexing vector data.

There's already a gem for ruby and for Rails.

P.S. I've already implemented it on GPTCache, if you need I may help with this.

Improve db tool schema dump

As a prequel to #129, use Sequel 'reflection' methods to replace the current messy schema dump with clean table definitions and foreign keys (to which sample data can added in a subsequent issues, and the output can be limited to certain tables):
https://sequel.jeremyevans.net/rdoc/files/doc/reflection_rdoc.html

Like this (as suggested by @bborn):
https://github.com/jerryjliu/llama_index/blob/b4618a2a24cd11b5c5949ab97389d62ac34ea336/llama_index/indices/struct_store/container_builder.py#L76
https://github.com/jerryjliu/llama_index/blob/c2f24363b8c6cd74f17647548187821ce9ea4ddf/llama_index/langchain_helpers/sql_wrapper.py#L44

Format (suggested by @rickychilcott from this paper)

CREATE TABLE Highschooler(
ID int primary key,
name text,
grade int)
/*
3 example rows:
SELECT * FROM Highschooler LIMIT 3;
ID name grade
1510 Jordan 9
1689 Gabriel 9
1381 Tiffany 9
*/
CREATE TABLE Friend(
student_id int,
friend_id int,
primary key (student_id,friend_id),
foreign key(student_id) references Highschooler(ID),
foreign key (friend_id) references Highschooler(ID)
)
/*
3 example rows:
SELECT * FROM Friend LIMIT 3;
student_id friend_id
1510 1381
1510 1689
1689 1709
*/

Include index_name in Weviate example.

A small improvement to the docs for the Weaviate example is to include index_name

client = Langchain::Vectorsearch::Weaviate.new(
    url: ENV["WEAVIATE_URL"],
    api_key: ENV["WEAVIATE_API_KEY"],
    index_name: "Document", # add this
    llm: :openai, # or :cohere
    llm_api_key: ENV["OPENAI_API_KEY"]
)

Since this is required by the API it makes sense to include

Reference: https://weaviate.io/developers/weaviate/quickstart/custom-vectors#schema

Add integration and/or system tests

I think we need the ability to add and run some 'Integration' tests that exercise interactions in high level components and use actual apis and keys. They would be run only on request and could be run before each release.

Start with a simple question to ChainOfThought with openai like in the README, with expectation that the result should be similar but not exactly equal to the result given in the README, since I assume the ai can respond slightly differently each time the test is called.

`require "langchain"` throws an error after a fresh install

Stack trace:

 -> langchain [main*]: gem install langchainrb
Successfully installed langchainrb-0.3.11
Parsing documentation for langchainrb-0.3.11
Done installing documentation for langchainrb after 0 seconds
1 gem installed
 -> langchain [main*]: irb
irb(main):001:0> require "langchain"
/Users/andrei/.asdf/installs/ruby/3.0.0/lib/ruby/gems/3.0.0/gems/langchainrb-0.3.11/lib/langchain.rb:17:in `<module:Langchain>': uninitialized constant Langchain::Pathname (NameError)
    from /Users/andrei/.asdf/installs/ruby/3.0.0/lib/ruby/gems/3.0.0/gems/langchainrb-0.3.11/lib/langchain.rb:7:in `<top (required)>'
    from <internal:/Users/andrei/.asdf/installs/ruby/3.0.0/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:160:in `require'
    from <internal:/Users/andrei/.asdf/installs/ruby/3.0.0/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:160:in `rescue in require'
    from <internal:/Users/andrei/.asdf/installs/ruby/3.0.0/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:149:in `require'
    from (irb):1:in `<main>'
    from /Users/andrei/.asdf/installs/ruby/3.0.0/lib/ruby/gems/3.0.0/gems/irb-1.6.4/exe/irb:9:in `<top (required)>'
    from /Users/andrei/.asdf/installs/ruby/3.0.0/bin/irb:23:in `load'
    from /Users/andrei/.asdf/installs/ruby/3.0.0/bin/irb:23:in `<main>'
<internal:/Users/andrei/.asdf/installs/ruby/3.0.0/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require': cannot load such file -- langchain (LoadError)
    from <internal:/Users/andrei/.asdf/installs/ruby/3.0.0/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
    from (irb):1:in `<main>'
    from /Users/andrei/.asdf/installs/ruby/3.0.0/lib/ruby/gems/3.0.0/gems/irb-1.6.4/exe/irb:9:in `<top (required)>'
    from /Users/andrei/.asdf/installs/ruby/3.0.0/bin/irb:23:in `load'
    from /Users/andrei/.asdf/installs/ruby/3.0.0/bin/irb:23:in `<main>'

Add support for Google LLMs

Build a TokenLengthValidator for Google Palm 2 models.

Move all of the custom errors/exceptions to a dedicated file

Problem

We currently have custom errors/exceptions spread out and defined all throughout the project within different namespaces.

Solution

Gather up and move all of the custom error class definitions to a dedicated errors.rb file. Here's an example of a good organization: https://github.com/jnunemaker/httparty/blob/master/lib/httparty/exceptions.rb. Please annotate (write documentation) to each error class, when it's raised, etc.

[Feature Request] 🤖 Agents should accept `memory:` option

Description:

Agents need to be able to store data in memory or vector search DB for retrieval later.

Example:

There's a limitation with the current SQLQueryAgent agent that will encounter a context window limit when a large database table schema is passed in to the LLM. As pointed out by @bborn LlamaIndex stores the DB schema in a vector search database to avoid stuffing the whole DB schema in a single prompt.

We should make the memory: (should we call it persistence: instead?) to Agents to solve problems like that ^^

Namespace lib directory and constants

Right now, all the Ruby files currently live in the lib directory. This can be a problem because the "require namespace" is flat, which can cause problems when there's overlap. It would be better to have them in lib/langchain

Similarly, all the constants are in the global namespace, which is shared. It'd be better to have them under Langchain.

So I propose:

move everything in lib into lib/langchain (except lib/langchain.rb)
update everything to be under Langchain

I can make the change, but it is very prone to getting out of date and causing merge conflicts, so wanted to get feedback before attempting.

Pgvector integration `similarity_search_by_vector()` should use the cosine distance by default

The Pgvector's similarity_search_by_vector() should be calculating the cosine distance by default: https://github.com/andreibondarev/langchainrb/blob/main/lib/vectorsearch/pgvector.rb#L75-L77

More info about building a cosine distance query here: https://github.com/pgvector/pgvector/blob/dee2c4feb1bc5b17b9fe6a0a1ce8dbf0963c1b05/README.md#vector-operators

[Proposal] Improvements to the ChainOfThoughtAgent

1. Rename ChainOfThoughtAgent to ReActAgent because that's what it actually is. This is what "chain of thought" is: https://learnprompting.org/docs/intermediate/chain_of_thought; and this is what "ReAct" actually is: https://arxiv.org/abs/2210.03629
2. Try asking the LLM for a JSON output format -- may or may not be more accurate. Source:
. i.e. instead of RegEx-ing strings, we may be able to read off JSON keys instead.

(Separate PRs please).

[Feature Request] Allow Langchain::Loader to accept a IO object / File Handle / a String

It would be super-useful to accept an IO Stream or a string directly.

It's particularly useful when you're working with files on cloud storages like Google Drive or S3.

Example:

drive = Google::Apis::DriveV3::DriveService.new
raw_content = drive.get_file(my_file_id, download_dest: StringIO.new).string
text = Langchain::Loader.load(raw_content)

Without this, I have to do something like:

drive = Google::Apis::DriveV3::DriveService.new
temp_file = TempFile.new(my_file_id)
raw_content = drive.get_file(my_file_id, download_dest: temp_file.path)
text = Langchain::Loader.load(temp_file.path)
FileUtils.rm(temp_file.path)

Add `Loader::Doc` loader to process `.doc`, `.docx` files

Create a Loaders::Doc class (similar to https://github.com/andreibondarev/langchainrb/tree/main/lib/loaders) that processes .doc and .docx files.
Test out that the data can then be indexes into one of the Vectorsearch DBs.
Add specs

Modify Tool usage to pass tool instances instead of tool names as strings.

Currently when an Agent is initialized to use Tools we pass them as strings that are then matched to existing classes. See the following current usage:

agent = Langchain::Agent::ChainOfThoughtAgent.new(llm: :openai, llm_api_key: ENV["OPENAI_API_KEY"], tools: ['search', 'calculator'])

agent.tools
# => ["search", "calculator"]

agent.tools = ["wikipedia"]

This approach is not flexible because:

We're unable to configure existing Tools
We're unable to pass new Tools to Agents

Let's change the Agent interface to accept Tool instances like so:

calculator_tool = Langchain::Tool::Calculator.new()
sql_db_tool = Langchain::Tool::Database.new(db_connection_string: "postgres://user:password@localhost:5432/db_name") # Coming in this PR: https://github.com/andreibondarev/langchainrb/pull/91/files#diff-9a2d0c4b8a1176be3d78866742f2ba4c2da2452cb05959f433c1204bb8211ebd

agent = Langchain::Agent::ChainOfThoughtAgent.new(
  llm: :openai,
  llm_api_key: ENV["OPENAI_API_KEY"],
  tools: [calculator_tool, sql_db_tool]
)

Note: Modify the SerpApi tool to accept the api_key: in the initialize method instead of seeking out the ENV var here:

module Langchain::Tool
  class SerpApi < Base
  
    attr_reader :api_key

    def initialize(api_key:)
      @api_key = api_key

    def execute_search(input:)
      GoogleSearch.new(
        serp_api_key: api_key

Debugging this project doesn't allow 'step' or 'continue'

When I set a breakpoint with 'binding.pry' most debugging commands work except 'step' and 'continue.'

[2] pry(#<Langchain::Tool::Database>)> whereami
Inside #<Langchain::Tool::Database>.
[3] pry(#<Langchain::Tool::Database>)>
[4] pry(#<Langchain::Tool::Database>)> step
NameError: undefined local variable or method `step' for #<Langchain::Tool::Database:0x00000001040e0e20 @DB=#<Sequel::SQLite::Database: {:adapter=>:sqlite}>>
from (pry):1:in `__pry__'

Improve logging throughout the langchainrb

When the :debug logger level is set:

Langchain.logger.level = :debug

I think we should print a ton of data, similar to the current output that the ChainOfThoughtAgent currently generates. Examples:

[Langchain.rb] [Weaviate]: Saving data to database is successful
[Langchain.rb] [OpenAI]: Generating embeddings
[Langchain.rb] [Weaviate]: Searching the "products" index

etc.

The challenge is figuring out which log messages are most helpful, we don't want to just flood the developer / the logs with useless text.

There's another way to do this:

[Langchain.rb] { service: "Vectorsearch::Weviate", action: "similarity_search", parameter: "..." }
[Langchain.rb] { service: "LLM::OpenAI", action: "embed", parameter: "..." }

Add support for Replicate LLMs

[Proposal] Change method add_data's parameter

Under module Vectorsearch::Base, there is a method add_data accepting path: nil, paths: nil params.
From my point of view, we could remove path and only accept paths or use the splat operator.

[Proposal] Option to persist conversations when using chat endpoints

Background

Currently the conversations with LLMs that offer the .chat() endpoint are not persisted, hence the LLM has no context of the previous chat messages that may have taken place.

The following 2 LLMs offer chat capabilities and accept messages: array:

Google Palm: API specs and Ruby client endpoint.
OpenAI: https://platform.openai.com/docs/api-reference/chat/create#chat/create-messages.

Tasks:

Modify the Google Palm and OpenAI LLM .chat() methods to persist and keep track of previous chat exchanges that has taken place.

Open Questions:

What is the best way to persist these conversations? That fits well with the langchain.rb library. In-memory, DB, vectorsearch DB, Redis?
What does the interface look like? Possibly:

openai = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])

openai.chat_persistence = true

openai.ask(question: ...)
#=> LLM answer...
openai.ask(question: ...)
#=> LLM answer...
openai.ask(question: ...)
#=> LLM answer...

openai.clear_chat_persistence!

openai.chat_persistence = false

Add content chunkers

❗❗❗ REQUESTING FEEDBACK ❗❗❗

We would like to collect any and all feedback people might have regarding Langchain.rb: GOOD and BAD! Have you already tried Langchain.rb in your project? Do you have specific requirements or use-cases that you don't think Langchain.rb could help you with? Please provide us with your feedback!

We'd like to ensure that this project is rooted in real needs and use-cases, and solves actual pain-points when developing LLM-driven applications.

Optional questions to think through and answer:

Is it clear what this library is meant to be used for?
Did you have any problems installing and integrating this library into your project?
Did you run into any errors while using this library?
Are you currently using this library in a project/prototype? Unless "in stealth mode" -- please share more info about your project.
How well is this library documented?
What kind of features do you think this project is missing?
...

Thank you! ❤️

Search tool requires an API key, raises error from serp_api library when missing

I was just trying out this simple chain of thought agent:

agent = Langchain::Agent::ChainOfThoughtAgent.new(llm: :openai, llm_api_key: ENV["OPENAI_API_KEY"], tools: ['calculator'])
puts agent.run(question: "What is the square root of 99?")

It fails with:

server returns error for url: https://serpapi.com/search?q=%E2%88%9A99&engine=google&output=json&source=ruby
/Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/google_search_results-2.2.0/lib/search/serp_api_search.rb:143:in `rescue in get_results': Invalid API key. Your API key should be here: https://serpapi.com/manage-api-key (RuntimeError)
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/google_search_results-2.2.0/lib/search/serp_api_search.rb:136:in `get_results'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/google_search_results-2.2.0/lib/search/serp_api_search.rb:50:in `get_json'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/google_search_results-2.2.0/lib/search/serp_api_search.rb:64:in `get_hash'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/tool/serp_api.rb:50:in `execute_search'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/tool/serp_api.rb:27:in `execute_search'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/tool/calculator.rb:27:in `rescue in execute'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/tool/calculator.rb:20:in `execute'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/tool/base.rb:26:in `execute'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/agent/chain_of_thought_agent/chain_of_thought_agent.rb:66:in `block in run'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/agent/chain_of_thought_agent/chain_of_thought_agent.rb:44:in `loop'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/agent/chain_of_thought_agent/chain_of_thought_agent.rb:44:in `run'
	from main.rb:7:in `<main>'
/Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/3.2.0/open-uri.rb:369:in `open_http': 401 Unauthorized (OpenURI::HTTPError)
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/3.2.0/open-uri.rb:760:in `buffer_open'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/3.2.0/open-uri.rb:214:in `block in open_loop'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/3.2.0/open-uri.rb:212:in `catch'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/3.2.0/open-uri.rb:212:in `open_loop'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/3.2.0/open-uri.rb:153:in `open_uri'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/3.2.0/open-uri.rb:740:in `open'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/google_search_results-2.2.0/lib/search/serp_api_search.rb:139:in `get_results'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/google_search_results-2.2.0/lib/search/serp_api_search.rb:50:in `get_json'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/google_search_results-2.2.0/lib/search/serp_api_search.rb:64:in `get_hash'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/tool/serp_api.rb:50:in `execute_search'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/tool/serp_api.rb:27:in `execute_search'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/tool/calculator.rb:27:in `rescue in execute'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/tool/calculator.rb:20:in `execute'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/tool/base.rb:26:in `execute'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/agent/chain_of_thought_agent/chain_of_thought_agent.rb:66:in `block in run'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/agent/chain_of_thought_agent/chain_of_thought_agent.rb:44:in `loop'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/agent/chain_of_thought_agent/chain_of_thought_agent.rb:44:in `run'
	from main.rb:7:in `<main>'
/Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/eqn-1.6.5/lib/eqn/parser.rb:12:in `parse': Parse error at offset: 0 -- Expected one of "\s", "\t", 'if', 'IF', 'round', 'ROUND', 'roundup', 'ROUNDUP', 'rounddown', 'ROUNDDOWN', '(', '+', '-', [0-9], '.', [a-zA-Z] at line 1, column 1 (byte 1) (Eqn::ParseError)
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/eqn-1.6.5/lib/eqn/calculator.rb:66:in `calculate'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/tool/calculator.rb:23:in `execute'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/tool/base.rb:26:in `execute'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/agent/chain_of_thought_agent/chain_of_thought_agent.rb:66:in `block in run'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/agent/chain_of_thought_agent/chain_of_thought_agent.rb:44:in `loop'
	from /Users/josh.nichols/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/langchainrb-0.4.1/lib/langchain/agent/chain_of_thought_agent/chain_of_thought_agent.rb:44:in `run'
	from main.rb:7:in `<main>'

I was kinda surprised that calculator uses the search tool. But I was also not expecting to give an API key.

I am imagining two pieces to this:

update docs to make it clear to make an account
check for SERP_API_KEY before using it, and give a nicer to read error, and refer to docs

Add an example how to use document loaders

[Feature Request] Sumarization toolkit and examples

One feature I would love to have in Langchain.rb that may be super-useful is summarization:

https://python.langchain.com/en/latest/modules/chains/index_examples/summarize.html

I don't think it's super hard to implement. (at least: a base version of it)

Calculate max_tokens based on model and prompt token values

Prompt + completion is apparently sometimes exceeding with hardcoded max_token values in the agents.
I think it can be calculated as: model_limit - prompt_token_size.
Use the Utils class TokenLengthValidator and/or Tiktoken.

(https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb) for counting tokens.](https://platform.openai.com/docs/api-reference/completions/create#completions/create-max_tokens)

Pgvector integration: Wrap `@index_name` in `quote_ident(@index_name)` in all queries

Tasks:

Wrap all of the instances of @index_name inside of queries in a quote_ident(@index_name). More information about quote_ident: https://www.rubydoc.info/gems/pg/PG/Connection#quote_ident-instance_method

Drop support for Ruby 2.x

Explanation

Ruby 2.7 has reached the End of Life (https://www.ruby-lang.org/en/downloads/branches/). It seems like a number of dependencies will start dropping support for Ruby 2.7 and only requiring Ruby >= 3.0. For example pgvector-ruby, a dependency of ours, already dropped Ruby 2.7 support.

Tasks

Get rid of Ruby 2.7 support
Bump pgvector-ruby library dependency to 0.2

Files in examples/pdf_store_and_query_with_chroma.rb don't exist

I was looking over these, and noticed:

https://github.com/andreibondarev/langchainrb/blob/eecd114a655e5f1b51ebcd9d39e26e72cfb918f6/examples/pdf_store_and_query_with_chroma.rb#L22-L24

These files don't exist. I'm not sure if the intent is to run them from where it's located or not. If it's not, then should at least include some comments to the effect of changing the file paths. I think it would be reasonable for them to live in the repository though.

Add missing specs for Qdrant.

We're missing the specs for the Qdrant vectorsearch.

Tasks:

Add spec/vectorsearch/qdrant_spec.rb with specs.

🗣️ LLMs: Add support for AI21 LLM

AI21 studio has a ton of interesting LLM task-specific endpoints and use-cases.

Implement LLM::A21 that utilizes their APIs.

typeerror with chain_of_thought_agent

When I run the following code I get this error

require "langchain"
require 'google_search_results'


GoogleSearch.api_key = ENV[KEY_SERAPI]


agent = Agent::ChainOfThoughtAgent.new(llm: :openai, llm_api_key: ENV[OPEN_IA], tools: ['search', 'calculator'])
agent.run(question: "How many full soccer fields would be needed to cover the distance between NYC and DC in a straight line?")

chain_of_thought_agent.rb:53:in `+': no implicit conversion of nil into String (TypeError)
    prompt += response

and also when I don't get the error I don't get the message response.

Create an Vectorsearch adapter for Chroma DB

Gem: https://github.com/mariochavez/chroma

Weather questions sometimes blows up on certain city names

ChainOfThought agent usually provides a good city name to the Weather tool, but can blow up (or be tricked into blowing up) on city names.

First example works, second blows up on same city:

[Proposal] Support `.txt` prompts

TODO: Fill out the issue

Vectorsearch::Pinecone - `#ask` tokens from context exceed max tokens allowed

Context

When asking a question, depending on how have been stored the data, it's possible to have a context exceeding the max tokens of the LLM.
https://github.com/andreibondarev/langchainrb/blob/ccd0fd53a9737fb61c82058e86da1c9b855ccd7f/lib/langchain/vectorsearch/pinecone.rb#L113-L123

Suggestion

In this #ask method, we could:

Add the possibility to chose the quantity (k) of context we want to get when querying the DB
Add a custom logic once we have the context elements and buidling the context to avoid having a certain quantity of tokens

WDYT?

OpenAI LLM should be using the `v1/chat/completions` endpoint not `v1/completions`

When the ask() method is called on a Vectorsearch instance (e.g.: https://github.com/andreibondarev/langchainrb/blob/main/lib/vectorsearch/qdrant.rb#L89-L100), we call the completions() method on the OpenAI LLM. I think better answers would be served by the chat/completions endpoint instead.

[Proposal] ChatGPT Plugins

Background

Inspiration: https://python.langchain.com/en/latest/modules/agents/tools/examples/chatgpt_plugins.html

Data stored with pgvector is not indexed

Starting from #55, copying here for reference:

pgvector specifies in the docs that the index must be defined AFTER the table has some data inside. So even if I would define it during schema creation, it won't have any effect on performances. You must create (or reindex) it AFTER you've put data in the table.

Solutions:
a) Add a method like update_indexes that the user must call manually after adding data
b) Update the index implicitly in the add_texts method, maybe with an option (update_index: true)

[Proposal] Add a processor to import xlsx files

It should be pretty straight forward using the xsv gem.

Something like (untested):

# frozen_string_literal: true

require "xsv"

module Langchain
  module Processors
    class Xlsx < Base
      EXTENSIONS = [".xlsx"]
      CONTENT_TYPES = ["application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"]

      # Parse the document and return the text
      # @param [File] data
      # @return [Array of Hash]
      def parse(data)
        xlsx_file = Xsv.open(data.read)
        xlsx_file.sheets.flat_map do |sheet|
          sheet.map do |row|
            row.map(&:strip)
          end
        end
     end
  end
end

patterns-ai-core / langchainrb Goto Github PK

langchainrb's People

Contributors

Stargazers

Watchers

Forkers

langchainrb's Issues

Tasks:

Resources:

Problem

Ideas

Problem:

Options to consider:

Problem

Solution

Background

Objectives

Background

Objectives

Problem

Solution

Description:

Example:

Background

Tasks:

Open Questions:

Tasks:

Explanation

Tasks

Context

Suggestion

Background

Recommend Projects

Recommend Topics

Recommend Org