langchain-ai / langchain-benchmarks Goto Github PK

View Code? Open in Web Editor NEW

189.0 7.0 39.0 15.9 MB

🦜💯 Flex those feathers!

Home Page: https://langchain-ai.github.io/langchain-benchmarks/

License: MIT License

Python 99.20% Makefile 0.80%

benchmark-framework benchmarking langchain langchain-python llm llms

langchain-benchmarks's People

Contributors

Stargazers

Watchers

langchain-benchmarks's Issues

[Feature req] Agents: comparing fine-tuning techniques

Common question: I'm fine-tuning for an agent. What split of data should I prioritize collecting, and in what mixture?

Fewer long trajectories?
More short trajectories / single-step function calls?
If there is conversation in the mix, how much should I include? And then do i include the full trajectory flattened out? or remove for later calls?

ConnectionError

Hi, I'm trying to run custom_agent.py on my computer, when it comes to this line of code:
chain_results = run_on_dataset( client, dataset_name="Titanic CSV Data", llm_or_chain_factory=get_chain, evaluation=eval_config, )
it generates an error message:
ConnectionError: HTTPConnectionPool(host='localhost', port=1984): Max retries exceeded with url: /sessions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001D88859A3A0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

I'm running under Windows with Python 3.9.7.
Has anyone seen this error before? Thanks!

Does LangChain-Benchmarks only support models from the model registry?

I'd like to run some benchmarks against models from Hugging Face. The tutorials seem tailored for models from the registry or OpenAI.

Before I go down the rabbit hole and try to use it myself, I thought I'd see if it was possible or if anyone has done this before and has examples I can look at.

Thanks

Error publishing feedback to LangSmith from Streamlit cloud

I've set LANGCHAIN_PROJECT and LANGCHAIN_API_KEY.

Feedback works locally.

App -
https://github.com/langchain-ai/langchain-benchmarks/tree/main/extraction

On streamlit cloud, I see this error -

Traceback (most recent call last):

  File "/home/adminuser/venv/lib/python3.9/site-packages/langsmith/utils.py", line 55, in raise_for_status_with_text

    response.raise_for_status()

  File "/home/adminuser/venv/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status

    raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.smith.langchain.com/feedback


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 548, in _run_script

    self._session_state.on_script_will_rerun(rerun_data.widget_states)

  File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/state/safe_session_state.py", line 68, in on_script_will_rerun

    self._state.on_script_will_rerun(latest_widget_states)

  File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/state/session_state.py", line 484, in on_script_will_rerun

    self._call_callbacks()

  File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/state/session_state.py", line 497, in _call_callbacks

    self._new_widget_state.call_callback(wid)

  File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/state/session_state.py", line 249, in call_callback

    callback(*args, **kwargs)

  File "/mount/src/langchain-benchmarks/extraction/streamlit_app.py", line 9, in send_feedback

    client.create_feedback(run_id, "user_score", score=score)

  File "/home/adminuser/venv/lib/python3.9/site-packages/langsmith/client.py", line 1588, in create_feedback

    raise_for_status_with_text(response)

  File "/home/adminuser/venv/lib/python3.9/site-packages/langsmith/utils.py", line 57, in raise_for_status_with_text

    raise ValueError(response.text) from e

ValueError: {"detail":"Resource not found"}

RAG over code benchmark

Test some other models

Whenever we're ready with tool calling

  # ("fireworks-firefunction-v1", ChatFireworks(model="accounts/fireworks/models/firefunction-v1", temperature=0)),
  # ("cohere-command-light", ChatCohere(temperature=0, model="command-light")),
  # ("cohere-command", ChatCohere(temperature=0, model="command")),
  # ("cohere-command-r", ChatCohere(temperature=0, model="command-r")),
  # ("cohere-command-r-plus", ChatCohere(temperature=0, model="command-r-plus")),
  # ("mistral-large-2402", ChatMistralAI(model="mistral-large-2402", temperature=0)),

Code does not give plot related response and throws error at frontend

my code :

import pandas as pd
import streamlit as st
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.agents.agent_types import AgentType
import matplotlib.pyplot as plt

df = pd.read_csv('/Users/siddheshphapale/Desktop/project/sqlcsv.csv')

llm = ChatOpenAI(openai_api_key= "s5" , temperature=0 ,max_tokens= 500 , verbose= False)
agent = create_pandas_dataframe_agent(llm, df, agent_type=AgentType.OPENAI_FUNCTIONS)

from langsmith import Client
client = Client()
def send_feedback(run_id, score):
client.create_feedback(run_id, "user_score", score=score)

st.set_page_config(page_title='🦜🔗
st.title('📊🔗')
st.info("")

query_text = st.text_input('Enter your question:', placeholder = 'region wise total net amt')

Form input and query

result = None
with st.form('myform', clear_on_submit=True):
submitted = st.form_submit_button('Submit')
if submitted:
with st.spinner('Calculating...'):
response = agent({"input": query_text}, include_run_info=True)
result = response["output"]
run_id = response["__run"].run_id
if result is not None:
st.info(result)
col_blank, col_text, col1, col2 = st.columns([10, 2,1,1])
with col_text:
st.text("Feedback:")
with col1:
st.button("👍", on_click=send_feedback, args=(run_id, 1))
with col2:
st.button("👎", on_click=send_feedback, args=(run_id, 0))

Is it possible to change the model evaluator?

I see here that code is using the GPT 4 model for the evaluation, since it its the most expensive model out there to run, is it possible to change the evaluator model for another?

langchain-ai / langchain-benchmarks Goto Github PK

langchain-benchmarks's People

Contributors

Stargazers

Watchers

Forkers

langchain-benchmarks's Issues

[Feature req] Agents: comparing fine-tuning techniques

ConnectionError

Does LangChain-Benchmarks only support models from the model registry?

Error publishing feedback to LangSmith from Streamlit cloud

RAG over code benchmark

Test some other models

Code does not give plot related response and throws error at frontend

Form input and query

Is it possible to change the model evaluator?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent