rlancemartin / auto-evaluator Goto Github PK

View Code? Open in Web Editor NEW

1.0K 1.0K 93.0 44.71 MB

Evaluation tool for LLM QA chains

Home Page: https://autoevaluator.langchain.com/

Python 100.00%

auto-evaluator's Issues

add support for open source models

I would really love to see how the open source embedding instruct models compare to the top from openai:

Any chance you could add support?

Improve speed of SVM retriever index creation

Currently SVM retriever has a create_index def that will embed a list of chunks:

def create_index(contexts: List[str], embeddings: Embeddings) -> np.ndarray:
    return np.array([embeddings.embed_query(split) for split in contexts])

See code:
https://github.com/hwchase17/langchain/blob/f1d15b4a75238eb56e386608b7f631a853a803e0/langchain/retrievers/svm.py#L17

This is quite slow (> 10 min on ~2k splits).

Tried multiprocessing, but appears to hang w/:

Process SpawnPoolWorker-74:
Traceback (most recent call last):
  File "/Users/31treehaus/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/31treehaus/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/31treehaus/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/Users/31treehaus/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'embed_query' on <module '__main__' (built-in)>

Tried a dataframe apply to parallelize, but no latency improvement.

Need to carve out more time to look into this.

Is there a limit on the size of the PDF? A PDF with a size of 200kb reported an error

Traceback (most recent call last):
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "/Users/helen/code/xiaomi/auto-evaluator/auto-evaluator.py", line 430, in
graded_answers, graded_retrieval, latency, predictions = run_evaluation(qa_chain, retriever, eval_set, grade_prompt,
File "/Users/helen/code/xiaomi/auto-evaluator/auto-evaluator.py", line 342, in run_evaluation
retrieval_grade = grade_model_retrieval(gt_dataset, retrieved_docs, grade_prompt)
File "/Users/helen/code/xiaomi/auto-evaluator/auto-evaluator.py", line 277, in grade_model_retrieval
graded_outputs = eval_chain.evaluate(
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/evaluation/qa/eval_chain.py", line 60, in evaluate
return self.apply(inputs)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chains/llm.py", line 118, in apply
response = self.generate(input_list)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chains/llm.py", line 62, in generate
return self.llm.generate_prompt(prompts, stop)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/base.py", line 82, in generate_prompt
raise e
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/base.py", line 79, in generate_prompt
output = self.generate(prompt_messages, stop=stop)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/base.py", line 54, in generate
results = [self._generate(m, stop=stop) for m in messages]
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/base.py", line 54, in
results = [self._generate(m, stop=stop) for m in messages]
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/openai.py", line 266, in _generate
response = self.completion_with_retry(messages=message_dicts, **params)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/openai.py", line 228, in completion_with_retry
return _completion_with_retry(**kwargs)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/tenacity/init.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/tenacity/init.py", line 379, in call
do = self.iter(retry_state=retry_state)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/tenacity/init.py", line 314, in iter
return fut.result()
File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/tenacity/init.py", line 382, in call
result = fn(*args, **kwargs)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/openai.py", line 226, in _completion_with_retry
return self.client.create(**kwargs)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/openai/api_requestor.py", line 226, in request
resp, got_stream = self._interpret_response(result, stream)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/openai/api_requestor.py", line 619, in _interpret_response
self._interpret_response_line(
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4280 tokens. Please reduce the length of the messages.

Add WandB integration

I've passed in the openai_api_key but still getting this validation error

ValidationError: 1 validation error for ChatOpenAI root Did not find openai_api_key, please add an environment variable OPENAI_API_KEY which contains it, or pass openai_api_key as a named parameter. (type=value_error)

Traceback:
File "/Users/yujiedou/Desktop/auto-evaluator-main/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "/Users/yujiedou/Desktop/auto-evaluator-main/auto-evaluator.py", line 302, in
eval_set = generate_eval(text, num_eval_questions, 3000)
File "/Users/yujiedou/Desktop/auto-evaluator-main/venv/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 194, in wrapper
return cached_func(*args, **kwargs)
File "/Users/yujiedou/Desktop/auto-evaluator-main/venv/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 223, in call
return self._get_or_create_cached_value(args, kwargs)
File "/Users/yujiedou/Desktop/auto-evaluator-main/venv/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 248, in _get_or_create_cached_value
return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
File "/Users/yujiedou/Desktop/auto-evaluator-main/venv/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 302, in _handle_cache_miss
computed_value = self._info.func(*func_args, **func_kwargs)
File "/Users/yujiedou/Desktop/auto-evaluator-main/auto-evaluator.py", line 81, in generate_eval
chain = QAGenerationChain.from_llm(ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY))
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.init

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

rlancemartin / auto-evaluator Goto Github PK

auto-evaluator's Issues

add support for open source models

Improve speed of SVM retriever index creation

Is there a limit on the size of the PDF? A PDF with a size of 200kb reported an error

Add WandB integration

I've passed in the openai_api_key but still getting this validation error

Would you consider making this a hugging face space?

Create HuggingFace space

Add llama-index support

Expose auto-evaluator as an API?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent