rlancemartin / auto-evaluator Goto Github PK

View Code? Open in Web Editor NEW

1.0K 1.0K 93.0 44.71 MB

Evaluation tool for LLM QA chains

Home Page: https://autoevaluator.langchain.com/

Python 100.00%

auto-evaluator's People

Contributors

Stargazers

Watchers

Forkers

januff webgrip kristinaquinones shpetimhaxhiu petercao ginko-ai existeundelta ai-jie01 prem2012 francyjglisboa bamcvoelker ai-ld convictional gaoxiaojun alexk1919 shivamsinha15 personx000 air23zj zyuipopropen sagisfalivava hyojunguy eltociear anjaligali1234 pravinsutar1612 ansariomair baylorbeargp xm59507 fuxi787 anbusport qxmao rmntlns ukaserge lijameshao hhy5277 techthiyanes hanachan1026 echallenge mziru karinbrisker panditamey1 kldarek miken666 pdesainteagathe answeraidev jxzhangjhu gladiopeace tyler-cranmer thrivewithai jaytoday coffeekumazaki nickgannon10 yanndd1 sallychang scheffershen willgendron thien0291 groklinux kun432 aiwinapp yoooyle baoarmy leecig ck-llm-study jinhyeong-lim psychologyphd itsharex charlest100 a616101 samithaj moresearch petrsovadina piehtvh jawhnycooke alam-bilal hbcbh1999 shashi792 j2deen thanhpham1987 zodram sorokinvld stophobia mvandermeulen sohaib0399 henrynn rpatil524 brunoscaglione xinhen ngmars dolica ai-bassem viajeraune

auto-evaluator's Issues

Expose auto-evaluator as an API?

Use langchain-serve to expose auto-evaluator as an API to enable integration with external services.

Improve speed of SVM retriever index creation

Currently SVM retriever has a create_index def that will embed a list of chunks:

def create_index(contexts: List[str], embeddings: Embeddings) -> np.ndarray:
    return np.array([embeddings.embed_query(split) for split in contexts])

See code:
https://github.com/hwchase17/langchain/blob/f1d15b4a75238eb56e386608b7f631a853a803e0/langchain/retrievers/svm.py#L17

This is quite slow (> 10 min on ~2k splits).

Tried multiprocessing, but appears to hang w/:

Process SpawnPoolWorker-74:
Traceback (most recent call last):
  File "/Users/31treehaus/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/31treehaus/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/31treehaus/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/Users/31treehaus/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'embed_query' on <module '__main__' (built-in)>

Tried a dataframe apply to parallelize, but no latency improvement.

Need to carve out more time to look into this.

Would you consider making this a hugging face space?

https://huggingface.co/spaces

This way its easy to use for anyone. Happy to help if you are interested and have any questions!

Add llama-index support

Add llama-index evaluation:
https://gpt-index.readthedocs.io/en/latest/how_to/evaluation/evaluation.html

Is there a limit on the size of the PDF? A PDF with a size of 200kb reported an error

Traceback (most recent call last):
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "/Users/helen/code/xiaomi/auto-evaluator/auto-evaluator.py", line 430, in
graded_answers, graded_retrieval, latency, predictions = run_evaluation(qa_chain, retriever, eval_set, grade_prompt,
File "/Users/helen/code/xiaomi/auto-evaluator/auto-evaluator.py", line 342, in run_evaluation
retrieval_grade = grade_model_retrieval(gt_dataset, retrieved_docs, grade_prompt)
File "/Users/helen/code/xiaomi/auto-evaluator/auto-evaluator.py", line 277, in grade_model_retrieval
graded_outputs = eval_chain.evaluate(
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/evaluation/qa/eval_chain.py", line 60, in evaluate
return self.apply(inputs)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chains/llm.py", line 118, in apply
response = self.generate(input_list)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chains/llm.py", line 62, in generate
return self.llm.generate_prompt(prompts, stop)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/base.py", line 82, in generate_prompt
raise e
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/base.py", line 79, in generate_prompt
output = self.generate(prompt_messages, stop=stop)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/base.py", line 54, in generate
results = [self._generate(m, stop=stop) for m in messages]
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/base.py", line 54, in
results = [self._generate(m, stop=stop) for m in messages]
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/openai.py", line 266, in _generate
response = self.completion_with_retry(messages=message_dicts, **params)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/openai.py", line 228, in completion_with_retry
return _completion_with_retry(**kwargs)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/tenacity/init.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/tenacity/init.py", line 379, in call
do = self.iter(retry_state=retry_state)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/tenacity/init.py", line 314, in iter
return fut.result()
File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/tenacity/init.py", line 382, in call
result = fn(*args, **kwargs)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/langchain/chat_models/openai.py", line 226, in _completion_with_retry
return self.client.create(**kwargs)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/openai/api_requestor.py", line 226, in request
resp, got_stream = self._interpret_response(result, stream)
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/openai/api_requestor.py", line 619, in _interpret_response
self._interpret_response_line(
File "/Users/helen/code/xiaomi/auto-evaluator/venv/lib/python3.9/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4280 tokens. Please reduce the length of the messages.

Add WandB integration

add support for open source models

I would really love to see how the open source embedding instruct models compare to the top from openai:

Any chance you could add support?

I've passed in the openai_api_key but still getting this validation error

ValidationError: 1 validation error for ChatOpenAI root Did not find openai_api_key, please add an environment variable OPENAI_API_KEY which contains it, or pass openai_api_key as a named parameter. (type=value_error)

Traceback:
File "/Users/yujiedou/Desktop/auto-evaluator-main/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "/Users/yujiedou/Desktop/auto-evaluator-main/auto-evaluator.py", line 302, in
eval_set = generate_eval(text, num_eval_questions, 3000)
File "/Users/yujiedou/Desktop/auto-evaluator-main/venv/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 194, in wrapper
return cached_func(*args, **kwargs)
File "/Users/yujiedou/Desktop/auto-evaluator-main/venv/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 223, in call
return self._get_or_create_cached_value(args, kwargs)
File "/Users/yujiedou/Desktop/auto-evaluator-main/venv/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 248, in _get_or_create_cached_value
return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
File "/Users/yujiedou/Desktop/auto-evaluator-main/venv/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 302, in _handle_cache_miss
computed_value = self._info.func(*func_args, **func_kwargs)
File "/Users/yujiedou/Desktop/auto-evaluator-main/auto-evaluator.py", line 81, in generate_eval
chain = QAGenerationChain.from_llm(ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY))
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.init

rlancemartin / auto-evaluator Goto Github PK

auto-evaluator's People

Contributors

Stargazers

Watchers

Forkers

auto-evaluator's Issues

Create HuggingFace space

Expose auto-evaluator as an API?

Improve speed of SVM retriever index creation

Would you consider making this a hugging face space?

Add llama-index support

Is there a limit on the size of the PDF? A PDF with a size of 200kb reported an error

Add WandB integration

add support for open source models

I've passed in the openai_api_key but still getting this validation error

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent