Topic: evaluation-framework Goto Github

Some thing interesting about evaluation-framework

👇 Here are 127 public repositories matching this topic...

act3-ace / corl

evaluation-framework,The Core Reinforcement Learning library is intended to enable scalable deep reinforcement learning experimentation in a manner extensible to new simulations and new ways for the learning agents to interact with them. The hope is that this makes RL research easier by removing lock-in to particular simulations.The work is released under the follow APRS approval. Initial release of CoRL - Part #1 -Approved on 2022-05-2024 12:08:51 - PA Approval # [AFRL-2022-2455]" Documentation https://act3-ace.github.io/CoRL/

Organization: act3-ace

Home Page: https://www.act3-ace.com/

evaluation-framework reinforcement-learning reinforcement-learning-algorithms reinforcement-learning-environments

ai21labs / lm-evaluation

evaluation-framework,Evaluation suite for large-scale language models.

Organization: ai21labs

language-model evaluation-framework

aiverify-foundation / moonshot

evaluation-framework,Moonshot - A simple and modular tool to evaluate and red-team any LLM application.

Organization: aiverify-foundation

Home Page: https://aiverify-foundation.github.io/moonshot/

benchmarking evaluation-framework llm red-teaming trustworthy-ai

ashafaei / od-test

evaluation-framework,OD-test: A Less Biased Evaluation of Out-of-Distribution (Outlier) Detectors (PyTorch)

User: ashafaei

deep-learning pytorch od-test outlier-detection evaluation-framework in-distribution anomaly-detection

athina-ai / athina-evals

evaluation-framework,Python SDK for running evaluations on LLM generated responses

Organization: athina-ai

Home Page: https://docs.athina.ai

evaluation evaluation-framework evaluation-metrics llm-eval llm-evaluation llm-evaluation-toolkit llm-ops llmops

bijington / expressive

evaluation-framework,Expressive is a cross-platform expression parsing and evaluation framework. The cross-platform nature is achieved through compiling for .NET Standard so it will run on practically any platform.

User: bijington

evaluation evaluation-framework parsing cross-platform expression-evaluator expression-parser netstandard xamarin hacktoberfest

bmw-innovationlab / sordi-ai-evaluation-gui

evaluation-framework,This repository allows you to evaluate a trained computer vision model and get general information and evaluation metrics with little configuration.

Organization: bmw-innovationlab

ai bmw computer-vision dataset deeplearning docker evaluation no-code python sordi

borda / birl

evaluation-framework,BIRL: Benchmark on Image Registration methods with Landmark validations

User: borda

Home Page: http://borda.github.io/BIRL

image-registration benchmark evaluation-framework medical-imaging pathology-image landmarks image-pair dataset registration-methods cima

chziakas / redeval

evaluation-framework,Red-teaming LLM applications.

User: chziakas

evaluation-framework llm llmops redteaming

codefuse-ai / codefuse-evaluation

evaluation-framework,Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中

Organization: codefuse-ai

code-evaluation evaluation-framework repository-eval lcc codecommenteval codetranseval codefuse

confident-ai / deepeval

evaluation-framework,The LLM Evaluation Framework

Organization: confident-ai

Home Page: https://docs.confident-ai.com/

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

cowjen01 / repsys

evaluation-framework,Framework for Interactive Evaluation of Recommender Systems

User: cowjen01

python recommender-systems analysis-framework javascript web-application machine-learning evaluation-framework evaluation-metrics

diningphil / pydgn

evaluation-framework,A research library for automating experiments on Deep Graph Networks

User: diningphil

Home Page: https://pydgn.readthedocs.io

deep-graph-networks deep-learning-for-graphs evaluation-framework

dynamic-rabbits / taint-evaluator

evaluation-framework,A suite of experiments for evaluating open-source binary taint trackers.

Organization: dynamic-rabbits

taint-analysis binary-analysis evaluation-framework benchmarks taint-tracking

eleutherai / lm-evaluation-harness

evaluation-framework,A framework for few-shot evaluation of language models.

Organization: eleutherai

Home Page: https://www.eleuther.ai

evaluation-framework language-model transformer

empirical-run / empirical

evaluation-framework,Test and evaluate LLMs and model configurations, across all the scenarios that matter for your application

Organization: empirical-run

Home Page: https://docs.empirical.run

evaluation-framework llm-inference llmops llm test-automation testing testing-framework

encord-team / text-to-image-eval

evaluation-framework,Evaluate custom and HuggingFace text-to-image/zero-shot-image-classification models like CLIP, SigLIP, DFN5B, and EVA-CLIP. Metrics include Zero-shot accuracy, Linear Probe, Image retrieval, and KNN accuracy.

Organization: encord-team

Home Page: https://encord.com

evaluation-framework evaluation-metrics model-evaluation-metrics text-to-image-evaluation embedding-evaluation embeddings-extraction knn-search linear-probing zero-shot-classification zero-shot-retrieval

gair-nlp / scaleeval

evaluation-framework,Scalable Meta-Evaluation of LLMs as Evaluators

Organization: gair-nlp

evaluation-framework generative-ai llm nlp

giskard-ai / giskard

evaluation-framework,🐢 Open-Source Evaluation & Testing for LLMs and ML models

Organization: giskard-ai

Home Page: https://docs.giskard.ai

mlops ml-validation ml-testing ai-testing ai-safety ml-safety llmops ethical-artificial-intelligence responsible-ai fairness-ai

haeyeoni / lidar_slam_evaluator

evaluation-framework,LiDAR SLAM comparison and evaluation framework

User: haeyeoni

lidar-slam slam evaluation-framework

hpclab / rankeval

evaluation-framework,Official repository of RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions.

Organization: hpclab

Home Page: http://rankeval.isti.cnr.it/

learning-to-rank evaluation-framework evaluation-metrics analysis-framework ensemble-models regression-trees

huggingface / lighteval

evaluation-framework,LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

Organization: huggingface

evaluation evaluation-framework evaluation-metrics huggingface

jinzhuoran / rwku

evaluation-framework,RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models

User: jinzhuoran

Home Page: https://rwku-bench.github.io

adversarial-attacks large-language-models natural-language-processing unlearning benchmark evaluation-framework forgetting membership-inference-attack privacy-protection right-to-be-forgotten

kaiko-ai / eva

evaluation-framework,Evaluation framework for oncology foundation models (FMs)

Organization: kaiko-ai

Home Page: https://kaiko-ai.github.io/eva/

oncology machine-learning evaluation-framework foundation-models

kolenaio / kolena

evaluation-framework,Python client for Kolena's machine learning testing platform

Organization: kolenaio

Home Page: https://docs.kolena.io

machine-learning testing evaluate-models evaluation evaluation-framework evaluation-metrics llmops mlops

lartpang / pysodevaltoolkit

evaluation-framework,PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection

User: lartpang

python3 metrics metrics-visualization pr-curve fm-curve s-measure f-measure e-measure mae saliency

lukaswals / unified_tracking_benchmark

evaluation-framework,An easy-to-use tool for evaluating tracking algorithms on many different benchmarks like OTB and Temple-Color

User: lukaswals

matlab tracking tracking-algorithm benchmark dataset evaluation-framework

mauriziofd / recsys2019_deeplearning_evaluation

evaluation-framework,This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.

User: mauriziofd

recommender-system recommendation-system recommendation-algorithms deep-learning evaluation-framework neural-network collaborative-filtering content-based-recommendation hybrid-recommender-system reproducibility

morenolaquatra / arch

evaluation-framework,ARCH: Audio Representations benCHmark

User: morenolaquatra

audio-processing audio-representation-learning deep-learning-models evaluation-framework self-supervised-learning

nlp-uoregon / mlmm-evaluation

evaluation-framework,Multilingual Large Language Models Evaluation Benchmark

User: nlp-uoregon

datasets evaluation evaluation-datasets evaluation-framework language-model large-language-models multilingual natural-language-processing nlp

noaa-owp / gval

evaluation-framework,A high-level Python framework to evaluate the skill of geospatial datasets by comparing candidates to benchmark maps producing agreement maps and metrics.

Organization: noaa-owp

Home Page: https://noaa-owp.github.io/gval/

earth-science environment evaluation-framework flood-inundation gdal geospatial hydrology python research science

nouhadziri / dialogentailment

evaluation-framework,The implementation of the paper "Evaluating Coherence in Dialogue Systems using Entailment"

User: nouhadziri

Home Page: https://arxiv.org/abs/1904.03371

evaluation-framework natural-language-inference dialogue-evaluation bert

optml-group / diffusion-mu-attack

evaluation-framework,The official implementation of ECCV'24 paper "To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now". This work introduces one fast and effective attack method to evaluate the harmful-content generation ability of safety-driven unlearned diffusion models.

Organization: optml-group

attack-unlearned-diffusion-model stable-diffusion unlearning adversarial-attacks evaluation-framework robustness

pentoai / vectory

evaluation-framework,Vectory provides a collection of tools to track and compare embedding versions.

Organization: pentoai

deep-learning deep-neural-networks embedding-python embedding-vectors embeddings-similarity evaluation-framework

powerflows / powerflows-dmn

evaluation-framework,Power Flows DMN - Powerful decisions and rules engine

Organization: powerflows

decision-engine dmn-engine decision-tables dmn-model feel mvel javascript java kotlin-dsl kotlin rules-engine rule-engine rules dmn evaluation groovy yaml xml evaluation-framework

promptfoo / promptfoo

evaluation-framework,Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

Organization: promptfoo

Home Page: https://www.promptfoo.dev/

llm prompt-engineering prompts llmops prompt-testing testing rag evaluation evaluation-framework llm-eval

psycoy / mixeval

evaluation-framework,The official evaluation suite and dynamic data release for MixEval.

User: psycoy

Home Page: https://mixeval.github.io/

benchmark benchmark-mixture benchmarking-framework benchmarking-suite evaluation evaluation-framework foundation-models large-language-model large-language-models large-multimodal-models

pyrddlgym-project / pyrddlgym

evaluation-framework,A toolkit for auto-generation of OpenAI Gym environments from RDDL description files.

Organization: pyrddlgym-project

Home Page: https://pyrddlgym.readthedocs.io/

gym-environments model-based rddl reinforcement-learning gym gymnasium planning simulator simulation rddl-domains benchmark-framework benchmarking benchmarking-framework evaluation-framework visualisation visualization visualizer planning-domain-definition-language planner planners

relari-ai / continuous-eval

evaluation-framework,Data-Driven Evaluation for LLM-Powered Applications

Organization: relari-ai

Home Page: https://continuous-eval.docs.relari.ai/

evaluation-framework evaluation-metrics information-retrieval llm-evaluation llmops rag retrieval-augmented-generation

sayednadim / image-quality-evaluation-metrics

evaluation-framework,Implementation of Common Image Evaluation Metrics by Sayed Nadim (sayednadim.github.io). The repo is built based on full reference image quality metrics such as L1, L2, PSNR, SSIM, LPIPS. and feature-level quality metrics such as FID, IS. It can be used for evaluating image denoising, colorization, inpainting, deraining, dehazing etc. where we have access to ground truth.

User: sayednadim

evaluation-framework full-reference-iqa image-processing

sb-ai-lab / sim4rec

evaluation-framework,Simulator for training and evaluation of Recommender Systems

Organization: sb-ai-lab

Home Page: https://sb-ai-lab.github.io/Sim4Rec/

evaluation-framework recommendation recommender-system rl-training synthetic-data user-modeling

spikeinterface / spiketoolkit

evaluation-framework,Python-based tools for pre-, post-processing, validating, and curating spike sorting datasets.

Organization: spikeinterface

Home Page: https://spikeinterface.readthedocs.io/en/latest/

electrophysiology spike-sorting evaluation-framework neuroscience

srcclr / efda

evaluation-framework,Evaluation Framework for Dependency Analysis (EFDA)

Organization: srcclr

dependency-analysis evaluation-framework languages

symflower / eval-dev-quality

evaluation-framework,DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.

Organization: symflower

Home Page: https://symflower.com/en/company/blog/2024/dev-quality-eval-v0.4.0-is-llama-3-better-than-gpt-4-for-generating-tests/

evaluation evaluation-framework llms software-quality software-development

tohtsky / irspack

evaluation-framework,Train, evaluate, and optimize implicit feedback-based recommender systems.

User: tohtsky

Home Page: https://irspack.readthedocs.io/

recommender-systems knn-algorithm matrix-factorization evaluation-framework hyperparameter-optimization eigen pybind11 optuna

tonicai / tonic_validate

evaluation-framework,Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

Organization: tonicai

Home Page: https://docs.tonic.ai/validate/

evaluation-metrics large-language-models llm llmops llms rag retrieval-augmented-generation evaluation-framework

tsenst / crowdflow

evaluation-framework,Optical Flow Dataset and Benchmark for Visual Crowd Analysis

User: tsenst

computer-vision optical-flow dataset benchmark-suite crowd-analysis motion-estimation synthetic-images video-processing video-analytics video-surveillance

vinid / quica

evaluation-framework,quica is a tool to run inter coder agreement pipelines in an easy and effective ways. Multiple measures are run and results are collected in a single table than can be easily exported in Latex

User: vinid

inter-rater-agreement inter-coder-agreement python evaluation-metrics evaluation-framework

yupidevs / pactus

evaluation-framework,Framework to evaluate Trajectory Classification Algorithms

Organization: yupidevs

classification classification-models evaluation-framework python trajectory trajectory-analysis transformers

zeno-ml / zeno

evaluation-framework,AI Data Management & Evaluation Platform

Organization: zeno-ml

Home Page: https://zenoml.com

data-science machine-learning python ai evaluation evaluation-framework