Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
-
Updated
Mar 14, 2026 - Python
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
AI observability platform for production LLM and agent systems.
The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.
A powerful AI observability framework that provides comprehensive insights into agent interactions across platforms, enabling developers to monitor, analyze, and optimize AI-driven applications with minimal integration effort.
A comprehensive solution for monitoring your AI models in production
Open-source observability for your LLM application.
AgentTrace is an open-source, local-first step debugger for AI agents. It provides a Python SDK for tracing your agent runs and a web UI to inspect spans, tool calls, prompts, and responses as an interactive tree.
A Python package for tracking and analyzing LLM usage across different models and applications. It is primarily designed as a library for integration into development process of LLM-based agentic workflow tooling, providing robust tracking capabilities.
A lightweight observability tool for visualizing and comparing RAG retrieval strategies. Features real-time embedding visualization and side-by-side performance metrics.
Privacy-safe observability for AI agents at runtime
OKI TRACE: Local LLM observability. See step-by-step, layer-by-layer what your AI thinks. Logit Lens & Attention for HuggingFace models.
Comprehensive LLMOps reference index: observability platforms, inference cost intelligence, failure mode taxonomy, stack compatibility matrices, and regulatory compliance mapping for LLMs in production.
RAG Eval Observability is a production-ready, open-source platform for building, evaluating, and monitoring Retrieval-Augmented Generation (RAG) systems. It pairs a ChatGPT-style UI with a robust backend for document ingestion, multiple retrieval strategies, offline evaluation, and real-time observability, along with backend CI/CD deployed on Azure
Reproducibility code for “Evaluating the Performance of Large Language Models in Taxonomic Classification of Questions in Verbal Protocols of Design” (AI EDAM submission; under review). [WIP]
AWS Bedrock Claude with Datadog LLM Observability agentless mode
Decision-level observability for LLM pipelines, making system behavior explainable even when no outputs exist.
Open-source AI agent evaluation workbench for benchmarking, tracing, optimization, and safety with human approval.
Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like AI Agent, LLM and tools tracing, debugging multi-agentic system, self-hosted dashboards and advanced analytics with timeline and execution graph view.
Add a description, image, and links to the llm-observability topic page so that developers can more easily learn about it.
To associate your repository with the llm-observability topic, visit your repo's landing page and select "manage topics."