Projects with this topic
Sort by:
-
Robot Framework test harness for LLM evaluation — deterministic grading, containerized execution, multi-model comparison, safety testing, Dash control panel for multi-session test runs, SQLite test history, and CI/CD-native.
Updated