Setting the press● loading

00 / Masthead

Mickel.tech

PlateInk settling

Binningen · CHDE / EN

Loading page content.

Tag: evals | Mickel Tech

Mickel.techIndependent practice

Agentic PDLC
Independent Expert
AI Transformation

Case studies
Writing
Apps
Contact

Tag

Field notes/#evals

#evals

2 entries on this thread.

01
[2026.05.15]
Harness Engineering
Agent Engineering
evals
AI Agents
Evaluating Agents When the Demo Always Passes
Checking an agent's final answer tells you almost nothing. You have to score the whole trajectory: tool choice, arguments, step count, cost, policy. How to build evals that catch real failures.
Read entry
02
[2025.12.08]
Agent Engineering
AI Agents
ai
benchmarks
evals
claude
gemini
codex
Announcing gmickel-bench: Real-World Evals
Most AI benchmarks tell you if a model can solve a LeetCode puzzle. They don't tell you if it can ship a product.
Read entry

Independent practice of Gordon Mickel. Operating Principal, AI & Technology, at Growth Factors. A small number of select mandates each year.

Binningen, Switzerland · DE / EN

Practice

Agentic PDLC
Independent expert
AI transformation
Case studies

Signals

Writing
Apps
Bench

Elsewhere

GitHub
X / Twitter
LinkedIn
ITDR profile

Recognitions

ITDR-listed Technical Expert (Switzerland)
OpenAI Red Team Network alumnus
SECA 2026 invited speaker
openEHR.ch Symposium speaker
Active across 10+ DACH BU portfolio companies
Founder, DocIQ (Swiss legal AI since 2017)
Founder, MergeFoundry, Inc.
Author, FlowNext (open source)

© 2026 Gordon Mickel · All rights reserved.

Imprint
Privacy
Contact