System status: online● ONLINESYSTEM_ID: MICKEL_TECH_V2.0
    XLSX
    Eval Details

    XLSX backend + agent tools

    Full-stack agent integration: Python service, TypeScript tools, prompt engineering, formula recalc.

    Methodology

    This evaluation tests cross-stack agent integration. The model must implement Python FastAPI routes for Excel manipulation, wire Convex TypeScript tools, and write agent prompts. Formula recalculation via LibreOffice is specified but often stubbed.

    Spec: Python FastAPI routes + Convex tool wiring + agent prompts + tests/evals for Excel manipulation.

    RESULTS BY MODEL

    GPT-5.2-codex medium

    Codex CLI

    83

    GPT-5.2 xhigh

    Codex CLI

    90

    GPT-5.2 medium

    Codex CLI

    76

    Opus 4.5 thinking

    Claude Code

    70

    GPT-5.1-codex-max medium

    Codex CLI

    70

    Gemini 3 Pro

    Gemini CLI

    56

    KEY TAKEAWAYS

    • GPT-5.2 xhigh dominates (90); medium (76), Claude/5.1 tied (70); Gemini struggles (56).
    • Formula recalculation common gap—models stub it rather than integrating LibreOffice.
    • Gemini regressed existing tools (dropped refetchParagraphs) showing change hygiene risks.
    Note

    Cross-stack complexity

    This eval touches Python (FastAPI, openpyxl, pandas), TypeScript (Convex actions, agent tools), and prompt engineering. Even with detailed instructions on Excel manipulation, none of the models fully integrated formula recalculation via LibreOffice as specified. The pattern: models implement the happy path but skip the hard system integration.