skill-creator eval viewer: Benchmark tab empty due to configurations[] format not normalized

## Bug Report: Benchmark tab renders empty in eval viewer

### Summary

The Benchmark tab in the skill-creator eval viewer renders empty when `benchmark.json` uses the `configurations[]` array format (which is the format Claude agents commonly produce). The viewer's `viewer.html` expects a different nested format with `run_summary` and `runs[]`.

### Root Cause

`generate_review.py` embeds `benchmark.json` as-is without normalizing the schema. `viewer.html` (line ~1122+) expects:
- `data.run_summary` — a dict keyed by configuration name, each value containing `{mean, stddev}` stat objects
- `data.runs[]` — array of objects with `configuration` string and nested `result.pass_rate`

But Claude agents producing `benchmark.json` commonly output:
```json
{
  "configurations": [
    {
      "name": "with_skill (v2)",
      "pass_rate": {"mean": 1.0, "stddev": 0.0},
      "tokens": {"mean": 27973, "stddev": 4157},
      "time_seconds": {"mean": 56.5, "stddev": 12.2}
    }
  ],
  "delta_v2_vs_baseline": {
    "pass_rate": "+11.1% (100% vs 89%)",
    "tokens": "+68%"
  },
  "analyst_observations": ["v2 skill maintains 100% pass rate..."]
}
```

### Fix

Add a `normalize_benchmark()` function in `generate_review.py` that converts between formats:

1. `configurations[]` array → `run_summary` dict (keyed by `name`)
2. Auto-generate `runs[]` from `configurations` if missing
3. `config` → `configuration` field rename in existing runs
4. Flat stat values → `{mean, stddev}` objects
5. `analyst_observations` accepted as alias for `notes`

### Reproduction

1. Run skill-creator eval workflow
2. Aggregate produces `benchmark.json` with `configurations[]` format
3. Run `generate_review.py --benchmark benchmark.json`
4. Benchmark tab is empty — no data rendered

### Related

This is the companion issue to #604 (Formal Grades display broken). Both are schema mismatch bugs between `generate_review.py` output and `viewer.html` expectations.

### Environment

- skill-creator plugin (commit `bd041495bd2a`)
- Python 3.12, macOS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skill-creator eval viewer: Benchmark tab empty due to configurations[] format not normalized #605

Bug Report: Benchmark tab renders empty in eval viewer

Summary

Root Cause

Fix

Reproduction

Related

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

skill-creator eval viewer: Benchmark tab empty due to configurations[] format not normalized #605

Description

Bug Report: Benchmark tab renders empty in eval viewer

Summary

Root Cause

Fix

Reproduction

Related

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions