3 Commits

Author SHA1 Message Date
vincent 8531a3b595 feat: dashboard UX optimization + real-time backend stats + health probe fix + pool shuffle
- dashboard.html: major UX overhaul (+657/-308 lines)
- server.py: /api/admin/backends now returns real-time RPM and model_count
- pool_manager.py: random.shuffle backends for load distribution
- config.py: health probe endpoint /v1/models → /models
- docker-compose.yml: add SIDECAR_PRIMARY_WAIT_MAX_RETRIES=6

BIZ-52 post-review optimizations
2026-07-03 16:32:42 +08:00
vincent 18dfb2901b fix: add Primary-Wait Prometheus counters + conservative defaults — BIZ-60 review
P0 changes per 4-reviewer consensus (严维序/陆怀瑾/沈路明/梁思筑):

1. Prometheus metrics counters (proxy.py + server.py):
   - sidecar_primary_wait_enter_total: requests entering Primary-Wait
   - sidecar_primary_wait_recovery_total: successful primary recoveries
   - sidecar_primary_wait_exhausted_total: wait exhausted → emergency

2. Conservative default (config.py):
   - primary_wait_max_retries: 6 → 3 (15s total wait, safe start)
   - Observe recovery rate before increasing to 6

Counters form complete funnel: enter - recovery = exhausted,
enabling Grafana monitoring and ROI validation per COO/PM/Ops.
2026-06-25 22:48:09 +08:00
vincent 2d95ae50a5 feat: Sidecar V2 — multi-pool provider proxy with 429 cooldown
- proxy.py: Fix route path duplication (v1/v1 → v1) when upstream
  base URL already includes /v1 prefix
- proxy.py: Fix _emergency_count global variable for metrics tracking
- server.py: Add logging.basicConfig(level=logging.INFO) for structlog
  INFO-level log visibility
- Full multi-pool routing: primary → fallback → emergency passthrough
- Per-backend rate limiting with RPM-based token bucket
- 429 cooldown mechanism with automatic recovery
- Dashboard with SSE real-time monitoring
- Admin API for backend/pool/config management
- SQLite-backed persistence with encrypted API key storage
- Docker compose deployment

Deployed by opengineer 严维序 as BIZ-50 Step 4
2026-06-25 21:20:32 +08:00