sidecar-v2

3 Commits 1 Branch 0 Tags

Author	SHA1	Message	Date
vincent	376ce97d91	feat: Primary-Wait backoff queuing — BIZ-60 When all primary backends are in cooldown, wait and retry the primary pool before falling through to fallback/emergency. This reduces unnecessary spend on paid fallback providers during temporary 429 storms. Config: - primary_wait_ms (default 5000, env SIDECAR_PRIMARY_WAIT_MS) - primary_wait_max_retries (default 6, env SIDECAR_PRIMARY_WAIT_MAX_RETRIES) Implementation: - config.py: 2 new config fields + env var loading - router.py: pick_primary_backend() — primary-pool-only selection - proxy.py: primary-wait loop between standard retries and emergency Expected win: 17% error rate during high concurrency drops, emergency passthrough count falls as requests wait for NVIDIA pool recovery instead of immediately routing to SiliconFlow fallback.	2026-06-25 22:22:02 +08:00
vincent	4bdf6ddf32	docs: add deployment guide, production config, and changelog	2026-06-25 21:22:48 +08:00
vincent	2d95ae50a5	feat: Sidecar V2 — multi-pool provider proxy with 429 cooldown - proxy.py: Fix route path duplication (v1/v1 → v1) when upstream base URL already includes /v1 prefix - proxy.py: Fix _emergency_count global variable for metrics tracking - server.py: Add logging.basicConfig(level=logging.INFO) for structlog INFO-level log visibility - Full multi-pool routing: primary → fallback → emergency passthrough - Per-backend rate limiting with RPM-based token bucket - 429 cooldown mechanism with automatic recovery - Dashboard with SSE real-time monitoring - Admin API for backend/pool/config management - SQLite-backed persistence with encrypted API key storage - Docker compose deployment Deployed by opengineer 严维序 as BIZ-50 Step 4	2026-06-25 21:20:32 +08:00

Author

SHA1

Message

Date

vincent

376ce97d91

feat: Primary-Wait backoff queuing — BIZ-60

When all primary backends are in cooldown, wait and retry the primary pool
before falling through to fallback/emergency. This reduces unnecessary
spend on paid fallback providers during temporary 429 storms.

Config:
- primary_wait_ms (default 5000, env SIDECAR_PRIMARY_WAIT_MS)
- primary_wait_max_retries (default 6, env SIDECAR_PRIMARY_WAIT_MAX_RETRIES)

Implementation:
- config.py: 2 new config fields + env var loading
- router.py: pick_primary_backend() — primary-pool-only selection
- proxy.py: primary-wait loop between standard retries and emergency

Expected win: 17% error rate during high concurrency drops, emergency
passthrough count falls as requests wait for NVIDIA pool recovery
instead of immediately routing to SiliconFlow fallback.

2026-06-25 22:22:02 +08:00

vincent

4bdf6ddf32

docs: add deployment guide, production config, and changelog

2026-06-25 21:22:48 +08:00

vincent

2d95ae50a5

feat: Sidecar V2 — multi-pool provider proxy with 429 cooldown

- proxy.py: Fix route path duplication (v1/v1 → v1) when upstream
  base URL already includes /v1 prefix
- proxy.py: Fix _emergency_count global variable for metrics tracking
- server.py: Add logging.basicConfig(level=logging.INFO) for structlog
  INFO-level log visibility
- Full multi-pool routing: primary → fallback → emergency passthrough
- Per-backend rate limiting with RPM-based token bucket
- 429 cooldown mechanism with automatic recovery
- Dashboard with SSE real-time monitoring
- Admin API for backend/pool/config management
- SQLite-backed persistence with encrypted API key storage
- Docker compose deployment

Deployed by opengineer 严维序 as BIZ-50 Step 4

2026-06-25 21:20:32 +08:00