feat: Primary-Wait backoff queuing — BIZ-60
When all primary backends are in cooldown, wait and retry the primary pool before falling through to fallback/emergency. This reduces unnecessary spend on paid fallback providers during temporary 429 storms. Config: - primary_wait_ms (default 5000, env SIDECAR_PRIMARY_WAIT_MS) - primary_wait_max_retries (default 6, env SIDECAR_PRIMARY_WAIT_MAX_RETRIES) Implementation: - config.py: 2 new config fields + env var loading - router.py: pick_primary_backend() — primary-pool-only selection - proxy.py: primary-wait loop between standard retries and emergency Expected win: 17% error rate during high concurrency drops, emergency passthrough count falls as requests wait for NVIDIA pool recovery instead of immediately routing to SiliconFlow fallback.
This commit is contained in:
@@ -73,6 +73,10 @@ class Config:
|
||||
# Stats
|
||||
stats_refresh_interval_seconds: float = 30.0
|
||||
|
||||
# Primary-Wait: when all primary backends are cooling, wait before fallback
|
||||
primary_wait_ms: int = 5000
|
||||
primary_wait_max_retries: int = 6
|
||||
|
||||
# Request timeout
|
||||
default_request_timeout_seconds: int = 120
|
||||
|
||||
@@ -114,6 +118,16 @@ class Config:
|
||||
# Logging
|
||||
c.log_level = os.getenv("LOG_LEVEL", c.log_level).upper()
|
||||
|
||||
# Primary-Wait
|
||||
c.primary_wait_ms = int(
|
||||
os.getenv("SIDECAR_PRIMARY_WAIT_MS", str(c.primary_wait_ms))
|
||||
)
|
||||
c.primary_wait_max_retries = int(
|
||||
os.getenv(
|
||||
"SIDECAR_PRIMARY_WAIT_MAX_RETRIES", str(c.primary_wait_max_retries)
|
||||
)
|
||||
)
|
||||
|
||||
# Database
|
||||
c.db_path = os.getenv(
|
||||
"SIDECAR_DB_PATH",
|
||||
|
||||
Reference in New Issue
Block a user