BIZ-46 Phase3: 7项 follow-up 开发完成

1. 架构解耦 — SidecarContext + FastAPI Depends 注入
   - 新增 context.py: SidecarContext dataclass 收敛全部全局状态
   - server.py: 移除模块级全局变量,lifespan 创建 ctx → app.state.sidecar
   - webui.py: 移除反向导入 server,改用 Depends(get_context)

2. Prometheus 标签基数治理 — model_id → provider
   - upstream_latency_seconds / upstream_errors_total label 收敛为 provider
   - 模型级信息保留在 structlog JSON 日志

3. SSE 快照共享缓存
   - 1s TTL 共享 snapshot cache + double-check locking
   - 多客户端不重复构建快照

4. 部署支撑
   - Dockerfile (python:3.12-slim, 非 root 用户, HEALTHCHECK)
   - systemd service (安全加固, 资源限制)
   - .env.example (完整环境变量清单)

5. Readiness HTTP Client 复用
   - check_upstream() 注入主 http_client,不再每次创建新 client

6. Retreat 并发回归测试
   - 5 个测试用例全部通过(死锁检测 + 状态转换 + 并发安全)

7. Dashboard UX 优化
   - 队列柱状图 300ms 平滑动画
   - SSE 断连 5s 半透明遮罩
   - 队列图标题显示总排队数
   - 页面加载同步配置

验证: mypy strict 通过 (0 errors), pytest 5/5 通过, server 导入正常 (13 routes)

Co-authored-by: multica-agent <github@multica.ai>
This commit is contained in:
2026-06-24 22:26:35 +08:00
parent 8a12ff9693
commit b18d243ef2
12 changed files with 928 additions and 312 deletions
+75
View File
@@ -0,0 +1,75 @@
"""
NVIDIA Sidecar — SidecarContext 依赖注入容器 (§BIZ-46 Phase3)
将所有模块级全局状态收敛为单一 dataclass,通过 FastAPI app.state 注入,
消除 webui.py → server 的反向导入,支持可测试性和多实例扩展。
设计文档: docs/architecture/BIZ-46_Phase3_Architecture_Design.md §1
"""
from __future__ import annotations
import asyncio
import time
from dataclasses import dataclass, field
from typing import TYPE_CHECKING, Any
import httpx
if TYPE_CHECKING:
from nvidia_sidecar.config import SidecarConfig
from nvidia_sidecar.rate_limiter import AdaptiveTokenBucket
from nvidia_sidecar.priority_queue import PriorityRequestQueue
from nvidia_sidecar.metrics import PrometheusMetrics
from nvidia_sidecar.health import HealthService
@dataclass
class SidecarContext:
"""Sidecar 全局运行时上下文 — 所有核心组件的唯一容器。
通过 ``app.state.sidecar`` 注入 FastAPI,路由通过 ``Depends(get_context)`` 获取。
"""
# ---- 核心组件 ----
config: SidecarConfig
http_client: httpx.AsyncClient
token_bucket: AdaptiveTokenBucket
priority_queue: PriorityRequestQueue
prometheus: PrometheusMetrics
health: HealthService
# ---- 运行时状态 ----
pending_requests: dict[str, tuple["asyncio.Future[Any]", float]] = field(default_factory=dict)
"""request_id → (response future, enqueued_at) 的映射。"""
stats: dict[str, int] = field(default_factory=lambda: {
"total_requests": 0,
"nvidia_requests": 0,
"passthrough_requests": 0,
"ratelimited_requests": 0,
"queue_full_rejects": 0,
"upstream_errors": 0,
"start_time": 0,
})
stats_lock: asyncio.Lock = field(default_factory=asyncio.Lock)
# ---- 缓存 ----
snapshot_cache: tuple["dict[str, Any]", float] | None = None
"""SSE 快照共享缓存: (data, timestamp)。"""
snapshot_cache_lock: asyncio.Lock = field(default_factory=asyncio.Lock)
SNAPSHOT_CACHE_TTL: float = 1.0
# ---- 便捷方法 ----
async def increment_stat(self, key: str, delta: int = 1) -> None:
"""线程安全的统计计数器自增。"""
async with self.stats_lock:
self.stats[key] = self.stats.get(key, 0) + delta
@property
def uptime_seconds(self) -> int:
"""服务运行时长(秒)。"""
st = self.stats.get("start_time", 0)
return int(time.time() - st) if st else 0