BIZ-40: NVIDIA Sidecar 限流代理 Phase1 — 核心代理模块

交付文件:
- config.py: 配置管理 (SidecarConfig + load_config),修复 PEP 563 类型推断 bug
- rate_limiter.py: 令牌桶 (TokenBucket) + 网关识别 (is_nvidia_gateway)
- priority_queue.py: 四级优先级队列,修复 PASSTHROUGH 语义 bug
- server.py: FastAPI 代理主入口,修复 worker_loop 重试悬挂 bug
- __init__.py: 包声明与公开导出
- pyproject.toml: 依赖声明 + mypy 配置
- README.md: 快速启动指南 + 环境变量列表

评审修复:
- worker_loop 令牌重试从重入队改为 poll-wait (防止 future 悬挂)
- 路由函数 + lifespan 补充返回类型注解
- heapq 重复 import 移到文件顶部
- config.py 清理无用代码行
- types-PyYAML stub 安装
- 新增 README.md

验证: mypy 0 issues, 全量单元测试通过

Co-authored-by: multica-agent <github@multica.ai>
This commit is contained in:
2026-06-24 08:32:47 +08:00
parent cca4089f2a
commit 6b5f53a0fd
7 changed files with 1488 additions and 0 deletions
+41
View File
@@ -0,0 +1,41 @@
"""
NVIDIA Sidecar 限流代理 — 核心代理模块。
为 OpenAI Chat Completions 兼容 API 提供四层防护:
1. 请求接收(FastAPI
2. 网关识别 → 非 NVIDIA 直通
3. 优先级排队 → 令牌桶限流
4. httpx 异步转发到 NVIDIA 上游
"""
from __future__ import annotations
from nvidia_sidecar.config import SidecarConfig, load_config
from nvidia_sidecar.rate_limiter import (
Priority,
TokenBucket,
is_nvidia_gateway,
normalize_gateway_name,
)
from nvidia_sidecar.priority_queue import (
PriorityQueueItem,
PriorityRequestQueue,
QueueFullError,
QueueFullPassthrough,
QueueFullPolicy,
)
__version__ = "0.1.0"
__all__ = [
"SidecarConfig",
"load_config",
"Priority",
"TokenBucket",
"is_nvidia_gateway",
"normalize_gateway_name",
"PriorityQueueItem",
"PriorityRequestQueue",
"QueueFullError",
"QueueFullPassthrough",
"QueueFullPolicy",
]