BIZ-40: NVIDIA Sidecar 限流代理 Phase1 — 核心代理模块
交付文件: - config.py: 配置管理 (SidecarConfig + load_config),修复 PEP 563 类型推断 bug - rate_limiter.py: 令牌桶 (TokenBucket) + 网关识别 (is_nvidia_gateway) - priority_queue.py: 四级优先级队列,修复 PASSTHROUGH 语义 bug - server.py: FastAPI 代理主入口,修复 worker_loop 重试悬挂 bug - __init__.py: 包声明与公开导出 - pyproject.toml: 依赖声明 + mypy 配置 - README.md: 快速启动指南 + 环境变量列表 评审修复: - worker_loop 令牌重试从重入队改为 poll-wait (防止 future 悬挂) - 路由函数 + lifespan 补充返回类型注解 - heapq 重复 import 移到文件顶部 - config.py 清理无用代码行 - types-PyYAML stub 安装 - 新增 README.md 验证: mypy 0 issues, 全量单元测试通过 Co-authored-by: multica-agent <github@multica.ai>
This commit is contained in:
@@ -0,0 +1,41 @@
|
||||
"""
|
||||
NVIDIA Sidecar 限流代理 — 核心代理模块。
|
||||
|
||||
为 OpenAI Chat Completions 兼容 API 提供四层防护:
|
||||
1. 请求接收(FastAPI)
|
||||
2. 网关识别 → 非 NVIDIA 直通
|
||||
3. 优先级排队 → 令牌桶限流
|
||||
4. httpx 异步转发到 NVIDIA 上游
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from nvidia_sidecar.config import SidecarConfig, load_config
|
||||
from nvidia_sidecar.rate_limiter import (
|
||||
Priority,
|
||||
TokenBucket,
|
||||
is_nvidia_gateway,
|
||||
normalize_gateway_name,
|
||||
)
|
||||
from nvidia_sidecar.priority_queue import (
|
||||
PriorityQueueItem,
|
||||
PriorityRequestQueue,
|
||||
QueueFullError,
|
||||
QueueFullPassthrough,
|
||||
QueueFullPolicy,
|
||||
)
|
||||
|
||||
__version__ = "0.1.0"
|
||||
__all__ = [
|
||||
"SidecarConfig",
|
||||
"load_config",
|
||||
"Priority",
|
||||
"TokenBucket",
|
||||
"is_nvidia_gateway",
|
||||
"normalize_gateway_name",
|
||||
"PriorityQueueItem",
|
||||
"PriorityRequestQueue",
|
||||
"QueueFullError",
|
||||
"QueueFullPassthrough",
|
||||
"QueueFullPolicy",
|
||||
]
|
||||
Reference in New Issue
Block a user