Files
EnterpriseArchitect/services/nvidia_sidecar/README.md
T

63 lines
1.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# NVIDIA Sidecar 限流代理
为 NVIDIA API 提供**优先级排队 + 令牌桶限流**的透明代理层。
## 快速启动
```bash
pip install .
nvidia-sidecar
```
监听 `127.0.0.1:9190`,代理到 NVIDIA API。
## 环境变量
| 变量 | 默认值 | 说明 |
|------|--------|------|
| `SIDECAR_HOST` | `127.0.0.1` | 监听地址 |
| `SIDECAR_PORT` | `9190` | 监听端口 |
| `SIDECAR_METRICS_PORT` | `9191` | Metrics 端口 |
| `SIDECAR_UPSTREAM` | `https://integrate.api.nvidia.com/v1` | 上游 API 地址 |
| `SIDECAR_API_KEY` | — | NVIDIA API Key(必填) |
| `SIDECAR_RATE_RPM` | `40` | 每分钟请求数限制 |
| `SIDECAR_BUCKET_CAPACITY` | `40` | 令牌桶容量 |
| `SIDECAR_TIMEOUT` | `6000` | 上游请求超时(秒) |
| `SIDECAR_QUEUE_MAX` | `500` | 队列最大长度 |
| `SIDECAR_LOW_TIMEOUT` | `2.0` | 低优先级令牌等待超时(秒) |
| `SIDECAR_FALLBACK_PASSTHROUGH` | `true` | 队列满时是否直通上游 |
| `SIDECAR_LOG_LEVEL` | `INFO` | 日志级别 |
## YAML 配置
```yaml
listen_port: 9292
rate_rpm: 60
upstream_api_key: "nvapi-xxx"
```
```bash
nvidia-sidecar --config /etc/nvidia-sidecar.yaml
```
## API 端点
| 路径 | 方法 | 说明 |
|------|------|------|
| `/v1/chat/completions` | POST | OpenAI Chat Completions 代理 |
| `/v1/completions` | POST | OpenAI Completions 代理(legacy |
| `/v1/embeddings` | POST | OpenAI Embeddings 代理 |
| `/v1/models` | GET | 模型列表代理 |
| `/health` | GET | 健康检查 |
| `/metrics` | GET | 指标查询 |
## 架构
```
请求 → 网关识别 → [NVIDIA: 优先级排队 → 令牌桶限流] → httpx → NVIDIA API
→ [非 NVIDIA: 直通] → httpx → 上游
```
- **四级优先级**: URGENT > HIGH > NORMAL > LOW(通过 `X-Priority` header 指定)
- **队列满策略**: PASSTHROUGH(直通)/ REJECT503/ DROP_LOWEST(丢弃最低优先级)
- **令牌桶**: 40 RPM,线程安全,支持阻塞/非阻塞消费