Compare commits

...

2 Commits

Author SHA1 Message Date
vincent 0894a86af8 ADR-006 v2.1: final revision, NVIDIA provider keys, reply to 徐聪
v2.1 changes from 2nd-round review:
+ Emergency channel RPM: max(1, max_rpm * 0.1)
+ Queue 503: add Retry-After: 30 header
+ sidecar_backup_success Prometheus metric
+ Startup crypto.py key validation on boot
+ SQLite size limits: 100MB practical, 500MB WAL
+ RPM flow: per-request counting, not token-based
+ SSE streaming: TTFT for avg_latency_ms
+ Merge proxy/retry.py into core/cooldown.py

Added sidecar-v2-nvidia-providers.yaml (11 keys)

Co-authored-by: multica-agent <github@multica.ai>
2026-06-25 15:19:21 +08:00
vincent 82edded30c ADR-006 v2.0: Sidecar V2 architecture revision based on review feedback
Incorporated feedback from 4 reviewers:
- 徐聪: AES key management, emergency channel, concurrency control, DDL indexes
- 陆怀瑾: P0 phase, schedule buffer, deployment topology, V1 compat checklist
- 严维序: SQLite backup, monitoring, cooldown persistence, port plan, rollback
- 沈路明: queue design, health check, per-model RPM decision, key validation, dashboard panels

Key additions:
+ Queue flow control design (FIFO + priority, capacity 500, REJECT overflow)
+ Provider health check (active probe + passive stats hybrid)
+ Per-model RPM decision (Provider-level V2, Model-level V3)
+ Key validation on add (test call with error feedback)
+ AES key management (SIDECAR_ENCRYPTION_KEY env var, backup SOP)
+ Emergency channel (10% RPM during full cooldown)
+ SQLite backup strategy (cron .backup, 7-day retention)
+ SQLite monitoring Prometheus metrics (db_size, wal_size, integrity)
+ Full DDL with indexes (ON CONFLICT, BEGIN IMMEDIATE patterns)
+ Dashboard panel list (5 panels: status, trends, history)
+ V1 compatibility checklist (13 items)
+ V1->V2 migration SOP with rollback plan
+ Deployment topology (systemd + Docker, port plan, firewall)
+ Log aggregation policy (logrotate: 10MB/30days)
+ Schedule revised: 71h/12days (added P0 + buffer)

Co-authored-by: multica-agent <github@multica.ai>
2026-06-25 14:52:39 +08:00
8 changed files with 1579 additions and 0 deletions
File diff suppressed because it is too large Load Diff
+58
View File
@@ -0,0 +1,58 @@
sequenceDiagram
participant OC as OpenClaw
participant GW as API Gateway
participant LB as 负载均衡器
participant QM as 队列管理器
participant RL as Rate Limiter
participant P as Provider
participant CD as Cooldown Detector
participant ST as 统计引擎
OC->>GW: POST /v1/chat/completions
GW->>LB: 路由到目标池
Note over LB: Weighted RR 5-10s刷新<br/>weight=(max_rpm-current_rpm)/max_rpm
LB->>RL: BEGIN IMMEDIATE 事务 检查 RPM + 预占
alt RPM 不足
RL->>QM: 入队等待 超时30s
QM-->>RL: 令牌可用
end
RL-->>LB: 允许转发
LB->>P: 转发请求
P-->>LB: 响应
alt 200 OK
LB->>ST: INSERT ON CONFLICT 记录 usage_logs
LB-->>GW: 正常响应
else 429 Too Many Requests
LB->>CD: 上报429
CD->>P: 移入冷却池 cooldown_until=now+30s×2^n
LB->>LB: 重新选择 Provider B
alt Provider B 正常
LB->>P: 转发到 Provider B
P-->>LB: 200 OK
end
alt 主池全部冷却
Note over LB: 降级 Fallback 池<br/>检查即将恢复的Provider<br/>剩余<10s 等待
alt Fallback 可用
LB->>P: 转发 Fallback Provider
P-->>LB: 200 OK +降级标记
else Fallback 也全冷却
LB->>P: 紧急通道 1 Provider 10% RPM
alt 紧急通道成功
P-->>LB: 200 OK
else
LB-->>OC: 503 Service Unavailable
OC->>OC: OpenClaw 自身 fallback
end
end
end
end
Binary file not shown.

After

Width:  |  Height:  |  Size: 152 KiB

+71
View File
@@ -0,0 +1,71 @@
erDiagram
providers ||--o{ provider_usage_logs : has
providers ||--o{ cooldown_events : triggers
providers ||--o| provider_health : monitors
providers {
string id PK
string name
string api_key
string endpoint_url
string model_prefix
string pool
string status
string source
int rpm_limit
int tpm_limit
float weight
float cost_per_1k
string cooldown_until
string metadata
}
provider_usage_logs {
string id PK
string provider_id FK
string model
int prompt_tokens
int completion_tokens
int total_tokens
float cost
int request_count
int error_count
int avg_latency_ms
string hour_bucket
}
cooldown_events {
string id PK
string provider_id FK
int consecutive_count
int cooldown_seconds
string response_summary
string started_at
string ended_at
}
provider_health {
string provider_id PK
string state
int last_latency_ms
int last_status_code
float success_rate_5m
int consecutive_failures
}
daily_stats {
string id PK
string date
string pool
int total_requests
int total_errors
int total_tokens
float total_cost
int unique_providers
}
system_config {
string key PK
string value
string description
}
Binary file not shown.

After

Width:  |  Height:  |  Size: 137 KiB

@@ -0,0 +1,121 @@
# NVIDIA Provider Keys Reference for Sidecar V2
# =============================================
# ⚠️ SECURITY: This file contains sensitive API key material.
# In Sidecar V2 production deployment, API keys are stored as
# AES-256-GCM ciphertext in SQLite (providers.api_key column).
# The plaintext keys below are for V2 initial provisioning only.
#
# Usage: Import into Sidecar V2 via WebUI Admin or POST /api/v2/providers
# After import, this file should be stored in a secure location
# (Bitwarden / password manager) and NOT kept in plaintext on disk.
#
# Created: 2026-06-25 | By: 梁思筑 (architect)
# Total providers: 11 | Pool: main | RPM each: 40 | Total RPM capacity: 440
providers:
- account: bizwings
email: vincent@bizwingsinc.com
api_key: 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
endpoint_url: https://integrate.api.nvidia.com/v1
model_prefix: "nvidia/"
pool: main
rpm_limit: 40
notes: "主账号"
- account: "98053"
email: 98053@qq.com
api_key: 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
endpoint_url: https://integrate.api.nvidia.com/v1
model_prefix: "nvidia/"
pool: main
rpm_limit: 40
notes: ""
- account: liuweicheng84
email: liuweicheng84@gmail.com
api_key: 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
endpoint_url: https://integrate.api.nvidia.com/v1
model_prefix: "nvidia/"
pool: main
rpm_limit: 40
notes: ""
- account: vx18088980513
email: vx18088980513@qq.com
api_key: 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
endpoint_url: https://integrate.api.nvidia.com/v1
model_prefix: "nvidia/"
pool: main
rpm_limit: 40
notes: ""
- account: "64391942"
email: 64391942@qq.com
api_key: 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
endpoint_url: https://integrate.api.nvidia.com/v1
model_prefix: "nvidia/"
pool: main
rpm_limit: 40
notes: ""
- account: cgtest1
email: cgtest1@bizwingsinc.com
api_key: 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
endpoint_url: https://integrate.api.nvidia.com/v1
model_prefix: "nvidia/"
pool: main
rpm_limit: 40
notes: "测试账号1"
- account: cgtest2
email: cgtest2@bizwingsinc.com
api_key: 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
endpoint_url: https://integrate.api.nvidia.com/v1
model_prefix: "nvidia/"
pool: main
rpm_limit: 40
notes: "测试账号2"
- account: "15876517651"
email: 1248106918@qq.com
api_key: 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
endpoint_url: https://integrate.api.nvidia.com/v1
model_prefix: "nvidia/"
pool: main
rpm_limit: 40
notes: ""
- account: "19584586741"
email: 414133763@qq.com
api_key: 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
endpoint_url: https://integrate.api.nvidia.com/v1
model_prefix: "nvidia/"
pool: main
rpm_limit: 40
notes: ""
- account: "18874954146"
email: 350894172@qq.com
api_key: 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
endpoint_url: https://integrate.api.nvidia.com/v1
model_prefix: "nvidia/"
pool: main
rpm_limit: 40
notes: ""
- account: "2405483110"
email: 2405483110@qq.com
api_key: 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
endpoint_url: https://integrate.api.nvidia.com/v1
model_prefix: "nvidia/"
pool: main
rpm_limit: 40
notes: ""
# Aggregated stats
summary:
total_providers: 11
total_rpm_capacity: 440
pools:
main: 11
fallback: 0
+58
View File
@@ -0,0 +1,58 @@
flowchart TB
subgraph OC["OpenClaw Gateway"]
OC_SCHED["OpenClaw 调度器"]
OC_FB["OpenClaw Fallback<br/>传统配置链路"]
end
subgraph SIDECAR["Sidecar V2 systemd/Docker"]
direction TB
subgraph ENTRY["入口层"]
GW["API Gateway :9190<br/>FastAPI + 路由匹配"]
end
subgraph CORE["核心调度层"]
LB["负载均衡器<br/>Weighted RR 5-10s刷新"]
QM["队列管理器<br/>FIFO + 优先级<br/>容量500 + 溢出策略"]
end
subgraph POOLS["Provider 池层"]
MP["主池 Main Pool"]
FP["Fallback 池"]
CP["冷却池<br/>Cooldown Pool"]
end
subgraph FLOW["流控层"]
RL["Rate Limiter<br/>Per-Provider Token Bucket"]
CD["Cooldown Detector<br/>429检测+指数退避<br/>+紧急通道10%RPM"]
end
subgraph STATS["存储与统计层"]
MT["Metrics :9191<br/>Prometheus"]
ST["统计引擎<br/>Token/费用/调用量"]
DB[("SQLite WAL<br/>sidecar_v2.db<br/>+ cron备份")]
end
subgraph WEBUI["WebUI 层 :9190"]
UI["Dashboard<br/>SSE 实时推送"]
AP["Admin API<br/>Provider CRUD<br/>Bearer Token 鉴权"]
end
end
OC_SCHED --> GW
GW --> LB
LB --> QM
QM --> RL
RL --> MP
RL --> FP
MP -.->|"429 触发冷却"| CP
MP -->|"全部冷却"| FP
FP -->|"全部冷却"| OC_FB
CP -.->|"冷却结束恢复"| MP
RL --> CD
CD -.->|"紧急通道 10% RPM"| MP
LB --> MT
MT --> ST
ST --> DB
DB --> UI
AP --> DB
Binary file not shown.

After

Width:  |  Height:  |  Size: 143 KiB