ADR-006 v2.0: Sidecar V2 architecture revision based on review feedback

Incorporated feedback from 4 reviewers:
- 徐聪: AES key management, emergency channel, concurrency control, DDL indexes
- 陆怀瑾: P0 phase, schedule buffer, deployment topology, V1 compat checklist
- 严维序: SQLite backup, monitoring, cooldown persistence, port plan, rollback
- 沈路明: queue design, health check, per-model RPM decision, key validation, dashboard panels

Key additions:
+ Queue flow control design (FIFO + priority, capacity 500, REJECT overflow)
+ Provider health check (active probe + passive stats hybrid)
+ Per-model RPM decision (Provider-level V2, Model-level V3)
+ Key validation on add (test call with error feedback)
+ AES key management (SIDECAR_ENCRYPTION_KEY env var, backup SOP)
+ Emergency channel (10% RPM during full cooldown)
+ SQLite backup strategy (cron .backup, 7-day retention)
+ SQLite monitoring Prometheus metrics (db_size, wal_size, integrity)
+ Full DDL with indexes (ON CONFLICT, BEGIN IMMEDIATE patterns)
+ Dashboard panel list (5 panels: status, trends, history)
+ V1 compatibility checklist (13 items)
+ V1->V2 migration SOP with rollback plan
+ Deployment topology (systemd + Docker, port plan, firewall)
+ Log aggregation policy (logrotate: 10MB/30days)
+ Schedule revised: 71h/12days (added P0 + buffer)

Co-authored-by: multica-agent <github@multica.ai>
This commit is contained in:
2026-06-25 14:52:39 +08:00
parent 4fd89b038d
commit 82edded30c
7 changed files with 1429 additions and 0 deletions
File diff suppressed because it is too large Load Diff
+58
View File
@@ -0,0 +1,58 @@
sequenceDiagram
participant OC as OpenClaw
participant GW as API Gateway
participant LB as 负载均衡器
participant QM as 队列管理器
participant RL as Rate Limiter
participant P as Provider
participant CD as Cooldown Detector
participant ST as 统计引擎
OC->>GW: POST /v1/chat/completions
GW->>LB: 路由到目标池
Note over LB: Weighted RR 5-10s刷新<br/>weight=(max_rpm-current_rpm)/max_rpm
LB->>RL: BEGIN IMMEDIATE 事务 检查 RPM + 预占
alt RPM 不足
RL->>QM: 入队等待 超时30s
QM-->>RL: 令牌可用
end
RL-->>LB: 允许转发
LB->>P: 转发请求
P-->>LB: 响应
alt 200 OK
LB->>ST: INSERT ON CONFLICT 记录 usage_logs
LB-->>GW: 正常响应
else 429 Too Many Requests
LB->>CD: 上报429
CD->>P: 移入冷却池 cooldown_until=now+30s×2^n
LB->>LB: 重新选择 Provider B
alt Provider B 正常
LB->>P: 转发到 Provider B
P-->>LB: 200 OK
end
alt 主池全部冷却
Note over LB: 降级 Fallback 池<br/>检查即将恢复的Provider<br/>剩余<10s 等待
alt Fallback 可用
LB->>P: 转发 Fallback Provider
P-->>LB: 200 OK +降级标记
else Fallback 也全冷却
LB->>P: 紧急通道 1 Provider 10% RPM
alt 紧急通道成功
P-->>LB: 200 OK
else
LB-->>OC: 503 Service Unavailable
OC->>OC: OpenClaw 自身 fallback
end
end
end
end
Binary file not shown.

After

Width:  |  Height:  |  Size: 152 KiB

+71
View File
@@ -0,0 +1,71 @@
erDiagram
providers ||--o{ provider_usage_logs : has
providers ||--o{ cooldown_events : triggers
providers ||--o| provider_health : monitors
providers {
string id PK
string name
string api_key
string endpoint_url
string model_prefix
string pool
string status
string source
int rpm_limit
int tpm_limit
float weight
float cost_per_1k
string cooldown_until
string metadata
}
provider_usage_logs {
string id PK
string provider_id FK
string model
int prompt_tokens
int completion_tokens
int total_tokens
float cost
int request_count
int error_count
int avg_latency_ms
string hour_bucket
}
cooldown_events {
string id PK
string provider_id FK
int consecutive_count
int cooldown_seconds
string response_summary
string started_at
string ended_at
}
provider_health {
string provider_id PK
string state
int last_latency_ms
int last_status_code
float success_rate_5m
int consecutive_failures
}
daily_stats {
string id PK
string date
string pool
int total_requests
int total_errors
int total_tokens
float total_cost
int unique_providers
}
system_config {
string key PK
string value
string description
}
Binary file not shown.

After

Width:  |  Height:  |  Size: 137 KiB

+58
View File
@@ -0,0 +1,58 @@
flowchart TB
subgraph OC["OpenClaw Gateway"]
OC_SCHED["OpenClaw 调度器"]
OC_FB["OpenClaw Fallback<br/>传统配置链路"]
end
subgraph SIDECAR["Sidecar V2 systemd/Docker"]
direction TB
subgraph ENTRY["入口层"]
GW["API Gateway :9190<br/>FastAPI + 路由匹配"]
end
subgraph CORE["核心调度层"]
LB["负载均衡器<br/>Weighted RR 5-10s刷新"]
QM["队列管理器<br/>FIFO + 优先级<br/>容量500 + 溢出策略"]
end
subgraph POOLS["Provider 池层"]
MP["主池 Main Pool"]
FP["Fallback 池"]
CP["冷却池<br/>Cooldown Pool"]
end
subgraph FLOW["流控层"]
RL["Rate Limiter<br/>Per-Provider Token Bucket"]
CD["Cooldown Detector<br/>429检测+指数退避<br/>+紧急通道10%RPM"]
end
subgraph STATS["存储与统计层"]
MT["Metrics :9191<br/>Prometheus"]
ST["统计引擎<br/>Token/费用/调用量"]
DB[("SQLite WAL<br/>sidecar_v2.db<br/>+ cron备份")]
end
subgraph WEBUI["WebUI 层 :9190"]
UI["Dashboard<br/>SSE 实时推送"]
AP["Admin API<br/>Provider CRUD<br/>Bearer Token 鉴权"]
end
end
OC_SCHED --> GW
GW --> LB
LB --> QM
QM --> RL
RL --> MP
RL --> FP
MP -.->|"429 触发冷却"| CP
MP -->|"全部冷却"| FP
FP -->|"全部冷却"| OC_FB
CP -.->|"冷却结束恢复"| MP
RL --> CD
CD -.->|"紧急通道 10% RPM"| MP
LB --> MT
MT --> ST
ST --> DB
DB --> UI
AP --> DB
Binary file not shown.

After

Width:  |  Height:  |  Size: 143 KiB