ADR-006 v2.0: Sidecar V2 architecture revision based on review feedback
Incorporated feedback from 4 reviewers: - 徐聪: AES key management, emergency channel, concurrency control, DDL indexes - 陆怀瑾: P0 phase, schedule buffer, deployment topology, V1 compat checklist - 严维序: SQLite backup, monitoring, cooldown persistence, port plan, rollback - 沈路明: queue design, health check, per-model RPM decision, key validation, dashboard panels Key additions: + Queue flow control design (FIFO + priority, capacity 500, REJECT overflow) + Provider health check (active probe + passive stats hybrid) + Per-model RPM decision (Provider-level V2, Model-level V3) + Key validation on add (test call with error feedback) + AES key management (SIDECAR_ENCRYPTION_KEY env var, backup SOP) + Emergency channel (10% RPM during full cooldown) + SQLite backup strategy (cron .backup, 7-day retention) + SQLite monitoring Prometheus metrics (db_size, wal_size, integrity) + Full DDL with indexes (ON CONFLICT, BEGIN IMMEDIATE patterns) + Dashboard panel list (5 panels: status, trends, history) + V1 compatibility checklist (13 items) + V1->V2 migration SOP with rollback plan + Deployment topology (systemd + Docker, port plan, firewall) + Log aggregation policy (logrotate: 10MB/30days) + Schedule revised: 71h/12days (added P0 + buffer) Co-authored-by: multica-agent <github@multica.ai>
This commit is contained in:
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,58 @@
|
|||||||
|
sequenceDiagram
|
||||||
|
participant OC as OpenClaw
|
||||||
|
participant GW as API Gateway
|
||||||
|
participant LB as 负载均衡器
|
||||||
|
participant QM as 队列管理器
|
||||||
|
participant RL as Rate Limiter
|
||||||
|
participant P as Provider
|
||||||
|
participant CD as Cooldown Detector
|
||||||
|
participant ST as 统计引擎
|
||||||
|
|
||||||
|
OC->>GW: POST /v1/chat/completions
|
||||||
|
GW->>LB: 路由到目标池
|
||||||
|
|
||||||
|
Note over LB: Weighted RR 5-10s刷新<br/>weight=(max_rpm-current_rpm)/max_rpm
|
||||||
|
|
||||||
|
LB->>RL: BEGIN IMMEDIATE 事务 检查 RPM + 预占
|
||||||
|
|
||||||
|
alt RPM 不足
|
||||||
|
RL->>QM: 入队等待 超时30s
|
||||||
|
QM-->>RL: 令牌可用
|
||||||
|
end
|
||||||
|
|
||||||
|
RL-->>LB: 允许转发
|
||||||
|
|
||||||
|
LB->>P: 转发请求
|
||||||
|
P-->>LB: 响应
|
||||||
|
|
||||||
|
alt 200 OK
|
||||||
|
LB->>ST: INSERT ON CONFLICT 记录 usage_logs
|
||||||
|
LB-->>GW: 正常响应
|
||||||
|
else 429 Too Many Requests
|
||||||
|
LB->>CD: 上报429
|
||||||
|
CD->>P: 移入冷却池 cooldown_until=now+30s×2^n
|
||||||
|
|
||||||
|
LB->>LB: 重新选择 Provider B
|
||||||
|
|
||||||
|
alt Provider B 正常
|
||||||
|
LB->>P: 转发到 Provider B
|
||||||
|
P-->>LB: 200 OK
|
||||||
|
end
|
||||||
|
|
||||||
|
alt 主池全部冷却
|
||||||
|
Note over LB: 降级 Fallback 池<br/>检查即将恢复的Provider<br/>剩余<10s 等待
|
||||||
|
|
||||||
|
alt Fallback 可用
|
||||||
|
LB->>P: 转发 Fallback Provider
|
||||||
|
P-->>LB: 200 OK +降级标记
|
||||||
|
else Fallback 也全冷却
|
||||||
|
LB->>P: 紧急通道 1 Provider 10% RPM
|
||||||
|
alt 紧急通道成功
|
||||||
|
P-->>LB: 200 OK
|
||||||
|
else
|
||||||
|
LB-->>OC: 503 Service Unavailable
|
||||||
|
OC->>OC: OpenClaw 自身 fallback
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
Binary file not shown.
|
After Width: | Height: | Size: 152 KiB |
@@ -0,0 +1,71 @@
|
|||||||
|
erDiagram
|
||||||
|
providers ||--o{ provider_usage_logs : has
|
||||||
|
providers ||--o{ cooldown_events : triggers
|
||||||
|
providers ||--o| provider_health : monitors
|
||||||
|
|
||||||
|
providers {
|
||||||
|
string id PK
|
||||||
|
string name
|
||||||
|
string api_key
|
||||||
|
string endpoint_url
|
||||||
|
string model_prefix
|
||||||
|
string pool
|
||||||
|
string status
|
||||||
|
string source
|
||||||
|
int rpm_limit
|
||||||
|
int tpm_limit
|
||||||
|
float weight
|
||||||
|
float cost_per_1k
|
||||||
|
string cooldown_until
|
||||||
|
string metadata
|
||||||
|
}
|
||||||
|
|
||||||
|
provider_usage_logs {
|
||||||
|
string id PK
|
||||||
|
string provider_id FK
|
||||||
|
string model
|
||||||
|
int prompt_tokens
|
||||||
|
int completion_tokens
|
||||||
|
int total_tokens
|
||||||
|
float cost
|
||||||
|
int request_count
|
||||||
|
int error_count
|
||||||
|
int avg_latency_ms
|
||||||
|
string hour_bucket
|
||||||
|
}
|
||||||
|
|
||||||
|
cooldown_events {
|
||||||
|
string id PK
|
||||||
|
string provider_id FK
|
||||||
|
int consecutive_count
|
||||||
|
int cooldown_seconds
|
||||||
|
string response_summary
|
||||||
|
string started_at
|
||||||
|
string ended_at
|
||||||
|
}
|
||||||
|
|
||||||
|
provider_health {
|
||||||
|
string provider_id PK
|
||||||
|
string state
|
||||||
|
int last_latency_ms
|
||||||
|
int last_status_code
|
||||||
|
float success_rate_5m
|
||||||
|
int consecutive_failures
|
||||||
|
}
|
||||||
|
|
||||||
|
daily_stats {
|
||||||
|
string id PK
|
||||||
|
string date
|
||||||
|
string pool
|
||||||
|
int total_requests
|
||||||
|
int total_errors
|
||||||
|
int total_tokens
|
||||||
|
float total_cost
|
||||||
|
int unique_providers
|
||||||
|
}
|
||||||
|
|
||||||
|
system_config {
|
||||||
|
string key PK
|
||||||
|
string value
|
||||||
|
string description
|
||||||
|
}
|
||||||
Binary file not shown.
|
After Width: | Height: | Size: 137 KiB |
@@ -0,0 +1,58 @@
|
|||||||
|
flowchart TB
|
||||||
|
subgraph OC["OpenClaw Gateway"]
|
||||||
|
OC_SCHED["OpenClaw 调度器"]
|
||||||
|
OC_FB["OpenClaw Fallback<br/>传统配置链路"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph SIDECAR["Sidecar V2 systemd/Docker"]
|
||||||
|
direction TB
|
||||||
|
|
||||||
|
subgraph ENTRY["入口层"]
|
||||||
|
GW["API Gateway :9190<br/>FastAPI + 路由匹配"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph CORE["核心调度层"]
|
||||||
|
LB["负载均衡器<br/>Weighted RR 5-10s刷新"]
|
||||||
|
QM["队列管理器<br/>FIFO + 优先级<br/>容量500 + 溢出策略"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph POOLS["Provider 池层"]
|
||||||
|
MP["主池 Main Pool"]
|
||||||
|
FP["Fallback 池"]
|
||||||
|
CP["冷却池<br/>Cooldown Pool"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph FLOW["流控层"]
|
||||||
|
RL["Rate Limiter<br/>Per-Provider Token Bucket"]
|
||||||
|
CD["Cooldown Detector<br/>429检测+指数退避<br/>+紧急通道10%RPM"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph STATS["存储与统计层"]
|
||||||
|
MT["Metrics :9191<br/>Prometheus"]
|
||||||
|
ST["统计引擎<br/>Token/费用/调用量"]
|
||||||
|
DB[("SQLite WAL<br/>sidecar_v2.db<br/>+ cron备份")]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph WEBUI["WebUI 层 :9190"]
|
||||||
|
UI["Dashboard<br/>SSE 实时推送"]
|
||||||
|
AP["Admin API<br/>Provider CRUD<br/>Bearer Token 鉴权"]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
OC_SCHED --> GW
|
||||||
|
GW --> LB
|
||||||
|
LB --> QM
|
||||||
|
QM --> RL
|
||||||
|
RL --> MP
|
||||||
|
RL --> FP
|
||||||
|
MP -.->|"429 触发冷却"| CP
|
||||||
|
MP -->|"全部冷却"| FP
|
||||||
|
FP -->|"全部冷却"| OC_FB
|
||||||
|
CP -.->|"冷却结束恢复"| MP
|
||||||
|
RL --> CD
|
||||||
|
CD -.->|"紧急通道 10% RPM"| MP
|
||||||
|
LB --> MT
|
||||||
|
MT --> ST
|
||||||
|
ST --> DB
|
||||||
|
DB --> UI
|
||||||
|
AP --> DB
|
||||||
Binary file not shown.
|
After Width: | Height: | Size: 143 KiB |
Reference in New Issue
Block a user