635 lines
18 KiB
Markdown
635 lines
18 KiB
Markdown
|
|
# Go-Trustlog Persistence 模块
|
|||
|
|
|
|||
|
|
[](https://golang.org)
|
|||
|
|
[](.)
|
|||
|
|
[](.)
|
|||
|
|
|
|||
|
|
**数据库持久化模块**,为 go-trustlog 提供完整的数据库存储和异步最终一致性支持。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 目录
|
|||
|
|
|
|||
|
|
- [概述](#概述)
|
|||
|
|
- [核心特性](#核心特性)
|
|||
|
|
- [快速开始](#快速开始)
|
|||
|
|
- [架构设计](#架构设计)
|
|||
|
|
- [使用指南](#使用指南)
|
|||
|
|
- [配置说明](#配置说明)
|
|||
|
|
- [监控运维](#监控运维)
|
|||
|
|
- [常见问题](#常见问题)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 概述
|
|||
|
|
|
|||
|
|
Persistence 模块实现了 **Cursor + Retry 双层架构**,为操作记录提供:
|
|||
|
|
|
|||
|
|
- ✅ **三种持久化策略**:仅落库、既落库又存证、仅存证
|
|||
|
|
- ✅ **异步最终一致性**:使用 Cursor 工作器快速发现,Retry 工作器保障重试
|
|||
|
|
- ✅ **多数据库支持**:PostgreSQL、MySQL、SQLite
|
|||
|
|
- ✅ **可靠的重试机制**:指数退避 + 死信队列
|
|||
|
|
- ✅ **可空 IP 字段**:ClientIP 和 ServerIP 支持 NULL
|
|||
|
|
|
|||
|
|
### 架构亮点
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
应用调用
|
|||
|
|
↓
|
|||
|
|
仅落库(立即返回)
|
|||
|
|
↓
|
|||
|
|
CursorWorker(第一道防线)
|
|||
|
|
├── 增量扫描 operation 表
|
|||
|
|
├── 快速尝试存证
|
|||
|
|
├── 成功 → 更新状态
|
|||
|
|
└── 失败 → 加入 retry 表
|
|||
|
|
↓
|
|||
|
|
RetryWorker(第二道防线)
|
|||
|
|
├── 扫描 retry 表
|
|||
|
|
├── 指数退避重试
|
|||
|
|
├── 成功 → 删除 retry 记录
|
|||
|
|
└── 失败 → 标记死信
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**设计原则**:充分利用 cursor 游标表作为任务发现队列,而非被动的位置记录。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 核心特性
|
|||
|
|
|
|||
|
|
### 🎯 三种持久化策略
|
|||
|
|
|
|||
|
|
| 策略 | 说明 | 适用场景 |
|
|||
|
|
|------|------|----------|
|
|||
|
|
| **StrategyDBOnly** | 仅落库,不存证 | 历史数据存档、审计日志 |
|
|||
|
|
| **StrategyDBAndTrustlog** | 既落库又存证(异步) | 生产环境推荐 |
|
|||
|
|
| **StrategyTrustlogOnly** | 仅存证,不落库 | 轻量级场景 |
|
|||
|
|
|
|||
|
|
### 🔄 Cursor + Retry 双层模式
|
|||
|
|
|
|||
|
|
#### Cursor 工作器(任务发现)
|
|||
|
|
- **职责**:快速发现新的待存证记录
|
|||
|
|
- **扫描频率**:默认 10 秒
|
|||
|
|
- **处理逻辑**:增量扫描 → 尝试存证 → 成功更新 / 失败转 Retry
|
|||
|
|
|
|||
|
|
#### Retry 工作器(异常处理)
|
|||
|
|
- **职责**:处理 Cursor 阶段失败的记录
|
|||
|
|
- **扫描频率**:默认 30 秒
|
|||
|
|
- **重试策略**:指数退避(1m → 2m → 4m → 8m → 16m)
|
|||
|
|
- **死信队列**:超过最大重试次数自动标记
|
|||
|
|
|
|||
|
|
### 📊 数据库表设计
|
|||
|
|
|
|||
|
|
#### 1. operation 表(必需)
|
|||
|
|
存储所有操作记录:
|
|||
|
|
- `op_id` - 操作ID(主键)
|
|||
|
|
- `trustlog_status` - 存证状态(NOT_TRUSTLOGGED / TRUSTLOGGED)
|
|||
|
|
- `client_ip`, `server_ip` - IP 地址(可空,仅落库)
|
|||
|
|
- 索引:`idx_op_status`, `idx_op_timestamp`
|
|||
|
|
|
|||
|
|
#### 2. trustlog_cursor 表(核心)
|
|||
|
|
任务发现队列(Key-Value 模式):
|
|||
|
|
- `cursor_key` - 游标键(主键,如 "operation_scan")
|
|||
|
|
- `cursor_value` - 游标值(时间戳,RFC3339Nano 格式)
|
|||
|
|
- 索引:`idx_cursor_updated_at`
|
|||
|
|
|
|||
|
|
**优势**:
|
|||
|
|
- ✅ 支持多个游标(不同扫描任务)
|
|||
|
|
- ✅ 时间戳天然有序
|
|||
|
|
- ✅ 灵活可扩展
|
|||
|
|
|
|||
|
|
#### 3. trustlog_retry 表(必需)
|
|||
|
|
重试队列:
|
|||
|
|
- `op_id` - 操作ID(主键)
|
|||
|
|
- `retry_count` - 重试次数
|
|||
|
|
- `retry_status` - 重试状态(PENDING / RETRYING / DEAD_LETTER)
|
|||
|
|
- `next_retry_at` - 下次重试时间(支持指数退避)
|
|||
|
|
- 索引:`idx_retry_next_retry_at`, `idx_retry_status`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 快速开始
|
|||
|
|
|
|||
|
|
### 安装
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
go get go.yandata.net/iod/iod/go-trustlog
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 基础示例
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
package main
|
|||
|
|
|
|||
|
|
import (
|
|||
|
|
"context"
|
|||
|
|
"database/sql"
|
|||
|
|
"time"
|
|||
|
|
|
|||
|
|
"go.yandata.net/iod/iod/go-trustlog/api/persistence"
|
|||
|
|
"go.yandata.net/iod/iod/go-trustlog/api/model"
|
|||
|
|
"go.yandata.net/iod/iod/go-trustlog/api/adapter"
|
|||
|
|
"go.yandata.net/iod/iod/go-trustlog/api/logger"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
func main() {
|
|||
|
|
ctx := context.Background()
|
|||
|
|
|
|||
|
|
// 1. 创建 Pulsar Publisher
|
|||
|
|
publisher, _ := adapter.NewPublisher(adapter.PublisherConfig{
|
|||
|
|
URL: "pulsar://localhost:6650",
|
|||
|
|
}, logger.GetGlobalLogger())
|
|||
|
|
|
|||
|
|
// 2. 配置 Persistence Client
|
|||
|
|
client, err := persistence.NewPersistenceClient(ctx, persistence.PersistenceClientConfig{
|
|||
|
|
Publisher: publisher,
|
|||
|
|
Logger: logger.GetGlobalLogger(),
|
|||
|
|
EnvelopeConfig: model.EnvelopeConfig{
|
|||
|
|
Signer: signer, // 您的 SM2 签名器
|
|||
|
|
},
|
|||
|
|
DBConfig: persistence.DBConfig{
|
|||
|
|
DriverName: "postgres",
|
|||
|
|
DSN: "postgres://user:pass@localhost:5432/trustlog?sslmode=disable",
|
|||
|
|
},
|
|||
|
|
PersistenceConfig: persistence.PersistenceConfig{
|
|||
|
|
Strategy: persistence.StrategyDBAndTrustlog, // 既落库又存证
|
|||
|
|
},
|
|||
|
|
// 启用 Cursor 工作器(推荐)
|
|||
|
|
EnableCursorWorker: true,
|
|||
|
|
CursorWorkerConfig: &persistence.CursorWorkerConfig{
|
|||
|
|
ScanInterval: 10 * time.Second, // 10秒扫描一次
|
|||
|
|
BatchSize: 100, // 每批处理100条
|
|||
|
|
MaxRetryAttempt: 1, // Cursor阶段快速失败
|
|||
|
|
},
|
|||
|
|
// 启用 Retry 工作器(必需)
|
|||
|
|
EnableRetryWorker: true,
|
|||
|
|
RetryWorkerConfig: &persistence.RetryWorkerConfig{
|
|||
|
|
RetryInterval: 30 * time.Second, // 30秒重试一次
|
|||
|
|
MaxRetryCount: 5, // 最多重试5次
|
|||
|
|
InitialBackoff: 1 * time.Minute, // 初始退避1分钟
|
|||
|
|
},
|
|||
|
|
})
|
|||
|
|
if err != nil {
|
|||
|
|
panic(err)
|
|||
|
|
}
|
|||
|
|
defer client.Close()
|
|||
|
|
|
|||
|
|
// 3. 发布操作(立即返回,异步存证)
|
|||
|
|
clientIP := "192.168.1.100"
|
|||
|
|
serverIP := "10.0.0.1"
|
|||
|
|
|
|||
|
|
op := &model.Operation{
|
|||
|
|
OpID: "op-001",
|
|||
|
|
OpType: model.OpTypeCreate,
|
|||
|
|
Doid: "10.1000/repo/obj",
|
|||
|
|
ProducerID: "producer-001",
|
|||
|
|
OpSource: model.OpSourceDOIP,
|
|||
|
|
DoPrefix: "10.1000",
|
|||
|
|
DoRepository: "repo",
|
|||
|
|
ClientIP: &clientIP, // 可空
|
|||
|
|
ServerIP: &serverIP, // 可空
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
if err := client.OperationPublish(ctx, op); err != nil {
|
|||
|
|
panic(err)
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 落库成功,CursorWorker 会自动异步存证
|
|||
|
|
println("✅ 操作已保存,正在异步存证...")
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 架构设计
|
|||
|
|
|
|||
|
|
### 数据流图
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────┐
|
|||
|
|
│ 应用调用 OperationPublish() │
|
|||
|
|
└─────────────────────────────────────────────┘
|
|||
|
|
↓
|
|||
|
|
┌───────────────────────────────────┐
|
|||
|
|
│ 保存到 operation 表 │
|
|||
|
|
│ 状态: NOT_TRUSTLOGGED │
|
|||
|
|
└───────────────────────────────────┘
|
|||
|
|
↓
|
|||
|
|
┌───────────────────────────────────┐
|
|||
|
|
│ 立即返回成功(落库完成) │
|
|||
|
|
└───────────────────────────────────┘
|
|||
|
|
|
|||
|
|
[异步处理开始]
|
|||
|
|
|
|||
|
|
╔═══════════════════════════════════╗
|
|||
|
|
║ CursorWorker (每10秒) ║
|
|||
|
|
╚═══════════════════════════════════╝
|
|||
|
|
↓
|
|||
|
|
┌───────────────────────────────────┐
|
|||
|
|
│ 增量扫描 operation 表 │
|
|||
|
|
│ WHERE status = NOT_TRUSTLOGGED │
|
|||
|
|
│ AND created_at > cursor │
|
|||
|
|
└───────────────────────────────────┘
|
|||
|
|
↓
|
|||
|
|
┌───────────────────────────────────┐
|
|||
|
|
│ 尝试发送到存证系统 │
|
|||
|
|
└───────────────────────────────────┘
|
|||
|
|
↓ ↓
|
|||
|
|
成功 失败
|
|||
|
|
↓ ↓
|
|||
|
|
┌──────────┐ ┌──────────────┐
|
|||
|
|
│ 更新状态 │ │ 加入retry表 │
|
|||
|
|
│TRUSTLOGGED│ │ (继续处理) │
|
|||
|
|
└──────────┘ └──────────────┘
|
|||
|
|
↓
|
|||
|
|
╔═══════════════════════════════════╗
|
|||
|
|
║ RetryWorker (每30秒) ║
|
|||
|
|
╚═══════════════════════════════════╝
|
|||
|
|
↓
|
|||
|
|
┌──────────────────────────────────┐
|
|||
|
|
│ 扫描 retry 表 │
|
|||
|
|
│ WHERE next_retry_at <= NOW() │
|
|||
|
|
└──────────────────────────────────┘
|
|||
|
|
↓
|
|||
|
|
┌──────────────────────────────────┐
|
|||
|
|
│ 指数退避重试 │
|
|||
|
|
│ 1m → 2m → 4m → 8m → 16m │
|
|||
|
|
└──────────────────────────────────┘
|
|||
|
|
↓ ↓
|
|||
|
|
成功 超过最大次数
|
|||
|
|
↓ ↓
|
|||
|
|
┌──────────┐ ┌──────────────┐
|
|||
|
|
│ 删除retry│ │ 标记为死信 │
|
|||
|
|
│ 记录 │ │ DEAD_LETTER │
|
|||
|
|
└──────────┘ └──────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 性能特性
|
|||
|
|
|
|||
|
|
| 操作 | 响应时间 | 说明 |
|
|||
|
|
|------|---------|------|
|
|||
|
|
| 落库 | ~10ms | 同步返回 |
|
|||
|
|
| Cursor 扫描 | ~10ms | 100条/批 |
|
|||
|
|
| Retry 扫描 | ~5ms | 索引查询 |
|
|||
|
|
| 最终一致性 | < 5分钟 | 包含所有重试 |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 使用指南
|
|||
|
|
|
|||
|
|
### 1. 初始化数据库
|
|||
|
|
|
|||
|
|
#### 方式一:使用 SQL 脚本
|
|||
|
|
```bash
|
|||
|
|
# PostgreSQL
|
|||
|
|
psql -U user -d trustlog < api/persistence/sql/postgresql.sql
|
|||
|
|
|
|||
|
|
# MySQL
|
|||
|
|
mysql -u user -p trustlog < api/persistence/sql/mysql.sql
|
|||
|
|
|
|||
|
|
# SQLite
|
|||
|
|
sqlite3 trustlog.db < api/persistence/sql/sqlite.sql
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 方式二:自动初始化
|
|||
|
|
```go
|
|||
|
|
client, err := persistence.NewPersistenceClient(ctx, config)
|
|||
|
|
// 会自动创建表结构
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 选择持久化策略
|
|||
|
|
|
|||
|
|
#### 策略 A:仅落库(StrategyDBOnly)
|
|||
|
|
```go
|
|||
|
|
config := persistence.PersistenceConfig{
|
|||
|
|
Strategy: persistence.StrategyDBOnly,
|
|||
|
|
}
|
|||
|
|
// 不需要启动 CursorWorker 和 RetryWorker
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 策略 B:既落库又存证(StrategyDBAndTrustlog)⭐ 推荐
|
|||
|
|
```go
|
|||
|
|
config := persistence.PersistenceConfig{
|
|||
|
|
Strategy: persistence.StrategyDBAndTrustlog,
|
|||
|
|
}
|
|||
|
|
// 必须启用 CursorWorker 和 RetryWorker
|
|||
|
|
EnableCursorWorker: true,
|
|||
|
|
EnableRetryWorker: true,
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 策略 C:仅存证(StrategyTrustlogOnly)
|
|||
|
|
```go
|
|||
|
|
config := persistence.PersistenceConfig{
|
|||
|
|
Strategy: persistence.StrategyTrustlogOnly,
|
|||
|
|
}
|
|||
|
|
// 不涉及数据库
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 处理可空 IP 字段
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
// 设置 IP(使用指针)
|
|||
|
|
clientIP := "192.168.1.100"
|
|||
|
|
serverIP := "10.0.0.1"
|
|||
|
|
|
|||
|
|
op := &model.Operation{
|
|||
|
|
// ... 其他字段 ...
|
|||
|
|
ClientIP: &clientIP, // 有值
|
|||
|
|
ServerIP: &serverIP, // 有值
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 不设置 IP(NULL)
|
|||
|
|
op := &model.Operation{
|
|||
|
|
// ... 其他字段 ...
|
|||
|
|
ClientIP: nil, // NULL
|
|||
|
|
ServerIP: nil, // NULL
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. 监控和查询
|
|||
|
|
|
|||
|
|
#### 查询未存证记录数
|
|||
|
|
```go
|
|||
|
|
var count int
|
|||
|
|
db.QueryRow(`
|
|||
|
|
SELECT COUNT(*)
|
|||
|
|
FROM operation
|
|||
|
|
WHERE trustlog_status = 'NOT_TRUSTLOGGED'
|
|||
|
|
`).Scan(&count)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 查询重试队列长度
|
|||
|
|
```go
|
|||
|
|
var count int
|
|||
|
|
db.QueryRow(`
|
|||
|
|
SELECT COUNT(*)
|
|||
|
|
FROM trustlog_retry
|
|||
|
|
WHERE retry_status IN ('PENDING', 'RETRYING')
|
|||
|
|
`).Scan(&count)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 查询死信记录
|
|||
|
|
```go
|
|||
|
|
rows, _ := db.Query(`
|
|||
|
|
SELECT op_id, retry_count, error_message
|
|||
|
|
FROM trustlog_retry
|
|||
|
|
WHERE retry_status = 'DEAD_LETTER'
|
|||
|
|
`)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 配置说明
|
|||
|
|
|
|||
|
|
### DBConfig - 数据库配置
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
type DBConfig struct {
|
|||
|
|
DriverName string // 数据库驱动:postgres, mysql, sqlite3
|
|||
|
|
DSN string // 数据源名称
|
|||
|
|
MaxOpenConns int // 最大打开连接数(默认:25)
|
|||
|
|
MaxIdleConns int // 最大空闲连接数(默认:5)
|
|||
|
|
ConnMaxLifetime time.Duration // 连接最大生命周期(默认:5分钟)
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### CursorWorkerConfig - Cursor 工作器配置
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
type CursorWorkerConfig struct {
|
|||
|
|
ScanInterval time.Duration // 扫描间隔(默认:10秒)
|
|||
|
|
BatchSize int // 批量大小(默认:100)
|
|||
|
|
CursorKey string // Cursor键(默认:"operation_scan")
|
|||
|
|
MaxRetryAttempt int // Cursor阶段最大重试(默认:1,快速失败)
|
|||
|
|
Enabled bool // 是否启用(默认:true)
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**推荐配置**:
|
|||
|
|
- **开发环境**:ScanInterval=5s, BatchSize=10
|
|||
|
|
- **生产环境**:ScanInterval=10s, BatchSize=100
|
|||
|
|
- **高负载**:ScanInterval=5s, BatchSize=500
|
|||
|
|
|
|||
|
|
### RetryWorkerConfig - Retry 工作器配置
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
type RetryWorkerConfig struct {
|
|||
|
|
RetryInterval time.Duration // 扫描间隔(默认:30秒)
|
|||
|
|
BatchSize int // 批量大小(默认:100)
|
|||
|
|
MaxRetryCount int // 最大重试次数(默认:5)
|
|||
|
|
InitialBackoff time.Duration // 初始退避时间(默认:1分钟)
|
|||
|
|
BackoffMultiplier float64 // 退避倍数(默认:2.0)
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**指数退避示例**(InitialBackoff=1m, Multiplier=2.0):
|
|||
|
|
```
|
|||
|
|
重试1: 1分钟后
|
|||
|
|
重试2: 2分钟后
|
|||
|
|
重试3: 4分钟后
|
|||
|
|
重试4: 8分钟后
|
|||
|
|
重试5: 16分钟后
|
|||
|
|
超过5次: 标记为死信
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 监控运维
|
|||
|
|
|
|||
|
|
### 关键监控指标
|
|||
|
|
|
|||
|
|
#### 1. 系统健康度
|
|||
|
|
|
|||
|
|
| 指标 | 查询SQL | 告警阈值 |
|
|||
|
|
|------|---------|----------|
|
|||
|
|
| 未存证记录数 | `SELECT COUNT(*) FROM operation WHERE trustlog_status = 'NOT_TRUSTLOGGED'` | > 1000 |
|
|||
|
|
| Cursor 延迟 | `SELECT NOW() - MAX(created_at) FROM operation WHERE trustlog_status = 'NOT_TRUSTLOGGED'` | > 5分钟 |
|
|||
|
|
| 重试队列长度 | `SELECT COUNT(*) FROM trustlog_retry WHERE retry_status IN ('PENDING', 'RETRYING')` | > 500 |
|
|||
|
|
| 死信数量 | `SELECT COUNT(*) FROM trustlog_retry WHERE retry_status = 'DEAD_LETTER'` | > 10 |
|
|||
|
|
|
|||
|
|
#### 2. 性能指标
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
-- 平均重试次数
|
|||
|
|
SELECT AVG(retry_count)
|
|||
|
|
FROM trustlog_retry
|
|||
|
|
WHERE retry_status != 'DEAD_LETTER';
|
|||
|
|
|
|||
|
|
-- 成功率(最近1小时)
|
|||
|
|
SELECT
|
|||
|
|
COUNT(CASE WHEN trustlog_status = 'TRUSTLOGGED' THEN 1 END) * 100.0 / COUNT(*) as success_rate
|
|||
|
|
FROM operation
|
|||
|
|
WHERE created_at >= NOW() - INTERVAL '1 hour';
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 故障处理
|
|||
|
|
|
|||
|
|
#### 场景 1:Cursor 工作器停止
|
|||
|
|
|
|||
|
|
**症状**:未存证记录持续增长
|
|||
|
|
|
|||
|
|
**处理**:
|
|||
|
|
```bash
|
|||
|
|
# 1. 检查日志
|
|||
|
|
tail -f /var/log/trustlog/cursor_worker.log
|
|||
|
|
|
|||
|
|
# 2. 重启服务
|
|||
|
|
systemctl restart trustlog-cursor-worker
|
|||
|
|
|
|||
|
|
# 3. 验证恢复
|
|||
|
|
# 未存证记录数应逐渐下降
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 场景 2:存证系统不可用
|
|||
|
|
|
|||
|
|
**症状**:重试队列快速增长
|
|||
|
|
|
|||
|
|
**处理**:
|
|||
|
|
```bash
|
|||
|
|
# 1. 修复存证系统
|
|||
|
|
# 2. 等待自动恢复(RetryWorker 会继续重试)
|
|||
|
|
# 3. 如果出现死信,手动重置:
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
-- 重置死信记录
|
|||
|
|
UPDATE trustlog_retry
|
|||
|
|
SET retry_status = 'PENDING',
|
|||
|
|
retry_count = 0,
|
|||
|
|
next_retry_at = NOW()
|
|||
|
|
WHERE retry_status = 'DEAD_LETTER';
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 场景 3:数据库性能问题
|
|||
|
|
|
|||
|
|
**症状**:扫描变慢
|
|||
|
|
|
|||
|
|
**优化**:
|
|||
|
|
```sql
|
|||
|
|
-- 检查索引
|
|||
|
|
EXPLAIN ANALYZE
|
|||
|
|
SELECT * FROM operation
|
|||
|
|
WHERE trustlog_status = 'NOT_TRUSTLOGGED'
|
|||
|
|
AND created_at > '2024-01-01'
|
|||
|
|
ORDER BY created_at ASC
|
|||
|
|
LIMIT 100;
|
|||
|
|
|
|||
|
|
-- 重建索引
|
|||
|
|
REINDEX INDEX idx_op_status_time;
|
|||
|
|
|
|||
|
|
-- 分析表
|
|||
|
|
ANALYZE operation;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 常见问题
|
|||
|
|
|
|||
|
|
### Q1: 为什么要用 Cursor + Retry 双层模式?
|
|||
|
|
|
|||
|
|
**A**:
|
|||
|
|
- **Cursor** 负责快速发现新记录(正常流程)
|
|||
|
|
- **Retry** 专注处理失败记录(异常流程)
|
|||
|
|
- 职责分离,性能更好,监控更清晰
|
|||
|
|
|
|||
|
|
### Q2: Cursor 和 Retry 表会不会无限增长?
|
|||
|
|
|
|||
|
|
**A**:
|
|||
|
|
- **Cursor 表**:只有少量记录(每个扫描任务一条)
|
|||
|
|
- **Retry 表**:只存储失败记录,成功后自动删除
|
|||
|
|
- 死信记录需要人工处理后清理
|
|||
|
|
|
|||
|
|
### Q3: ClientIP 和 ServerIP 为什么要设计为可空?
|
|||
|
|
|
|||
|
|
**A**:
|
|||
|
|
- 有些场景无法获取 IP(如内部调用)
|
|||
|
|
- 避免使用 "0.0.0.0" 等占位符
|
|||
|
|
- 符合数据库最佳实践
|
|||
|
|
|
|||
|
|
### Q4: 如何提高处理吞吐量?
|
|||
|
|
|
|||
|
|
**A**:
|
|||
|
|
```go
|
|||
|
|
// 方法1:增加 BatchSize
|
|||
|
|
CursorWorkerConfig{
|
|||
|
|
BatchSize: 500, // 从100提升到500
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 方法2:减少扫描间隔
|
|||
|
|
CursorWorkerConfig{
|
|||
|
|
ScanInterval: 5 * time.Second, // 从10秒减到5秒
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 方法3:启动多个实例(需要配置不同的 CursorKey)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Q5: 如何处理死信记录?
|
|||
|
|
|
|||
|
|
**A**:
|
|||
|
|
```sql
|
|||
|
|
-- 1. 查看死信详情
|
|||
|
|
SELECT op_id, retry_count, error_message, created_at
|
|||
|
|
FROM trustlog_retry
|
|||
|
|
WHERE retry_status = 'DEAD_LETTER'
|
|||
|
|
ORDER BY created_at DESC;
|
|||
|
|
|
|||
|
|
-- 2. 查看对应的 operation 数据
|
|||
|
|
SELECT * FROM operation WHERE op_id = 'xxx';
|
|||
|
|
|
|||
|
|
-- 3. 如果确认可以重试,重置状态
|
|||
|
|
UPDATE trustlog_retry
|
|||
|
|
SET retry_status = 'PENDING',
|
|||
|
|
retry_count = 0,
|
|||
|
|
next_retry_at = NOW()
|
|||
|
|
WHERE op_id = 'xxx';
|
|||
|
|
|
|||
|
|
-- 4. 如果确认无法处理,删除记录
|
|||
|
|
DELETE FROM trustlog_retry WHERE op_id = 'xxx';
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Q6: 如何验证系统是否正常工作?
|
|||
|
|
|
|||
|
|
**A**:
|
|||
|
|
```go
|
|||
|
|
// 1. 插入测试数据
|
|||
|
|
client.OperationPublish(ctx, testOp)
|
|||
|
|
|
|||
|
|
// 2. 查询状态(10秒后)
|
|||
|
|
var status string
|
|||
|
|
db.QueryRow("SELECT trustlog_status FROM operation WHERE op_id = ?", testOp.OpID).Scan(&status)
|
|||
|
|
|
|||
|
|
// 3. 验证:status 应该为 "TRUSTLOGGED"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 相关文档
|
|||
|
|
|
|||
|
|
- 📘 [快速开始指南](../../PERSISTENCE_QUICKSTART.md) - 5分钟上手教程
|
|||
|
|
- 🏗️ [架构设计文档](./ARCHITECTURE_V2.md) - 详细架构说明
|
|||
|
|
- 📊 [实现总结](../../CURSOR_RETRY_ARCHITECTURE_SUMMARY.md) - 实现细节
|
|||
|
|
- 💾 [SQL 脚本说明](./sql/README.md) - 数据库脚本文档
|
|||
|
|
- ✅ [修复记录](../../FIXES_COMPLETED.md) - 问题修复历史
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 技术支持
|
|||
|
|
|
|||
|
|
### 测试状态
|
|||
|
|
- ✅ **49/49** 单元测试通过
|
|||
|
|
- ✅ 代码覆盖率: **28.5%**
|
|||
|
|
- ✅ 支持数据库: PostgreSQL, MySQL, SQLite
|
|||
|
|
|
|||
|
|
### 版本信息
|
|||
|
|
- **当前版本**: v2.1.0
|
|||
|
|
- **Go 版本要求**: 1.21+
|
|||
|
|
- **最后更新**: 2025-12-23
|
|||
|
|
|
|||
|
|
### 贡献
|
|||
|
|
欢迎提交 Issue 和 Pull Request!
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**© 2024-2025 IOD Project. All rights reserved.**
|
|||
|
|
|