feat: 更新插件作者信息并将深度阅读插件本地化为英文
This commit is contained in:
@@ -16,6 +16,8 @@
|
||||
- ✅ **灵活保留策略**: 可自由配置保留对话头部和尾部的消息数量,确保关键信息和上下文的连贯性。
|
||||
- ✅ **智能注入**: 将生成的历史摘要智能地注入到新的上下文中。
|
||||
|
||||
详细的工作原理和流程请参考 [工作流程指南](WORKFLOW_GUIDE_CN.md)。
|
||||
|
||||
---
|
||||
|
||||
## 安装与配置
|
||||
@@ -49,16 +51,51 @@
|
||||
|
||||
您可以在过滤器的设置中调整以下参数:
|
||||
|
||||
### 核心参数
|
||||
|
||||
| 参数 | 默认值 | 描述 |
|
||||
| :--- | :--- | :--- |
|
||||
| `priority` | `10` | 过滤器执行顺序,数值越小越先执行。 |
|
||||
| `compression_threshold` | `15` | 当总消息数达到此值时,将在后台触发摘要生成。 |
|
||||
| `keep_first` | `1` | 始终保留对话开始的 N 条消息。第一条消息通常包含重要的系统提示。 |
|
||||
| `keep_last` | `6` | 始终保留对话末尾的 N 条消息,以确保上下文连贯。 |
|
||||
| `summary_model` | `None` | 用于生成摘要的模型。**强烈建议**配置一个快速、经济的兼容模型(如 `gemini-2.5-flash`)。如果留空,将尝试使用当前对话的模型,但这可能因模型不兼容(如 Pipe 模型)而失败。 |
|
||||
| `max_summary_tokens` | `4000` | 生成摘要时允许的最大 Token 数。 |
|
||||
| `summary_temperature` | `0.3` | 控制摘要生成的随机性,较低的值结果更稳定。 |
|
||||
| `debug_mode` | `true` | 是否在日志中打印详细的调试信息。生产环境建议设为 `false`。 |
|
||||
| `compression_threshold_tokens` | `64000` | **(重要)** 当上下文总 Token 数超过此值时,将在后台触发摘要生成。建议设置为模型最大上下文窗口的 50%-70%。 |
|
||||
| `max_context_tokens` | `128000` | **(重要)** 上下文的硬性上限。如果超过此值,将强制移除最早的消息(保留受保护消息除外)。防止 Token 溢出。 |
|
||||
| `keep_first` | `1` | 始终保留对话开始的 N 条消息。第一条消息通常包含重要的系统提示或环境变量,建议至少保留 1 条。 |
|
||||
| `keep_last` | `6` | 始终保留对话末尾的 N 条消息,以确保最近对话的连贯性。 |
|
||||
|
||||
### 摘要生成配置
|
||||
|
||||
| 参数 | 默认值 | 描述 |
|
||||
| :--- | :--- | :--- |
|
||||
| `summary_model` | `None` | 用于生成摘要的模型 ID。**强烈建议**配置一个快速、经济且上下文窗口较大的模型(如 `gemini-2.5-flash`, `deepseek-v3`)。如果留空,将尝试使用当前对话的模型。 |
|
||||
| `max_summary_tokens` | `16384` | 生成摘要时允许的最大 Token 数。 |
|
||||
| `summary_temperature` | `0.1` | 控制摘要生成的随机性,较低的值结果更稳定。 |
|
||||
|
||||
### 高级配置
|
||||
|
||||
#### `model_thresholds` (模型特定阈值)
|
||||
|
||||
这是一个字典配置,允许您为特定的模型 ID 覆盖全局的 `compression_threshold_tokens` 和 `max_context_tokens`。这对于混合使用不同上下文窗口大小的模型非常有用。
|
||||
|
||||
**默认配置包含了主流模型(如 GPT-4, Claude 3.5, Gemini 1.5/2.0, Qwen 2.5/3, DeepSeek V3 等)的推荐阈值。**
|
||||
|
||||
**配置示例:**
|
||||
|
||||
```json
|
||||
{
|
||||
"gpt-4": {
|
||||
"compression_threshold_tokens": 8000,
|
||||
"max_context_tokens": 32000
|
||||
},
|
||||
"gemini-2.5-flash": {
|
||||
"compression_threshold_tokens": 734000,
|
||||
"max_context_tokens": 1048576
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### `debug_mode`
|
||||
|
||||
- **默认值**: `true`
|
||||
- **描述**: 是否在 Open WebUI 的控制台日志中打印详细的调试信息(如 Token 计数、压缩进度、数据库操作等)。生产环境建议设为 `false`。
|
||||
|
||||
---
|
||||
|
||||
@@ -68,10 +105,10 @@
|
||||
- **解决**:请确认 `DATABASE_URL` 环境变量已正确设置,并且数据库服务运行正常。
|
||||
|
||||
- **问题:摘要未生成**
|
||||
- **解决**:检查 `compression_threshold` 是否已达到,并确认 `summary_model` 配置正确。查看日志以获取详细错误。
|
||||
- **解决**:检查 `compression_threshold_tokens` 是否已达到,并确认 `summary_model` 配置正确。查看日志以获取详细错误。
|
||||
|
||||
- **问题:初始的系统提示丢失**
|
||||
- **解决**:确保 `keep_first` 的值大于 0,以保留包含重要信息的初始消息。
|
||||
|
||||
- **问题:压缩效果不明显**
|
||||
- **解决**:尝试适当提高 `compression_threshold`,或减少 `keep_first` / `keep_last` 的值。
|
||||
- **解决**:尝试适当提高 `compression_threshold_tokens`,或减少 `keep_first` / `keep_last` 的值。
|
||||
@@ -373,109 +373,7 @@ class Filter:
|
||||
default=128000, ge=0, description="上下文的硬性上限。超过此值将强制移除最早的消息 (全局默认值)"
|
||||
)
|
||||
model_thresholds: dict = Field(
|
||||
default={
|
||||
# Groq
|
||||
"groq-openai/gpt-oss-20b": {"max_context_tokens": 8000, "compression_threshold_tokens": 5600},
|
||||
"groq-openai/gpt-oss-120b": {"max_context_tokens": 8000, "compression_threshold_tokens": 5600},
|
||||
|
||||
# Qwen (ModelScope / CF)
|
||||
"modelscope-Qwen/Qwen3-Coder-480B-A35B-Instruct": {"max_context_tokens": 256000, "compression_threshold_tokens": 179200},
|
||||
"cfchatqwen-qwen3-max-search": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"modelscope-Qwen/Qwen3-235B-A22B-Thinking-2507": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"cfchatqwen-qwen3-max": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"cfchatqwen-qwen3-vl-plus-thinking": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"cfchatqwen-qwen3-coder-plus-search": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"cfchatqwen-qwen3-vl-plus": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"cfchatqwen-qwen3-coder-plus": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"cfchatqwen-qwen3-omni-flash-thinking": {"max_context_tokens": 65536, "compression_threshold_tokens": 45875},
|
||||
"cfchatqwen-qwen3-omni-flash": {"max_context_tokens": 65536, "compression_threshold_tokens": 45875},
|
||||
"cfchatqwen-qwen3-next-80b-a3b-thinking": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"modelscope-Qwen/Qwen3-VL-235B-A22B-Instruct": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"cfchatqwen-qwen3-next-80b-a3b-thinking-search": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"cfchatqwen-qwen3-next-80b-a3b": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"cfchatqwen-qwen3-235b-a22b-thinking-search": {"max_context_tokens": 131072, "compression_threshold_tokens": 91750},
|
||||
"cfchatqwen-qwen3-235b-a22b": {"max_context_tokens": 131072, "compression_threshold_tokens": 91750},
|
||||
"cfchatqwen-qwen3-235b-a22b-thinking": {"max_context_tokens": 131072, "compression_threshold_tokens": 91750},
|
||||
"cfchatqwen-qwen3-coder-flash-search": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"cfchatqwen-qwen3-coder-flash": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"cfchatqwen-qwen3-max-2025-10-30": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"cfchatqwen-qwen3-max-2025-10-30-thinking": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"cfchatqwen-qwen3-max-2025-10-30-thinking-search": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"modelscope-Qwen/Qwen3-235B-A22B-Instruct-2507": {"max_context_tokens": 262144, "compression_threshold_tokens": 183500},
|
||||
"cfchatqwen-qwen3-vl-30b-a3b": {"max_context_tokens": 131072, "compression_threshold_tokens": 91750},
|
||||
"cfchatqwen-qwen3-vl-30b-a3b-thinking": {"max_context_tokens": 131072, "compression_threshold_tokens": 91750},
|
||||
|
||||
# Gemini
|
||||
"gemini-2.5-pro-search": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-2.5-flash-search": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-2.5-flash": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-2.5-flash-lite": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-2.5-flash-lite-search": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-2.5-pro": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-2.0-flash-search": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-2.0-flash": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-2.0-flash-exp": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-2.0-flash-lite": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"copilot-gemini-2.5-pro": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"gemini-pro-latest": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-3-pro-preview": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"gemini-pro-latest-search": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-flash-latest": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-flash-latest-search": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-flash-lite-latest-search": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-flash-lite-latest": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
"gemini-robotics-er-1.5-preview": {"max_context_tokens": 1048576, "compression_threshold_tokens": 734000},
|
||||
|
||||
# DeepSeek
|
||||
"modelscope-deepseek-ai/DeepSeek-V3.1": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"cfdeepseek-deepseek-search": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"openrouter-deepseek/deepseek-r1-0528:free": {"max_context_tokens": 163840, "compression_threshold_tokens": 114688},
|
||||
"modelscope-deepseek-ai/DeepSeek-V3.2-Exp": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"cfdeepseek-deepseek-r1-search": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"cfdeepseek-deepseek-r1": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"openrouter-deepseek/deepseek-chat-v3.1:free": {"max_context_tokens": 163800, "compression_threshold_tokens": 114660},
|
||||
"modelscope-deepseek-ai/DeepSeek-R1-0528": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"cfdeepseek-deepseek": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
|
||||
# Kimi (Moonshot)
|
||||
"cfkimi-kimi-k2-search": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"cfkimi-kimi-k1.5-search": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"cfkimi-kimi-k1.5-thinking-search": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"cfkimi-kimi-research": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"openrouter-moonshotai/kimi-k2:free": {"max_context_tokens": 32768, "compression_threshold_tokens": 22937},
|
||||
"cfkimi-kimi-k2": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"cfkimi-kimi-k1.5": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
|
||||
# GPT / OpenAI
|
||||
"gpt-4.1": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"gpt-4o": {"max_context_tokens": 64000, "compression_threshold_tokens": 44800},
|
||||
"gpt-5": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"github-gpt-4.1": {"max_context_tokens": 7500, "compression_threshold_tokens": 5250},
|
||||
"gpt-5-mini": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"gpt-5.1": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"gpt-5.1-codex": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"gpt-5.1-codex-mini": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"gpt-5-codex": {"max_context_tokens": 200000, "compression_threshold_tokens": 140000},
|
||||
"github-gpt-4.1-mini": {"max_context_tokens": 7500, "compression_threshold_tokens": 5250},
|
||||
"openrouter-openai/gpt-oss-20b:free": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
|
||||
# Claude / Anthropic
|
||||
"claude-sonnet-4.5": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"claude-haiku-4.5": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"copilot-claude-opus-41": {"max_context_tokens": 80000, "compression_threshold_tokens": 56000},
|
||||
"copilot-claude-sonnet-4": {"max_context_tokens": 80000, "compression_threshold_tokens": 56000},
|
||||
|
||||
# Other / OpenRouter / OSWE
|
||||
"oswe-vscode-insiders": {"max_context_tokens": 256000, "compression_threshold_tokens": 179200},
|
||||
"modelscope-MiniMax/MiniMax-M2": {"max_context_tokens": 204800, "compression_threshold_tokens": 143360},
|
||||
"oswe-vscode-prime": {"max_context_tokens": 200000, "compression_threshold_tokens": 140000},
|
||||
"grok-code-fast-1": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"copilot-auto": {"max_context_tokens": 128000, "compression_threshold_tokens": 89600},
|
||||
"modelscope-ZhipuAI/GLM-4.6": {"max_context_tokens": 32000, "compression_threshold_tokens": 22400},
|
||||
"openrouter-x-ai/grok-4.1-fast:free": {"max_context_tokens": 2000000, "compression_threshold_tokens": 1400000},
|
||||
"openrouter-qwen/qwen3-coder:free": {"max_context_tokens": 262000, "compression_threshold_tokens": 183400},
|
||||
"openrouter-qwen/qwen3-235b-a22b:free": {"max_context_tokens": 40960, "compression_threshold_tokens": 28672},
|
||||
},
|
||||
default={},
|
||||
description="针对特定模型的阈值覆盖配置。仅包含需要特殊配置的模型。"
|
||||
)
|
||||
|
||||
|
||||
@@ -1,45 +0,0 @@
|
||||
需求文档:异步上下文压缩插件优化 (Async Context Compression Optimization)
|
||||
1. 核心目标 将现有的基于消息数量的压缩逻辑升级为基于 Token 数量的压缩逻辑,并引入递归摘要机制,以更精准地控制上下文窗口,提高摘要质量,并防止历史信息丢失。
|
||||
|
||||
2. 功能需求
|
||||
|
||||
Token 计数与阈值控制
|
||||
引入 tiktoken: 使用 tiktoken 库进行精确的 Token 计数。如果环境不支持,则回退到字符估算 (1 token ≈ 4 chars)。
|
||||
新配置参数 (Valves):
|
||||
compression_threshold_tokens (默认: 64000): 当上下文总 Token 数超过此值时,触发压缩(生成摘要)。
|
||||
max_context_tokens (默认: 128000): 上下文的硬性上限。如果超过此值,强制移除最早的消息(保留受保护消息除外)。
|
||||
model_thresholds (字典): 支持针对不同模型 ID 配置不同的阈值。例如:{'gpt-4': {'compression_threshold_tokens': 8000, ...}}。
|
||||
废弃旧参数: compression_threshold (基于消息数) 将被标记为废弃,优先使用 Token 阈值。
|
||||
递归摘要 (Recursive Summarization)
|
||||
机制: 在生成新摘要时,必须读取并包含上一次的摘要。
|
||||
逻辑: 新摘要 = LLM(上一次摘要 + 新产生的对话消息)。
|
||||
目的: 防止随着对话进行,最早期的摘要信息被丢弃,确保长期记忆的连续性。
|
||||
消息保护与修剪策略
|
||||
保护机制: keep_first (保留头部 N 条) 和 keep_last (保留尾部 N 条) 的消息绝对不参与压缩,也不被移除。
|
||||
修剪逻辑: 当触发 max_context_tokens 限制时,优先移除 keep_first 之后、keep_last 之前的最早消息。
|
||||
优化的提示词 (Prompt Engineering)
|
||||
目标: 去除无用信息(寒暄、重复),保留关键信号(事实、代码、决策)。
|
||||
指令:
|
||||
提炼与净化: 明确要求移除噪音。
|
||||
关键保留: 强调代码片段必须逐字保留。
|
||||
合并与更新: 明确指示将新信息合并到旧摘要中。
|
||||
语言一致性: 输出语言必须与对话语言保持一致。
|
||||
3. 实现细节
|
||||
|
||||
文件:
|
||||
async_context_compression.py
|
||||
类:
|
||||
Filter
|
||||
关键方法:
|
||||
_count_tokens(text): 实现 Token 计数。
|
||||
_calculate_messages_tokens(messages): 计算消息列表总 Token。
|
||||
_generate_summary_async(...)
|
||||
: 修改为加载旧摘要,并传入 LLM。
|
||||
_call_summary_llm(...)
|
||||
: 更新 Prompt,接受 previous_summary 和 new_messages。
|
||||
inlet(...)
|
||||
:
|
||||
使用 compression_threshold_tokens 判断是否注入摘要。
|
||||
实现 max_context_tokens 的强制修剪逻辑。
|
||||
outlet(...)
|
||||
: 使用 compression_threshold_tokens 判断是否触发后台摘要任务。
|
||||
Reference in New Issue
Block a user