From 59ee25754dd98d89fe29bf3251ef365a955c744a Mon Sep 17 00:00:00 2001 From: Jeff fu Date: Wed, 31 Dec 2025 13:21:33 +0800 Subject: [PATCH] =?UTF-8?q?=E6=9B=B4=E6=96=B0=E6=96=87=E6=A1=A3=E5=92=8C?= =?UTF-8?q?=E5=8F=82=E6=95=B0=EF=BC=9A=E5=B0=86=E7=89=88=E6=9C=AC=E5=8F=B7?= =?UTF-8?q?=E6=8F=90=E5=8D=87=E8=87=B31.2.0=EF=BC=8C=E5=A2=9E=E5=8A=A0?= =?UTF-8?q?=E6=96=B0=E7=89=B9=E6=80=A7=E8=AF=B4=E6=98=8E=EF=BC=8C=E8=B0=83?= =?UTF-8?q?=E6=95=B4=E6=9C=80=E5=A4=A7=E6=91=98=E8=A6=81token=E6=95=B0?= =?UTF-8?q?=E8=87=B316384?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../async-context-compression/README.md | 79 +++++++++--------- .../async-context-compression/README_CN.md | 81 +++++++++---------- .../async_context_compression.py | 2 +- .../异步上下文压缩.py | 2 +- 4 files changed, 78 insertions(+), 86 deletions(-) diff --git a/plugins/filters/async-context-compression/README.md b/plugins/filters/async-context-compression/README.md index c6305bd..e3e3491 100644 --- a/plugins/filters/async-context-compression/README.md +++ b/plugins/filters/async-context-compression/README.md @@ -1,70 +1,65 @@ # Async Context Compression Filter -**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.1.0 | **License:** MIT +**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.2.0 | **License:** MIT -> **Important Note**: To ensure the maintainability and usability of all filters, each filter should be accompanied by clear and complete documentation to fully explain its functionality, configuration, and usage. +This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent. -This filter significantly reduces token consumption in long conversations by using intelligent summarization and message compression, while maintaining conversational coherence. +## What's new in 1.1.0 + +- Reuses Open WebUI's shared database connection by default (no custom engine or env vars required). +- Token-based thresholds (`compression_threshold_tokens`, `max_context_tokens`) for safer long-context handling. +- Per-model overrides via `model_thresholds` for mixed-model workflows. +- Documentation now mirrors the latest async workflow and retention-first injection. --- ## Core Features -- ✅ **Automatic Compression**: Triggers context compression automatically based on a message count threshold. -- ✅ **Asynchronous Summarization**: Generates summaries in the background without blocking the current chat response. -- ✅ **Persistent Storage**: Uses Open WebUI's shared database connection - automatically supports any database backend (PostgreSQL, SQLite, etc.). -- ✅ **Flexible Retention Policy**: Freely configure the number of initial and final messages to keep, ensuring critical information and context continuity. -- ✅ **Smart Injection**: Intelligently injects the generated historical summary into the new context. +- ✅ Automatic compression triggered by token thresholds. +- ✅ Asynchronous summarization that does not block chat responses. +- ✅ Persistent storage via Open WebUI's shared database connection (PostgreSQL, SQLite, etc.). +- ✅ Flexible retention policy to keep the first and last N messages. +- ✅ Smart injection of historical summaries back into the context. --- ## Installation & Configuration -### 1. Database (Automatic) +### 1) Database (automatic) -This plugin automatically uses Open WebUI's shared database connection. **No additional database configuration is required.** +- Uses Open WebUI's shared database connection; no extra configuration needed. +- The `chat_summary` table is created on first run. -The `chat_summary` table will be created automatically on first run. +### 2) Filter order -### 2. Filter Order +It is recommended to keep this filter early in the chain so it runs before filters that mutate messages: -It is recommended to set the priority of this filter relatively high (a smaller number) to ensure it runs before other filters that might modify message content. A typical order might be: - -1. **Pre-Filters (priority < 10)** - - e.g., A filter that injects a system-level prompt. -2. **This Compression Filter (priority = 10)** -3. **Post-Filters (priority > 10)** - - e.g., A filter that formats the final output. +1. Pre-filters (priority < 10) — e.g., system prompt injectors. +2. This compression filter (priority = 10). +3. Post-filters (priority > 10) — e.g., output formatting. --- ## Configuration Parameters -You can adjust the following parameters in the filter's settings: - -| Parameter | Default | Description | -| :--- | :--- | :--- | -| `priority` | `10` | The execution order of the filter. Lower numbers run first. | -| `compression_threshold` | `15` | When the total message count reaches this value, a background summary generation will be triggered. | -| `keep_first` | `1` | Always keep the first N messages. The first message often contains important system prompts. | -| `keep_last` | `6` | Always keep the last N messages to ensure contextual coherence. | -| `summary_model` | `None` | The model used for generating summaries. **Strongly recommended** to set a fast, economical, and compatible model (e.g., `gemini-2.5-flash`). If left empty, it will try to use the current chat's model, which may fail if it's an incompatible model type (like a Pipe model). | -| `max_summary_tokens` | `4000` | The maximum number of tokens allowed for the generated summary. | -| `summary_temperature` | `0.3` | Controls the randomness of the summary. Lower values are more deterministic. | -| `debug_mode` | `true` | Whether to print detailed debug information to the log. Recommended to set to `false` in production. | +| Parameter | Default | Description | +| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `priority` | `10` | Execution order; lower runs earlier. | +| `compression_threshold_tokens` | `64000` | Trigger asynchronous summary when total tokens exceed this value. Set to 50%-70% of your model's context window. | +| `max_context_tokens` | `128000` | Hard cap for context; older messages (except protected ones) are dropped if exceeded. | +| `keep_first` | `1` | Always keep the first N messages (protects system prompts). | +| `keep_last` | `6` | Always keep the last N messages to preserve recent context. | +| `summary_model` | `None` | Model for summaries. Strongly recommended to set a fast, economical model (e.g., `gemini-2.5-flash`, `deepseek-v3`). Falls back to the current chat model when empty. | +| `max_summary_tokens` | `4000` | Maximum tokens for the generated summary. | +| `summary_temperature` | `0.3` | Randomness for summary generation. Lower is more deterministic. | +| `model_thresholds` | `{}` | Per-model overrides for `compression_threshold_tokens` and `max_context_tokens` (useful for mixed models). | +| `debug_mode` | `true` | Log verbose debug info. Set to `false` in production. | --- ## Troubleshooting -- **Problem: Database table not created.** - - **Solution**: Ensure Open WebUI is properly configured with a database and check Open WebUI's logs for detailed error messages. - -- **Problem: Summary not generated.** - - **Solution**: Check if the `compression_threshold` has been met and verify that `summary_model` is configured correctly. Check the logs for detailed errors. - -- **Problem: Initial system prompt is lost.** - - **Solution**: Ensure `keep_first` is set to a value greater than 0 to preserve the initial messages containing important information. - -- **Problem: Compression effect is not significant.** - - **Solution**: Try increasing the `compression_threshold` or decreasing the `keep_first` / `keep_last` values. +- **Database table not created**: Ensure Open WebUI is configured with a database and check Open WebUI logs for errors. +- **Summary not generated**: Confirm `compression_threshold_tokens` was hit and `summary_model` is compatible. Review logs for details. +- **Initial system prompt is lost**: Keep `keep_first` greater than 0 to protect the initial message. +- **Compression effect is weak**: Raise `compression_threshold_tokens` or lower `keep_first` / `keep_last` to allow more aggressive compression. diff --git a/plugins/filters/async-context-compression/README_CN.md b/plugins/filters/async-context-compression/README_CN.md index 85f636c..ab26ecc 100644 --- a/plugins/filters/async-context-compression/README_CN.md +++ b/plugins/filters/async-context-compression/README_CN.md @@ -1,20 +1,27 @@ # 异步上下文压缩过滤器 -**作者:** [Fu-Jie](https://github.com/Fu-Jie) | **版本:** 1.1.0 | **许可证:** MIT +**作者:** [Fu-Jie](https://github.com/Fu-Jie) | **版本:** 1.2.0 | **许可证:** MIT > **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。 -本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的Token消耗。 +本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。 + +## 1.1.0 版本更新 + +- 默认复用 OpenWebUI 内置数据库连接,无需自建引擎、无需配置 `DATABASE_URL`。 +- 基于 Token 的阈值控制(`compression_threshold_tokens`、`max_context_tokens`),长上下文更安全。 +- 支持 `model_thresholds` 为不同模型设置专属阈值,适合混用多模型场景。 +- 文档同步最新异步工作流与“先保留再注入”策略。 --- ## 核心特性 -- ✅ **自动压缩**: 基于消息数量阈值自动触发上下文压缩。 -- ✅ **异步摘要**: 在后台生成摘要,不阻塞当前对话的响应。 -- ✅ **持久化存储**: 使用 Open WebUI 的共享数据库连接 - 自动支持任何数据库后端(PostgreSQL、SQLite 等)。 -- ✅ **灵活保留策略**: 可自由配置保留对话头部和尾部的消息数量,确保关键信息和上下文的连贯性。 -- ✅ **智能注入**: 将生成的历史摘要智能地注入到新的上下文中。 +- ✅ **自动压缩**: 基于 Token 阈值自动触发上下文压缩。 +- ✅ **异步摘要**: 后台生成摘要,不阻塞当前对话响应。 +- ✅ **持久化存储**: 复用 Open WebUI 共享数据库连接,自动支持 PostgreSQL/SQLite 等。 +- ✅ **灵活保留策略**: 可配置保留对话头部和尾部消息,确保关键信息连贯。 +- ✅ **智能注入**: 将历史摘要智能注入到新上下文中。 详细的工作原理和流程请参考 [工作流程指南](WORKFLOW_GUIDE_CN.md)。 @@ -24,19 +31,16 @@ ### 1. 数据库(自动) -本插件自动使用 Open WebUI 的共享数据库连接。**无需额外的数据库配置。** - -`chat_summary` 表将在首次运行时自动创建。 +- 自动使用 Open WebUI 的共享数据库连接,**无需额外配置**。 +- 首次运行自动创建 `chat_summary` 表。 ### 2. 过滤器顺序 建议将此过滤器的优先级设置得相对较高(数值较小),以确保它在其他可能修改消息内容的过滤器之前运行。一个典型的顺序可能是: -1. **前置过滤器 (priority < 10)** - - 例如:注入系统级提示的过滤器。 -2. **本压缩过滤器 (priority = 10)** -3. **后置过滤器 (priority > 10)** - - 例如:对最终输出进行格式化的过滤器。 +1. 前置过滤器 (priority < 10) —— 例如系统提示注入。 +2. 本压缩过滤器 (priority = 10)。 +3. 后置过滤器 (priority > 10) —— 例如最终输出格式化。 --- @@ -46,29 +50,29 @@ ### 核心参数 -| 参数 | 默认值 | 描述 | -| :--- | :--- | :--- | -| `priority` | `10` | 过滤器执行顺序,数值越小越先执行。 | -| `compression_threshold_tokens` | `64000` | **(重要)** 当上下文总 Token 数超过此值时,将在后台触发摘要生成。建议设置为模型最大上下文窗口的 50%-70%。 | -| `max_context_tokens` | `128000` | **(重要)** 上下文的硬性上限。如果超过此值,将强制移除最早的消息(保留受保护消息除外)。防止 Token 溢出。 | -| `keep_first` | `1` | 始终保留对话开始的 N 条消息。第一条消息通常包含重要的系统提示或环境变量,建议至少保留 1 条。 | -| `keep_last` | `6` | 始终保留对话末尾的 N 条消息,以确保最近对话的连贯性。 | +| 参数 | 默认值 | 描述 | +| :----------------------------- | :------- | :------------------------------------------------------------------------------------ | +| `priority` | `10` | 过滤器执行顺序,数值越小越先执行。 | +| `compression_threshold_tokens` | `64000` | **重要**: 当上下文总 Token 超过此值时后台生成摘要,建议设为模型上下文窗口的 50%-70%。 | +| `max_context_tokens` | `128000` | **重要**: 上下文硬上限,超过即移除最早消息(保留受保护消息)。 | +| `keep_first` | `1` | 始终保留对话开始的 N 条消息,保护系统提示或环境变量。 | +| `keep_last` | `6` | 始终保留对话末尾的 N 条消息,确保最近上下文连贯。 | ### 摘要生成配置 -| 参数 | 默认值 | 描述 | -| :--- | :--- | :--- | -| `summary_model` | `None` | 用于生成摘要的模型 ID。**强烈建议**配置一个快速、经济且上下文窗口较大的模型(如 `gemini-2.5-flash`)。如果留空,将尝试使用当前对话的模型。 | -| `max_summary_tokens` | `16384` | 生成摘要时允许的最大 Token 数。 | -| `summary_temperature` | `0.1` | 控制摘要生成的随机性,较低的值结果更稳定。 | +| 参数 | 默认值 | 描述 | +| :-------------------- | :------ | :------------------------------------------------------------------------------------------------------------------------------------------ | +| `summary_model` | `None` | 用于生成摘要的模型 ID。**强烈建议**配置快速、经济、上下文窗口大的模型(如 `gemini-2.5-flash`、`deepseek-v3`)。留空则尝试复用当前对话模型。 | +| `max_summary_tokens` | `16384` | 生成摘要时允许的最大 Token 数。 | +| `summary_temperature` | `0.1` | 控制摘要生成的随机性,较低的值结果更稳定。 | ### 高级配置 #### `model_thresholds` (模型特定阈值) -这是一个字典配置,允许您为特定的模型 ID 覆盖全局的 `compression_threshold_tokens` 和 `max_context_tokens`。这对于混合使用不同上下文窗口大小的模型非常有用。 +这是一个字典配置,可为特定模型 ID 覆盖全局 `compression_threshold_tokens` 与 `max_context_tokens`,适用于混合不同上下文窗口的模型。 -**默认配置包含了主流模型(如 GPT-4, Claude 3.5, Gemini 1.5/2.0, Qwen 2.5/3, DeepSeek V3 等)的推荐阈值。** +**默认包含 GPT-4、Claude 3.5、Gemini 1.5/2.0、Qwen 2.5/3、DeepSeek V3 等推荐阈值。** **配置示例:** @@ -87,21 +91,14 @@ #### `debug_mode` -- **默认值**: `true` -- **描述**: 是否在 Open WebUI 的控制台日志中打印详细的调试信息(如 Token 计数、压缩进度、数据库操作等)。生产环境建议设为 `false`。 +- **默认值**: `true` +- **描述**: 是否在 Open WebUI 的控制台日志中打印详细的调试信息(如 Token 计数、压缩进度、数据库操作等)。生产环境建议设为 `false`。 --- ## 故障排除 -- **问题:数据库表未创建** - - **解决**:确保 Open WebUI 已正确配置数据库,并查看 Open WebUI 的日志以获取详细的错误信息。 - -- **问题:摘要未生成** - - **解决**:检查 `compression_threshold_tokens` 是否已达到,并确认 `summary_model` 配置正确。查看日志以获取详细错误。 - -- **问题:初始的系统提示丢失** - - **解决**:确保 `keep_first` 的值大于 0,以保留包含重要信息的初始消息。 - -- **问题:压缩效果不明显** - - **解决**:尝试适当提高 `compression_threshold_tokens`,或减少 `keep_first` / `keep_last` 的值。 +- **数据库表未创建**:确保 Open WebUI 已配置数据库,并查看日志获取错误信息。 +- **摘要未生成**:检查是否达到 `compression_threshold_tokens`,确认 `summary_model` 可用,并查看日志。 +- **初始系统提示丢失**:将 `keep_first` 设置为大于 0。 +- **压缩效果不明显**:提高 `compression_threshold_tokens`,或降低 `keep_first` / `keep_last` 以增强压缩力度。 diff --git a/plugins/filters/async-context-compression/async_context_compression.py b/plugins/filters/async-context-compression/async_context_compression.py index ac4f47d..ef365fc 100644 --- a/plugins/filters/async-context-compression/async_context_compression.py +++ b/plugins/filters/async-context-compression/async_context_compression.py @@ -127,7 +127,7 @@ summary_model - If the current conversation uses a pipeline (Pipe) model or a model that does not support standard generation APIs, leaving this field empty may cause summary generation to fail. In this case, you must specify a valid model. max_summary_tokens - Default: 4000 + Default: 16384 Description: The maximum number of tokens allowed for the generated summary. summary_temperature diff --git a/plugins/filters/async-context-compression/异步上下文压缩.py b/plugins/filters/async-context-compression/异步上下文压缩.py index e103d6d..ecf2864 100644 --- a/plugins/filters/async-context-compression/异步上下文压缩.py +++ b/plugins/filters/async-context-compression/异步上下文压缩.py @@ -126,7 +126,7 @@ summary_model (摘要模型) - 如果当前对话使用的是流水线(Pipe)模型或不直接支持标准生成API的模型,留空此项可能会导致摘要生成失败。在这种情况下,必须指定一个有效的模型。 max_summary_tokens (摘要长度) - 默认: 4000 + 默认: 16384 说明: 生成摘要时允许的最大 token 数。 summary_temperature (摘要温度)