更新文档和参数：将版本号提升至1.2.0，增加新特性说明，调整最大摘要token数至16384

2025-12-31 13:21:33 +08:00
parent 2cf64c085a
commit 59ee25754d
4 changed files with 78 additions and 86 deletions
--- a/plugins/filters/async-context-compression/README.md
+++ b/plugins/filters/async-context-compression/README.md
@@ -1,70 +1,65 @@
 # Async Context Compression Filter

-**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.1.0 | **License:** MIT
+**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.2.0 | **License:** MIT

-> **Important Note**: To ensure the maintainability and usability of all filters, each filter should be accompanied by clear and complete documentation to fully explain its functionality, configuration, and usage.
+This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.

-This filter significantly reduces token consumption in long conversations by using intelligent summarization and message compression, while maintaining conversational coherence.
+## What's new in 1.1.0 
+
+- Reuses Open WebUI's shared database connection by default (no custom engine or env vars required).
+- Token-based thresholds (`compression_threshold_tokens`, `max_context_tokens`) for safer long-context handling.
+- Per-model overrides via `model_thresholds` for mixed-model workflows.
+- Documentation now mirrors the latest async workflow and retention-first injection.

 ---

 ## Core Features

-   ✅ **Automatic Compression**: Triggers context compression automatically based on a message count threshold.
-   ✅ **Asynchronous Summarization**: Generates summaries in the background without blocking the current chat response.
-   ✅ **Persistent Storage**: Uses Open WebUI's shared database connection - automatically supports any database backend (PostgreSQL, SQLite, etc.).
-   ✅ **Flexible Retention Policy**: Freely configure the number of initial and final messages to keep, ensuring critical information and context continuity.
-   ✅ **Smart Injection**: Intelligently injects the generated historical summary into the new context.
+- ✅ Automatic compression triggered by token thresholds.
+- ✅ Asynchronous summarization that does not block chat responses.
+- ✅ Persistent storage via Open WebUI's shared database connection (PostgreSQL, SQLite, etc.).
+- ✅ Flexible retention policy to keep the first and last N messages.
+- ✅ Smart injection of historical summaries back into the context.

 ---

 ## Installation & Configuration

-### 1. Database (Automatic)
+### 1) Database (automatic)

-This plugin automatically uses Open WebUI's shared database connection. **No additional database configuration is required.**
+- Uses Open WebUI's shared database connection; no extra configuration needed.
+- The `chat_summary` table is created on first run.

-The `chat_summary` table will be created automatically on first run.
+### 2) Filter order

-### 2. Filter Order
+It is recommended to keep this filter early in the chain so it runs before filters that mutate messages:

-It is recommended to set the priority of this filter relatively high (a smaller number) to ensure it runs before other filters that might modify message content. A typical order might be:
-
-1.  **Pre-Filters (priority < 10)**
-    -   e.g., A filter that injects a system-level prompt.
-2.  **This Compression Filter (priority = 10)**
-3.  **Post-Filters (priority > 10)**
-    -   e.g., A filter that formats the final output.
+1. Pre-filters (priority < 10) — e.g., system prompt injectors.
+2. This compression filter (priority = 10).
+3. Post-filters (priority > 10) — e.g., output formatting.

 ---

 ## Configuration Parameters

-You can adjust the following parameters in the filter's settings:
-
-| Parameter | Default | Description |
-| :--- | :--- | :--- |
-| `priority` | `10` | The execution order of the filter. Lower numbers run first. |
-| `compression_threshold` | `15` | When the total message count reaches this value, a background summary generation will be triggered. |
-| `keep_first` | `1` | Always keep the first N messages. The first message often contains important system prompts. |
-| `keep_last` | `6` | Always keep the last N messages to ensure contextual coherence. |
-| `summary_model` | `None` | The model used for generating summaries. **Strongly recommended** to set a fast, economical, and compatible model (e.g., `gemini-2.5-flash`). If left empty, it will try to use the current chat's model, which may fail if it's an incompatible model type (like a Pipe model). |
-| `max_summary_tokens` | `4000` | The maximum number of tokens allowed for the generated summary. |
-| `summary_temperature` | `0.3` | Controls the randomness of the summary. Lower values are more deterministic. |
-| `debug_mode` | `true` | Whether to print detailed debug information to the log. Recommended to set to `false` in production. |
+| Parameter                      | Default  | Description                                                                                                                                                           |
+| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `priority`                     | `10`     | Execution order; lower runs earlier.                                                                                                                                  |
+| `compression_threshold_tokens` | `64000`  | Trigger asynchronous summary when total tokens exceed this value. Set to 50%-70% of your model's context window.                                                      |
+| `max_context_tokens`           | `128000` | Hard cap for context; older messages (except protected ones) are dropped if exceeded.                                                                                 |
+| `keep_first`                   | `1`      | Always keep the first N messages (protects system prompts).                                                                                                           |
+| `keep_last`                    | `6`      | Always keep the last N messages to preserve recent context.                                                                                                           |
+| `summary_model`                | `None`   | Model for summaries. Strongly recommended to set a fast, economical model (e.g., `gemini-2.5-flash`, `deepseek-v3`). Falls back to the current chat model when empty. |
+| `max_summary_tokens`           | `4000`   | Maximum tokens for the generated summary.                                                                                                                             |
+| `summary_temperature`          | `0.3`    | Randomness for summary generation. Lower is more deterministic.                                                                                                       |
+| `model_thresholds`             | `{}`     | Per-model overrides for `compression_threshold_tokens` and `max_context_tokens` (useful for mixed models).                                                            |
+| `debug_mode`                   | `true`   | Log verbose debug info. Set to `false` in production.                                                                                                                 |

 ---

 ## Troubleshooting

-   **Problem: Database table not created.**
-    -   **Solution**: Ensure Open WebUI is properly configured with a database and check Open WebUI's logs for detailed error messages.
-
-   **Problem: Summary not generated.**
-    -   **Solution**: Check if the `compression_threshold` has been met and verify that `summary_model` is configured correctly. Check the logs for detailed errors.
-
-   **Problem: Initial system prompt is lost.**
-    -   **Solution**: Ensure `keep_first` is set to a value greater than 0 to preserve the initial messages containing important information.
-
-   **Problem: Compression effect is not significant.**
-    -   **Solution**: Try increasing the `compression_threshold` or decreasing the `keep_first` / `keep_last` values.
+- **Database table not created**: Ensure Open WebUI is configured with a database and check Open WebUI logs for errors.
+- **Summary not generated**: Confirm `compression_threshold_tokens` was hit and `summary_model` is compatible. Review logs for details.
+- **Initial system prompt is lost**: Keep `keep_first` greater than 0 to protect the initial message.
+- **Compression effect is weak**: Raise `compression_threshold_tokens` or lower `keep_first` / `keep_last` to allow more aggressive compression.
--- a/plugins/filters/async-context-compression/README_CN.md
+++ b/plugins/filters/async-context-compression/README_CN.md
@@ -1,20 +1,27 @@
 # 异步上下文压缩过滤器

-**作者:** [Fu-Jie](https://github.com/Fu-Jie) | **版本:** 1.1.0 | **许可证:** MIT
+**作者:** [Fu-Jie](https://github.com/Fu-Jie) | **版本:** 1.2.0 | **许可证:** MIT

 > **重要提示**：为了确保所有过滤器的可维护性和易用性，每个过滤器都应附带清晰、完整的文档，以确保其功能、配置和使用方法得到充分说明。

-本过滤器通过智能摘要和消息压缩技术，在保持对话连贯性的同时，显著降低长对话的Token消耗。
+本过滤器通过智能摘要和消息压缩技术，在保持对话连贯性的同时，显著降低长对话的 Token 消耗。
+
+## 1.1.0 版本更新
+
+- 默认复用 OpenWebUI 内置数据库连接，无需自建引擎、无需配置 `DATABASE_URL`。
+- 基于 Token 的阈值控制（`compression_threshold_tokens`、`max_context_tokens`），长上下文更安全。
+- 支持 `model_thresholds` 为不同模型设置专属阈值，适合混用多模型场景。
+- 文档同步最新异步工作流与“先保留再注入”策略。

 ---

 ## 核心特性

-   ✅ **自动压缩**: 基于消息数量阈值自动触发上下文压缩。
-   ✅ **异步摘要**: 在后台生成摘要，不阻塞当前对话的响应。
-   ✅ **持久化存储**: 使用 Open WebUI 的共享数据库连接 - 自动支持任何数据库后端（PostgreSQL、SQLite 等）。
-   ✅ **灵活保留策略**: 可自由配置保留对话头部和尾部的消息数量，确保关键信息和上下文的连贯性。
-   ✅ **智能注入**: 将生成的历史摘要智能地注入到新的上下文中。
+- ✅ **自动压缩**: 基于 Token 阈值自动触发上下文压缩。
+- ✅ **异步摘要**: 后台生成摘要，不阻塞当前对话响应。
+- ✅ **持久化存储**: 复用 Open WebUI 共享数据库连接，自动支持 PostgreSQL/SQLite 等。
+- ✅ **灵活保留策略**: 可配置保留对话头部和尾部消息，确保关键信息连贯。
+- ✅ **智能注入**: 将历史摘要智能注入到新上下文中。

 详细的工作原理和流程请参考 [工作流程指南](WORKFLOW_GUIDE_CN.md)。

@@ -24,19 +31,16 @@

 ### 1. 数据库（自动）

-本插件自动使用 Open WebUI 的共享数据库连接。**无需额外的数据库配置。**
-
-`chat_summary` 表将在首次运行时自动创建。
+- 自动使用 Open WebUI 的共享数据库连接，**无需额外配置**。
+- 首次运行自动创建 `chat_summary` 表。

 ### 2. 过滤器顺序

 建议将此过滤器的优先级设置得相对较高（数值较小），以确保它在其他可能修改消息内容的过滤器之前运行。一个典型的顺序可能是：

-1.  **前置过滤器 (priority < 10)**
-    -   例如：注入系统级提示的过滤器。
-2.  **本压缩过滤器 (priority = 10)**
-3.  **后置过滤器 (priority > 10)**
-    -   例如：对最终输出进行格式化的过滤器。
+1. 前置过滤器 (priority < 10) —— 例如系统提示注入。
+2. 本压缩过滤器 (priority = 10)。
+3. 后置过滤器 (priority > 10) —— 例如最终输出格式化。

 ---

@@ -46,29 +50,29 @@

 ### 核心参数

-| 参数 | 默认值 | 描述 |
-| :--- | :--- | :--- |
-| `priority` | `10` | 过滤器执行顺序，数值越小越先执行。 |
-| `compression_threshold_tokens` | `64000` | **(重要)** 当上下文总 Token 数超过此值时，将在后台触发摘要生成。建议设置为模型最大上下文窗口的 50%-70%。 |
-| `max_context_tokens` | `128000` | **(重要)** 上下文的硬性上限。如果超过此值，将强制移除最早的消息（保留受保护消息除外）。防止 Token 溢出。 |
-| `keep_first` | `1` | 始终保留对话开始的 N 条消息。第一条消息通常包含重要的系统提示或环境变量，建议至少保留 1 条。 |
-| `keep_last` | `6` | 始终保留对话末尾的 N 条消息，以确保最近对话的连贯性。 |
+| 参数                           | 默认值   | 描述                                                                                  |
+| :----------------------------- | :------- | :------------------------------------------------------------------------------------ |
+| `priority`                     | `10`     | 过滤器执行顺序，数值越小越先执行。                                                    |
+| `compression_threshold_tokens` | `64000`  | **重要**: 当上下文总 Token 超过此值时后台生成摘要，建议设为模型上下文窗口的 50%-70%。 |
+| `max_context_tokens`           | `128000` | **重要**: 上下文硬上限，超过即移除最早消息（保留受保护消息）。                        |
+| `keep_first`                   | `1`      | 始终保留对话开始的 N 条消息，保护系统提示或环境变量。                                 |
+| `keep_last`                    | `6`      | 始终保留对话末尾的 N 条消息，确保最近上下文连贯。                                     |

 ### 摘要生成配置

-| 参数 | 默认值 | 描述 |
-| :--- | :--- | :--- |
-| `summary_model` | `None` | 用于生成摘要的模型 ID。**强烈建议**配置一个快速、经济且上下文窗口较大的模型（如 `gemini-2.5-flash`）。如果留空，将尝试使用当前对话的模型。 |
-| `max_summary_tokens` | `16384` | 生成摘要时允许的最大 Token 数。 |
-| `summary_temperature` | `0.1` | 控制摘要生成的随机性，较低的值结果更稳定。 |
+| 参数                  | 默认值  | 描述                                                                                                                                        |
+| :-------------------- | :------ | :------------------------------------------------------------------------------------------------------------------------------------------ |
+| `summary_model`       | `None`  | 用于生成摘要的模型 ID。**强烈建议**配置快速、经济、上下文窗口大的模型（如 `gemini-2.5-flash`、`deepseek-v3`）。留空则尝试复用当前对话模型。 |
+| `max_summary_tokens`  | `16384` | 生成摘要时允许的最大 Token 数。                                                                                                             |
+| `summary_temperature` | `0.1`   | 控制摘要生成的随机性，较低的值结果更稳定。                                                                                                  |

 ### 高级配置

 #### `model_thresholds` (模型特定阈值)

-这是一个字典配置，允许您为特定的模型 ID 覆盖全局的 `compression_threshold_tokens` 和 `max_context_tokens`。这对于混合使用不同上下文窗口大小的模型非常有用。
+这是一个字典配置，可为特定模型 ID 覆盖全局 `compression_threshold_tokens` 与 `max_context_tokens`，适用于混合不同上下文窗口的模型。

-**默认配置包含了主流模型（如 GPT-4, Claude 3.5, Gemini 1.5/2.0, Qwen 2.5/3, DeepSeek V3 等）的推荐阈值。**
+**默认包含 GPT-4、Claude 3.5、Gemini 1.5/2.0、Qwen 2.5/3、DeepSeek V3 等推荐阈值。**

 **配置示例：**

@@ -87,21 +91,14 @@

 #### `debug_mode`

-   **默认值**: `true`
-   **描述**: 是否在 Open WebUI 的控制台日志中打印详细的调试信息（如 Token 计数、压缩进度、数据库操作等）。生产环境建议设为 `false`。
+- **默认值**: `true`
+- **描述**: 是否在 Open WebUI 的控制台日志中打印详细的调试信息（如 Token 计数、压缩进度、数据库操作等）。生产环境建议设为 `false`。

 ---

 ## 故障排除

-   **问题：数据库表未创建**
-    -   **解决**：确保 Open WebUI 已正确配置数据库，并查看 Open WebUI 的日志以获取详细的错误信息。
-
-   **问题：摘要未生成**
-    -   **解决**：检查 `compression_threshold_tokens` 是否已达到，并确认 `summary_model` 配置正确。查看日志以获取详细错误。
-
-   **问题：初始的系统提示丢失**
-    -   **解决**：确保 `keep_first` 的值大于 0，以保留包含重要信息的初始消息。
-
-   **问题：压缩效果不明显**
-    -   **解决**：尝试适当提高 `compression_threshold_tokens`，或减少 `keep_first` / `keep_last` 的值。
+- **数据库表未创建**：确保 Open WebUI 已配置数据库，并查看日志获取错误信息。
+- **摘要未生成**：检查是否达到 `compression_threshold_tokens`，确认 `summary_model` 可用，并查看日志。
+- **初始系统提示丢失**：将 `keep_first` 设置为大于 0。
+- **压缩效果不明显**：提高 `compression_threshold_tokens`，或降低 `keep_first` / `keep_last` 以增强压缩力度。
--- a/plugins/filters/async-context-compression/async_context_compression.py
+++ b/plugins/filters/async-context-compression/async_context_compression.py
@@ -127,7 +127,7 @@ summary_model
    - If the current conversation uses a pipeline (Pipe) model or a model that does not support standard generation APIs, leaving this field empty may cause summary generation to fail. In this case, you must specify a valid model.

 max_summary_tokens
-  Default: 4000
+  Default: 16384
  Description: The maximum number of tokens allowed for the generated summary.

 summary_temperature
--- a/plugins/filters/async-context-compression/异步上下文压缩.py
+++ b/plugins/filters/async-context-compression/异步上下文压缩.py
@@ -126,7 +126,7 @@ summary_model (摘要模型)
    - 如果当前对话使用的是流水线（Pipe）模型或不直接支持标准生成API的模型，留空此项可能会导致摘要生成失败。在这种情况下，必须指定一个有效的模型。

 max_summary_tokens (摘要长度)
-  默认: 4000
+  默认: 16384
  说明: 生成摘要时允许的最大 token 数。

 summary_temperature (摘要温度)