feat(async-context-compression): v1.6.0 absolute system message protection

- Redefine keep_first to count only non-system messages, protecting the first N user/assistant exchanges plus all interleaved system messages - System messages in the compression gap are now extracted and preserved as original messages instead of being summarized - System messages dropped during forced trimming are re-inserted into the final output - Change keep_first default from 1 to 0 - Update docstring, README, README_CN, WORKFLOW_GUIDE_CN, and docs mirrors Fixes #62
2026-03-24 02:45:12 +08:00
parent f30d3ed12c
commit ff44d324eb
8 changed files with 236 additions and 94 deletions
--- a/docs/plugins/filters/async-context-compression.md
+++ b/docs/plugins/filters/async-context-compression.md
@@ -1,6 +1,6 @@
 # Async Context Compression Filter

-| By [Fu-Jie](https://github.com/Fu-Jie) · v1.5.0 | [⭐ Star this repo](https://github.com/Fu-Jie/openwebui-extensions) |
+| By [Fu-Jie](https://github.com/Fu-Jie) · v1.6.0 | [⭐ Star this repo](https://github.com/Fu-Jie/openwebui-extensions) |
 | :--- | ---: |

 | ![followers](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_followers.json&label=%F0%9F%91%A5&style=flat) | ![points](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_points.json&label=%E2%AD%90&style=flat) | ![top](https://img.shields.io/badge/%F0%9F%8F%86-Top%20%3C1%25-10b981?style=flat) | ![contributions](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_contributions.json&label=%F0%9F%93%A6&style=flat) | ![downloads](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_downloads.json&label=%E2%AC%87%EF%B8%8F&style=flat) | ![saves](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_saves.json&label=%F0%9F%92%BE&style=flat) | ![views](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_views.json&label=%F0%9F%91%81%EF%B8%8F&style=flat) |
@@ -8,6 +8,25 @@

 This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.

+## Install with Batch Install Plugins
+
+If you already use [Batch Install Plugins from GitHub](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/tools/batch-install-plugins), you can install or update this plugin with:
+
+```text
+Install plugin from Fu-Jie/openwebui-extensions
+```
+
+When the selection dialog opens, search for this plugin, check it, and continue.
+
+> [!IMPORTANT]
+> If the official OpenWebUI Community version is already installed, remove it first. After that, Batch Install Plugins can keep this plugin updated in future runs.
+
+## What's new in 1.6.0
+
+- **Fixed `keep_first` Logic**: Re-defined `keep_first` to protect the first N **non-system** messages plus all interleaved system messages. This ensures initial context (e.g., identity, task instructions) is preserved correctly.
+- **Absolute System Message Protection**: System messages are now strictly excluded from compression. Any system message encountered in the history (even late-injected ones) is preserved as an original message in the final context.
+- **Improved Context Assembly**: Summaries now only target User and Assistant dialogue, ensuring that system instructions injected by other plugins are never "eaten" by the summarizer.
+
 ## What's new in 1.5.0

 - **External Chat Reference Summaries**: Added support for referenced chat context blocks that can reuse cached summaries, inject small referenced chats directly, or generate summaries for larger referenced chats before injection.
@@ -41,6 +60,10 @@ This filter reduces token consumption in long conversations through intelligent

 ## What This Fixes

+- **Problem: System Messages being summarized/lost.**
+  Previously, the filter could include system messages (especially those injected late by other plugins) in its summarization zone, causing important instructions to be lost. Now, all system messages are strictly preserved in their original role and never summarized.
+- **Problem: Incorrect `keep_first` behavior.**
+  Previously, `keep_first` simply took the first $N$ messages. If those were only system messages, the initial user/assistant messages (which are often important for context) would be summarized. Now, `keep_first` ensures that $N$ non-system messages are protected.
 - **Problem 1: A referenced chat could break the current request.**
  Before, if the filter needed to summarize a referenced chat and that LLM call failed, the current chat could fail with it. Now it degrades gracefully and injects direct context instead.
 - **Problem 2: Some referenced chats were being cut too aggressively.**
@@ -128,7 +151,7 @@ flowchart TD
 | `priority`                     | `10`     | Execution order; lower runs earlier.                                                                                                                                  |
 | `compression_threshold_tokens` | `64000`  | Trigger asynchronous summary when total tokens exceed this value. Set to 50%-70% of your model's context window.                                                      |
 | `max_context_tokens`           | `128000` | Hard cap for context; older messages (except protected ones) are dropped if exceeded.                                                                                 |
-| `keep_first`                   | `1`      | Always keep the first N messages (protects system prompts).                                                                                                           |
+| `keep_first`                   | `1`      | Number of initial **non-system** messages to always keep (plus all preceding system prompts).                                                                         |
 | `keep_last`                    | `6`      | Always keep the last N messages to preserve recent context.                                                                                                           |
 | `summary_model`                | `None`   | Model for summaries. Strongly recommended to set a fast, economical model (e.g., `gemini-2.5-flash`, `deepseek-v3`). Falls back to the current chat model when empty. |
 | `summary_model_max_context`    | `0`      | Input context window used to fit summary requests. If `0`, falls back to `model_thresholds` or global `max_context_tokens`.                                          |
--- a/docs/plugins/filters/async-context-compression.zh.md
+++ b/docs/plugins/filters/async-context-compression.zh.md
@@ -1,6 +1,6 @@
 # 异步上下文压缩过滤器

-| 作者：[Fu-Jie](https://github.com/Fu-Jie) · v1.5.0 | [⭐ 点个 Star 支持项目](https://github.com/Fu-Jie/openwebui-extensions) |
+| 作者：[Fu-Jie](https://github.com/Fu-Jie) · v1.6.0 | [⭐ 点个 Star 支持项目](https://github.com/Fu-Jie/openwebui-extensions) |
 | :--- | ---: |

 | ![followers](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_followers.json&label=%F0%9F%91%A5&style=flat) | ![points](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_points.json&label=%E2%AD%90&style=flat) | ![top](https://img.shields.io/badge/%F0%9F%8F%86-Top%20%3C1%25-10b981?style=flat) | ![contributions](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_contributions.json&label=%F0%9F%93%A6&style=flat) | ![downloads](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_downloads.json&label=%E2%AC%87%EF%B8%8F&style=flat) | ![saves](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_saves.json&label=%F0%9F%92%BE&style=flat) | ![views](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_views.json&label=%F0%9F%91%81%EF%B8%8F&style=flat) |
@@ -10,6 +10,25 @@

 本过滤器通过智能摘要和消息压缩技术，在保持对话连贯性的同时，显著降低长对话的 Token 消耗。

+## 使用 Batch Install Plugins 安装
+
+如果你已经安装了 [Batch Install Plugins from GitHub](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/tools/batch-install-plugins)，可以用下面这句来安装或更新当前插件：
+
+```text
+从 Fu-Jie/openwebui-extensions 安装插件
+```
+
+当选择弹窗打开后，搜索当前插件，勾选后继续安装即可。
+
+> [!IMPORTANT]
+> 如果你已经安装了 OpenWebUI 官方社区里的同名版本，请先删除旧版本，否则重新安装时可能报错。删除后，Batch Install Plugins 后续就可以继续负责更新这个插件。
+
+## 1.6.0 版本更新
+
+- **修正 `keep_first` 逻辑**：重新定义了 `keep_first` 的功能，现在它负责保护前 N 条**非系统消息**（以及它们之前的所有系统提示词）。这确保了初始对话背景（如身份设定、任务说明）能被正确保留。
+- **系统消息绝对保护**：系统消息现在被严格排除在压缩范围之外。历史记录中遇到的任何系统消息（甚至是后期注入的消息）都会作为原始消息保留在最终上下文中。
+- **改进的上下文组装**：摘要现在仅针对用户和助手的对话，确保其他插件注入的系统指令永远不会被摘要器“吃掉”。
+
 ## 1.5.0 版本更新

 - **外部聊天引用摘要**: 新增对引用聊天上下文的摘要支持。现在可以复用缓存摘要、直接注入较小引用聊天，或先为较大的引用聊天生成摘要再注入。
@@ -39,12 +58,14 @@
 - ✅ **智能模型匹配**: 自定义模型自动继承基础模型的阈值配置。
 - ⚠ **多模态支持**: 图片内容会被保留，但其 Token **不参与计算**。请相应调整阈值。

-详细的工作原理和更长说明仍可参考 [工作流程指南](https://github.com/Fu-Jie/openwebui-extensions/blob/main/plugins/filters/async-context-compression/WORKFLOW_GUIDE_CN.md)。
-
 ---

 ## 这次解决了什么问题（通俗版）

+- **问题：系统消息被摘要或丢失。**
+  以前，过滤器可能会将被引用或后期注入的系统消息包含在摘要区域内，导致重要的指令丢失。现在，所有系统消息都严格按原样保留，永不被摘要。
+- **问题：`keep_first` 逻辑不符合预期。**
+  以前 `keep_first` 只是简单提取前 N 条消息。如果前几条全是系统消息，初始的问答（通常对上下文很重要）就会被压缩掉。现在 `keep_first` 确保保护 N 条非系统消息。
 - **问题 1：引用别的聊天时，摘要失败可能把当前对话一起弄挂。**
  以前如果过滤器需要先帮被引用聊天做摘要，而这一步的 LLM 调用失败了，当前请求也可能直接失败。现在改成了“能摘要就摘要，失败就退回直接塞上下文”，当前对话不会被一起拖死。
 - **问题 2：有些被引用聊天被截得太早，信息丢得太多。**
@@ -72,11 +93,11 @@ flowchart TD
    F -- 是 --> G[直接复用缓存摘要]
    F -- 否 --> H{能直接放进当前预算?}
    H -- 是 --> I[直接注入完整引用聊天文本]
-    H -- 否 --> J[准备引用聊天的摘要输入]
+    H -- No --> J[准备引用聊天的摘要输入]

    J --> K{引用聊天摘要调用成功?}
    K -- 是 --> L[注入生成后的引用摘要]
-    K -- 否 --> M[回退为直接注入上下文]
+    K -- No --> M[回退为直接注入上下文]

    G --> D
    I --> D
@@ -136,7 +157,7 @@ flowchart TD
 | `priority`                     | `10`     | 过滤器执行顺序，数值越小越先执行。                                                    |
 | `compression_threshold_tokens` | `64000`  | **重要**: 当上下文总 Token 超过此值时后台生成摘要，建议设为模型上下文窗口的 50%-70%。 |
 | `max_context_tokens`           | `128000` | **重要**: 上下文硬上限，超过即移除最早消息（保留受保护消息）。                        |
-| `keep_first`                   | `1`      | 始终保留对话开始的 N 条消息，保护系统提示或环境变量。                                 |
+| `keep_first`                   | `1`      | 始终保留对话开始的 N 条**非系统消息**（以及它们之前的所有系统提示词）。               |
 | `keep_last`                    | `6`      | 始终保留对话末尾的 N 条消息，确保最近上下文连贯。                                     |

 ### 摘要生成配置
--- a/docs/plugins/filters/index.md
+++ b/docs/plugins/filters/index.md
@@ -22,7 +22,7 @@ Filters act as middleware in the message pipeline:

    Reduces token consumption in long conversations with safer summary fallbacks and clearer failure visibility.

-    **Version:** 1.5.0
+    **Version:** 1.6.0

    [:octicons-arrow-right-24: Documentation](async-context-compression.md)

--- a/docs/plugins/filters/index.zh.md
+++ b/docs/plugins/filters/index.zh.md
@@ -22,7 +22,7 @@ Filter 充当消息管线中的中间件：

    通过更稳健的摘要回退和更清晰的失败提示，降低长对话的 token 消耗并保持连贯性。

-    **版本：** 1.5.0
+    **版本：** 1.6.0

    [:octicons-arrow-right-24: 查看文档](async-context-compression.zh.md)