feat(async-context-compression): release v1.4.0 with structure-aware grouping and session locking

- Introduced Atomic Message Grouping to prevent tool-calling corruption (Issue #56) - Implemented Tail Boundary Alignment for deterministic context truncation - Added per-chat asynchronous session locking to prevent duplicate background tasks - Enhanced summarization traceability with message IDs and names - Synchronized version and changelog across all documentation files - Optimized release-prep skill to remove redundant H1 titles Closes #56
2026-03-09 20:31:25 +08:00
parent 2eee7c5d35
commit 7efb64b16b
28 changed files with 3540 additions and 286 deletions
--- a/docs/development/fix-role-tool-error.md
+++ b/docs/development/fix-role-tool-error.md
@@ -0,0 +1,124 @@
+# Fix: OpenAI API Error "messages with role 'tool' must be a response to a preceding message with 'tool_calls'"
+
+## Problem Description
+In the `async-context-compression` filter, chat history can be trimmed or summarized when the conversation grows. If the retained tail starts in the middle of a native tool-calling sequence, the next request may begin with a `tool` message whose triggering `assistant` message is no longer present.
+
+That produces the OpenAI API error:
+`"messages with role 'tool' must be a response to a preceding message with 'tool_calls'"`
+
+## Root Cause
+History compression boundaries were not fully aware of atomic tool-call chains. A valid chain may include:
+
+1. An `assistant` message with `tool_calls`
+2. One or more `tool` messages
+3. An optional assistant follow-up that consumes the tool results
+
+If truncation happens inside that chain, the request sent to the model becomes invalid.
+
+## Solution: Atomic Boundary Alignment
+The fix groups tool-call sequences into atomic units and aligns trim boundaries to those groups.
+
+### 1. `_get_atomic_groups()`
+This helper groups message indices into units that must be kept or dropped together. It explicitly recognizes native tool-calling patterns such as:
+
+- `assistant(tool_calls)`
+- `tool`
+- assistant follow-up response
+
+Conceptually, it treats the whole sequence as one atomic block instead of independent messages.
+
+```python
+def _get_atomic_groups(self, messages: List[Dict]) -> List[List[int]]:
+    groups = []
+    current_group = []
+
+    for i, msg in enumerate(messages):
+        role = msg.get("role")
+        has_tool_calls = bool(msg.get("tool_calls"))
+
+        if role == "assistant" and has_tool_calls:
+            if current_group:
+                groups.append(current_group)
+            current_group = [i]
+        elif role == "tool":
+            if not current_group:
+                groups.append([i])
+            else:
+                current_group.append(i)
+        elif (
+            role == "assistant"
+            and current_group
+            and messages[current_group[-1]].get("role") == "tool"
+        ):
+            current_group.append(i)
+            groups.append(current_group)
+            current_group = []
+        else:
+            if current_group:
+                groups.append(current_group)
+                current_group = []
+            groups.append([i])
+
+    if current_group:
+        groups.append(current_group)
+
+    return groups
+```
+
+### 2. `_align_tail_start_to_atomic_boundary()`
+This helper checks whether a proposed trim point falls inside one of those atomic groups. If it does, the start index is moved backward to the beginning of that group.
+
+```python
+def _align_tail_start_to_atomic_boundary(
+    self, messages: List[Dict], raw_start_index: int, protected_prefix: int
+) -> int:
+    aligned_start = max(raw_start_index, protected_prefix)
+
+    if aligned_start <= protected_prefix or aligned_start >= len(messages):
+        return aligned_start
+
+    trimmable = messages[protected_prefix:]
+    local_start = aligned_start - protected_prefix
+
+    for group in self._get_atomic_groups(trimmable):
+        group_start = group[0]
+        group_end = group[-1] + 1
+
+        if local_start == group_start:
+            return aligned_start
+
+        if group_start < local_start < group_end:
+            return protected_prefix + group_start
+
+    return aligned_start
+```
+
+### 3. Applied to Tail Retention and Summary Progress
+The aligned boundary is now used when rebuilding the retained tail and when calculating how much history can be summarized safely.
+
+Example from the current implementation:
+
+```python
+raw_start_index = max(compressed_count, effective_keep_first)
+start_index = self._align_tail_start_to_atomic_boundary(
+    messages, raw_start_index, effective_keep_first
+)
+tail_messages = messages[start_index:]
+```
+
+And during summary progress calculation:
+
+```python
+raw_target_compressed_count = max(0, len(messages) - self.valves.keep_last)
+target_compressed_count = self._align_tail_start_to_atomic_boundary(
+    messages, raw_target_compressed_count, effective_keep_first
+)
+```
+
+## Verification Results
+- **First compression boundary**: When history first crosses the compression threshold, the retained tail no longer starts inside a tool-call block.
+- **Complex sessions**: Real-world testing with 30+ messages, multiple tool calls, and failed calls remained stable during background summarization.
+- **Regression behavior**: The filter now prefers a valid boundary even if that means retaining slightly more context than a naive raw slice would allow.
+
+## Conclusion
+The fix prevents orphaned `tool` messages by making history trimming and summary progress aware of atomic tool-call groups. This eliminates the 400 error during long conversations and background compression.
--- a/docs/development/fix-role-tool-error.zh.md
+++ b/docs/development/fix-role-tool-error.zh.md
@@ -0,0 +1,126 @@
+# 修复：OpenAI API 错误 "messages with role 'tool' must be a response to a preceding message with 'tool_calls'"
+
+## 问题描述
+在 `async-context-compression` 过滤器中，当对话历史变长时，系统会对消息进行裁剪或摘要。如果保留下来的尾部历史恰好从一个原生工具调用序列的中间开始，那么下一次请求就可能以一条 `tool` 消息开头，而触发它的 `assistant` 消息已经被裁掉。
+
+这就会触发 OpenAI API 的错误：
+`"messages with role 'tool' must be a response to a preceding message with 'tool_calls'"`
+
+## 根本原因
+
+真正的缺陷在于历史压缩边界没有完整识别工具调用链的“原子性”。一个合法的工具调用链通常包括：
+
+1. 一条带有 `tool_calls` 的 `assistant` 消息
+2. 一条或多条 `tool` 消息
+3. 一条可选的 assistant 跟进回复，用于消费工具结果
+
+如果裁剪点落在这段链条内部，发给模型的消息序列就会变成非法格式。
+
+## 解决方案：对齐原子边界
+修复通过把工具调用序列分组为原子单元，并使裁剪边界对齐到这些单元。
+
+### 1. `_get_atomic_groups()`
+这个辅助函数会把消息索引分组为“必须一起保留或一起丢弃”的原子单元。它显式识别以下原生工具调用模式：
+
+- `assistant(tool_calls)`
+- `tool`
+- assistant 跟进回复
+
+也就是说，它不再把这些消息看成彼此独立的单条消息，而是把整段序列视为一个原子块。
+
+```python
+def _get_atomic_groups(self, messages: List[Dict]) -> List[List[int]]:
+    groups = []
+    current_group = []
+
+    for i, msg in enumerate(messages):
+        role = msg.get("role")
+        has_tool_calls = bool(msg.get("tool_calls"))
+
+        if role == "assistant" and has_tool_calls:
+            if current_group:
+                groups.append(current_group)
+            current_group = [i]
+        elif role == "tool":
+            if not current_group:
+                groups.append([i])
+            else:
+                current_group.append(i)
+        elif (
+            role == "assistant"
+            and current_group
+            and messages[current_group[-1]].get("role") == "tool"
+        ):
+            current_group.append(i)
+            groups.append(current_group)
+            current_group = []
+        else:
+            if current_group:
+                groups.append(current_group)
+                current_group = []
+            groups.append([i])
+
+    if current_group:
+        groups.append(current_group)
+
+    return groups
+```
+
+### 2. `_align_tail_start_to_atomic_boundary()`
+这个辅助函数会检查一个拟定的裁剪起点是否落在某个原子块内部。如果是，它会把起点向前回退到该原子块的开头位置。
+
+```python
+def _align_tail_start_to_atomic_boundary(
+    self, messages: List[Dict], raw_start_index: int, protected_prefix: int
+) -> int:
+    aligned_start = max(raw_start_index, protected_prefix)
+
+    if aligned_start <= protected_prefix or aligned_start >= len(messages):
+        return aligned_start
+
+    trimmable = messages[protected_prefix:]
+    local_start = aligned_start - protected_prefix
+
+    for group in self._get_atomic_groups(trimmable):
+        group_start = group[0]
+        group_end = group[-1] + 1
+
+        if local_start == group_start:
+            return aligned_start
+
+        if group_start < local_start < group_end:
+            return protected_prefix + group_start
+
+    return aligned_start
+```
+
+### 3. 应用于尾部保留和摘要进度计算
+这个对齐后的边界现在被用于重建保留尾部消息，以及计算可以安全摘要的历史范围。
+
+当前实现中的示例：
+
+```python
+raw_start_index = max(compressed_count, effective_keep_first)
+start_index = self._align_tail_start_to_atomic_boundary(
+    messages, raw_start_index, effective_keep_first
+)
+tail_messages = messages[start_index:]
+```
+
+在摘要进度计算中同样如此：
+
+```python
+raw_target_compressed_count = max(0, len(messages) - self.valves.keep_last)
+target_compressed_count = self._align_tail_start_to_atomic_boundary(
+    messages, raw_target_compressed_count, effective_keep_first
+)
+```
+
+## 验证结果
+
+- **首次压缩边界**：当历史第一次越过压缩阈值时，保留尾部不再从工具调用块中间开始。
+- **复杂会话验证**：在 30+ 条消息、多个工具调用和失败调用的真实场景下，后台摘要过程保持稳定。
+- **回归行为更安全**：过滤器现在会优先选择合法边界，即使这意味着比原始的朴素切片稍微多保留一点上下文。
+
+## 结论
+通过让历史裁剪与摘要进度计算具备"工具调用原子块感知"能力，避免孤立的 `tool` 消息出现，消除长对话与后台压缩期间的 400 错误。
--- a/docs/plugins/filters/async-context-compression.md
+++ b/docs/plugins/filters/async-context-compression.md
@@ -1,16 +1,15 @@
 # Async Context Compression Filter

-**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.3.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
+**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.4.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT

 This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.

-## What's new in 1.3.0
+## What's new in 1.4.0

- **Internationalization (i18n)**: Complete localization of user-facing messages across 9 languages (English, Chinese, Japanese, Korean, French, German, Spanish, Italian).
- **Smart Status Display**: Added `token_usage_status_threshold` valve (default 80%) to intelligently control when token usage status is shown.
- **Improved Performance**: Frontend language detection and logging are optimized to be completely non-blocking, maintaining lightning-fast TTFB.
- **Copilot SDK Integration**: Automatically detects and skips compression for copilot_sdk based models to prevent conflicts.
- **Configuration**: `debug_mode` is now set to `false` by default for a quieter production experience.
+- **Atomic Message Grouping**: Introduced structure-aware grouping for `assistant-tool-tool-assistant` chains to prevent "No tool call found" errors.
+- **Tail Boundary Alignment**: Implemented automatic correction for truncation points to ensure they don't fall inside a tool-calling sequence.
+- **Chat Session Locking**: Added a session-based lock to prevent multiple concurrent summary tasks for the same chat ID.
+- **Enhanced Traceability**: Improved summary formatting to include message IDs, names, and metadata for better context tracking.

 ---

--- a/docs/plugins/filters/async-context-compression.zh.md
+++ b/docs/plugins/filters/async-context-compression.zh.md
@@ -1,18 +1,17 @@
 # 异步上下文压缩过滤器

-**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.3.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
+**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.4.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT

 > **重要提示**：为了确保所有过滤器的可维护性和易用性，每个过滤器都应附带清晰、完整的文档，以确保其功能、配置和使用方法得到充分说明。

 本过滤器通过智能摘要和消息压缩技术，在保持对话连贯性的同时，显著降低长对话的 Token 消耗。

-## 1.3.0 版本更新
+## 1.4.0 版本更新

- **国际化 (i18n) 支持**: 完成了所有用户可见消息的本地化，现已原生支持 9 种语言（含中、英、日、韩及欧洲主要语言）。
- **智能状态显示**: 新增 `token_usage_status_threshold` 阀门（默认 80%），可以智能控制何时显示 Token 用量状态，减少不必要的打扰。
- **性能大幅优化**: 对前端语言检测和日志处理流程进行了非阻塞重构，完全不影响首字节响应时间（TTFB），保持毫秒级极速推流。
- **Copilot SDK 兼容**: 自动检测并跳过基于 `copilot_sdk` 模型的上下文压缩，避免冲突。
- **配置项调整**: 为了提供更安静的生产环境体验，`debug_mode` 现已默认设置为 `false`。
+- **原子消息组 (Atomic Grouping)**: 引入结构感知的消息分组逻辑，确保工具调用链被整体保留或移除，彻底解决 "No tool call found" 错误。
+- **尾部边界自动对齐**: 实现了截断点的自动修正逻辑，确保历史上下文截断不会落在工具调用序列中间。
+- **会话级异步锁**: 增加了基于 `chat_id` 的后台任务锁，防止同一会话并发触发多个总结任务。
+- **元数据溯源增强**: 优化了总结输入格式，在总结中保留了消息 ID、参与者名称及关键元数据，提升上下文可追踪性。

 ---

--- a/docs/plugins/filters/index.md
+++ b/docs/plugins/filters/index.md
@@ -22,7 +22,7 @@ Filters act as middleware in the message pipeline:

    Reduces token consumption in long conversations through intelligent summarization while maintaining coherence.

-    **Version:** 1.3.0
+    **Version:** 1.4.0

    [:octicons-arrow-right-24: Documentation](async-context-compression.md)

--- a/docs/plugins/filters/index.zh.md
+++ b/docs/plugins/filters/index.zh.md
@@ -22,7 +22,7 @@ Filter 充当消息管线中的中间件：

    通过智能总结减少长对话的 token 消耗，同时保持连贯性。

-    **版本：** 1.3.0
+    **版本：** 1.4.0

    [:octicons-arrow-right-24: 查看文档](async-context-compression.md)