feat(async-context-compression): release v1.4.0 with structure-aware grouping and session locking
- Introduced Atomic Message Grouping to prevent tool-calling corruption (Issue #56) - Implemented Tail Boundary Alignment for deterministic context truncation - Added per-chat asynchronous session locking to prevent duplicate background tasks - Enhanced summarization traceability with message IDs and names - Synchronized version and changelog across all documentation files - Optimized release-prep skill to remove redundant H1 titles Closes #56
This commit is contained in:
124
docs/development/fix-role-tool-error.md
Normal file
124
docs/development/fix-role-tool-error.md
Normal file
@@ -0,0 +1,124 @@
|
||||
# Fix: OpenAI API Error "messages with role 'tool' must be a response to a preceding message with 'tool_calls'"
|
||||
|
||||
## Problem Description
|
||||
In the `async-context-compression` filter, chat history can be trimmed or summarized when the conversation grows. If the retained tail starts in the middle of a native tool-calling sequence, the next request may begin with a `tool` message whose triggering `assistant` message is no longer present.
|
||||
|
||||
That produces the OpenAI API error:
|
||||
`"messages with role 'tool' must be a response to a preceding message with 'tool_calls'"`
|
||||
|
||||
## Root Cause
|
||||
History compression boundaries were not fully aware of atomic tool-call chains. A valid chain may include:
|
||||
|
||||
1. An `assistant` message with `tool_calls`
|
||||
2. One or more `tool` messages
|
||||
3. An optional assistant follow-up that consumes the tool results
|
||||
|
||||
If truncation happens inside that chain, the request sent to the model becomes invalid.
|
||||
|
||||
## Solution: Atomic Boundary Alignment
|
||||
The fix groups tool-call sequences into atomic units and aligns trim boundaries to those groups.
|
||||
|
||||
### 1. `_get_atomic_groups()`
|
||||
This helper groups message indices into units that must be kept or dropped together. It explicitly recognizes native tool-calling patterns such as:
|
||||
|
||||
- `assistant(tool_calls)`
|
||||
- `tool`
|
||||
- assistant follow-up response
|
||||
|
||||
Conceptually, it treats the whole sequence as one atomic block instead of independent messages.
|
||||
|
||||
```python
|
||||
def _get_atomic_groups(self, messages: List[Dict]) -> List[List[int]]:
|
||||
groups = []
|
||||
current_group = []
|
||||
|
||||
for i, msg in enumerate(messages):
|
||||
role = msg.get("role")
|
||||
has_tool_calls = bool(msg.get("tool_calls"))
|
||||
|
||||
if role == "assistant" and has_tool_calls:
|
||||
if current_group:
|
||||
groups.append(current_group)
|
||||
current_group = [i]
|
||||
elif role == "tool":
|
||||
if not current_group:
|
||||
groups.append([i])
|
||||
else:
|
||||
current_group.append(i)
|
||||
elif (
|
||||
role == "assistant"
|
||||
and current_group
|
||||
and messages[current_group[-1]].get("role") == "tool"
|
||||
):
|
||||
current_group.append(i)
|
||||
groups.append(current_group)
|
||||
current_group = []
|
||||
else:
|
||||
if current_group:
|
||||
groups.append(current_group)
|
||||
current_group = []
|
||||
groups.append([i])
|
||||
|
||||
if current_group:
|
||||
groups.append(current_group)
|
||||
|
||||
return groups
|
||||
```
|
||||
|
||||
### 2. `_align_tail_start_to_atomic_boundary()`
|
||||
This helper checks whether a proposed trim point falls inside one of those atomic groups. If it does, the start index is moved backward to the beginning of that group.
|
||||
|
||||
```python
|
||||
def _align_tail_start_to_atomic_boundary(
|
||||
self, messages: List[Dict], raw_start_index: int, protected_prefix: int
|
||||
) -> int:
|
||||
aligned_start = max(raw_start_index, protected_prefix)
|
||||
|
||||
if aligned_start <= protected_prefix or aligned_start >= len(messages):
|
||||
return aligned_start
|
||||
|
||||
trimmable = messages[protected_prefix:]
|
||||
local_start = aligned_start - protected_prefix
|
||||
|
||||
for group in self._get_atomic_groups(trimmable):
|
||||
group_start = group[0]
|
||||
group_end = group[-1] + 1
|
||||
|
||||
if local_start == group_start:
|
||||
return aligned_start
|
||||
|
||||
if group_start < local_start < group_end:
|
||||
return protected_prefix + group_start
|
||||
|
||||
return aligned_start
|
||||
```
|
||||
|
||||
### 3. Applied to Tail Retention and Summary Progress
|
||||
The aligned boundary is now used when rebuilding the retained tail and when calculating how much history can be summarized safely.
|
||||
|
||||
Example from the current implementation:
|
||||
|
||||
```python
|
||||
raw_start_index = max(compressed_count, effective_keep_first)
|
||||
start_index = self._align_tail_start_to_atomic_boundary(
|
||||
messages, raw_start_index, effective_keep_first
|
||||
)
|
||||
tail_messages = messages[start_index:]
|
||||
```
|
||||
|
||||
And during summary progress calculation:
|
||||
|
||||
```python
|
||||
raw_target_compressed_count = max(0, len(messages) - self.valves.keep_last)
|
||||
target_compressed_count = self._align_tail_start_to_atomic_boundary(
|
||||
messages, raw_target_compressed_count, effective_keep_first
|
||||
)
|
||||
```
|
||||
|
||||
## Verification Results
|
||||
- **First compression boundary**: When history first crosses the compression threshold, the retained tail no longer starts inside a tool-call block.
|
||||
- **Complex sessions**: Real-world testing with 30+ messages, multiple tool calls, and failed calls remained stable during background summarization.
|
||||
- **Regression behavior**: The filter now prefers a valid boundary even if that means retaining slightly more context than a naive raw slice would allow.
|
||||
|
||||
## Conclusion
|
||||
The fix prevents orphaned `tool` messages by making history trimming and summary progress aware of atomic tool-call groups. This eliminates the 400 error during long conversations and background compression.
|
||||
126
docs/development/fix-role-tool-error.zh.md
Normal file
126
docs/development/fix-role-tool-error.zh.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# 修复:OpenAI API 错误 "messages with role 'tool' must be a response to a preceding message with 'tool_calls'"
|
||||
|
||||
## 问题描述
|
||||
在 `async-context-compression` 过滤器中,当对话历史变长时,系统会对消息进行裁剪或摘要。如果保留下来的尾部历史恰好从一个原生工具调用序列的中间开始,那么下一次请求就可能以一条 `tool` 消息开头,而触发它的 `assistant` 消息已经被裁掉。
|
||||
|
||||
这就会触发 OpenAI API 的错误:
|
||||
`"messages with role 'tool' must be a response to a preceding message with 'tool_calls'"`
|
||||
|
||||
## 根本原因
|
||||
|
||||
真正的缺陷在于历史压缩边界没有完整识别工具调用链的“原子性”。一个合法的工具调用链通常包括:
|
||||
|
||||
1. 一条带有 `tool_calls` 的 `assistant` 消息
|
||||
2. 一条或多条 `tool` 消息
|
||||
3. 一条可选的 assistant 跟进回复,用于消费工具结果
|
||||
|
||||
如果裁剪点落在这段链条内部,发给模型的消息序列就会变成非法格式。
|
||||
|
||||
## 解决方案:对齐原子边界
|
||||
修复通过把工具调用序列分组为原子单元,并使裁剪边界对齐到这些单元。
|
||||
|
||||
### 1. `_get_atomic_groups()`
|
||||
这个辅助函数会把消息索引分组为“必须一起保留或一起丢弃”的原子单元。它显式识别以下原生工具调用模式:
|
||||
|
||||
- `assistant(tool_calls)`
|
||||
- `tool`
|
||||
- assistant 跟进回复
|
||||
|
||||
也就是说,它不再把这些消息看成彼此独立的单条消息,而是把整段序列视为一个原子块。
|
||||
|
||||
```python
|
||||
def _get_atomic_groups(self, messages: List[Dict]) -> List[List[int]]:
|
||||
groups = []
|
||||
current_group = []
|
||||
|
||||
for i, msg in enumerate(messages):
|
||||
role = msg.get("role")
|
||||
has_tool_calls = bool(msg.get("tool_calls"))
|
||||
|
||||
if role == "assistant" and has_tool_calls:
|
||||
if current_group:
|
||||
groups.append(current_group)
|
||||
current_group = [i]
|
||||
elif role == "tool":
|
||||
if not current_group:
|
||||
groups.append([i])
|
||||
else:
|
||||
current_group.append(i)
|
||||
elif (
|
||||
role == "assistant"
|
||||
and current_group
|
||||
and messages[current_group[-1]].get("role") == "tool"
|
||||
):
|
||||
current_group.append(i)
|
||||
groups.append(current_group)
|
||||
current_group = []
|
||||
else:
|
||||
if current_group:
|
||||
groups.append(current_group)
|
||||
current_group = []
|
||||
groups.append([i])
|
||||
|
||||
if current_group:
|
||||
groups.append(current_group)
|
||||
|
||||
return groups
|
||||
```
|
||||
|
||||
### 2. `_align_tail_start_to_atomic_boundary()`
|
||||
这个辅助函数会检查一个拟定的裁剪起点是否落在某个原子块内部。如果是,它会把起点向前回退到该原子块的开头位置。
|
||||
|
||||
```python
|
||||
def _align_tail_start_to_atomic_boundary(
|
||||
self, messages: List[Dict], raw_start_index: int, protected_prefix: int
|
||||
) -> int:
|
||||
aligned_start = max(raw_start_index, protected_prefix)
|
||||
|
||||
if aligned_start <= protected_prefix or aligned_start >= len(messages):
|
||||
return aligned_start
|
||||
|
||||
trimmable = messages[protected_prefix:]
|
||||
local_start = aligned_start - protected_prefix
|
||||
|
||||
for group in self._get_atomic_groups(trimmable):
|
||||
group_start = group[0]
|
||||
group_end = group[-1] + 1
|
||||
|
||||
if local_start == group_start:
|
||||
return aligned_start
|
||||
|
||||
if group_start < local_start < group_end:
|
||||
return protected_prefix + group_start
|
||||
|
||||
return aligned_start
|
||||
```
|
||||
|
||||
### 3. 应用于尾部保留和摘要进度计算
|
||||
这个对齐后的边界现在被用于重建保留尾部消息,以及计算可以安全摘要的历史范围。
|
||||
|
||||
当前实现中的示例:
|
||||
|
||||
```python
|
||||
raw_start_index = max(compressed_count, effective_keep_first)
|
||||
start_index = self._align_tail_start_to_atomic_boundary(
|
||||
messages, raw_start_index, effective_keep_first
|
||||
)
|
||||
tail_messages = messages[start_index:]
|
||||
```
|
||||
|
||||
在摘要进度计算中同样如此:
|
||||
|
||||
```python
|
||||
raw_target_compressed_count = max(0, len(messages) - self.valves.keep_last)
|
||||
target_compressed_count = self._align_tail_start_to_atomic_boundary(
|
||||
messages, raw_target_compressed_count, effective_keep_first
|
||||
)
|
||||
```
|
||||
|
||||
## 验证结果
|
||||
|
||||
- **首次压缩边界**:当历史第一次越过压缩阈值时,保留尾部不再从工具调用块中间开始。
|
||||
- **复杂会话验证**:在 30+ 条消息、多个工具调用和失败调用的真实场景下,后台摘要过程保持稳定。
|
||||
- **回归行为更安全**:过滤器现在会优先选择合法边界,即使这意味着比原始的朴素切片稍微多保留一点上下文。
|
||||
|
||||
## 结论
|
||||
通过让历史裁剪与摘要进度计算具备"工具调用原子块感知"能力,避免孤立的 `tool` 消息出现,消除长对话与后台压缩期间的 400 错误。
|
||||
@@ -1,16 +1,15 @@
|
||||
# Async Context Compression Filter
|
||||
|
||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.3.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
|
||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.4.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
|
||||
|
||||
This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.
|
||||
|
||||
## What's new in 1.3.0
|
||||
## What's new in 1.4.0
|
||||
|
||||
- **Internationalization (i18n)**: Complete localization of user-facing messages across 9 languages (English, Chinese, Japanese, Korean, French, German, Spanish, Italian).
|
||||
- **Smart Status Display**: Added `token_usage_status_threshold` valve (default 80%) to intelligently control when token usage status is shown.
|
||||
- **Improved Performance**: Frontend language detection and logging are optimized to be completely non-blocking, maintaining lightning-fast TTFB.
|
||||
- **Copilot SDK Integration**: Automatically detects and skips compression for copilot_sdk based models to prevent conflicts.
|
||||
- **Configuration**: `debug_mode` is now set to `false` by default for a quieter production experience.
|
||||
- **Atomic Message Grouping**: Introduced structure-aware grouping for `assistant-tool-tool-assistant` chains to prevent "No tool call found" errors.
|
||||
- **Tail Boundary Alignment**: Implemented automatic correction for truncation points to ensure they don't fall inside a tool-calling sequence.
|
||||
- **Chat Session Locking**: Added a session-based lock to prevent multiple concurrent summary tasks for the same chat ID.
|
||||
- **Enhanced Traceability**: Improved summary formatting to include message IDs, names, and metadata for better context tracking.
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -1,18 +1,17 @@
|
||||
# 异步上下文压缩过滤器
|
||||
|
||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.3.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
|
||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.4.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
|
||||
|
||||
> **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。
|
||||
|
||||
本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。
|
||||
|
||||
## 1.3.0 版本更新
|
||||
## 1.4.0 版本更新
|
||||
|
||||
- **国际化 (i18n) 支持**: 完成了所有用户可见消息的本地化,现已原生支持 9 种语言(含中、英、日、韩及欧洲主要语言)。
|
||||
- **智能状态显示**: 新增 `token_usage_status_threshold` 阀门(默认 80%),可以智能控制何时显示 Token 用量状态,减少不必要的打扰。
|
||||
- **性能大幅优化**: 对前端语言检测和日志处理流程进行了非阻塞重构,完全不影响首字节响应时间(TTFB),保持毫秒级极速推流。
|
||||
- **Copilot SDK 兼容**: 自动检测并跳过基于 `copilot_sdk` 模型的上下文压缩,避免冲突。
|
||||
- **配置项调整**: 为了提供更安静的生产环境体验,`debug_mode` 现已默认设置为 `false`。
|
||||
- **原子消息组 (Atomic Grouping)**: 引入结构感知的消息分组逻辑,确保工具调用链被整体保留或移除,彻底解决 "No tool call found" 错误。
|
||||
- **尾部边界自动对齐**: 实现了截断点的自动修正逻辑,确保历史上下文截断不会落在工具调用序列中间。
|
||||
- **会话级异步锁**: 增加了基于 `chat_id` 的后台任务锁,防止同一会话并发触发多个总结任务。
|
||||
- **元数据溯源增强**: 优化了总结输入格式,在总结中保留了消息 ID、参与者名称及关键元数据,提升上下文可追踪性。
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -22,7 +22,7 @@ Filters act as middleware in the message pipeline:
|
||||
|
||||
Reduces token consumption in long conversations through intelligent summarization while maintaining coherence.
|
||||
|
||||
**Version:** 1.3.0
|
||||
**Version:** 1.4.0
|
||||
|
||||
[:octicons-arrow-right-24: Documentation](async-context-compression.md)
|
||||
|
||||
|
||||
@@ -22,7 +22,7 @@ Filter 充当消息管线中的中间件:
|
||||
|
||||
通过智能总结减少长对话的 token 消耗,同时保持连贯性。
|
||||
|
||||
**版本:** 1.3.0
|
||||
**版本:** 1.4.0
|
||||
|
||||
[:octicons-arrow-right-24: 查看文档](async-context-compression.md)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user