fix(async-context-compression): reverse-unfolding to prevent progress drift
- Reconstruct native tool-calling sequences using reverse-unfolding mechanism - Strictly use atomic grouping for safe native tool output trimming - Add comprehensive test coverage for unfolding logic and issue drafts - READMEs and docs synced (v1.4.1)
This commit is contained in:
@@ -1,15 +1,13 @@
|
||||
# Async Context Compression Filter
|
||||
|
||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.4.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
|
||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.4.1 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
|
||||
|
||||
This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.
|
||||
|
||||
## What's new in 1.4.0
|
||||
## What's new in 1.4.1
|
||||
|
||||
- **Atomic Message Grouping**: Introduced structure-aware grouping for `assistant-tool-tool-assistant` chains to prevent "No tool call found" errors.
|
||||
- **Tail Boundary Alignment**: Implemented automatic correction for truncation points to ensure they don't fall inside a tool-calling sequence.
|
||||
- **Chat Session Locking**: Added a session-based lock to prevent multiple concurrent summary tasks for the same chat ID.
|
||||
- **Enhanced Traceability**: Improved summary formatting to include message IDs, names, and metadata for better context tracking.
|
||||
- **Reverse-Unfolding Mechanism**: Accurately reconstructs the expanded native tool-calling sequence during the outlet phase to permanently fix coordinate drift and missing summaries for long tool-based conversations.
|
||||
- **Safer Tool Trimming**: Refactored `enable_tool_output_trimming` to strictly use atomic block groups for safe trimming, completely preventing JSON payload corruption.
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -1,17 +1,15 @@
|
||||
# 异步上下文压缩过滤器
|
||||
|
||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.4.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
|
||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.4.1 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
|
||||
|
||||
> **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。
|
||||
|
||||
本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。
|
||||
|
||||
## 1.4.0 版本更新
|
||||
## 1.4.1 版本更新
|
||||
|
||||
- **原子消息组 (Atomic Grouping)**: 引入结构感知的消息分组逻辑,确保工具调用链被整体保留或移除,彻底解决 "No tool call found" 错误。
|
||||
- **尾部边界自动对齐**: 实现了截断点的自动修正逻辑,确保历史上下文截断不会落在工具调用序列中间。
|
||||
- **会话级异步锁**: 增加了基于 `chat_id` 的后台任务锁,防止同一会话并发触发多个总结任务。
|
||||
- **元数据溯源增强**: 优化了总结输入格式,在总结中保留了消息 ID、参与者名称及关键元数据,提升上下文可追踪性。
|
||||
- **逆向展开机制**: 引入 `_unfold_messages` 机制以在 `outlet` 阶段精确对齐坐标系,彻底解决了由于前端视图折叠导致长轮次工具调用对话出现进度漂移或跳过生成摘要的问题。
|
||||
- **更安全的工具内容裁剪**: 重构了 `enable_tool_output_trimming`,现在严格使用原子级分组进行安全的原生工具内容裁剪,替代了激进的正则表达式匹配,防止 JSON 载荷损坏。
|
||||
|
||||
---
|
||||
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,461 @@
|
||||
import asyncio
|
||||
import importlib.util
|
||||
import os
|
||||
import sys
|
||||
import types
|
||||
import unittest
|
||||
|
||||
|
||||
PLUGIN_PATH = os.path.join(os.path.dirname(__file__), "async_context_compression.py")
|
||||
MODULE_NAME = "async_context_compression_under_test"
|
||||
|
||||
|
||||
def _ensure_module(name: str) -> types.ModuleType:
|
||||
module = sys.modules.get(name)
|
||||
if module is None:
|
||||
module = types.ModuleType(name)
|
||||
sys.modules[name] = module
|
||||
return module
|
||||
|
||||
|
||||
def _install_openwebui_stubs() -> None:
|
||||
_ensure_module("open_webui")
|
||||
_ensure_module("open_webui.utils")
|
||||
chat_module = _ensure_module("open_webui.utils.chat")
|
||||
_ensure_module("open_webui.models")
|
||||
users_module = _ensure_module("open_webui.models.users")
|
||||
models_module = _ensure_module("open_webui.models.models")
|
||||
chats_module = _ensure_module("open_webui.models.chats")
|
||||
main_module = _ensure_module("open_webui.main")
|
||||
_ensure_module("fastapi")
|
||||
fastapi_requests = _ensure_module("fastapi.requests")
|
||||
|
||||
async def generate_chat_completion(*args, **kwargs):
|
||||
return {}
|
||||
|
||||
class DummyUsers:
|
||||
pass
|
||||
|
||||
class DummyModels:
|
||||
@staticmethod
|
||||
def get_model_by_id(model_id):
|
||||
return None
|
||||
|
||||
class DummyChats:
|
||||
@staticmethod
|
||||
def get_chat_by_id(chat_id):
|
||||
return None
|
||||
|
||||
class DummyRequest:
|
||||
pass
|
||||
|
||||
chat_module.generate_chat_completion = generate_chat_completion
|
||||
users_module.Users = DummyUsers
|
||||
models_module.Models = DummyModels
|
||||
chats_module.Chats = DummyChats
|
||||
main_module.app = object()
|
||||
fastapi_requests.Request = DummyRequest
|
||||
|
||||
|
||||
_install_openwebui_stubs()
|
||||
spec = importlib.util.spec_from_file_location(MODULE_NAME, PLUGIN_PATH)
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
sys.modules[MODULE_NAME] = module
|
||||
assert spec.loader is not None
|
||||
spec.loader.exec_module(module)
|
||||
module.Filter._init_database = lambda self: None
|
||||
|
||||
|
||||
class TestAsyncContextCompression(unittest.TestCase):
|
||||
def setUp(self):
|
||||
self.filter = module.Filter()
|
||||
|
||||
def test_inlet_logs_tool_trimming_outcome_when_no_oversized_outputs(self):
|
||||
self.filter.valves.show_debug_log = True
|
||||
self.filter.valves.enable_tool_output_trimming = True
|
||||
|
||||
logged_messages = []
|
||||
|
||||
async def fake_log(message, log_type="info", event_call=None):
|
||||
logged_messages.append(message)
|
||||
|
||||
async def fake_user_context(__user__, __event_call__):
|
||||
return {"user_language": "en-US"}
|
||||
|
||||
async def fake_event_call(_payload):
|
||||
return True
|
||||
|
||||
self.filter._log = fake_log
|
||||
self.filter._get_user_context = fake_user_context
|
||||
self.filter._get_chat_context = lambda body, metadata=None: {
|
||||
"chat_id": "",
|
||||
"message_id": "",
|
||||
}
|
||||
self.filter._get_latest_summary = lambda chat_id: None
|
||||
|
||||
body = {
|
||||
"params": {"function_calling": "native"},
|
||||
"messages": [
|
||||
{
|
||||
"role": "assistant",
|
||||
"tool_calls": [{"id": "call_1", "type": "function"}],
|
||||
"content": "",
|
||||
},
|
||||
{"role": "tool", "content": "short result"},
|
||||
{"role": "assistant", "content": "Final answer"},
|
||||
],
|
||||
}
|
||||
|
||||
asyncio.run(self.filter.inlet(body, __event_call__=fake_event_call))
|
||||
|
||||
self.assertTrue(
|
||||
any("Tool trimming check:" in message for message in logged_messages)
|
||||
)
|
||||
self.assertTrue(
|
||||
any(
|
||||
"no oversized native tool outputs were found" in message
|
||||
for message in logged_messages
|
||||
)
|
||||
)
|
||||
|
||||
def test_inlet_logs_tool_trimming_skip_reason_when_disabled(self):
|
||||
self.filter.valves.show_debug_log = True
|
||||
self.filter.valves.enable_tool_output_trimming = False
|
||||
|
||||
logged_messages = []
|
||||
|
||||
async def fake_log(message, log_type="info", event_call=None):
|
||||
logged_messages.append(message)
|
||||
|
||||
async def fake_user_context(__user__, __event_call__):
|
||||
return {"user_language": "en-US"}
|
||||
|
||||
async def fake_event_call(_payload):
|
||||
return True
|
||||
|
||||
self.filter._log = fake_log
|
||||
self.filter._get_user_context = fake_user_context
|
||||
self.filter._get_chat_context = lambda body, metadata=None: {
|
||||
"chat_id": "",
|
||||
"message_id": "",
|
||||
}
|
||||
self.filter._get_latest_summary = lambda chat_id: None
|
||||
|
||||
body = {"messages": [], "params": {"function_calling": "native"}}
|
||||
|
||||
asyncio.run(self.filter.inlet(body, __event_call__=fake_event_call))
|
||||
|
||||
self.assertTrue(
|
||||
any("Tool trimming skipped: tool trimming disabled" in message for message in logged_messages)
|
||||
)
|
||||
|
||||
def test_normalize_native_tool_call_ids_keeps_links_aligned(self):
|
||||
long_tool_call_id = "call_abcdefghijklmnopqrstuvwxyz_1234567890abcd"
|
||||
messages = [
|
||||
{
|
||||
"role": "assistant",
|
||||
"tool_calls": [
|
||||
{
|
||||
"id": long_tool_call_id,
|
||||
"type": "function",
|
||||
"function": {"name": "search", "arguments": "{}"},
|
||||
}
|
||||
],
|
||||
"content": "",
|
||||
},
|
||||
{
|
||||
"role": "tool",
|
||||
"tool_call_id": long_tool_call_id,
|
||||
"content": "tool result",
|
||||
},
|
||||
]
|
||||
|
||||
normalized_count = self.filter._normalize_native_tool_call_ids(messages)
|
||||
|
||||
normalized_id = messages[0]["tool_calls"][0]["id"]
|
||||
self.assertEqual(normalized_count, 1)
|
||||
self.assertLessEqual(len(normalized_id), 40)
|
||||
self.assertNotEqual(normalized_id, long_tool_call_id)
|
||||
self.assertEqual(messages[1]["tool_call_id"], normalized_id)
|
||||
|
||||
def test_trim_native_tool_outputs_restores_real_behavior(self):
|
||||
messages = [
|
||||
{
|
||||
"role": "assistant",
|
||||
"tool_calls": [{"id": "call_1", "type": "function"}],
|
||||
"content": "",
|
||||
},
|
||||
{"role": "tool", "content": "x" * 1600},
|
||||
{"role": "assistant", "content": "Final answer"},
|
||||
]
|
||||
|
||||
trimmed_count = self.filter._trim_native_tool_outputs(messages, "en-US")
|
||||
|
||||
self.assertEqual(trimmed_count, 1)
|
||||
self.assertEqual(messages[1]["content"], "... [Content collapsed] ...")
|
||||
self.assertTrue(messages[1]["metadata"]["is_trimmed"])
|
||||
self.assertTrue(messages[2]["metadata"]["tool_outputs_trimmed"])
|
||||
self.assertIn("Final answer", messages[2]["content"])
|
||||
self.assertIn("Tool outputs trimmed", messages[2]["content"])
|
||||
|
||||
def test_trim_native_tool_outputs_supports_embedded_tool_call_cards(self):
|
||||
messages = [
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": (
|
||||
'<details type="tool_calls" done="true" id="call-1" '
|
||||
'name="execute_code" arguments=""{}"" '
|
||||
f'result=""{"x" * 1600}"">\n'
|
||||
"<summary>Tool Executed</summary>\n"
|
||||
"</details>\n"
|
||||
"Final answer"
|
||||
),
|
||||
}
|
||||
]
|
||||
|
||||
trimmed_count = self.filter._trim_native_tool_outputs(messages, "en-US")
|
||||
|
||||
self.assertEqual(trimmed_count, 1)
|
||||
self.assertIn(
|
||||
'result=""... [Content collapsed] ...""',
|
||||
messages[0]["content"],
|
||||
)
|
||||
self.assertNotIn("x" * 200, messages[0]["content"])
|
||||
self.assertTrue(messages[0]["metadata"]["tool_outputs_trimmed"])
|
||||
|
||||
def test_function_calling_mode_reads_params_fallback(self):
|
||||
self.assertEqual(
|
||||
self.filter._get_function_calling_mode(
|
||||
{"params": {"function_calling": "native"}}
|
||||
),
|
||||
"native",
|
||||
)
|
||||
|
||||
def test_function_calling_mode_infers_native_from_message_shape(self):
|
||||
self.assertEqual(
|
||||
self.filter._get_function_calling_mode(
|
||||
{
|
||||
"messages": [
|
||||
{
|
||||
"role": "assistant",
|
||||
"tool_calls": [{"id": "call_1", "type": "function"}],
|
||||
"content": "",
|
||||
},
|
||||
{"role": "tool", "content": "tool result"},
|
||||
]
|
||||
}
|
||||
),
|
||||
"native",
|
||||
)
|
||||
|
||||
def test_trim_native_tool_outputs_handles_pending_tool_chain(self):
|
||||
messages = [
|
||||
{
|
||||
"role": "assistant",
|
||||
"tool_calls": [{"id": "call_1", "type": "function"}],
|
||||
"content": "",
|
||||
},
|
||||
{"role": "tool", "content": "x" * 1600},
|
||||
]
|
||||
|
||||
trimmed_count = self.filter._trim_native_tool_outputs(messages, "en-US")
|
||||
|
||||
self.assertEqual(trimmed_count, 1)
|
||||
self.assertEqual(messages[1]["content"], "... [Content collapsed] ...")
|
||||
self.assertTrue(messages[1]["metadata"]["is_trimmed"])
|
||||
|
||||
def test_target_progress_uses_original_history_coordinates(self):
|
||||
self.filter.valves.keep_last = 2
|
||||
summary_message = self.filter._build_summary_message(
|
||||
"older summary", "en-US", 6
|
||||
)
|
||||
messages = [
|
||||
{"role": "system", "content": "System prompt"},
|
||||
summary_message,
|
||||
{"role": "user", "content": "Question 1"},
|
||||
{"role": "assistant", "content": "Answer 1"},
|
||||
{"role": "user", "content": "Question 2"},
|
||||
{"role": "assistant", "content": "Answer 2"},
|
||||
]
|
||||
|
||||
self.assertEqual(self.filter._get_original_history_count(messages), 10)
|
||||
self.assertEqual(self.filter._calculate_target_compressed_count(messages), 8)
|
||||
|
||||
def test_load_full_chat_messages_rebuilds_active_history_branch(self):
|
||||
class FakeChats:
|
||||
@staticmethod
|
||||
def get_chat_by_id(chat_id):
|
||||
return types.SimpleNamespace(
|
||||
chat={
|
||||
"history": {
|
||||
"currentId": "m3",
|
||||
"messages": {
|
||||
"m1": {
|
||||
"id": "m1",
|
||||
"role": "user",
|
||||
"content": "Question",
|
||||
},
|
||||
"m2": {
|
||||
"id": "m2",
|
||||
"role": "assistant",
|
||||
"content": "Tool call",
|
||||
"tool_calls": [{"id": "call_1"}],
|
||||
"parentId": "m1",
|
||||
},
|
||||
"m3": {
|
||||
"id": "m3",
|
||||
"role": "tool",
|
||||
"content": "Tool result",
|
||||
"tool_call_id": "call_1",
|
||||
"parentId": "m2",
|
||||
},
|
||||
},
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
original_chats = module.Chats
|
||||
module.Chats = FakeChats
|
||||
try:
|
||||
messages = self.filter._load_full_chat_messages("chat-1")
|
||||
finally:
|
||||
module.Chats = original_chats
|
||||
|
||||
self.assertEqual([message["id"] for message in messages], ["m1", "m2", "m3"])
|
||||
self.assertEqual(messages[2]["role"], "tool")
|
||||
|
||||
def test_outlet_unfolds_compact_tool_details_view(self):
|
||||
compact_messages = [
|
||||
{"role": "user", "content": "U1"},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": (
|
||||
'<details type="tool_calls" done="true" id="call-1" '
|
||||
'name="search_notes" arguments=""{}"" '
|
||||
f'result=""{"x" * 3000}"">\n'
|
||||
"<summary>Tool Executed</summary>\n"
|
||||
"</details>\n"
|
||||
"Answer 1"
|
||||
),
|
||||
},
|
||||
{"role": "user", "content": "U2"},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": (
|
||||
'<details type="tool_calls" done="true" id="call-2" '
|
||||
'name="merge_notes" arguments=""{}"" '
|
||||
f'result=""{"y" * 4000}"">\n'
|
||||
"<summary>Tool Executed</summary>\n"
|
||||
"</details>\n"
|
||||
"Answer 2"
|
||||
),
|
||||
},
|
||||
]
|
||||
|
||||
async def fake_user_context(__user__, __event_call__):
|
||||
return {"user_language": "en-US"}
|
||||
|
||||
async def noop_log(*args, **kwargs):
|
||||
return None
|
||||
|
||||
create_task_called = False
|
||||
|
||||
def fake_create_task(coro):
|
||||
nonlocal create_task_called
|
||||
create_task_called = True
|
||||
coro.close()
|
||||
return None
|
||||
|
||||
self.filter._get_user_context = fake_user_context
|
||||
self.filter._get_chat_context = lambda body, metadata=None: {
|
||||
"chat_id": "chat-1",
|
||||
"message_id": "msg-1",
|
||||
}
|
||||
self.filter._should_skip_compression = lambda body, model: False
|
||||
self.filter._log = noop_log
|
||||
|
||||
# Set a low threshold so the task is guaranteed to trigger
|
||||
self.filter.valves.compression_threshold_tokens = 100
|
||||
|
||||
original_create_task = asyncio.create_task
|
||||
asyncio.create_task = fake_create_task
|
||||
try:
|
||||
asyncio.run(
|
||||
self.filter.outlet(
|
||||
{"model": "test-model", "messages": compact_messages},
|
||||
__event_call__=None,
|
||||
)
|
||||
)
|
||||
finally:
|
||||
asyncio.create_task = original_create_task
|
||||
|
||||
self.assertTrue(create_task_called)
|
||||
|
||||
def test_summary_save_progress_matches_truncated_input(self):
|
||||
self.filter.valves.keep_first = 1
|
||||
self.filter.valves.keep_last = 1
|
||||
self.filter.valves.summary_model = "fake-summary-model"
|
||||
self.filter.valves.summary_model_max_context = 0
|
||||
|
||||
captured = {}
|
||||
events = []
|
||||
|
||||
async def mock_emitter(event):
|
||||
events.append(event)
|
||||
|
||||
async def mock_summary_llm(
|
||||
previous_summary,
|
||||
new_conversation_text,
|
||||
body,
|
||||
user_data,
|
||||
__event_call__,
|
||||
):
|
||||
return "new summary"
|
||||
|
||||
def mock_save_summary(chat_id, summary, compressed_count):
|
||||
captured["chat_id"] = chat_id
|
||||
captured["summary"] = summary
|
||||
captured["compressed_count"] = compressed_count
|
||||
|
||||
async def noop_log(*args, **kwargs):
|
||||
return None
|
||||
|
||||
self.filter._log = noop_log
|
||||
self.filter._call_summary_llm = mock_summary_llm
|
||||
self.filter._save_summary = mock_save_summary
|
||||
self.filter._get_model_thresholds = lambda model_id: {
|
||||
"max_context_tokens": 3500
|
||||
}
|
||||
self.filter._calculate_messages_tokens = lambda messages: len(messages) * 1000
|
||||
self.filter._count_tokens = lambda text: 1000
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": "System prompt"},
|
||||
{"role": "user", "content": "Question 1"},
|
||||
{"role": "assistant", "content": "Answer 1"},
|
||||
{"role": "user", "content": "Question 2"},
|
||||
{"role": "assistant", "content": "Answer 2"},
|
||||
{"role": "user", "content": "Question 3"},
|
||||
]
|
||||
|
||||
asyncio.run(
|
||||
self.filter._generate_summary_async(
|
||||
messages=messages,
|
||||
chat_id="chat-1",
|
||||
body={"model": "fake-summary-model"},
|
||||
user_data={"id": "user-1"},
|
||||
target_compressed_count=5,
|
||||
lang="en-US",
|
||||
__event_emitter__=mock_emitter,
|
||||
__event_call__=None,
|
||||
)
|
||||
)
|
||||
|
||||
self.assertEqual(captured["chat_id"], "chat-1")
|
||||
self.assertEqual(captured["summary"], "new summary")
|
||||
self.assertEqual(captured["compressed_count"], 2)
|
||||
self.assertTrue(any(event["type"] == "status" for event in events))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
17
plugins/filters/async-context-compression/v1.4.1.md
Normal file
17
plugins/filters/async-context-compression/v1.4.1.md
Normal file
@@ -0,0 +1,17 @@
|
||||
[](https://openwebui.com/f/fujie/async_context_compression)
|
||||
|
||||
## Overview
|
||||
|
||||
This release addresses the critical progress coordinate drift issue in OpenWebUI's `outlet` phase, ensuring robust summarization for long tool-calling conversations.
|
||||
|
||||
[View on GitHub](https://github.com/Fu-Jie/openwebui-extensions/blob/main/plugins/filters/async-context-compression/README.md)
|
||||
|
||||
- **New Features**
|
||||
- **Reverse-Unfolding Mechanism**: Accurately reconstructs the expanded native tool-calling sequence during the outlet phase to permanently fix coordinate drift and missing summaries for long tool-based conversations.
|
||||
- **Safer Tool Trimming**: Refactored `enable_tool_output_trimming` to strictly use atomic block groups for safe trimming, completely preventing JSON payload corruption.
|
||||
|
||||
- **Bug Fixes**
|
||||
- Fixed coordinate drift where `compressed_message_count` could lose track due to OpenWebUI's frontend view truncating tool calls.
|
||||
|
||||
- **Related Issues**
|
||||
- Closes #56
|
||||
Reference in New Issue
Block a user