fix(async-context-compression): resolve race condition, update role to assistant, bump to v1.1.3

This commit is contained in:
fujie
2026-01-12 01:45:58 +08:00
parent d5c099dd15
commit 34b2c3d6cf
8 changed files with 74 additions and 124 deletions

View File

@@ -1,7 +1,7 @@
# Async Context Compression # Async Context Compression
<span class="category-badge filter">Filter</span> <span class="category-badge filter">Filter</span>
<span class="version-badge">v1.1.2</span> <span class="version-badge">v1.1.3</span>
Reduces token consumption in long conversations through intelligent summarization while maintaining conversational coherence. Reduces token consumption in long conversations through intelligent summarization while maintaining conversational coherence.
@@ -32,6 +32,8 @@ This is especially useful for:
- :material-console: **Frontend Debugging**: Debug logs in browser console - :material-console: **Frontend Debugging**: Debug logs in browser console
- :material-alert-circle-check: **Enhanced Error Reporting**: Clear error status notifications - :material-alert-circle-check: **Enhanced Error Reporting**: Clear error status notifications
- :material-check-all: **Open WebUI v0.7.x Compatibility**: Dynamic DB session handling - :material-check-all: **Open WebUI v0.7.x Compatibility**: Dynamic DB session handling
- :material-account-convert: **Improved Compatibility**: Summary role changed to `assistant`
- :material-shield-check: **Enhanced Stability**: Resolved race conditions in state management
--- ---

View File

@@ -1,7 +1,7 @@
# Async Context Compression异步上下文压缩 # Async Context Compression异步上下文压缩
<span class="category-badge filter">Filter</span> <span class="category-badge filter">Filter</span>
<span class="version-badge">v1.1.2</span> <span class="version-badge">v1.1.3</span>
通过智能摘要减少长对话的 token 消耗,同时保持对话连贯。 通过智能摘要减少长对话的 token 消耗,同时保持对话连贯。
@@ -32,6 +32,8 @@ Async Context Compression 过滤器通过以下方式帮助管理长对话的 to
- :material-console: **前端调试**:支持浏览器控制台日志 - :material-console: **前端调试**:支持浏览器控制台日志
- :material-alert-circle-check: **增强错误报告**:清晰的错误状态通知 - :material-alert-circle-check: **增强错误报告**:清晰的错误状态通知
- :material-check-all: **Open WebUI v0.7.x 兼容性**:动态数据库会话处理 - :material-check-all: **Open WebUI v0.7.x 兼容性**:动态数据库会话处理
- :material-account-convert: **兼容性提升**:摘要角色改为 `assistant`
- :material-shield-check: **稳定性增强**:解决状态管理竞态条件
--- ---

View File

@@ -22,7 +22,7 @@ Filters act as middleware in the message pipeline:
Reduces token consumption in long conversations through intelligent summarization while maintaining coherence. Reduces token consumption in long conversations through intelligent summarization while maintaining coherence.
**Version:** 1.1.2 **Version:** 1.1.3
[:octicons-arrow-right-24: Documentation](async-context-compression.md) [:octicons-arrow-right-24: Documentation](async-context-compression.md)

View File

@@ -22,7 +22,7 @@ Filter 充当消息管线中的中间件:
通过智能总结减少长对话的 token 消耗,同时保持连贯性。 通过智能总结减少长对话的 token 消耗,同时保持连贯性。
**版本:** 1.1.0 **版本:** 1.1.3
[:octicons-arrow-right-24: 查看文档](async-context-compression.md) [:octicons-arrow-right-24: 查看文档](async-context-compression.md)

View File

@@ -1,9 +1,14 @@
# Async Context Compression Filter # Async Context Compression Filter
**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.1.2 | **License:** MIT **Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.1.3 | **License:** MIT
This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent. This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.
## What's new in 1.1.3
- **Improved Compatibility**: Changed summary injection role from `user` to `assistant` for better compatibility across different LLMs.
- **Enhanced Stability**: Fixed a race condition in state management that could cause "inlet state not found" warnings in high-concurrency scenarios.
- **Bug Fixes**: Corrected default model handling to prevent misleading logs when no model is specified.
## What's new in 1.1.2 ## What's new in 1.1.2
- **Open WebUI v0.7.x Compatibility**: Resolved a critical database session binding error affecting Open WebUI v0.7.x users. The plugin now dynamically discovers the database engine and session context, ensuring compatibility across versions. - **Open WebUI v0.7.x Compatibility**: Resolved a critical database session binding error affecting Open WebUI v0.7.x users. The plugin now dynamically discovers the database engine and session context, ensuring compatibility across versions.
@@ -15,12 +20,7 @@ This filter reduces token consumption in long conversations through intelligent
- **Frontend Debugging**: Added `show_debug_log` option to print debug info to the browser console (F12). - **Frontend Debugging**: Added `show_debug_log` option to print debug info to the browser console (F12).
- **Optimized Compression**: Improved token calculation logic to prevent aggressive truncation of history, ensuring more context is retained. - **Optimized Compression**: Improved token calculation logic to prevent aggressive truncation of history, ensuring more context is retained.
## What's new in 1.1.0
- Reuses Open WebUI's shared database connection by default (no custom engine or env vars required).
- Token-based thresholds (`compression_threshold_tokens`, `max_context_tokens`) for safer long-context handling.
- Per-model overrides via `model_thresholds` for mixed-model workflows.
- Documentation now mirrors the latest async workflow and retention-first injection.
--- ---

View File

@@ -1,11 +1,16 @@
# 异步上下文压缩过滤器 # 异步上下文压缩过滤器
**作者:** [Fu-Jie](https://github.com/Fu-Jie) | **版本:** 1.1.2 | **许可证:** MIT **作者:** [Fu-Jie](https://github.com/Fu-Jie) | **版本:** 1.1.3 | **许可证:** MIT
> **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。 > **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。
本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。 本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。
## 1.1.3 版本更新
- **兼容性提升**: 将摘要注入角色从 `user` 改为 `assistant`,以提高在不同 LLM 之间的兼容性。
- **稳定性增强**: 修复了状态管理中的竞态条件,解决了高并发场景下可能出现的“无法获取 inlet 状态”警告。
- **Bug 修复**: 修正了默认模型处理逻辑,防止在未指定模型时产生误导性日志。
## 1.1.2 版本更新 ## 1.1.2 版本更新
- **Open WebUI v0.7.x 兼容性**: 修复了影响 Open WebUI v0.7.x 用户的严重数据库会话绑定错误。插件现在动态发现数据库引擎和会话上下文,确保跨版本兼容性。 - **Open WebUI v0.7.x 兼容性**: 修复了影响 Open WebUI v0.7.x 用户的严重数据库会话绑定错误。插件现在动态发现数据库引擎和会话上下文,确保跨版本兼容性。
@@ -17,12 +22,7 @@
- **前端调试**: 新增 `show_debug_log` 选项,支持在浏览器控制台 (F12) 打印调试信息。 - **前端调试**: 新增 `show_debug_log` 选项,支持在浏览器控制台 (F12) 打印调试信息。
- **压缩优化**: 优化 Token 计算逻辑,防止历史记录被过度截断,保留更多上下文。 - **压缩优化**: 优化 Token 计算逻辑,防止历史记录被过度截断,保留更多上下文。
## 1.1.0 版本更新
- 默认复用 OpenWebUI 内置数据库连接,无需自建引擎、无需配置 `DATABASE_URL`
- 基于 Token 的阈值控制(`compression_threshold_tokens``max_context_tokens`),长上下文更安全。
- 支持 `model_thresholds` 为不同模型设置专属阈值,适合混用多模型场景。
- 文档同步最新异步工作流与“先保留再注入”策略。
--- ---

View File

@@ -5,7 +5,7 @@ author: Fu-Jie
author_url: https://github.com/Fu-Jie author_url: https://github.com/Fu-Jie
funding_url: https://github.com/Fu-Jie/awesome-openwebui funding_url: https://github.com/Fu-Jie/awesome-openwebui
description: Reduces token consumption in long conversations while maintaining coherence through intelligent summarization and message compression. description: Reduces token consumption in long conversations while maintaining coherence through intelligent summarization and message compression.
version: 1.1.2 version: 1.1.3
openwebui_id: b1655bc8-6de9-4cad-8cb5-a6f7829a02ce openwebui_id: b1655bc8-6de9-4cad-8cb5-a6f7829a02ce
license: MIT license: MIT
@@ -370,7 +370,10 @@ class Filter:
self.valves = self.Valves() self.valves = self.Valves()
self._owui_db = owui_db self._owui_db = owui_db
self._db_engine = owui_engine self._db_engine = owui_engine
self.temp_state = {} # Used to pass temporary data between inlet and outlet self._db_engine = owui_engine
self._fallback_session_factory = (
sessionmaker(bind=self._db_engine) if self._db_engine else None
)
self._fallback_session_factory = ( self._fallback_session_factory = (
sessionmaker(bind=self._db_engine) if self._db_engine else None sessionmaker(bind=self._db_engine) if self._db_engine else None
) )
@@ -638,42 +641,6 @@ class Filter:
return "" return ""
def _inject_summary_to_first_message(self, message: dict, summary: str) -> dict:
"""Injects the summary into the first message (prepended to content)."""
content = message.get("content", "")
summary_block = f"【Historical Conversation Summary】\n{summary}\n\n---\nBelow is the recent conversation:\n\n"
# Handle different content types
if isinstance(content, list): # Multimodal content
# Find the first text part and insert the summary before it
new_content = []
summary_inserted = False
for part in content:
if (
isinstance(part, dict)
and part.get("type") == "text"
and not summary_inserted
):
# Prepend summary to the first text part
new_content.append(
{"type": "text", "text": summary_block + part.get("text", "")}
)
summary_inserted = True
else:
new_content.append(part)
# If no text part, insert at the beginning
if not summary_inserted:
new_content.insert(0, {"type": "text", "text": summary_block})
message["content"] = new_content
elif isinstance(content, str): # Plain text
message["content"] = summary_block + content
return message
async def _emit_debug_log( async def _emit_debug_log(
self, self,
__event_call__, __event_call__,
@@ -803,15 +770,9 @@ class Filter:
# Target is to compress up to the (total - keep_last) message # Target is to compress up to the (total - keep_last) message
target_compressed_count = max(0, len(messages) - self.valves.keep_last) target_compressed_count = max(0, len(messages) - self.valves.keep_last)
# [Optimization] Simple state cleanup check # Record the target compression progress for the original messages, for use in outlet
if chat_id in self.temp_state: # Target is to compress up to the (total - keep_last) message
await self._log( target_compressed_count = max(0, len(messages) - self.valves.keep_last)
f"[Inlet] ⚠️ Overwriting unconsumed old state (Chat ID: {chat_id})",
type="warning",
event_call=__event_call__,
)
self.temp_state[chat_id] = target_compressed_count
await self._log( await self._log(
f"[Inlet] Recorded target compression progress: {target_compressed_count}", f"[Inlet] Recorded target compression progress: {target_compressed_count}",
@@ -844,7 +805,7 @@ class Filter:
f"---\n" f"---\n"
f"Below is the recent conversation:" f"Below is the recent conversation:"
) )
summary_msg = {"role": "user", "content": summary_content} summary_msg = {"role": "assistant", "content": summary_content}
# 3. Tail messages (Tail) - All messages starting from the last compression point # 3. Tail messages (Tail) - All messages starting from the last compression point
# Note: Must ensure head messages are not duplicated # Note: Must ensure head messages are not duplicated
@@ -914,18 +875,29 @@ class Filter:
event_call=__event_call__, event_call=__event_call__,
) )
return body return body
model = body.get("model", "gpt-3.5-turbo") model = body.get("model") or ""
# Calculate target compression progress directly
# Assuming body['messages'] in outlet contains the full history (including new response)
messages = body.get("messages", [])
target_compressed_count = max(0, len(messages) - self.valves.keep_last)
if self.valves.debug_mode or self.valves.show_debug_log: if self.valves.debug_mode or self.valves.show_debug_log:
await self._log( await self._log(
f"\n{'='*60}\n[Outlet] Chat ID: {chat_id}\n[Outlet] Response complete", f"\n{'='*60}\n[Outlet] Chat ID: {chat_id}\n[Outlet] Response complete\n[Outlet] Calculated target compression progress: {target_compressed_count} (Messages: {len(messages)})",
event_call=__event_call__, event_call=__event_call__,
) )
# Process Token calculation and summary generation asynchronously in the background (do not wait for completion, do not affect output) # Process Token calculation and summary generation asynchronously in the background (do not wait for completion, do not affect output)
asyncio.create_task( asyncio.create_task(
self._check_and_generate_summary_async( self._check_and_generate_summary_async(
chat_id, model, body, __user__, __event_emitter__, __event_call__ chat_id,
model,
body,
__user__,
target_compressed_count,
__event_emitter__,
__event_call__,
) )
) )
@@ -942,6 +914,7 @@ class Filter:
model: str, model: str,
body: dict, body: dict,
user_data: Optional[dict], user_data: Optional[dict],
target_compressed_count: Optional[int],
__event_emitter__: Callable[[Any], Awaitable[None]] = None, __event_emitter__: Callable[[Any], Awaitable[None]] = None,
__event_call__: Callable[[Any], Awaitable[None]] = None, __event_call__: Callable[[Any], Awaitable[None]] = None,
): ):
@@ -986,6 +959,7 @@ class Filter:
chat_id, chat_id,
body, body,
user_data, user_data,
target_compressed_count,
__event_emitter__, __event_emitter__,
__event_call__, __event_call__,
) )
@@ -1015,6 +989,7 @@ class Filter:
chat_id: str, chat_id: str,
body: dict, body: dict,
user_data: Optional[dict], user_data: Optional[dict],
target_compressed_count: Optional[int],
__event_emitter__: Callable[[Any], Awaitable[None]] = None, __event_emitter__: Callable[[Any], Awaitable[None]] = None,
__event_call__: Callable[[Any], Awaitable[None]] = None, __event_call__: Callable[[Any], Awaitable[None]] = None,
): ):
@@ -1031,12 +1006,11 @@ class Filter:
) )
# 1. Get target compression progress # 1. Get target compression progress
# Prioritize getting from temp_state (calculated by inlet). If unavailable (e.g., after restart), assume current is full history. # If target_compressed_count is not passed (should not happen with new logic), estimate it
target_compressed_count = self.temp_state.pop(chat_id, None)
if target_compressed_count is None: if target_compressed_count is None:
target_compressed_count = max(0, len(messages) - self.valves.keep_last) target_compressed_count = max(0, len(messages) - self.valves.keep_last)
await self._log( await self._log(
f"[🤖 Async Summary Task] ⚠️ Could not get inlet state, estimating progress using current message count: {target_compressed_count}", f"[🤖 Async Summary Task] ⚠️ target_compressed_count is None, estimating: {target_compressed_count}",
type="warning", type="warning",
event_call=__event_call__, event_call=__event_call__,
) )

View File

@@ -5,7 +5,7 @@ author: Fu-Jie
author_url: https://github.com/Fu-Jie author_url: https://github.com/Fu-Jie
funding_url: https://github.com/Fu-Jie/awesome-openwebui funding_url: https://github.com/Fu-Jie/awesome-openwebui
description: 通过智能摘要和消息压缩,降低长对话的 token 消耗,同时保持对话连贯性。 description: 通过智能摘要和消息压缩,降低长对话的 token 消耗,同时保持对话连贯性。
version: 1.1.2 version: 1.1.3
openwebui_id: 5c0617cb-a9e4-4bd6-a440-d276534ebd18 openwebui_id: 5c0617cb-a9e4-4bd6-a440-d276534ebd18
license: MIT license: MIT
@@ -290,7 +290,8 @@ class Filter:
self.valves = self.Valves() self.valves = self.Valves()
self._db_engine = owui_engine self._db_engine = owui_engine
self._SessionLocal = owui_Session self._SessionLocal = owui_Session
self.temp_state = {} # 用于在 inlet 和 outlet 之间传递临时数据 self._SessionLocal = owui_Session
self._init_database()
self._init_database() self._init_database()
def _init_database(self): def _init_database(self):
@@ -471,42 +472,6 @@ class Filter:
"max_context_tokens": self.valves.max_context_tokens, "max_context_tokens": self.valves.max_context_tokens,
} }
def _inject_summary_to_first_message(self, message: dict, summary: str) -> dict:
"""将摘要注入到第一条消息中(追加到内容前面)"""
content = message.get("content", "")
summary_block = f"【历史对话摘要】\n{summary}\n\n---\n以下是最近的对话:\n\n"
# 处理不同内容类型
if isinstance(content, list): # 多模态内容
# 查找第一个文本部分并在其前面插入摘要
new_content = []
summary_inserted = False
for part in content:
if (
isinstance(part, dict)
and part.get("type") == "text"
and not summary_inserted
):
# 在第一个文本部分前插入摘要
new_content.append(
{"type": "text", "text": summary_block + part.get("text", "")}
)
summary_inserted = True
else:
new_content.append(part)
# 如果没有文本部分,在开头插入
if not summary_inserted:
new_content.insert(0, {"type": "text", "text": summary_block})
message["content"] = new_content
elif isinstance(content, str): # 纯文本
message["content"] = summary_block + content
return message
async def _emit_debug_log( async def _emit_debug_log(
self, self,
__event_call__, __event_call__,
@@ -628,15 +593,9 @@ class Filter:
# 目标是压缩到倒数第 keep_last 条之前 # 目标是压缩到倒数第 keep_last 条之前
target_compressed_count = max(0, len(messages) - self.valves.keep_last) target_compressed_count = max(0, len(messages) - self.valves.keep_last)
# [优化] 简单的状态清理检查 # 记录原始消息的目标压缩进度,供 outlet 使用
if chat_id in self.temp_state: # 目标是压缩到倒数第 keep_last 条之前
await self._log( target_compressed_count = max(0, len(messages) - self.valves.keep_last)
f"[Inlet] ⚠️ 覆盖未消费的旧状态 (Chat ID: {chat_id})",
type="warning",
event_call=__event_call__,
)
self.temp_state[chat_id] = target_compressed_count
await self._log( await self._log(
f"[Inlet] 记录目标压缩进度: {target_compressed_count}", f"[Inlet] 记录目标压缩进度: {target_compressed_count}",
@@ -669,7 +628,7 @@ class Filter:
f"---\n" f"---\n"
f"以下是最近的对话:" f"以下是最近的对话:"
) )
summary_msg = {"role": "user", "content": summary_content} summary_msg = {"role": "assistant", "content": summary_content}
# 3. 尾部消息 (Tail) - 从上次压缩点开始的所有消息 # 3. 尾部消息 (Tail) - 从上次压缩点开始的所有消息
# 注意:这里必须确保不重复包含头部消息 # 注意:这里必须确保不重复包含头部消息
@@ -732,18 +691,29 @@ class Filter:
在后台计算 Token 数并触发摘要生成(不阻塞当前响应,不影响内容输出) 在后台计算 Token 数并触发摘要生成(不阻塞当前响应,不影响内容输出)
""" """
chat_id = __metadata__["chat_id"] chat_id = __metadata__["chat_id"]
model = body.get("model", "gpt-3.5-turbo") model = body.get("model") or ""
# 直接计算目标压缩进度
# 假设 outlet 中的 body['messages'] 包含完整历史(包括新响应)
messages = body.get("messages", [])
target_compressed_count = max(0, len(messages) - self.valves.keep_last)
if self.valves.debug_mode or self.valves.show_debug_log: if self.valves.debug_mode or self.valves.show_debug_log:
await self._log( await self._log(
f"\n{'='*60}\n[Outlet] Chat ID: {chat_id}\n[Outlet] 响应完成", f"\n{'='*60}\n[Outlet] Chat ID: {chat_id}\n[Outlet] 响应完成\n[Outlet] 计算目标压缩进度: {target_compressed_count} (消息数: {len(messages)})",
event_call=__event_call__, event_call=__event_call__,
) )
# 在后台异步处理 Token 计算和摘要生成(不等待完成,不影响输出) # 在后台异步处理 Token 计算和摘要生成(不等待完成,不影响输出)
asyncio.create_task( asyncio.create_task(
self._check_and_generate_summary_async( self._check_and_generate_summary_async(
chat_id, model, body, __user__, __event_emitter__, __event_call__ chat_id,
model,
body,
__user__,
target_compressed_count,
__event_emitter__,
__event_call__,
) )
) )
@@ -760,6 +730,7 @@ class Filter:
model: str, model: str,
body: dict, body: dict,
user_data: Optional[dict], user_data: Optional[dict],
target_compressed_count: Optional[int],
__event_emitter__: Callable[[Any], Awaitable[None]] = None, __event_emitter__: Callable[[Any], Awaitable[None]] = None,
__event_call__: Callable[[Any], Awaitable[None]] = None, __event_call__: Callable[[Any], Awaitable[None]] = None,
): ):
@@ -804,6 +775,7 @@ class Filter:
chat_id, chat_id,
body, body,
user_data, user_data,
target_compressed_count,
__event_emitter__, __event_emitter__,
__event_call__, __event_call__,
) )
@@ -833,6 +805,7 @@ class Filter:
chat_id: str, chat_id: str,
body: dict, body: dict,
user_data: Optional[dict], user_data: Optional[dict],
target_compressed_count: Optional[int],
__event_emitter__: Callable[[Any], Awaitable[None]] = None, __event_emitter__: Callable[[Any], Awaitable[None]] = None,
__event_call__: Callable[[Any], Awaitable[None]] = None, __event_call__: Callable[[Any], Awaitable[None]] = None,
): ):
@@ -847,12 +820,11 @@ class Filter:
await self._log(f"\n[🤖 异步摘要任务] 开始...", event_call=__event_call__) await self._log(f"\n[🤖 异步摘要任务] 开始...", event_call=__event_call__)
# 1. 获取目标压缩进度 # 1. 获取目标压缩进度
# 优先从 temp_state 获取(由 inlet 计算),如果获取不到(例如重启后),则假设当前是完整历史 # 如果未传递 target_compressed_count新逻辑下不应发生则进行估算
target_compressed_count = self.temp_state.pop(chat_id, None)
if target_compressed_count is None: if target_compressed_count is None:
target_compressed_count = max(0, len(messages) - self.valves.keep_last) target_compressed_count = max(0, len(messages) - self.valves.keep_last)
await self._log( await self._log(
f"[🤖 异步摘要任务] ⚠️ 无法获取 inlet 状态,使用当前消息数估算进度: {target_compressed_count}", f"[🤖 异步摘要任务] ⚠️ target_compressed_count 为 None进行估算: {target_compressed_count}",
type="warning", type="warning",
event_call=__event_call__, event_call=__event_call__,
) )