diff --git a/.agent/learnings/async-context-compression-progress-mapping.md b/.agent/learnings/async-context-compression-progress-mapping.md new file mode 100644 index 0000000..9b835e1 --- /dev/null +++ b/.agent/learnings/async-context-compression-progress-mapping.md @@ -0,0 +1,27 @@ +# Async Context Compression Progress Mapping + +> Discovered: 2026-03-10 + +## Context +Applies to `plugins/filters/async-context-compression/async_context_compression.py` once the inlet has already replaced early history with a synthetic summary message. + +## Finding +`compressed_message_count` cannot be recalculated from the visible message list length after compression. Once a summary marker is present, the visible list mixes: +- preserved head messages that are still before the saved boundary +- one synthetic summary message +- tail messages that map to original history starting at the saved boundary + +## Solution / Pattern +Store the original-history boundary on the injected summary message metadata, then recover future progress using: +- `original_count = covered_until + len(messages_after_summary_marker)` +- `target_progress = max(covered_until, original_count - keep_last)` + +When the summary-model window is too small, trim newest atomic groups from the summary input so the saved boundary still matches what the summary actually covers. + +## Gotchas +- If you trim from the head of the summary input, the saved progress can overstate coverage and hide messages that were never summarized. +- Status previews for the next context must convert the saved original-history boundary back into the current visible view before rebuilding head/summary/tail. +- `inlet(body["messages"])` and `outlet(body["messages"])` can both represent the full conversation while using different serializations: + - inlet may receive expanded native tool-call chains (`assistant(tool_calls) -> tool -> assistant`) + - outlet may receive a compact top-level transcript where tool calls are folded into assistant `
` blocks +- These two views do not share a safe `compressed_message_count` coordinate system. If outlet is in the compact assistant/details view, do not persist summary progress derived from its top-level message count. diff --git a/.agent/learnings/openwebui-tool-call-context-inflation.md b/.agent/learnings/openwebui-tool-call-context-inflation.md new file mode 100644 index 0000000..f5951f8 --- /dev/null +++ b/.agent/learnings/openwebui-tool-call-context-inflation.md @@ -0,0 +1,26 @@ +# OpenWebUI Tool Call Context Inflation + +> Discovered: 2026-03-11 + +## Context +When analyzing why the `async_context_compression` plugin sees different array lengths of `messages` between the `inlet` (e.g. 27 items) and `outlet` (e.g. 8 items) phases, especially when native tool calling (Function Calling) is involved in OpenWebUI. + +## Finding +There is a fundamental disparity in how OpenWebUI serializes conversational history at different stages of the request lifecycle: + +1. **Outlet (UI Rendering View)**: + After the LLM completes generation and tools have been executed, OpenWebUI's `middleware.py` (and streaming builders) bundles intermediate tool calls and their raw results. It hides them inside an HTML `
...
` block within a single `role: assistant` message's `content`. + Concurrently, the actual native API tool-calling data is saved in a hidden `output` dict field attached to that message. At this stage, the `messages` array looks short (e.g., 8 items) because tool interactions are visually folded. + +2. **Inlet (LLM Native View)**: + When the user sends the *next* message, the request enters `main.py` -> `process_chat_payload` -> `middleware.py:process_messages_with_output()`. + Here, OpenWebUI scans historical `assistant` messages for that hidden `output` field. If found, it completely **inflates (unfolds)** the raw data back into an exact sequence of OpenAI-compliant `tool_call` and `tool_result` messages (using `utils/misc.py:convert_output_to_messages`). + The HTML `
` string is entirely discarded before being sent to the LLM. + +**Conclusion on Token Consumption**: +In the next turn, tool context is **NOT** compressed at all. It is fully re-expanded to its original verbose state (e.g., back to 27 items) and consumes the maximum amount of tokens required by the raw JSON arguments and results. + +## Gotchas +- Any logic operating in the `outlet` phase (like background tasks) that relies on the `messages` array index will be completely misaligned with the array seen in the `inlet` phase. +- Attempting to slice or trim history based on `outlet` array lengths will cause index out-of-bounds errors or destructive cropping of recent messages. +- The only safe way to bridge these two views is either to translate the folded view back into the expanded view using `convert_output_to_messages`, or to rely on unique `id` fields (if available) rather than array indices. \ No newline at end of file diff --git a/README.md b/README.md index f33561a..2608226 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,7 @@ A collection of enhancements, plugins, and prompts for [open-webui](https://gith ## 📊 Community Stats +> > ![updated](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_updated.json&style=flat) | 👤 Author | 👥 Followers | ⭐ Points | 🏆 Contributions | @@ -19,18 +20,19 @@ A collection of enhancements, plugins, and prompts for [open-webui](https://gith | :---: | :---: | :---: | :---: | :---: | | ![posts](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_posts.json&style=flat) | ![downloads](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_downloads.json&style=flat) | ![views](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_views.json&style=flat) | ![upvotes](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_upvotes.json&style=flat) | ![saves](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_saves.json&style=flat) | - ### 🔥 Top 6 Popular Plugins + | Rank | Plugin | Version | Downloads | Views | 📅 Updated | | :---: | :--- | :---: | :---: | :---: | :---: | | 🥇 | [Smart Mind Map](https://openwebui.com/posts/turn_any_text_into_beautiful_mind_maps_3094c59a) | ![v](https://img.shields.io/badge/v-1.0.0-blue?style=flat) | ![p1_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p1_dl.json&style=flat) | ![p1_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p1_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) | | 🥈 | [Smart Infographic](https://openwebui.com/posts/smart_infographic_ad6f0c7f) | ![v](https://img.shields.io/badge/v-1.5.0-blue?style=flat) | ![p2_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p2_dl.json&style=flat) | ![p2_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p2_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) | | 🥉 | [Markdown Normalizer](https://openwebui.com/posts/markdown_normalizer_baaa8732) | ![v](https://img.shields.io/badge/v-1.2.7-blue?style=flat) | ![p3_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p3_dl.json&style=flat) | ![p3_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p3_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) | | 4️⃣ | [Export to Word Enhanced](https://openwebui.com/posts/export_to_word_enhanced_formatting_fca6a315) | ![v](https://img.shields.io/badge/v-0.4.4-blue?style=flat) | ![p4_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p4_dl.json&style=flat) | ![p4_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p4_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) | -| 5️⃣ | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | ![v](https://img.shields.io/badge/v-1.4.0-blue?style=flat) | ![p5_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_dl.json&style=flat) | ![p5_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--09-gray?style=flat) | +| 5️⃣ | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | ![v](https://img.shields.io/badge/v-1.4.1-blue?style=flat) | ![p5_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_dl.json&style=flat) | ![p5_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--11-gray?style=flat) | | 6️⃣ | [AI Task Instruction Generator](https://openwebui.com/posts/ai_task_instruction_generator_9bab8b37) | ![v](https://img.shields.io/badge/v-N/A-gray?style=flat) | ![p6_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p6_dl.json&style=flat) | ![p6_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p6_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) | ### 📈 Total Downloads Trend + ![Activity](https://gist.githubusercontent.com/Fu-Jie/db3d95687075a880af6f1fba76d679c6/raw/chart.svg) *See full stats and charts in [Community Stats Report](./docs/community-stats.md)* diff --git a/README_CN.md b/README_CN.md index 6fb8f43..9879fec 100644 --- a/README_CN.md +++ b/README_CN.md @@ -6,6 +6,7 @@ OpenWebUI 增强功能集合。包含个人开发与收集的插件、提示词 ## 📊 社区统计 +> > ![updated_zh](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_updated_zh.json&style=flat) | 👤 作者 | 👥 粉丝 | ⭐ 积分 | 🏆 贡献 | @@ -16,18 +17,19 @@ OpenWebUI 增强功能集合。包含个人开发与收集的插件、提示词 | :---: | :---: | :---: | :---: | :---: | | ![posts](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_posts.json&style=flat) | ![downloads](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_downloads.json&style=flat) | ![views](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_views.json&style=flat) | ![upvotes](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_upvotes.json&style=flat) | ![saves](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_saves.json&style=flat) | - ### 🔥 热门插件 Top 6 + | 排名 | 插件 | 版本 | 下载 | 浏览 | 📅 更新 | | :---: | :--- | :---: | :---: | :---: | :---: | | 🥇 | [Smart Mind Map](https://openwebui.com/posts/turn_any_text_into_beautiful_mind_maps_3094c59a) | ![v](https://img.shields.io/badge/v-1.0.0-blue?style=flat) | ![p1_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p1_dl.json&style=flat) | ![p1_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p1_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) | | 🥈 | [Smart Infographic](https://openwebui.com/posts/smart_infographic_ad6f0c7f) | ![v](https://img.shields.io/badge/v-1.5.0-blue?style=flat) | ![p2_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p2_dl.json&style=flat) | ![p2_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p2_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) | | 🥉 | [Markdown Normalizer](https://openwebui.com/posts/markdown_normalizer_baaa8732) | ![v](https://img.shields.io/badge/v-1.2.7-blue?style=flat) | ![p3_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p3_dl.json&style=flat) | ![p3_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p3_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) | | 4️⃣ | [Export to Word Enhanced](https://openwebui.com/posts/export_to_word_enhanced_formatting_fca6a315) | ![v](https://img.shields.io/badge/v-0.4.4-blue?style=flat) | ![p4_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p4_dl.json&style=flat) | ![p4_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p4_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) | -| 5️⃣ | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | ![v](https://img.shields.io/badge/v-1.4.0-blue?style=flat) | ![p5_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_dl.json&style=flat) | ![p5_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--09-gray?style=flat) | +| 5️⃣ | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | ![v](https://img.shields.io/badge/v-1.4.1-blue?style=flat) | ![p5_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_dl.json&style=flat) | ![p5_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--11-gray?style=flat) | | 6️⃣ | [AI Task Instruction Generator](https://openwebui.com/posts/ai_task_instruction_generator_9bab8b37) | ![v](https://img.shields.io/badge/v-N/A-gray?style=flat) | ![p6_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p6_dl.json&style=flat) | ![p6_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p6_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) | ### 📈 总下载量累计趋势 + ![Activity](https://gist.githubusercontent.com/Fu-Jie/db3d95687075a880af6f1fba76d679c6/raw/chart.svg) *完整统计与趋势图请查看 [社区统计报告](./docs/community-stats.zh.md)* diff --git a/docs/plugins/filters/async-context-compression.md b/docs/plugins/filters/async-context-compression.md index 361ec14..b3abf02 100644 --- a/docs/plugins/filters/async-context-compression.md +++ b/docs/plugins/filters/async-context-compression.md @@ -1,15 +1,13 @@ # Async Context Compression Filter -**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.4.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT +**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.4.1 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent. -## What's new in 1.4.0 +## What's new in 1.4.1 -- **Atomic Message Grouping**: Introduced structure-aware grouping for `assistant-tool-tool-assistant` chains to prevent "No tool call found" errors. -- **Tail Boundary Alignment**: Implemented automatic correction for truncation points to ensure they don't fall inside a tool-calling sequence. -- **Chat Session Locking**: Added a session-based lock to prevent multiple concurrent summary tasks for the same chat ID. -- **Enhanced Traceability**: Improved summary formatting to include message IDs, names, and metadata for better context tracking. +- **Reverse-Unfolding Mechanism**: Accurately reconstructs the expanded native tool-calling sequence during the outlet phase to permanently fix coordinate drift and missing summaries for long tool-based conversations. +- **Safer Tool Trimming**: Refactored `enable_tool_output_trimming` to strictly use atomic block groups for safe trimming, completely preventing JSON payload corruption. --- diff --git a/docs/plugins/filters/async-context-compression.zh.md b/docs/plugins/filters/async-context-compression.zh.md index 9a1ca68..98794c8 100644 --- a/docs/plugins/filters/async-context-compression.zh.md +++ b/docs/plugins/filters/async-context-compression.zh.md @@ -1,17 +1,15 @@ # 异步上下文压缩过滤器 -**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.4.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT +**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.4.1 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT > **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。 本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。 -## 1.4.0 版本更新 +## 1.4.1 版本更新 -- **原子消息组 (Atomic Grouping)**: 引入结构感知的消息分组逻辑,确保工具调用链被整体保留或移除,彻底解决 "No tool call found" 错误。 -- **尾部边界自动对齐**: 实现了截断点的自动修正逻辑,确保历史上下文截断不会落在工具调用序列中间。 -- **会话级异步锁**: 增加了基于 `chat_id` 的后台任务锁,防止同一会话并发触发多个总结任务。 -- **元数据溯源增强**: 优化了总结输入格式,在总结中保留了消息 ID、参与者名称及关键元数据,提升上下文可追踪性。 +- **逆向展开机制**: 引入 `_unfold_messages` 机制以在 `outlet` 阶段精确对齐坐标系,彻底解决了由于前端视图折叠导致长轮次工具调用对话出现进度漂移或跳过生成摘要的问题。 +- **更安全的工具内容裁剪**: 重构了 `enable_tool_output_trimming`,现在严格使用原子级分组进行安全的原生工具内容裁剪,替代了激进的正则表达式匹配,防止 JSON 载荷损坏。 --- diff --git a/docs/plugins/filters/index.md b/docs/plugins/filters/index.md index 135be0d..fcf0bb5 100644 --- a/docs/plugins/filters/index.md +++ b/docs/plugins/filters/index.md @@ -22,7 +22,7 @@ Filters act as middleware in the message pipeline: Reduces token consumption in long conversations through intelligent summarization while maintaining coherence. - **Version:** 1.4.0 + **Version:** 1.4.1 [:octicons-arrow-right-24: Documentation](async-context-compression.md) diff --git a/docs/plugins/filters/index.zh.md b/docs/plugins/filters/index.zh.md index 084eab1..377b346 100644 --- a/docs/plugins/filters/index.zh.md +++ b/docs/plugins/filters/index.zh.md @@ -22,7 +22,7 @@ Filter 充当消息管线中的中间件: 通过智能总结减少长对话的 token 消耗,同时保持连贯性。 - **版本:** 1.4.0 + **版本:** 1.4.1 [:octicons-arrow-right-24: 查看文档](async-context-compression.md) diff --git a/plugins/debug/async_context_compression/ISSUE_EXPLANATION.md b/plugins/debug/async_context_compression/ISSUE_EXPLANATION.md new file mode 100644 index 0000000..9f2c99e --- /dev/null +++ b/plugins/debug/async_context_compression/ISSUE_EXPLANATION.md @@ -0,0 +1,62 @@ +# 异步上下文压缩插件:当前问题与处理状态总结 + +这份文档详细梳理了我们在处理 `async_context_compression`(异步上下文压缩插件)时,遭遇的“幽灵截断”问题的根本原因,以及我们目前的解决进度。 + +## 1. 根本原因:两种截然不同的“世界观”(数据序列化差异) + +在我们之前的排查中,我曾错误地认为:`outlet`(后置处理阶段)拿到的 `body["messages"]` 是由于截断导致的残缺数据。 +但根据您提供的本地运行日志,**您是对的,`body['messages']` 确实包含了完整的对话历史**。 + +那么为什么长度会产生 `inlet 看到 27 条`,而 `outlet 只看到 8 条` 这种巨大的差异? + +原因在于,OpenWebUI 的管道在进入大模型前和从大模型返回后,使用了**两种完全不同的消息格式**: + +### 视图 A:Inlet 阶段(原生 API 展开视图) +- **特点**:严格遵循 OpenAI 函数调用规范。 +- **状态**:每一次工具调用、工具返回,都被视为一条独立的 message。 +- **例子**:一个包含了复杂搜索的对话。 + - User: 帮我查一下天气(1条) + - Assistant: 发起 tool_call(1条) + - Tool: 返回 JSON 结果(1条) + - ...多次往复... + - **最终总计:27 条。**我们的压缩算法(trim)是基于这个 27 条的坐标系来计算保留多少条的。 + +### 视图 B:Outlet 阶段(UI HTML 折叠视图) +- **特点**:专为前端渲染优化的紧凑视图。 +- **状态**:OpenWebUI 在调用完模型后,为了让前端显示出那个好看的、可折叠的工具调用卡片,强行把中间所有的 Tool 交互过程,用 `
...
` 的 HTML 代码包裹起来,塞进了一个 `role: assistant` 的 `content` 字符串里! +- **例子**:同样的对话。 + - User: 帮我查一下天气(1条) + - Assistant: `
包含了好多次工具调用和结果的代码
今天天气很好...`(1条) + - **最终总计:8 条。** + +**💥 灾难发生点:** +原本的插件逻辑假定 `inlet` 和 `outlet` 共享同一个坐标系。 +1. 在 `inlet` 时,系统计算出:“我需要把前 10 条消息生成摘要,保留后 17 条”。 +2. 系统把“生成前10条摘要”的任务转入后台异步执行。 +3. 后台任务在 `outlet` 阶段被触发,此时它拿到的消息数组变成了**视图 B(总共只有 8 条)。** +4. 算法试图在只有 8 条消息的数组里,把“前 10 条消息”砍掉并替换为 1 条摘要。 +5. **结果就是:数组索引越界/坐标彻底错乱,触发报错,并且可能将最新的有效消息当成旧消息删掉(过度压缩)。** + +--- + +## 2. 目前已解决的问题 (✅ Done) + +为了立刻制止这种因为“坐标系错位”导致的数据破坏,我们已经落实了热修复(Local v1.4.0): + +**✅ 添加了“折叠视图”的探针防御:** +- 我写了一个函数 `_is_compact_tool_details_view`。 +- 现在,当后台触发生成摘要时,系统会自动扫描 `outlet` 传来的 `messages`。只要发现里面包含 `
` 这种带有 HTML 折叠标签的痕迹,就会**立刻终止并跳过**当前的摘要生成任务。 +- **收益**:彻底杜绝了因数组错位而引发的任务报错和强制裁切。UI 崩溃与历史丢失问题得到遏制。 + +--- + +## 3. 当前已解决的遗留问题 (✅ Done: 逆向展开修复) + +之前因为跳过生成而引入的新限制:**包含工具调用的长轮次对话,无法自动生成“历史摘要”** 的问题,现已彻底解决。 + +### 最终实施的技术方案: +我们通过源码分析发现,OpenWebUI 在进入 `inlet` 时会执行 `convert_output_to_messages` 还原工具调用链。因此,我们在插件的 `outlet` 阶段引入了相同的 **逆向展开 (Deflation/Unfolding)** 机制 `_unfold_messages`。 + +现在,当后台任务拿到 `outlet` 传来的折叠视图时,不会再选择“跳过”。而是自动提取出潜藏在消息对象体内部的原生 `output` 字段,并**将其重新展开为展开视图**(比如将 8 条假象重新还原为真实的 27 条底层数据),使得它的坐标系与 `inlet` 完全对齐。 + +至此,带有复杂工具调用的长轮次对话也能安全地进行背景自动压缩,不再有任何截断和强制删减的风险! \ No newline at end of file diff --git a/plugins/debug/async_context_compression/REPLY_TO_DHAERN_CN.md b/plugins/debug/async_context_compression/REPLY_TO_DHAERN_CN.md new file mode 100644 index 0000000..131a522 --- /dev/null +++ b/plugins/debug/async_context_compression/REPLY_TO_DHAERN_CN.md @@ -0,0 +1,60 @@ +# 回复 dhaern — 针对最新审查的跟进 + +感谢您重新审查了最新版本并提出了持续精准的分析意见。以下针对您剩余的两个关切点逐一回应。 + +--- + +### 1. `enable_tool_output_trimming` — 不是功能退化,而是行为变化是有意为之 + +裁剪逻辑依然存在且可正常运行。以下是当前版本与之前版本的行为对比。 + +**当前行为(`_trim_native_tool_outputs`,第 835–945 行):** +- 通过 `_get_atomic_groups` 遍历原子分组。 +- 识别有效的工具调用链:`assistant(tool_calls)` → `tool` → [可选的 assistant 跟进消息]。 +- 如果一条链内所有 `tool` 角色消息的字符数总和超过 **1,200 个字符**,则将 *tool 消息本身的内容* 折叠为一个本地化的 `[Content collapsed]` 占位符,并注入 `metadata.is_trimmed` 标志。 +- 同时遍历包含 `
` HTML 块的 assistant 消息,对其中尺寸过大的 `result` 属性进行相同的折叠处理。 +- 当 `enable_tool_output_trimming=True` 且 `function_calling=native` 时,该函数在 inlet 阶段被调用。 + +**与旧版本的区别:** +旧版的做法是改写 *assistant 跟进消息*,仅保留"最终答案"。新版的做法是折叠 *tool 响应内容本身*。两者都会缩减上下文体积,但新方法能够保留 tool 调用链的结构完整性(这是本次发布中原子分组工作的前提条件)。 + +插件头部的 docstring 里还有一段过时的描述("提取最终答案"),与实际行为相悖。最新提交中已将其更正为"将尺寸过大的原生工具输出折叠为简短占位符"。 + +如果您在寻找旧版本中"仅保留最终答案"的特定行为,该路径已被有意移除,因为它与本次发布引入的原子分组完整性保证相冲突。当前的折叠方案是安全的替代实现。 + +--- + +### 2. `compressed_message_count` — 修复是真实有效的;以下是坐标系追踪 + +您对"从已修改视图重新计算"的担忧,考虑到此前的架构背景,是完全可以理解的。以下精确说明为何当前代码不存在这一问题。 + +**`outlet` 中的关键变更:** +```python +db_messages = self._load_full_chat_messages(chat_id) +messages_to_unfold = db_messages if (db_messages and len(db_messages) >= len(messages)) else messages +summary_messages = self._unfold_messages(messages_to_unfold) +target_compressed_count = self._calculate_target_compressed_count(summary_messages) +``` + +`_load_full_chat_messages` 从 OpenWebUI 数据库中获取原始的持久化历史记录。由于在 inlet 渲染期间注入的合成 summary 消息**从未被回写到数据库**,从 DB 路径获取的 `summary_messages` 始终是干净的、未经修改的原始历史记录——没有 summary 标记,没有坐标膨胀。 + +在此干净列表上调用 `_calculate_target_compressed_count` 的计算逻辑如下(仍在原始历史坐标系内): +``` +original_count = len(db_messages) +raw_target = original_count - keep_last +target = atomic_align(raw_target) +``` + +这个 `target_compressed_count` 值原封不动地传递进 `_generate_summary_async`。在异步任务内部,同一批 `db_messages` 被切片为 `messages[start:target]` 来构建 `middle_messages`。生成完成后(可能从末尾进行原子截断),保存的值为: +```python +saved_compressed_count = start_index + len(middle_messages) +``` +这是原始 DB 消息列表中新摘要实际涵盖到的确切位置——不是目标值,也不是来自不同视图的估算值。 + +**回退路径(DB 不可用时)** 使用 inlet 渲染后的 body 消息。此时 `_get_summary_view_state` 会读取注入的 summary 标记的 `covered_until` 字段(该字段在写入时已记录为原子对齐后的 `start_index`),因此 `base_progress` 已经处于原始历史坐标系内,计算可以自然延续,不会混用两种视图。 + +简而言之:该字段在整个调用链中现在具有唯一、一致的语义——即原始持久化消息列表中,当前摘要文本实际覆盖到的索引位置。 + +--- + +再次感谢您严格的审查。您在上次发布后标记的这两个问题已得到处理,文档中的过时描述也已更正。如果发现其他问题,欢迎继续反馈。 diff --git a/plugins/debug/async_context_compression/REPLY_TO_DHAERN_EN.md b/plugins/debug/async_context_compression/REPLY_TO_DHAERN_EN.md new file mode 100644 index 0000000..5db9391 --- /dev/null +++ b/plugins/debug/async_context_compression/REPLY_TO_DHAERN_EN.md @@ -0,0 +1,60 @@ +# Reply to dhaern - Follow-up on the Latest Review + +Thank you for re-checking the latest version and for the continued precise analysis. Let me address your two remaining concerns directly. + +--- + +### 1. `enable_tool_output_trimming` — Not a regression; behavior change is intentional + +The trimming logic is present and functional. Here is what it does now versus before. + +**Current behavior (`_trim_native_tool_outputs`, lines 835–945):** +- Iterates over atomic groups via `_get_atomic_groups`. +- Identifies valid chains: `assistant(tool_calls)` → `tool` → [optional assistant follow-up]. +- If the combined character count of the `tool` role messages in a chain exceeds **1,200 characters**, it collapses *the tool messages themselves* to a localized `[Content collapsed]` placeholder and injects a `metadata.is_trimmed` flag. +- Separately walks assistant messages containing `
` HTML blocks and collapses oversized `result` attributes in the same way. +- The function is called at inlet when `enable_tool_output_trimming=True` and `function_calling=native`. + +**What is different from the previous version:** +The old approach rewrote the *assistant follow-up* message to keep only the "final answer". The new approach collapses the *tool response content* itself. Both reduce context size, but the new approach preserves the structural integrity of the tool-calling chain (which the atomic grouping work in this release depends on). + +The docstring in the plugin header also contained a stale description ("extract only the final answer") that contradicted the actual behavior. That has been corrected in the latest commit to accurately say "collapses oversized native tool outputs to a short placeholder." + +If you are looking for the specific "keep only the final answer" behavior from the old version, that path was intentionally removed because it conflicted with the atomic-group integrity guarantees introduced in this release. The current collapse approach is a safe replacement. + +--- + +### 2. `compressed_message_count` — The fix is real; here is the coordinate trace + +The concern about "recalculating from the already-modified view" is understandable given the previous architecture. Here is exactly why the current code does not have that problem. + +**Key change in `outlet`:** +```python +db_messages = self._load_full_chat_messages(chat_id) +messages_to_unfold = db_messages if (db_messages and len(db_messages) >= len(messages)) else messages +summary_messages = self._unfold_messages(messages_to_unfold) +target_compressed_count = self._calculate_target_compressed_count(summary_messages) +``` + +`_load_full_chat_messages` fetches the raw persisted history from the OpenWebUI database. Because the synthetic summary message (injected during inlet rendering) is **never written back to the database**, `summary_messages` from the DB path is always the clean, unmodified original history — no summary marker, no coordinate inflation. + +`_calculate_target_compressed_count` called on this clean list simply computes: +``` +original_count = len(db_messages) +raw_target = original_count - keep_last +target = atomic_align(raw_target) # still in original-history coordinates +``` + +This `target_compressed_count` value is then passed into `_generate_summary_async` unchanged. Inside the async task, the same `db_messages` list is sliced to `messages[start:target]` to build `middle_messages`. After generation (with potential atomic truncation from the end), the saved value is: +```python +saved_compressed_count = start_index + len(middle_messages) +``` +This is the exact position in the original DB message list up to which the new summary actually covers — not a target, not an estimate from a different view. + +**The fallback path (DB unavailable)** uses the inlet-rendered body messages. In that case `_get_summary_view_state` reads `covered_until` from the injected summary marker (which was written as the atomically-aligned `start_index`), so `base_progress` is already in original-history coordinates. The calculation naturally continues from there without mixing views. + +In short: the field now has a single, consistent meaning throughout the entire call chain — the index (in the original, persisted message list) up to which the current summary text actually covers. + +--- + +Thank you again for the rigorous review. The two points you flagged after the last release are now addressed, and the documentation stale description has been corrected. Please do let us know if you spot anything else. diff --git a/plugins/filters/async-context-compression/README.md b/plugins/filters/async-context-compression/README.md index 361ec14..b3abf02 100644 --- a/plugins/filters/async-context-compression/README.md +++ b/plugins/filters/async-context-compression/README.md @@ -1,15 +1,13 @@ # Async Context Compression Filter -**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.4.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT +**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.4.1 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent. -## What's new in 1.4.0 +## What's new in 1.4.1 -- **Atomic Message Grouping**: Introduced structure-aware grouping for `assistant-tool-tool-assistant` chains to prevent "No tool call found" errors. -- **Tail Boundary Alignment**: Implemented automatic correction for truncation points to ensure they don't fall inside a tool-calling sequence. -- **Chat Session Locking**: Added a session-based lock to prevent multiple concurrent summary tasks for the same chat ID. -- **Enhanced Traceability**: Improved summary formatting to include message IDs, names, and metadata for better context tracking. +- **Reverse-Unfolding Mechanism**: Accurately reconstructs the expanded native tool-calling sequence during the outlet phase to permanently fix coordinate drift and missing summaries for long tool-based conversations. +- **Safer Tool Trimming**: Refactored `enable_tool_output_trimming` to strictly use atomic block groups for safe trimming, completely preventing JSON payload corruption. --- diff --git a/plugins/filters/async-context-compression/README_CN.md b/plugins/filters/async-context-compression/README_CN.md index 9a1ca68..98794c8 100644 --- a/plugins/filters/async-context-compression/README_CN.md +++ b/plugins/filters/async-context-compression/README_CN.md @@ -1,17 +1,15 @@ # 异步上下文压缩过滤器 -**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.4.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT +**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.4.1 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT > **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。 本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。 -## 1.4.0 版本更新 +## 1.4.1 版本更新 -- **原子消息组 (Atomic Grouping)**: 引入结构感知的消息分组逻辑,确保工具调用链被整体保留或移除,彻底解决 "No tool call found" 错误。 -- **尾部边界自动对齐**: 实现了截断点的自动修正逻辑,确保历史上下文截断不会落在工具调用序列中间。 -- **会话级异步锁**: 增加了基于 `chat_id` 的后台任务锁,防止同一会话并发触发多个总结任务。 -- **元数据溯源增强**: 优化了总结输入格式,在总结中保留了消息 ID、参与者名称及关键元数据,提升上下文可追踪性。 +- **逆向展开机制**: 引入 `_unfold_messages` 机制以在 `outlet` 阶段精确对齐坐标系,彻底解决了由于前端视图折叠导致长轮次工具调用对话出现进度漂移或跳过生成摘要的问题。 +- **更安全的工具内容裁剪**: 重构了 `enable_tool_output_trimming`,现在严格使用原子级分组进行安全的原生工具内容裁剪,替代了激进的正则表达式匹配,防止 JSON 载荷损坏。 --- diff --git a/plugins/filters/async-context-compression/async_context_compression.py b/plugins/filters/async-context-compression/async_context_compression.py index b66ccc9..51bf7a2 100644 --- a/plugins/filters/async-context-compression/async_context_compression.py +++ b/plugins/filters/async-context-compression/async_context_compression.py @@ -5,17 +5,16 @@ author: Fu-Jie author_url: https://github.com/Fu-Jie/openwebui-extensions funding_url: https://github.com/open-webui description: Reduces token consumption in long conversations while maintaining coherence through intelligent summarization and message compression. -version: 1.4.0 +version: 1.4.1 openwebui_id: b1655bc8-6de9-4cad-8cb5-a6f7829a02ce license: MIT ═══════════════════════════════════════════════════════════════════════════════ -📌 What's new in 1.3.0 +📌 What's new in 1.4.1 ═══════════════════════════════════════════════════════════════════════════════ - ✅ Smart Status Display: Added `token_usage_status_threshold` valve (default 80%) to control when token usage status is shown, reducing unnecessary notifications. - ✅ Copilot SDK Integration: Automatically detects and skips compression for copilot_sdk based models to prevent conflicts. - ✅ Improved User Experience: Status messages now only appear when token usage exceeds the configured threshold, keeping the interface cleaner. + ✅ Reverse-Unfolding Mechanism: Accurately reconstructs the expanded native tool-calling sequence during the outlet phase to permanently fix coordinate drift and missing summaries for long tool-based conversations. + ✅ Safer Tool Trimming: Refactored `enable_tool_output_trimming` to strictly use atomic block groups for safe trimming, completely preventing JSON payload corruption. ═══════════════════════════════════════════════════════════════════════════════ 📌 Overview @@ -122,7 +121,7 @@ model_thresholds enable_tool_output_trimming Default: false - Description: When enabled and `function_calling: "native"` is active, trims verbose tool outputs to extract only the final answer. + Description: When enabled and `function_calling: "native"` is active, collapses oversized native tool outputs (role="tool" messages exceeding ~1200 chars) to a short placeholder, reducing context size while preserving tool-call chain structure. keep_first Default: 1 @@ -268,11 +267,14 @@ import hashlib import time import contextlib import logging +from copy import deepcopy from functools import lru_cache # Setup logger logger = logging.getLogger(__name__) +SUMMARY_METADATA_SOURCE = "async_context_compression" + # Open WebUI built-in imports from open_webui.utils.chat import generate_chat_completion from open_webui.models.users import Users @@ -280,6 +282,16 @@ from open_webui.models.models import Models from fastapi.requests import Request from open_webui.main import app as webui_app +try: + from open_webui.models.chats import Chats +except ModuleNotFoundError: # pragma: no cover - filter runs inside OpenWebUI + Chats = None + +try: + from open_webui.models.chat_messages import ChatMessages +except ModuleNotFoundError: # pragma: no cover - filter runs inside OpenWebUI + ChatMessages = None + # Open WebUI internal database (re-use shared connection) try: from open_webui.internal import db as owui_db @@ -612,6 +624,325 @@ class Filter: self._chat_locks[chat_id] = asyncio.Lock() return self._chat_locks[chat_id] + def _is_summary_message(self, message: Dict[str, Any]) -> bool: + """Return True when the message is this filter's injected summary marker.""" + metadata = message.get("metadata", {}) + if not isinstance(metadata, dict): + return False + return bool( + metadata.get("is_summary") + and metadata.get("source") == SUMMARY_METADATA_SOURCE + ) + + def _build_summary_message( + self, summary_text: str, lang: str, covered_until: int + ) -> Dict[str, Any]: + """Create a summary marker message with original-history progress metadata.""" + summary_content = ( + self._get_translation(lang, "summary_prompt_prefix") + + f"{summary_text}" + + self._get_translation(lang, "summary_prompt_suffix") + ) + return { + "role": "assistant", + "content": summary_content, + "metadata": { + "is_summary": True, + "source": SUMMARY_METADATA_SOURCE, + "covered_until": max(0, int(covered_until)), + }, + } + + def _get_summary_view_state(self, messages: List[Dict]) -> Dict[str, Optional[int]]: + """Inspect the current message view and recover summary marker metadata.""" + for index, message in enumerate(messages): + if not self._is_summary_message(message): + continue + + metadata = message.get("metadata", {}) + covered_until = metadata.get("covered_until", 0) + if not isinstance(covered_until, int) or covered_until < 0: + covered_until = 0 + + return { + "summary_index": index, + "base_progress": covered_until, + } + + return {"summary_index": None, "base_progress": 0} + + def _get_original_history_count(self, messages: List[Dict]) -> int: + """Map the current visible message list back to original-history size.""" + summary_state = self._get_summary_view_state(messages) + summary_index = summary_state["summary_index"] + base_progress = summary_state["base_progress"] or 0 + + if summary_index is None: + return len(messages) + + return base_progress + max(0, len(messages) - summary_index - 1) + + def _calculate_target_compressed_count(self, messages: List[Dict]) -> int: + """Calculate the next summary boundary in original-history coordinates.""" + summary_state = self._get_summary_view_state(messages) + summary_index = summary_state["summary_index"] + base_progress = summary_state["base_progress"] or 0 + + original_count = self._get_original_history_count(messages) + raw_target = max(base_progress, original_count - self.valves.keep_last) + + if summary_index is None: + protected_prefix = self._get_effective_keep_first(messages) + return self._align_tail_start_to_atomic_boundary( + messages, raw_target, protected_prefix + ) + + if raw_target <= base_progress: + return base_progress + + tail_messages = messages[summary_index + 1 :] + local_target = raw_target - base_progress + aligned_local_target = self._align_tail_start_to_atomic_boundary( + tail_messages, local_target, 0 + ) + return base_progress + aligned_local_target + + def _reconstruct_active_history_branch( + self, history_messages: Any, current_id: Optional[str] + ) -> List[Dict[str, Any]]: + """Rebuild the active chat branch from OpenWebUI `history.messages` data.""" + if not isinstance(history_messages, dict) or not history_messages: + return [] + + if isinstance(current_id, str) and current_id in history_messages: + ordered_messages: List[Dict[str, Any]] = [] + visited = set() + cursor = current_id + + while isinstance(cursor, str) and cursor and cursor not in visited: + visited.add(cursor) + node = history_messages.get(cursor) + if not isinstance(node, dict): + break + + ordered_messages.append(deepcopy(node)) + cursor = node.get("parentId") or node.get("parent_id") + + if ordered_messages: + ordered_messages.reverse() + return ordered_messages + + sortable_messages = [] + for index, node in enumerate(history_messages.values()): + if not isinstance(node, dict): + continue + + timestamp = node.get("timestamp") + if not isinstance(timestamp, (int, float)): + timestamp = node.get("created_at") + if not isinstance(timestamp, (int, float)): + timestamp = index + + sortable_messages.append((float(timestamp), index, deepcopy(node))) + + sortable_messages.sort(key=lambda item: (item[0], item[1])) + return [message for _, _, message in sortable_messages] + + def _load_full_chat_messages(self, chat_id: str) -> List[Dict[str, Any]]: + """Load the full persisted chat history for summary decisions when available.""" + if not chat_id or Chats is None: + return [] + + try: + chat_record = Chats.get_chat_by_id(chat_id) + except Exception as exc: + logger.warning(f"[Chat Load] Failed to fetch chat {chat_id}: {exc}") + return [] + + chat_payload = getattr(chat_record, "chat", None) + if not isinstance(chat_payload, dict): + return [] + + direct_messages = chat_payload.get("messages") + if isinstance(direct_messages, list) and direct_messages: + return deepcopy(direct_messages) + + history = chat_payload.get("history") + if not isinstance(history, dict): + return [] + + history_messages = history.get("messages") + if not isinstance(history_messages, dict) or not history_messages: + return [] + + current_id = history.get("currentId") or history.get("current_id") + return self._reconstruct_active_history_branch(history_messages, current_id) + + def _shorten_tool_call_id(self, tool_call_id: str, max_length: int = 40) -> str: + """Keep tool call IDs within provider limits while staying deterministic.""" + if not isinstance(tool_call_id, str): + return tool_call_id + + cleaned_id = tool_call_id.strip() + if len(cleaned_id) <= max_length: + return cleaned_id + + hash_suffix = hashlib.sha1(cleaned_id.encode("utf-8")).hexdigest()[:8] + prefix_length = max(0, max_length - len(hash_suffix) - 1) + return f"{cleaned_id[:prefix_length]}_{hash_suffix}" + + def _normalize_native_tool_call_ids(self, messages: List[Dict]) -> int: + """Normalize overlong native tool-call IDs and keep assistant/tool links aligned.""" + rewritten_ids: Dict[str, str] = {} + + for message in messages: + tool_calls = message.get("tool_calls") + if not isinstance(tool_calls, list): + continue + + for tool_call in tool_calls: + if not isinstance(tool_call, dict): + continue + + original_id = tool_call.get("id") + if not isinstance(original_id, str) or not original_id.strip(): + continue + + normalized_id = rewritten_ids.get(original_id) + if normalized_id is None: + normalized_id = self._shorten_tool_call_id(original_id) + rewritten_ids[original_id] = normalized_id + + tool_call["id"] = normalized_id + + if not rewritten_ids: + return 0 + + normalized_count = 0 + for message in messages: + tool_call_id = message.get("tool_call_id") + if not isinstance(tool_call_id, str): + continue + + normalized_id = rewritten_ids.get(tool_call_id) + if normalized_id and normalized_id != tool_call_id: + message["tool_call_id"] = normalized_id + normalized_count += 1 + + return sum(1 for old_id, new_id in rewritten_ids.items() if old_id != new_id) + + def _trim_native_tool_outputs(self, messages: List[Dict], lang: str) -> int: + """Collapse verbose native tool outputs while preserving tool-call structure.""" + trimmed_count = 0 + tool_trim_threshold_chars = 1200 + collapsed_text = self._get_translation(lang, "content_collapsed").strip() + + for group in self._get_atomic_groups(messages): + if len(group) < 2: + continue + + grouped_messages = [messages[index] for index in group] + first_message = grouped_messages[0] + trailing_messages = grouped_messages[1:] + + if not ( + first_message.get("role") == "assistant" + and first_message.get("tool_calls") + and trailing_messages + ): + continue + + last_message = grouped_messages[-1] + assistant_followup = None + tool_messages = trailing_messages + + if ( + len(grouped_messages) >= 3 + and last_message.get("role") == "assistant" + and all(msg.get("role") == "tool" for msg in grouped_messages[1:-1]) + ): + assistant_followup = last_message + tool_messages = grouped_messages[1:-1] + elif not all(msg.get("role") == "tool" for msg in trailing_messages): + continue + + tool_chars = sum(len(str(msg.get("content", ""))) for msg in tool_messages) + if tool_chars < tool_trim_threshold_chars: + continue + + for tool_message in tool_messages: + metadata = tool_message.get("metadata", {}) + if not isinstance(metadata, dict): + metadata = {} + metadata["is_trimmed"] = True + metadata["trimmed_by"] = SUMMARY_METADATA_SOURCE + tool_message["metadata"] = metadata + tool_message["content"] = collapsed_text + trimmed_count += 1 + + if assistant_followup is not None: + final_content = assistant_followup.get("content", "") + if isinstance(final_content, str) and final_content.strip(): + assistant_metadata = assistant_followup.get("metadata", {}) + if not isinstance(assistant_metadata, dict): + assistant_metadata = {} + if not assistant_metadata.get("tool_outputs_trimmed"): + assistant_followup["content"] = self._get_translation( + lang, "tool_trimmed", content=final_content + ) + assistant_metadata["tool_outputs_trimmed"] = True + assistant_metadata["trimmed_by"] = SUMMARY_METADATA_SOURCE + assistant_followup["metadata"] = assistant_metadata + + for message in messages: + content = message.get("content", "") + if ( + not isinstance(content, str) + or '
str: + nonlocal trimmed_blocks + block = match.group(0) + result_match = re.search(r'result="([^"]*)"', block) + + if not result_match: + return block + + if len(result_match.group(1)) < tool_trim_threshold_chars: + return block + + trimmed_blocks += 1 + return re.sub( + r'result="([^"]*)"', + f'result=""{collapsed_text}""', + block, + count=1, + ) + + new_content = re.sub( + r'
', + _replace_tool_block, + content, + ) + + if trimmed_blocks <= 0: + continue + + metadata = message.get("metadata", {}) + if not isinstance(metadata, dict): + metadata = {} + metadata["tool_outputs_trimmed"] = True + metadata["trimmed_by"] = SUMMARY_METADATA_SOURCE + message["metadata"] = metadata + message["content"] = new_content + trimmed_count += trimmed_blocks + + return trimmed_count + def _get_atomic_groups(self, messages: List[Dict]) -> List[List[int]]: """ Groups message indices into atomic units that must be kept or dropped together. @@ -724,17 +1055,21 @@ class Filter: if __event_call__: try: js_code = """ - return ( - document.documentElement.lang || - localStorage.getItem('locale') || - localStorage.getItem('language') || - navigator.language || - 'en-US' - ); + try { + return ( + document.documentElement.lang || + localStorage.getItem('locale') || + localStorage.getItem('language') || + navigator.language || + 'en-US' + ); + } catch (e) { + return 'en-US'; + } """ frontend_lang = await asyncio.wait_for( __event_call__({"type": "execute", "data": {"code": js_code}}), - timeout=1.0, + timeout=2.0, ) if frontend_lang and isinstance(frontend_lang, str): user_language = frontend_lang @@ -1133,6 +1468,256 @@ class Filter: "message_id": str(message_id).strip(), } + def _infer_native_function_calling_from_messages(self, messages: Any) -> bool: + """Infer native function-calling mode from tool-shaped messages.""" + if not isinstance(messages, list): + return False + + for message in messages: + if not isinstance(message, dict): + continue + + tool_calls = message.get("tool_calls") + if isinstance(tool_calls, list) and tool_calls: + return True + + if message.get("role") == "tool": + return True + + content = message.get("content", "") + if isinstance(content, str) and '
List[Dict[str, Any]]: + """Build a compact structural summary of recent messages for debugging.""" + if not isinstance(messages, list): + return [] + + summary = [] + for index, message in enumerate(messages[:limit]): + if not isinstance(message, dict): + summary.append( + { + "index": index, + "type": type(message).__name__, + } + ) + continue + + content = message.get("content", "") + tool_calls = message.get("tool_calls") + metadata = message.get("metadata", {}) + + entry = { + "index": index, + "role": message.get("role", "unknown"), + "has_tool_calls": bool(isinstance(tool_calls, list) and tool_calls), + "tool_call_count": len(tool_calls) + if isinstance(tool_calls, list) + else 0, + "tool_call_id_lengths": [ + len(str(tc.get("id", ""))) + for tc in tool_calls[:3] + if isinstance(tc, dict) + ] + if isinstance(tool_calls, list) + else [], + "has_tool_call_id": isinstance(message.get("tool_call_id"), str), + "tool_call_id_length": len(str(message.get("tool_call_id", ""))) + if isinstance(message.get("tool_call_id"), str) + else 0, + "content_type": type(content).__name__, + "content_length": len(content) if isinstance(content, str) else 0, + "has_tool_details_block": isinstance(content, str) + and '
Dict[str, Any]: + """Collect a structural snapshot of the request for tool-calling diagnosis.""" + if not isinstance(body, dict): + return {"body_type": type(body).__name__} + + messages = body.get("messages", []) + metadata = body.get("metadata", {}) + params = body.get("params", {}) + + role_counts: Dict[str, int] = {} + tool_detail_blocks = 0 + tool_role_indices = [] + assistant_tool_call_indices = [] + + if isinstance(messages, list): + for index, message in enumerate(messages): + if not isinstance(message, dict): + continue + + role = str(message.get("role", "unknown")) + role_counts[role] = role_counts.get(role, 0) + 1 + + if role == "tool": + tool_role_indices.append(index) + + tool_calls = message.get("tool_calls") + if isinstance(tool_calls, list) and tool_calls: + assistant_tool_call_indices.append(index) + + content = message.get("content", "") + if isinstance(content, str) and '
Dict[str, Any]: + """Collect compact summary-boundary diagnostics for a message list.""" + if not isinstance(messages, list): + return {"messages_type": type(messages).__name__} + + summary_state = self._get_summary_view_state(messages) + sample = [] + for index, message in enumerate(messages[:4]): + if not isinstance(message, dict): + sample.append({"index": index, "type": type(message).__name__}) + continue + + content = message.get("content", "") + sample.append( + { + "index": index, + "role": message.get("role", "unknown"), + "id": message.get("id", ""), + "parentId": message.get("parentId") or message.get("parent_id"), + "tool_call_id": message.get("tool_call_id", ""), + "tool_call_count": len(message.get("tool_calls", [])) + if isinstance(message.get("tool_calls"), list) + else 0, + "is_summary": self._is_summary_message(message), + "content_length": len(content) if isinstance(content, str) else 0, + } + ) + + tail_sample = [] + start_index = max(0, len(messages) - 3) + for index, message in enumerate(messages[start_index:], start=start_index): + if not isinstance(message, dict): + tail_sample.append({"index": index, "type": type(message).__name__}) + continue + + content = message.get("content", "") + tail_sample.append( + { + "index": index, + "role": message.get("role", "unknown"), + "id": message.get("id", ""), + "parentId": message.get("parentId") or message.get("parent_id"), + "tool_call_id": message.get("tool_call_id", ""), + "tool_call_count": len(message.get("tool_calls", [])) + if isinstance(message.get("tool_calls"), list) + else 0, + "is_summary": self._is_summary_message(message), + "content_length": len(content) if isinstance(content, str) else 0, + } + ) + + return { + "message_count": len(messages), + "summary_state": summary_state, + "original_history_count": self._get_original_history_count(messages), + "target_compressed_count": self._calculate_target_compressed_count(messages), + "effective_keep_first": self._get_effective_keep_first(messages), + "head_sample": sample, + "tail_sample": tail_sample, + } + + def _unfold_messages(self, messages: Any) -> List[Dict[str, Any]]: + """ + Reverse-expand compact UI messages back into their native tool-calling sequence + by parsing the hidden 'output' dictionary, identical to what OpenWebUI does + in the inlet phase (middleware.py:process_messages_with_output). + """ + if not isinstance(messages, list): + return messages + + unfolded = [] + for msg in messages: + if not isinstance(msg, dict): + unfolded.append(msg) + continue + + # If it's an assistant message with the hidden 'output' field, unfold it + if msg.get("role") == "assistant" and isinstance(msg.get("output"), list) and msg.get("output"): + try: + from open_webui.utils.misc import convert_output_to_messages + expanded = convert_output_to_messages(msg["output"], raw=True) + if expanded: + unfolded.extend(expanded) + continue + except ImportError: + pass # Fallback if for some reason the internal import fails + + # Clean message (strip 'output' field just like inlet does) + clean_msg = {k: v for k, v in msg.items() if k != "output"} + unfolded.append(clean_msg) + + return unfolded + + def _get_function_calling_mode(self, body: dict) -> str: + """Read function-calling mode from all known OpenWebUI payload locations.""" + metadata = body.get("metadata", {}) if isinstance(body, dict) else {} + params = body.get("params", {}) if isinstance(body, dict) else {} + messages = body.get("messages", []) if isinstance(body, dict) else [] + + if isinstance(metadata, dict): + mode = metadata.get("function_calling") + if isinstance(mode, str) and mode.strip(): + return mode.strip() + + if isinstance(params, dict): + mode = params.get("function_calling") + if isinstance(mode, str) and mode.strip(): + return mode.strip() + + if self._infer_native_function_calling_from_messages(messages): + return "native" + + return "" + async def _emit_debug_log( self, __event_call__, @@ -1166,26 +1751,33 @@ class Filter: # Construct JS code js_code = f""" (async function() {{ - console.group("🗜️ Async Context Compression Debug"); - console.log("Chat ID:", {json.dumps(chat_id)}); - console.log("Messages:", {original_count} + " -> " + {compressed_count}); - console.log("Compression Ratio:", {json.dumps(log_data['ratio'])}); - console.log("Summary Length:", {summary_length} + " chars"); - console.log("Configuration:", {{ - "Keep First": {kept_first}, - "Keep Last": {kept_last} - }}); - console.groupEnd(); + try {{ + console.group("🗜️ Async Context Compression Debug"); + console.log("Chat ID:", {json.dumps(chat_id)}); + console.log("Messages:", {original_count} + " -> " + {compressed_count}); + console.log("Compression Ratio:", {json.dumps(log_data['ratio'])}); + console.log("Summary Length:", {summary_length} + " chars"); + console.log("Configuration:", {{ + "Keep First": {kept_first}, + "Keep Last": {kept_last} + }}); + console.groupEnd(); + return true; + }} catch (e) {{ + console.error("[Compression] Failed to emit summary debug log", e); + return false; + }} }})(); """ - asyncio.create_task( + await asyncio.wait_for( __event_call__( { "type": "execute", "data": {"code": js_code}, } - ) + ), + timeout=2.0, ) except Exception as e: logger.error(f"Error emitting debug log: {e}") @@ -1225,11 +1817,24 @@ class Filter: safe_message = clean_message.replace('"', '\\"').replace("\n", "\\n") js_code = f""" - console.log("%c[Compression] {safe_message}", "{css}"); + try {{ + console.log("%c[Compression] {safe_message}", "{css}"); + return true; + }} catch (e) {{ + console.error("[Compression] Failed to emit console log", e); + return false; + }} """ - asyncio.create_task( - event_call({"type": "execute", "data": {"code": js_code}}) + await asyncio.wait_for( + event_call({"type": "execute", "data": {"code": js_code}}), + timeout=2.0, ) + except ValueError as ve: + if "broadcast" in str(ve).lower(): + logger.debug("Cannot broadcast to frontend without explicit room; suppressing further frontend logs in this session.") + self.valves.show_debug_log = False + else: + logger.error(f"Failed to process log to frontend: ValueError: {ve}") except Exception as e: logger.error( f"Failed to process log to frontend: {type(e).__name__}: {e}" @@ -1292,37 +1897,72 @@ class Filter: Compression Strategy: Only responsible for injecting existing summaries, no Token calculation. """ - # Check if compression should be skipped (e.g., for copilot_sdk) if self._should_skip_compression(body, __model__): if self.valves.debug_mode: logger.info( "[Inlet] Skipping compression: copilot_sdk detected in base model" ) - if self.valves.show_debug_log and __event_call__: - await self._log( - "[Inlet] ⏭️ Skipping compression: copilot_sdk detected", - event_call=__event_call__, - ) return body messages = body.get("messages", []) + user_ctx = await self._get_user_context(__user__, __event_call__) + lang = user_ctx["user_language"] + + if self.valves.show_debug_log and __event_call__: + debug_snapshot = self._build_native_tool_debug_snapshot(body) + await self._log( + "[Inlet] 🧩 Request structure snapshot: " + + json.dumps(debug_snapshot, ensure_ascii=False), + event_call=__event_call__, + ) + + normalized_tool_call_count = self._normalize_native_tool_call_ids(messages) + if ( + normalized_tool_call_count > 0 + and self.valves.show_debug_log + and __event_call__ + ): + await self._log( + f"[Inlet] 🪪 Normalized {normalized_tool_call_count} overlong tool call ID(s).", + event_call=__event_call__, + ) # --- Native Tool Output Trimming (Opt-in, only for native function calling) --- - metadata = body.get("metadata", {}) - is_native_func_calling = metadata.get("function_calling") == "native" + function_calling_mode = self._get_function_calling_mode(body) + is_native_func_calling = function_calling_mode == "native" + + if self.valves.show_debug_log and __event_call__: + trimming_state = ( + "enabled" if self.valves.enable_tool_output_trimming else "disabled" + ) + await self._log( + "[Inlet] ✂️ Tool trimming check: " + f"state={trimming_state}, function_calling={function_calling_mode or 'unset'}, " + f"message_count={len(messages)}", + event_call=__event_call__, + ) if self.valves.enable_tool_output_trimming and is_native_func_calling: - trimmed_count = 0 - - for msg in messages: - content = msg.get("content", "") - if not isinstance(content, str): - continue - if trimmed_count > 0 and self.valves.show_debug_log and __event_call__: + trimmed_count = self._trim_native_tool_outputs(messages, lang) + if self.valves.show_debug_log and __event_call__: await self._log( - f"[Inlet] ✂️ Trimmed {trimmed_count} tool output message(s).", + ( + f"[Inlet] ✂️ Trimmed {trimmed_count} tool output message(s)." + if trimmed_count > 0 + else "[Inlet] ✂️ Tool trimming checked, but no oversized native tool outputs were found." + ), event_call=__event_call__, ) + elif self.valves.show_debug_log and __event_call__: + skip_reason = ( + "tool trimming disabled" + if not self.valves.enable_tool_output_trimming + else f"function_calling={function_calling_mode or 'unset'}" + ) + await self._log( + f"[Inlet] ✂️ Tool trimming skipped: {skip_reason}.", + event_call=__event_call__, + ) chat_ctx = self._get_chat_context(body, __metadata__) chat_id = chat_ctx["chat_id"] @@ -1502,13 +2142,9 @@ class Filter: event_call=__event_call__, ) - # Record the target compression progress for the original messages, for use in outlet - # Target is to compress up to the (total - keep_last) message - target_compressed_count = max(0, len(messages) - self.valves.keep_last) - - # Get user context for i18n - user_ctx = await self._get_user_context(__user__, __event_call__) - lang = user_ctx["user_language"] + # Log the aligned compression boundary using the same original-history + # coordinate mapping as outlet/async summary generation. + target_compressed_count = self._calculate_target_compressed_count(messages) await self._log( f"[Inlet] Recorded target compression progress: {target_compressed_count}", @@ -1537,21 +2173,21 @@ class Filter: if effective_keep_first > 0: head_messages = messages[:effective_keep_first] - # 2. Summary message (Inserted as Assistant message) - summary_content = ( - self._get_translation(lang, "summary_prompt_prefix") - + f"{summary_record.summary}" - + self._get_translation(lang, "summary_prompt_suffix") - ) - summary_msg = {"role": "assistant", "content": summary_content} - - # 3. Tail messages (Tail) - All messages starting from the last compression point. + # 2. Tail messages (Tail) - All messages starting from the last compression point. # Align legacy/raw progress to an atomic boundary so old summary rows do not # reintroduce orphaned tool messages into the retained tail. raw_start_index = max(compressed_count, effective_keep_first) start_index = self._align_tail_start_to_atomic_boundary( messages, raw_start_index, effective_keep_first ) + + # 3. Summary message (Inserted as Assistant message) + summary_msg = self._build_summary_message( + summary_record.summary, + lang, + start_index, + ) + tail_messages = messages[start_index:] if self.valves.show_debug_log and __event_call__: @@ -1657,6 +2293,7 @@ class Filter: final_messages = candidate_messages # Calculate detailed token stats for logging + summary_content = summary_msg.get("content", "") if total_tokens == estimated_tokens: system_tokens = ( len(system_prompt_msg.get("content", "")) // 4 @@ -1890,14 +2527,50 @@ class Filter: model = body.get("model") or "" messages = body.get("messages", []) + if self.valves.show_debug_log and __event_call__: + outlet_snapshot = self._build_native_tool_debug_snapshot(body) + outlet_progress = self._build_summary_progress_snapshot(messages) + await self._log( + "[Outlet] 🧩 Body structure snapshot: " + + json.dumps(outlet_snapshot, ensure_ascii=False), + event_call=__event_call__, + ) + await self._log( + "[Outlet] 📐 Body summary-progress snapshot: " + + json.dumps(outlet_progress, ensure_ascii=False), + event_call=__event_call__, + ) + + # Unfold compact tool messages to align with inlet's exact coordinate system + # In the outlet phase, the frontend payload often lacks the hidden 'output' field. + # We try to load the full, raw history from the database first. + db_messages = self._load_full_chat_messages(chat_id) + messages_to_unfold = db_messages if (db_messages and len(db_messages) >= len(messages)) else messages + + summary_messages = self._unfold_messages(messages_to_unfold) + message_source = "outlet-db-unfolded" if db_messages and len(summary_messages) != len(messages) else "outlet-body-unfolded" if len(summary_messages) != len(messages) else "outlet-body" + + if self.valves.show_debug_log and __event_call__: + source_progress = self._build_summary_progress_snapshot(summary_messages) + await self._log( + f"[Outlet] 📚 Summary source messages: {message_source} ({len(summary_messages)} msgs, body carried {len(messages)})", + event_call=__event_call__, + ) + await self._log( + "[Outlet] 📐 Summary source progress snapshot: " + + json.dumps(source_progress, ensure_ascii=False), + event_call=__event_call__, + ) + # Calculate target compression progress directly, then align it to an atomic # boundary so the saved summary never cuts through a tool-calling block. - effective_keep_first = self._get_effective_keep_first(messages) - raw_target_compressed_count = max(0, len(messages) - self.valves.keep_last) - target_compressed_count = self._align_tail_start_to_atomic_boundary( - messages, raw_target_compressed_count, effective_keep_first + target_compressed_count = self._calculate_target_compressed_count( + summary_messages ) + summary_body = dict(body) + summary_body["messages"] = summary_messages + # Process Token calculation and summary generation asynchronously in the background # Use a lock to prevent multiple concurrent summary tasks for the same chat chat_lock = self._get_chat_lock(chat_id) @@ -1914,7 +2587,7 @@ class Filter: chat_lock, chat_id, model, - body, + summary_body, __user__, target_compressed_count, lang, @@ -2098,38 +2771,43 @@ class Filter: """ Generates summary asynchronously (runs in background, does not block response). Logic: - 1. Extract middle messages (remove keep_first and keep_last). - 2. Check Token limit, if exceeding max_context_tokens, remove from the head of middle messages. - 3. Generate summary for the remaining middle messages. + 1. Extract the visible message slice that maps to the next original-history boundary. + 2. If the summary model window is smaller than that slice, keep the oldest slice and trim the newest atomic groups. + 3. Generate summary for the remaining messages and save the exact covered boundary. """ try: await self._log( f"\n[🤖 Async Summary Task] Starting...", event_call=__event_call__ ) - # 1. Get target compression progress - # If target_compressed_count is not passed (should not happen with new logic), estimate it + # 1. Get target compression progress in original-history coordinates. if target_compressed_count is None: - target_compressed_count = max(0, len(messages) - self.valves.keep_last) + target_compressed_count = self._calculate_target_compressed_count( + messages + ) await self._log( f"[🤖 Async Summary Task] ⚠️ target_compressed_count is None, estimating: {target_compressed_count}", log_type="warning", event_call=__event_call__, ) - # 2. Determine the range of messages to compress (Middle). - # Use the same aligned boundary used for summary persistence so the tail - # always starts at an atomic-group boundary. - start_index = self._get_effective_keep_first(messages) - if target_compressed_count is None: - raw_end_index = max(0, len(messages) - self.valves.keep_last) - end_index = self._align_tail_start_to_atomic_boundary( - messages, raw_end_index, start_index - ) + # 2. Determine the visible message range that maps to the target original + # compression progress. + summary_state = self._get_summary_view_state(messages) + summary_index = summary_state["summary_index"] + base_progress = summary_state["base_progress"] or 0 + + if summary_index is None: + start_index = self._get_effective_keep_first(messages) + end_index = min(len(messages), target_compressed_count) + protected_prefix = 0 else: - end_index = self._align_tail_start_to_atomic_boundary( - messages, target_compressed_count, start_index + start_index = summary_index + end_index = min( + len(messages), + summary_index + 1 + max(0, target_compressed_count - base_progress), ) + protected_prefix = 1 # Ensure indices are valid if start_index >= end_index: @@ -2208,22 +2886,25 @@ class Filter: event_call=__event_call__, ) - # Remove from the head of middle_messages using atomic groups - # to avoid creating orphaned tool-call/tool-result pairs. + # Trim newest messages first so saved progress still reflects the exact + # original-history boundary actually covered by the summary. removed_tokens = 0 removed_count = 0 - summary_atomic_groups = self._get_atomic_groups(middle_messages) + trimmable_middle = middle_messages[protected_prefix:] + summary_atomic_groups = self._get_atomic_groups(trimmable_middle) while removed_tokens < excess_tokens and len(summary_atomic_groups) > 1: - group_indices = summary_atomic_groups.pop(0) + group_indices = summary_atomic_groups.pop() for _ in range(len(group_indices)): - msg_to_remove = middle_messages.pop(0) + msg_to_remove = trimmable_middle.pop() msg_tokens = self._count_tokens( str(msg_to_remove.get("content", "")) ) removed_tokens += msg_tokens removed_count += 1 + middle_messages = middle_messages[:protected_prefix] + trimmable_middle + await self._log( f"[🤖 Async Summary Task] Removed {removed_count} messages (atomic), totaling {removed_tokens} Tokens", event_call=__event_call__, @@ -2272,6 +2953,13 @@ class Filter: ) return + if summary_index is None: + saved_compressed_count = start_index + len(middle_messages) + else: + saved_compressed_count = base_progress + max( + 0, len(middle_messages) - protected_prefix + ) + # 6. Save new summary await self._log( "[Optimization] Saving summary in a background thread to avoid blocking the event loop.", @@ -2279,7 +2967,7 @@ class Filter: ) await asyncio.to_thread( - self._save_summary, chat_id, new_summary, target_compressed_count + self._save_summary, chat_id, new_summary, saved_compressed_count ) # Send completion status notification @@ -2304,7 +2992,7 @@ class Filter: event_call=__event_call__, ) await self._log( - f"[🤖 Async Summary Task] Progress update: Compressed up to original message {target_compressed_count}", + f"[🤖 Async Summary Task] Progress update: Compressed up to original message {saved_compressed_count}", event_call=__event_call__, ) @@ -2334,41 +3022,32 @@ class Filter: except Exception: pass # Ignore DB errors here, best effort - # 2. Calculate Effective Keep First - last_system_index = -1 - for i, msg in enumerate(messages): - if msg.get("role") == "system": - last_system_index = i - effective_keep_first = max( - self.valves.keep_first, last_system_index + 1 + # 2. Construct Next Context using the saved original-history boundary. + next_summary_msg = self._build_summary_message( + new_summary, lang, saved_compressed_count ) + if summary_index is None: + effective_keep_first = self._get_effective_keep_first(messages) + head_msgs = ( + messages[:effective_keep_first] + if effective_keep_first > 0 + else [] + ) + visible_tail_start = max( + saved_compressed_count, effective_keep_first + ) + else: + head_msgs = messages[:summary_index] + visible_tail_start = ( + summary_index + + 1 + + max(0, saved_compressed_count - base_progress) + ) - # 3. Construct Next Context - # Head - head_msgs = ( - messages[:effective_keep_first] - if effective_keep_first > 0 - else [] - ) - - # Summary - summary_content = ( - self._get_translation(lang, "summary_prompt_prefix") - + f"{new_summary}" - + self._get_translation(lang, "summary_prompt_suffix") - ) - summary_msg = {"role": "assistant", "content": summary_content} - - # Tail (using target_compressed_count which is what we just compressed up to) - # Note: target_compressed_count is the index *after* the last compressed message? - # In _generate_summary_async, target_compressed_count is passed in. - # It represents the number of messages to be covered by summary (excluding keep_last). - # So tail starts at max(target_compressed_count, effective_keep_first). - start_index = max(target_compressed_count, effective_keep_first) - tail_msgs = messages[start_index:] + tail_msgs = messages[visible_tail_start:] # Assemble - next_context = head_msgs + [summary_msg] + tail_msgs + next_context = head_msgs + [next_summary_msg] + tail_msgs # Inject system prompt if needed if system_prompt_msg: @@ -2392,7 +3071,7 @@ class Filter: if self._should_show_status(usage_ratio): status_msg = self._get_translation( lang, - "status_context_summary_updated", + "status_context_usage", tokens=token_count, max_tokens=max_context_tokens, ratio=f"{usage_ratio*100:.1f}", diff --git a/plugins/filters/async-context-compression/test_async_context_compression.py b/plugins/filters/async-context-compression/test_async_context_compression.py new file mode 100644 index 0000000..efa1263 --- /dev/null +++ b/plugins/filters/async-context-compression/test_async_context_compression.py @@ -0,0 +1,461 @@ +import asyncio +import importlib.util +import os +import sys +import types +import unittest + + +PLUGIN_PATH = os.path.join(os.path.dirname(__file__), "async_context_compression.py") +MODULE_NAME = "async_context_compression_under_test" + + +def _ensure_module(name: str) -> types.ModuleType: + module = sys.modules.get(name) + if module is None: + module = types.ModuleType(name) + sys.modules[name] = module + return module + + +def _install_openwebui_stubs() -> None: + _ensure_module("open_webui") + _ensure_module("open_webui.utils") + chat_module = _ensure_module("open_webui.utils.chat") + _ensure_module("open_webui.models") + users_module = _ensure_module("open_webui.models.users") + models_module = _ensure_module("open_webui.models.models") + chats_module = _ensure_module("open_webui.models.chats") + main_module = _ensure_module("open_webui.main") + _ensure_module("fastapi") + fastapi_requests = _ensure_module("fastapi.requests") + + async def generate_chat_completion(*args, **kwargs): + return {} + + class DummyUsers: + pass + + class DummyModels: + @staticmethod + def get_model_by_id(model_id): + return None + + class DummyChats: + @staticmethod + def get_chat_by_id(chat_id): + return None + + class DummyRequest: + pass + + chat_module.generate_chat_completion = generate_chat_completion + users_module.Users = DummyUsers + models_module.Models = DummyModels + chats_module.Chats = DummyChats + main_module.app = object() + fastapi_requests.Request = DummyRequest + + +_install_openwebui_stubs() +spec = importlib.util.spec_from_file_location(MODULE_NAME, PLUGIN_PATH) +module = importlib.util.module_from_spec(spec) +sys.modules[MODULE_NAME] = module +assert spec.loader is not None +spec.loader.exec_module(module) +module.Filter._init_database = lambda self: None + + +class TestAsyncContextCompression(unittest.TestCase): + def setUp(self): + self.filter = module.Filter() + + def test_inlet_logs_tool_trimming_outcome_when_no_oversized_outputs(self): + self.filter.valves.show_debug_log = True + self.filter.valves.enable_tool_output_trimming = True + + logged_messages = [] + + async def fake_log(message, log_type="info", event_call=None): + logged_messages.append(message) + + async def fake_user_context(__user__, __event_call__): + return {"user_language": "en-US"} + + async def fake_event_call(_payload): + return True + + self.filter._log = fake_log + self.filter._get_user_context = fake_user_context + self.filter._get_chat_context = lambda body, metadata=None: { + "chat_id": "", + "message_id": "", + } + self.filter._get_latest_summary = lambda chat_id: None + + body = { + "params": {"function_calling": "native"}, + "messages": [ + { + "role": "assistant", + "tool_calls": [{"id": "call_1", "type": "function"}], + "content": "", + }, + {"role": "tool", "content": "short result"}, + {"role": "assistant", "content": "Final answer"}, + ], + } + + asyncio.run(self.filter.inlet(body, __event_call__=fake_event_call)) + + self.assertTrue( + any("Tool trimming check:" in message for message in logged_messages) + ) + self.assertTrue( + any( + "no oversized native tool outputs were found" in message + for message in logged_messages + ) + ) + + def test_inlet_logs_tool_trimming_skip_reason_when_disabled(self): + self.filter.valves.show_debug_log = True + self.filter.valves.enable_tool_output_trimming = False + + logged_messages = [] + + async def fake_log(message, log_type="info", event_call=None): + logged_messages.append(message) + + async def fake_user_context(__user__, __event_call__): + return {"user_language": "en-US"} + + async def fake_event_call(_payload): + return True + + self.filter._log = fake_log + self.filter._get_user_context = fake_user_context + self.filter._get_chat_context = lambda body, metadata=None: { + "chat_id": "", + "message_id": "", + } + self.filter._get_latest_summary = lambda chat_id: None + + body = {"messages": [], "params": {"function_calling": "native"}} + + asyncio.run(self.filter.inlet(body, __event_call__=fake_event_call)) + + self.assertTrue( + any("Tool trimming skipped: tool trimming disabled" in message for message in logged_messages) + ) + + def test_normalize_native_tool_call_ids_keeps_links_aligned(self): + long_tool_call_id = "call_abcdefghijklmnopqrstuvwxyz_1234567890abcd" + messages = [ + { + "role": "assistant", + "tool_calls": [ + { + "id": long_tool_call_id, + "type": "function", + "function": {"name": "search", "arguments": "{}"}, + } + ], + "content": "", + }, + { + "role": "tool", + "tool_call_id": long_tool_call_id, + "content": "tool result", + }, + ] + + normalized_count = self.filter._normalize_native_tool_call_ids(messages) + + normalized_id = messages[0]["tool_calls"][0]["id"] + self.assertEqual(normalized_count, 1) + self.assertLessEqual(len(normalized_id), 40) + self.assertNotEqual(normalized_id, long_tool_call_id) + self.assertEqual(messages[1]["tool_call_id"], normalized_id) + + def test_trim_native_tool_outputs_restores_real_behavior(self): + messages = [ + { + "role": "assistant", + "tool_calls": [{"id": "call_1", "type": "function"}], + "content": "", + }, + {"role": "tool", "content": "x" * 1600}, + {"role": "assistant", "content": "Final answer"}, + ] + + trimmed_count = self.filter._trim_native_tool_outputs(messages, "en-US") + + self.assertEqual(trimmed_count, 1) + self.assertEqual(messages[1]["content"], "... [Content collapsed] ...") + self.assertTrue(messages[1]["metadata"]["is_trimmed"]) + self.assertTrue(messages[2]["metadata"]["tool_outputs_trimmed"]) + self.assertIn("Final answer", messages[2]["content"]) + self.assertIn("Tool outputs trimmed", messages[2]["content"]) + + def test_trim_native_tool_outputs_supports_embedded_tool_call_cards(self): + messages = [ + { + "role": "assistant", + "content": ( + '
\n' + "Tool Executed\n" + "
\n" + "Final answer" + ), + } + ] + + trimmed_count = self.filter._trim_native_tool_outputs(messages, "en-US") + + self.assertEqual(trimmed_count, 1) + self.assertIn( + 'result=""... [Content collapsed] ...""', + messages[0]["content"], + ) + self.assertNotIn("x" * 200, messages[0]["content"]) + self.assertTrue(messages[0]["metadata"]["tool_outputs_trimmed"]) + + def test_function_calling_mode_reads_params_fallback(self): + self.assertEqual( + self.filter._get_function_calling_mode( + {"params": {"function_calling": "native"}} + ), + "native", + ) + + def test_function_calling_mode_infers_native_from_message_shape(self): + self.assertEqual( + self.filter._get_function_calling_mode( + { + "messages": [ + { + "role": "assistant", + "tool_calls": [{"id": "call_1", "type": "function"}], + "content": "", + }, + {"role": "tool", "content": "tool result"}, + ] + } + ), + "native", + ) + + def test_trim_native_tool_outputs_handles_pending_tool_chain(self): + messages = [ + { + "role": "assistant", + "tool_calls": [{"id": "call_1", "type": "function"}], + "content": "", + }, + {"role": "tool", "content": "x" * 1600}, + ] + + trimmed_count = self.filter._trim_native_tool_outputs(messages, "en-US") + + self.assertEqual(trimmed_count, 1) + self.assertEqual(messages[1]["content"], "... [Content collapsed] ...") + self.assertTrue(messages[1]["metadata"]["is_trimmed"]) + + def test_target_progress_uses_original_history_coordinates(self): + self.filter.valves.keep_last = 2 + summary_message = self.filter._build_summary_message( + "older summary", "en-US", 6 + ) + messages = [ + {"role": "system", "content": "System prompt"}, + summary_message, + {"role": "user", "content": "Question 1"}, + {"role": "assistant", "content": "Answer 1"}, + {"role": "user", "content": "Question 2"}, + {"role": "assistant", "content": "Answer 2"}, + ] + + self.assertEqual(self.filter._get_original_history_count(messages), 10) + self.assertEqual(self.filter._calculate_target_compressed_count(messages), 8) + + def test_load_full_chat_messages_rebuilds_active_history_branch(self): + class FakeChats: + @staticmethod + def get_chat_by_id(chat_id): + return types.SimpleNamespace( + chat={ + "history": { + "currentId": "m3", + "messages": { + "m1": { + "id": "m1", + "role": "user", + "content": "Question", + }, + "m2": { + "id": "m2", + "role": "assistant", + "content": "Tool call", + "tool_calls": [{"id": "call_1"}], + "parentId": "m1", + }, + "m3": { + "id": "m3", + "role": "tool", + "content": "Tool result", + "tool_call_id": "call_1", + "parentId": "m2", + }, + }, + } + } + ) + + original_chats = module.Chats + module.Chats = FakeChats + try: + messages = self.filter._load_full_chat_messages("chat-1") + finally: + module.Chats = original_chats + + self.assertEqual([message["id"] for message in messages], ["m1", "m2", "m3"]) + self.assertEqual(messages[2]["role"], "tool") + + def test_outlet_unfolds_compact_tool_details_view(self): + compact_messages = [ + {"role": "user", "content": "U1"}, + { + "role": "assistant", + "content": ( + '
\n' + "Tool Executed\n" + "
\n" + "Answer 1" + ), + }, + {"role": "user", "content": "U2"}, + { + "role": "assistant", + "content": ( + '
\n' + "Tool Executed\n" + "
\n" + "Answer 2" + ), + }, + ] + + async def fake_user_context(__user__, __event_call__): + return {"user_language": "en-US"} + + async def noop_log(*args, **kwargs): + return None + + create_task_called = False + + def fake_create_task(coro): + nonlocal create_task_called + create_task_called = True + coro.close() + return None + + self.filter._get_user_context = fake_user_context + self.filter._get_chat_context = lambda body, metadata=None: { + "chat_id": "chat-1", + "message_id": "msg-1", + } + self.filter._should_skip_compression = lambda body, model: False + self.filter._log = noop_log + + # Set a low threshold so the task is guaranteed to trigger + self.filter.valves.compression_threshold_tokens = 100 + + original_create_task = asyncio.create_task + asyncio.create_task = fake_create_task + try: + asyncio.run( + self.filter.outlet( + {"model": "test-model", "messages": compact_messages}, + __event_call__=None, + ) + ) + finally: + asyncio.create_task = original_create_task + + self.assertTrue(create_task_called) + + def test_summary_save_progress_matches_truncated_input(self): + self.filter.valves.keep_first = 1 + self.filter.valves.keep_last = 1 + self.filter.valves.summary_model = "fake-summary-model" + self.filter.valves.summary_model_max_context = 0 + + captured = {} + events = [] + + async def mock_emitter(event): + events.append(event) + + async def mock_summary_llm( + previous_summary, + new_conversation_text, + body, + user_data, + __event_call__, + ): + return "new summary" + + def mock_save_summary(chat_id, summary, compressed_count): + captured["chat_id"] = chat_id + captured["summary"] = summary + captured["compressed_count"] = compressed_count + + async def noop_log(*args, **kwargs): + return None + + self.filter._log = noop_log + self.filter._call_summary_llm = mock_summary_llm + self.filter._save_summary = mock_save_summary + self.filter._get_model_thresholds = lambda model_id: { + "max_context_tokens": 3500 + } + self.filter._calculate_messages_tokens = lambda messages: len(messages) * 1000 + self.filter._count_tokens = lambda text: 1000 + + messages = [ + {"role": "system", "content": "System prompt"}, + {"role": "user", "content": "Question 1"}, + {"role": "assistant", "content": "Answer 1"}, + {"role": "user", "content": "Question 2"}, + {"role": "assistant", "content": "Answer 2"}, + {"role": "user", "content": "Question 3"}, + ] + + asyncio.run( + self.filter._generate_summary_async( + messages=messages, + chat_id="chat-1", + body={"model": "fake-summary-model"}, + user_data={"id": "user-1"}, + target_compressed_count=5, + lang="en-US", + __event_emitter__=mock_emitter, + __event_call__=None, + ) + ) + + self.assertEqual(captured["chat_id"], "chat-1") + self.assertEqual(captured["summary"], "new summary") + self.assertEqual(captured["compressed_count"], 2) + self.assertTrue(any(event["type"] == "status" for event in events)) + + +if __name__ == "__main__": + unittest.main() diff --git a/plugins/filters/async-context-compression/v1.4.1.md b/plugins/filters/async-context-compression/v1.4.1.md new file mode 100644 index 0000000..184f8e0 --- /dev/null +++ b/plugins/filters/async-context-compression/v1.4.1.md @@ -0,0 +1,17 @@ +[![](https://img.shields.io/badge/OpenWebUI%20Community-Get%20Plugin-blue?style=for-the-badge)](https://openwebui.com/f/fujie/async_context_compression) + +## Overview + +This release addresses the critical progress coordinate drift issue in OpenWebUI's `outlet` phase, ensuring robust summarization for long tool-calling conversations. + +[View on GitHub](https://github.com/Fu-Jie/openwebui-extensions/blob/main/plugins/filters/async-context-compression/README.md) + +- **New Features** + - **Reverse-Unfolding Mechanism**: Accurately reconstructs the expanded native tool-calling sequence during the outlet phase to permanently fix coordinate drift and missing summaries for long tool-based conversations. + - **Safer Tool Trimming**: Refactored `enable_tool_output_trimming` to strictly use atomic block groups for safe trimming, completely preventing JSON payload corruption. + +- **Bug Fixes** + - Fixed coordinate drift where `compressed_message_count` could lose track due to OpenWebUI's frontend view truncating tool calls. + +- **Related Issues** + - Closes #56