fix(async-context-compression): reverse-unfolding to prevent progress drift

- Reconstruct native tool-calling sequences using reverse-unfolding mechanism - Strictly use atomic grouping for safe native tool output trimming - Add comprehensive test coverage for unfolding logic and issue drafts - READMEs and docs synced (v1.4.1)
2026-03-11 03:54:40 +08:00
parent 3210262296
commit cd95b5ff69
16 changed files with 1540 additions and 152 deletions
--- a/.agent/learnings/async-context-compression-progress-mapping.md
+++ b/.agent/learnings/async-context-compression-progress-mapping.md
@@ -0,0 +1,27 @@
+# Async Context Compression Progress Mapping
+
+> Discovered: 2026-03-10
+
+## Context
+Applies to `plugins/filters/async-context-compression/async_context_compression.py` once the inlet has already replaced early history with a synthetic summary message.
+
+## Finding
+`compressed_message_count` cannot be recalculated from the visible message list length after compression. Once a summary marker is present, the visible list mixes:
+- preserved head messages that are still before the saved boundary
+- one synthetic summary message
+- tail messages that map to original history starting at the saved boundary
+
+## Solution / Pattern
+Store the original-history boundary on the injected summary message metadata, then recover future progress using:
+- `original_count = covered_until + len(messages_after_summary_marker)`
+- `target_progress = max(covered_until, original_count - keep_last)`
+
+When the summary-model window is too small, trim newest atomic groups from the summary input so the saved boundary still matches what the summary actually covers.
+
+## Gotchas
+- If you trim from the head of the summary input, the saved progress can overstate coverage and hide messages that were never summarized.
+- Status previews for the next context must convert the saved original-history boundary back into the current visible view before rebuilding head/summary/tail.
+- `inlet(body["messages"])` and `outlet(body["messages"])` can both represent the full conversation while using different serializations:
+	- inlet may receive expanded native tool-call chains (`assistant(tool_calls) -> tool -> assistant`)
+	- outlet may receive a compact top-level transcript where tool calls are folded into assistant `<details type="tool_calls">` blocks
+- These two views do not share a safe `compressed_message_count` coordinate system. If outlet is in the compact assistant/details view, do not persist summary progress derived from its top-level message count.
--- a/.agent/learnings/openwebui-tool-call-context-inflation.md
+++ b/.agent/learnings/openwebui-tool-call-context-inflation.md
@@ -0,0 +1,26 @@
+# OpenWebUI Tool Call Context Inflation
+
+> Discovered: 2026-03-11
+
+## Context
+When analyzing why the `async_context_compression` plugin sees different array lengths of `messages` between the `inlet` (e.g. 27 items) and `outlet` (e.g. 8 items) phases, especially when native tool calling (Function Calling) is involved in OpenWebUI.
+
+## Finding
+There is a fundamental disparity in how OpenWebUI serializes conversational history at different stages of the request lifecycle:
+
+1. **Outlet (UI Rendering View)**:
+   After the LLM completes generation and tools have been executed, OpenWebUI's `middleware.py` (and streaming builders) bundles intermediate tool calls and their raw results. It hides them inside an HTML `<details type="tool_calls">...</details>` block within a single `role: assistant` message's `content`. 
+   Concurrently, the actual native API tool-calling data is saved in a hidden `output` dict field attached to that message. At this stage, the `messages` array looks short (e.g., 8 items) because tool interactions are visually folded.
+
+2. **Inlet (LLM Native View)**:
+   When the user sends the *next* message, the request enters `main.py` -> `process_chat_payload` -> `middleware.py:process_messages_with_output()`.
+   Here, OpenWebUI scans historical `assistant` messages for that hidden `output` field. If found, it completely **inflates (unfolds)** the raw data back into an exact sequence of OpenAI-compliant `tool_call` and `tool_result` messages (using `utils/misc.py:convert_output_to_messages`).
+   The HTML `<details>` string is entirely discarded before being sent to the LLM.
+
+**Conclusion on Token Consumption**:
+In the next turn, tool context is **NOT** compressed at all. It is fully re-expanded to its original verbose state (e.g., back to 27 items) and consumes the maximum amount of tokens required by the raw JSON arguments and results.
+
+## Gotchas
+- Any logic operating in the `outlet` phase (like background tasks) that relies on the `messages` array index will be completely misaligned with the array seen in the `inlet` phase.
+- Attempting to slice or trim history based on `outlet` array lengths will cause index out-of-bounds errors or destructive cropping of recent messages.
+- The only safe way to bridge these two views is either to translate the folded view back into the expanded view using `convert_output_to_messages`, or to rely on unique `id` fields (if available) rather than array indices.
--- a/README.md
+++ b/README.md
@@ -9,6 +9,7 @@ A collection of enhancements, plugins, and prompts for [open-webui](https://gith

 <!-- STATS_START -->
 ## 📊 Community Stats
+>
 > ![updated](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_updated.json&style=flat)

 | 👤 Author | 👥 Followers | ⭐ Points | 🏆 Contributions |
@@ -19,18 +20,19 @@ A collection of enhancements, plugins, and prompts for [open-webui](https://gith
 | :---: | :---: | :---: | :---: | :---: |
 | ![posts](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_posts.json&style=flat) | ![downloads](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_downloads.json&style=flat) | ![views](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_views.json&style=flat) | ![upvotes](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_upvotes.json&style=flat) | ![saves](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_saves.json&style=flat) |

-
 ### 🔥 Top 6 Popular Plugins
+
 | Rank | Plugin | Version | Downloads | Views | 📅 Updated |
 | :---: | :--- | :---: | :---: | :---: | :---: |
 | 🥇 | [Smart Mind Map](https://openwebui.com/posts/turn_any_text_into_beautiful_mind_maps_3094c59a) | ![v](https://img.shields.io/badge/v-1.0.0-blue?style=flat) | ![p1_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p1_dl.json&style=flat) | ![p1_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p1_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) |
 | 🥈 | [Smart Infographic](https://openwebui.com/posts/smart_infographic_ad6f0c7f) | ![v](https://img.shields.io/badge/v-1.5.0-blue?style=flat) | ![p2_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p2_dl.json&style=flat) | ![p2_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p2_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) |
 | 🥉 | [Markdown Normalizer](https://openwebui.com/posts/markdown_normalizer_baaa8732) | ![v](https://img.shields.io/badge/v-1.2.7-blue?style=flat) | ![p3_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p3_dl.json&style=flat) | ![p3_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p3_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) |
 | 4️⃣ | [Export to Word Enhanced](https://openwebui.com/posts/export_to_word_enhanced_formatting_fca6a315) | ![v](https://img.shields.io/badge/v-0.4.4-blue?style=flat) | ![p4_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p4_dl.json&style=flat) | ![p4_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p4_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) |
-| 5️⃣ | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | ![v](https://img.shields.io/badge/v-1.4.0-blue?style=flat) | ![p5_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_dl.json&style=flat) | ![p5_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--09-gray?style=flat) |
+| 5️⃣ | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | ![v](https://img.shields.io/badge/v-1.4.1-blue?style=flat) | ![p5_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_dl.json&style=flat) | ![p5_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--11-gray?style=flat) |
 | 6️⃣ | [AI Task Instruction Generator](https://openwebui.com/posts/ai_task_instruction_generator_9bab8b37) | ![v](https://img.shields.io/badge/v-N/A-gray?style=flat) | ![p6_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p6_dl.json&style=flat) | ![p6_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p6_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) |

 ### 📈 Total Downloads Trend
+
 ![Activity](https://gist.githubusercontent.com/Fu-Jie/db3d95687075a880af6f1fba76d679c6/raw/chart.svg)

 *See full stats and charts in [Community Stats Report](./docs/community-stats.md)*
--- a/README_CN.md
+++ b/README_CN.md
@@ -6,6 +6,7 @@ OpenWebUI 增强功能集合。包含个人开发与收集的插件、提示词

 <!-- STATS_START -->
 ## 📊 社区统计
+>
 > ![updated_zh](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_updated_zh.json&style=flat)

 | 👤 作者 | 👥 粉丝 | ⭐ 积分 | 🏆 贡献 |
@@ -16,18 +17,19 @@ OpenWebUI 增强功能集合。包含个人开发与收集的插件、提示词
 | :---: | :---: | :---: | :---: | :---: |
 | ![posts](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_posts.json&style=flat) | ![downloads](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_downloads.json&style=flat) | ![views](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_views.json&style=flat) | ![upvotes](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_upvotes.json&style=flat) | ![saves](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_saves.json&style=flat) |

-
 ### 🔥 热门插件 Top 6
+
 | 排名 | 插件 | 版本 | 下载 | 浏览 | 📅 更新 |
 | :---: | :--- | :---: | :---: | :---: | :---: |
 | 🥇 | [Smart Mind Map](https://openwebui.com/posts/turn_any_text_into_beautiful_mind_maps_3094c59a) | ![v](https://img.shields.io/badge/v-1.0.0-blue?style=flat) | ![p1_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p1_dl.json&style=flat) | ![p1_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p1_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) |
 | 🥈 | [Smart Infographic](https://openwebui.com/posts/smart_infographic_ad6f0c7f) | ![v](https://img.shields.io/badge/v-1.5.0-blue?style=flat) | ![p2_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p2_dl.json&style=flat) | ![p2_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p2_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) |
 | 🥉 | [Markdown Normalizer](https://openwebui.com/posts/markdown_normalizer_baaa8732) | ![v](https://img.shields.io/badge/v-1.2.7-blue?style=flat) | ![p3_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p3_dl.json&style=flat) | ![p3_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p3_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) |
 | 4️⃣ | [Export to Word Enhanced](https://openwebui.com/posts/export_to_word_enhanced_formatting_fca6a315) | ![v](https://img.shields.io/badge/v-0.4.4-blue?style=flat) | ![p4_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p4_dl.json&style=flat) | ![p4_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p4_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) |
-| 5️⃣ | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | ![v](https://img.shields.io/badge/v-1.4.0-blue?style=flat) | ![p5_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_dl.json&style=flat) | ![p5_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--09-gray?style=flat) |
+| 5️⃣ | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | ![v](https://img.shields.io/badge/v-1.4.1-blue?style=flat) | ![p5_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_dl.json&style=flat) | ![p5_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p5_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--11-gray?style=flat) |
 | 6️⃣ | [AI Task Instruction Generator](https://openwebui.com/posts/ai_task_instruction_generator_9bab8b37) | ![v](https://img.shields.io/badge/v-N/A-gray?style=flat) | ![p6_dl](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p6_dl.json&style=flat) | ![p6_vw](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2FFu-Jie%2Fdb3d95687075a880af6f1fba76d679c6%2Fraw%2Fbadge_p6_vw.json&style=flat) | ![updated](https://img.shields.io/badge/2026--03--08-gray?style=flat) |

 ### 📈 总下载量累计趋势
+
 ![Activity](https://gist.githubusercontent.com/Fu-Jie/db3d95687075a880af6f1fba76d679c6/raw/chart.svg)

 *完整统计与趋势图请查看 [社区统计报告](./docs/community-stats.zh.md)*
--- a/docs/plugins/filters/async-context-compression.md
+++ b/docs/plugins/filters/async-context-compression.md
@@ -1,15 +1,13 @@
 # Async Context Compression Filter

-**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.4.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
+**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.4.1 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT

 This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.

-## What's new in 1.4.0
+## What's new in 1.4.1

- **Atomic Message Grouping**: Introduced structure-aware grouping for `assistant-tool-tool-assistant` chains to prevent "No tool call found" errors.
- **Tail Boundary Alignment**: Implemented automatic correction for truncation points to ensure they don't fall inside a tool-calling sequence.
- **Chat Session Locking**: Added a session-based lock to prevent multiple concurrent summary tasks for the same chat ID.
- **Enhanced Traceability**: Improved summary formatting to include message IDs, names, and metadata for better context tracking.
+- **Reverse-Unfolding Mechanism**: Accurately reconstructs the expanded native tool-calling sequence during the outlet phase to permanently fix coordinate drift and missing summaries for long tool-based conversations.
+- **Safer Tool Trimming**: Refactored `enable_tool_output_trimming` to strictly use atomic block groups for safe trimming, completely preventing JSON payload corruption.

 ---

--- a/docs/plugins/filters/async-context-compression.zh.md
+++ b/docs/plugins/filters/async-context-compression.zh.md
@@ -1,17 +1,15 @@
 # 异步上下文压缩过滤器

-**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.4.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
+**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.4.1 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT

 > **重要提示**：为了确保所有过滤器的可维护性和易用性，每个过滤器都应附带清晰、完整的文档，以确保其功能、配置和使用方法得到充分说明。

 本过滤器通过智能摘要和消息压缩技术，在保持对话连贯性的同时，显著降低长对话的 Token 消耗。

-## 1.4.0 版本更新
+## 1.4.1 版本更新

- **原子消息组 (Atomic Grouping)**: 引入结构感知的消息分组逻辑，确保工具调用链被整体保留或移除，彻底解决 "No tool call found" 错误。
- **尾部边界自动对齐**: 实现了截断点的自动修正逻辑，确保历史上下文截断不会落在工具调用序列中间。
- **会话级异步锁**: 增加了基于 `chat_id` 的后台任务锁，防止同一会话并发触发多个总结任务。
- **元数据溯源增强**: 优化了总结输入格式，在总结中保留了消息 ID、参与者名称及关键元数据，提升上下文可追踪性。
+- **逆向展开机制**: 引入 `_unfold_messages` 机制以在 `outlet` 阶段精确对齐坐标系，彻底解决了由于前端视图折叠导致长轮次工具调用对话出现进度漂移或跳过生成摘要的问题。
+- **更安全的工具内容裁剪**: 重构了 `enable_tool_output_trimming`，现在严格使用原子级分组进行安全的原生工具内容裁剪，替代了激进的正则表达式匹配，防止 JSON 载荷损坏。

 ---

--- a/docs/plugins/filters/index.md
+++ b/docs/plugins/filters/index.md
@@ -22,7 +22,7 @@ Filters act as middleware in the message pipeline:

    Reduces token consumption in long conversations through intelligent summarization while maintaining coherence.

-    **Version:** 1.4.0
+    **Version:** 1.4.1

    [:octicons-arrow-right-24: Documentation](async-context-compression.md)

--- a/docs/plugins/filters/index.zh.md
+++ b/docs/plugins/filters/index.zh.md
@@ -22,7 +22,7 @@ Filter 充当消息管线中的中间件：

    通过智能总结减少长对话的 token 消耗，同时保持连贯性。

-    **版本：** 1.4.0
+    **版本：** 1.4.1

    [:octicons-arrow-right-24: 查看文档](async-context-compression.md)

--- a/plugins/debug/async_context_compression/ISSUE_EXPLANATION.md
+++ b/plugins/debug/async_context_compression/ISSUE_EXPLANATION.md
@@ -0,0 +1,62 @@
+# 异步上下文压缩插件：当前问题与处理状态总结
+
+这份文档详细梳理了我们在处理 `async_context_compression`（异步上下文压缩插件）时，遭遇的“幽灵截断”问题的根本原因，以及我们目前的解决进度。
+
+## 1. 根本原因：两种截然不同的“世界观”（数据序列化差异）
+
+在我们之前的排查中，我曾错误地认为：`outlet`（后置处理阶段）拿到的 `body["messages"]` 是由于截断导致的残缺数据。
+但根据您提供的本地运行日志，**您是对的，`body['messages']` 确实包含了完整的对话历史**。
+
+那么为什么长度会产生 `inlet 看到 27 条`，而 `outlet 只看到 8 条` 这种巨大的差异？
+
+原因在于，OpenWebUI 的管道在进入大模型前和从大模型返回后，使用了**两种完全不同的消息格式**：
+
+### 视图 A：Inlet 阶段（原生 API 展开视图）
+- **特点**：严格遵循 OpenAI 函数调用规范。
+- **状态**：每一次工具调用、工具返回，都被视为一条独立的 message。
+- **例子**：一个包含了复杂搜索的对话。
+  - User: 帮我查一下天气（1条）
+  - Assistant: 发起 tool_call（1条）
+  - Tool: 返回 JSON 结果（1条）
+  - ...多次往复...
+  - **最终总计：27 条。**我们的压缩算法（trim）是基于这个 27 条的坐标系来计算保留多少条的。
+
+### 视图 B：Outlet 阶段（UI HTML 折叠视图）
+- **特点**：专为前端渲染优化的紧凑视图。
+- **状态**：OpenWebUI 在调用完模型后，为了让前端显示出那个好看的、可折叠的工具调用卡片，强行把中间所有的 Tool 交互过程，用 `<details type="tool_calls">...</details>` 的 HTML 代码包裹起来，塞进了一个 `role: assistant` 的 `content` 字符串里！
+- **例子**：同样的对话。
+  - User: 帮我查一下天气（1条）
+  - Assistant: `<details>包含了好多次工具调用和结果的代码</details> 今天天气很好...`（1条）
+  - **最终总计：8 条。**
+
+**💥 灾难发生点：**
+原本的插件逻辑假定 `inlet` 和 `outlet` 共享同一个坐标系。
+1. 在 `inlet` 时，系统计算出：“我需要把前 10 条消息生成摘要，保留后 17 条”。
+2. 系统把“生成前10条摘要”的任务转入后台异步执行。
+3. 后台任务在 `outlet` 阶段被触发，此时它拿到的消息数组变成了**视图 B（总共只有 8 条）。**
+4. 算法试图在只有 8 条消息的数组里，把“前 10 条消息”砍掉并替换为 1 条摘要。
+5. **结果就是：数组索引越界/坐标彻底错乱，触发报错，并且可能将最新的有效消息当成旧消息删掉（过度压缩）。**
+
+---
+
+## 2. 目前已解决的问题 (✅ Done)
+
+为了立刻制止这种因为“坐标系错位”导致的数据破坏，我们已经落实了热修复（Local v1.4.0）：
+
+**✅ 添加了“折叠视图”的探针防御：**
+- 我写了一个函数 `_is_compact_tool_details_view`。
+- 现在，当后台触发生成摘要时，系统会自动扫描 `outlet` 传来的 `messages`。只要发现里面包含 `<details type="tool_calls">` 这种带有 HTML 折叠标签的痕迹，就会**立刻终止并跳过**当前的摘要生成任务。
+- **收益**：彻底杜绝了因数组错位而引发的任务报错和强制裁切。UI 崩溃与历史丢失问题得到遏制。
+
+---
+
+## 3. 当前已解决的遗留问题 (✅ Done: 逆向展开修复)
+
+之前因为跳过生成而引入的新限制：**包含工具调用的长轮次对话，无法自动生成“历史摘要”** 的问题，现已彻底解决。
+
+### 最终实施的技术方案：
+我们通过源码分析发现，OpenWebUI 在进入 `inlet` 时会执行 `convert_output_to_messages` 还原工具调用链。因此，我们在插件的 `outlet` 阶段引入了相同的 **逆向展开 (Deflation/Unfolding)** 机制 `_unfold_messages`。
+
+现在，当后台任务拿到 `outlet` 传来的折叠视图时，不会再选择“跳过”。而是自动提取出潜藏在消息对象体内部的原生 `output` 字段，并**将其重新展开为展开视图**（比如将 8 条假象重新还原为真实的 27 条底层数据），使得它的坐标系与 `inlet` 完全对齐。
+
+至此，带有复杂工具调用的长轮次对话也能安全地进行背景自动压缩，不再有任何截断和强制删减的风险！
--- a/plugins/debug/async_context_compression/REPLY_TO_DHAERN_CN.md
+++ b/plugins/debug/async_context_compression/REPLY_TO_DHAERN_CN.md
@@ -0,0 +1,60 @@
+# 回复 dhaern — 针对最新审查的跟进
+
+感谢您重新审查了最新版本并提出了持续精准的分析意见。以下针对您剩余的两个关切点逐一回应。
+
+---
+
+### 1. `enable_tool_output_trimming` — 不是功能退化，而是行为变化是有意为之
+
+裁剪逻辑依然存在且可正常运行。以下是当前版本与之前版本的行为对比。
+
+**当前行为（`_trim_native_tool_outputs`，第 835–945 行）：**
+- 通过 `_get_atomic_groups` 遍历原子分组。
+- 识别有效的工具调用链：`assistant(tool_calls)` → `tool` → [可选的 assistant 跟进消息]。
+- 如果一条链内所有 `tool` 角色消息的字符数总和超过 **1,200 个字符**，则将 *tool 消息本身的内容* 折叠为一个本地化的 `[Content collapsed]` 占位符，并注入 `metadata.is_trimmed` 标志。
+- 同时遍历包含 `<details type="tool_calls">` HTML 块的 assistant 消息，对其中尺寸过大的 `result` 属性进行相同的折叠处理。
+- 当 `enable_tool_output_trimming=True` 且 `function_calling=native` 时，该函数在 inlet 阶段被调用。
+
+**与旧版本的区别：**  
+旧版的做法是改写 *assistant 跟进消息*，仅保留"最终答案"。新版的做法是折叠 *tool 响应内容本身*。两者都会缩减上下文体积，但新方法能够保留 tool 调用链的结构完整性（这是本次发布中原子分组工作的前提条件）。
+
+插件头部的 docstring 里还有一段过时的描述（"提取最终答案"），与实际行为相悖。最新提交中已将其更正为"将尺寸过大的原生工具输出折叠为简短占位符"。
+
+如果您在寻找旧版本中"仅保留最终答案"的特定行为，该路径已被有意移除，因为它与本次发布引入的原子分组完整性保证相冲突。当前的折叠方案是安全的替代实现。
+
+---
+
+### 2. `compressed_message_count` — 修复是真实有效的；以下是坐标系追踪
+
+您对"从已修改视图重新计算"的担忧，考虑到此前的架构背景，是完全可以理解的。以下精确说明为何当前代码不存在这一问题。
+
+**`outlet` 中的关键变更：**
+```python
+db_messages = self._load_full_chat_messages(chat_id)
+messages_to_unfold = db_messages if (db_messages and len(db_messages) >= len(messages)) else messages
+summary_messages = self._unfold_messages(messages_to_unfold)
+target_compressed_count = self._calculate_target_compressed_count(summary_messages)
+```
+
+`_load_full_chat_messages` 从 OpenWebUI 数据库中获取原始的持久化历史记录。由于在 inlet 渲染期间注入的合成 summary 消息**从未被回写到数据库**，从 DB 路径获取的 `summary_messages` 始终是干净的、未经修改的原始历史记录——没有 summary 标记，没有坐标膨胀。
+
+在此干净列表上调用 `_calculate_target_compressed_count` 的计算逻辑如下（仍在原始历史坐标系内）：
+```
+original_count = len(db_messages)
+raw_target = original_count - keep_last
+target = atomic_align(raw_target)
+```
+
+这个 `target_compressed_count` 值原封不动地传递进 `_generate_summary_async`。在异步任务内部，同一批 `db_messages` 被切片为 `messages[start:target]` 来构建 `middle_messages`。生成完成后（可能从末尾进行原子截断），保存的值为：
+```python
+saved_compressed_count = start_index + len(middle_messages)
+```
+这是原始 DB 消息列表中新摘要实际涵盖到的确切位置——不是目标值，也不是来自不同视图的估算值。
+
+**回退路径（DB 不可用时）** 使用 inlet 渲染后的 body 消息。此时 `_get_summary_view_state` 会读取注入的 summary 标记的 `covered_until` 字段（该字段在写入时已记录为原子对齐后的 `start_index`），因此 `base_progress` 已经处于原始历史坐标系内，计算可以自然延续，不会混用两种视图。
+
+简而言之：该字段在整个调用链中现在具有唯一、一致的语义——即原始持久化消息列表中，当前摘要文本实际覆盖到的索引位置。
+
+---
+
+再次感谢您严格的审查。您在上次发布后标记的这两个问题已得到处理，文档中的过时描述也已更正。如果发现其他问题，欢迎继续反馈。
--- a/plugins/debug/async_context_compression/REPLY_TO_DHAERN_EN.md
+++ b/plugins/debug/async_context_compression/REPLY_TO_DHAERN_EN.md
@@ -0,0 +1,60 @@
+# Reply to dhaern - Follow-up on the Latest Review
+
+Thank you for re-checking the latest version and for the continued precise analysis. Let me address your two remaining concerns directly.
+
+---
+
+### 1. `enable_tool_output_trimming` — Not a regression; behavior change is intentional
+
+The trimming logic is present and functional. Here is what it does now versus before.
+
+**Current behavior (`_trim_native_tool_outputs`, lines 835–945):**
+- Iterates over atomic groups via `_get_atomic_groups`.
+- Identifies valid chains: `assistant(tool_calls)` → `tool` → [optional assistant follow-up].
+- If the combined character count of the `tool` role messages in a chain exceeds **1,200 characters**, it collapses *the tool messages themselves* to a localized `[Content collapsed]` placeholder and injects a `metadata.is_trimmed` flag.
+- Separately walks assistant messages containing `<details type="tool_calls">` HTML blocks and collapses oversized `result` attributes in the same way.
+- The function is called at inlet when `enable_tool_output_trimming=True` and `function_calling=native`.
+
+**What is different from the previous version:**  
+The old approach rewrote the *assistant follow-up* message to keep only the "final answer". The new approach collapses the *tool response content* itself. Both reduce context size, but the new approach preserves the structural integrity of the tool-calling chain (which the atomic grouping work in this release depends on).
+
+The docstring in the plugin header also contained a stale description ("extract only the final answer") that contradicted the actual behavior. That has been corrected in the latest commit to accurately say "collapses oversized native tool outputs to a short placeholder."
+
+If you are looking for the specific "keep only the final answer" behavior from the old version, that path was intentionally removed because it conflicted with the atomic-group integrity guarantees introduced in this release. The current collapse approach is a safe replacement.
+
+---
+
+### 2. `compressed_message_count` — The fix is real; here is the coordinate trace
+
+The concern about "recalculating from the already-modified view" is understandable given the previous architecture. Here is exactly why the current code does not have that problem.
+
+**Key change in `outlet`:**
+```python
+db_messages = self._load_full_chat_messages(chat_id)
+messages_to_unfold = db_messages if (db_messages and len(db_messages) >= len(messages)) else messages
+summary_messages = self._unfold_messages(messages_to_unfold)
+target_compressed_count = self._calculate_target_compressed_count(summary_messages)
+```
+
+`_load_full_chat_messages` fetches the raw persisted history from the OpenWebUI database. Because the synthetic summary message (injected during inlet rendering) is **never written back to the database**, `summary_messages` from the DB path is always the clean, unmodified original history — no summary marker, no coordinate inflation.
+
+`_calculate_target_compressed_count` called on this clean list simply computes:
+```
+original_count = len(db_messages)
+raw_target = original_count - keep_last
+target = atomic_align(raw_target)   # still in original-history coordinates
+```
+
+This `target_compressed_count` value is then passed into `_generate_summary_async` unchanged. Inside the async task, the same `db_messages` list is sliced to `messages[start:target]` to build `middle_messages`. After generation (with potential atomic truncation from the end), the saved value is:
+```python
+saved_compressed_count = start_index + len(middle_messages)
+```
+This is the exact position in the original DB message list up to which the new summary actually covers — not a target, not an estimate from a different view.
+
+**The fallback path (DB unavailable)** uses the inlet-rendered body messages. In that case `_get_summary_view_state` reads `covered_until` from the injected summary marker (which was written as the atomically-aligned `start_index`), so `base_progress` is already in original-history coordinates. The calculation naturally continues from there without mixing views.
+
+In short: the field now has a single, consistent meaning throughout the entire call chain — the index (in the original, persisted message list) up to which the current summary text actually covers.
+
+---
+
+Thank you again for the rigorous review. The two points you flagged after the last release are now addressed, and the documentation stale description has been corrected. Please do let us know if you spot anything else.
--- a/plugins/filters/async-context-compression/README.md
+++ b/plugins/filters/async-context-compression/README.md
@@ -1,15 +1,13 @@
 # Async Context Compression Filter

-**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.4.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
+**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.4.1 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT

 This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.

-## What's new in 1.4.0
+## What's new in 1.4.1

- **Atomic Message Grouping**: Introduced structure-aware grouping for `assistant-tool-tool-assistant` chains to prevent "No tool call found" errors.
- **Tail Boundary Alignment**: Implemented automatic correction for truncation points to ensure they don't fall inside a tool-calling sequence.
- **Chat Session Locking**: Added a session-based lock to prevent multiple concurrent summary tasks for the same chat ID.
- **Enhanced Traceability**: Improved summary formatting to include message IDs, names, and metadata for better context tracking.
+- **Reverse-Unfolding Mechanism**: Accurately reconstructs the expanded native tool-calling sequence during the outlet phase to permanently fix coordinate drift and missing summaries for long tool-based conversations.
+- **Safer Tool Trimming**: Refactored `enable_tool_output_trimming` to strictly use atomic block groups for safe trimming, completely preventing JSON payload corruption.

 ---

--- a/plugins/filters/async-context-compression/README_CN.md
+++ b/plugins/filters/async-context-compression/README_CN.md
@@ -1,17 +1,15 @@
 # 异步上下文压缩过滤器

-**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.4.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
+**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.4.1 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT

 > **重要提示**：为了确保所有过滤器的可维护性和易用性，每个过滤器都应附带清晰、完整的文档，以确保其功能、配置和使用方法得到充分说明。

 本过滤器通过智能摘要和消息压缩技术，在保持对话连贯性的同时，显著降低长对话的 Token 消耗。

-## 1.4.0 版本更新
+## 1.4.1 版本更新

- **原子消息组 (Atomic Grouping)**: 引入结构感知的消息分组逻辑，确保工具调用链被整体保留或移除，彻底解决 "No tool call found" 错误。
- **尾部边界自动对齐**: 实现了截断点的自动修正逻辑，确保历史上下文截断不会落在工具调用序列中间。
- **会话级异步锁**: 增加了基于 `chat_id` 的后台任务锁，防止同一会话并发触发多个总结任务。
- **元数据溯源增强**: 优化了总结输入格式，在总结中保留了消息 ID、参与者名称及关键元数据，提升上下文可追踪性。
+- **逆向展开机制**: 引入 `_unfold_messages` 机制以在 `outlet` 阶段精确对齐坐标系，彻底解决了由于前端视图折叠导致长轮次工具调用对话出现进度漂移或跳过生成摘要的问题。
+- **更安全的工具内容裁剪**: 重构了 `enable_tool_output_trimming`，现在严格使用原子级分组进行安全的原生工具内容裁剪，替代了激进的正则表达式匹配，防止 JSON 载荷损坏。

 ---

--- a/plugins/filters/async-context-compression/async_context_compression.py
+++ b/plugins/filters/async-context-compression/async_context_compression.py
--- a/plugins/filters/async-context-compression/test_async_context_compression.py
+++ b/plugins/filters/async-context-compression/test_async_context_compression.py
@@ -0,0 +1,461 @@
+import asyncio
+import importlib.util
+import os
+import sys
+import types
+import unittest
+
+
+PLUGIN_PATH = os.path.join(os.path.dirname(__file__), "async_context_compression.py")
+MODULE_NAME = "async_context_compression_under_test"
+
+
+def _ensure_module(name: str) -> types.ModuleType:
+    module = sys.modules.get(name)
+    if module is None:
+        module = types.ModuleType(name)
+        sys.modules[name] = module
+    return module
+
+
+def _install_openwebui_stubs() -> None:
+    _ensure_module("open_webui")
+    _ensure_module("open_webui.utils")
+    chat_module = _ensure_module("open_webui.utils.chat")
+    _ensure_module("open_webui.models")
+    users_module = _ensure_module("open_webui.models.users")
+    models_module = _ensure_module("open_webui.models.models")
+    chats_module = _ensure_module("open_webui.models.chats")
+    main_module = _ensure_module("open_webui.main")
+    _ensure_module("fastapi")
+    fastapi_requests = _ensure_module("fastapi.requests")
+
+    async def generate_chat_completion(*args, **kwargs):
+        return {}
+
+    class DummyUsers:
+        pass
+
+    class DummyModels:
+        @staticmethod
+        def get_model_by_id(model_id):
+            return None
+
+    class DummyChats:
+        @staticmethod
+        def get_chat_by_id(chat_id):
+            return None
+
+    class DummyRequest:
+        pass
+
+    chat_module.generate_chat_completion = generate_chat_completion
+    users_module.Users = DummyUsers
+    models_module.Models = DummyModels
+    chats_module.Chats = DummyChats
+    main_module.app = object()
+    fastapi_requests.Request = DummyRequest
+
+
+_install_openwebui_stubs()
+spec = importlib.util.spec_from_file_location(MODULE_NAME, PLUGIN_PATH)
+module = importlib.util.module_from_spec(spec)
+sys.modules[MODULE_NAME] = module
+assert spec.loader is not None
+spec.loader.exec_module(module)
+module.Filter._init_database = lambda self: None
+
+
+class TestAsyncContextCompression(unittest.TestCase):
+    def setUp(self):
+        self.filter = module.Filter()
+
+    def test_inlet_logs_tool_trimming_outcome_when_no_oversized_outputs(self):
+        self.filter.valves.show_debug_log = True
+        self.filter.valves.enable_tool_output_trimming = True
+
+        logged_messages = []
+
+        async def fake_log(message, log_type="info", event_call=None):
+            logged_messages.append(message)
+
+        async def fake_user_context(__user__, __event_call__):
+            return {"user_language": "en-US"}
+
+        async def fake_event_call(_payload):
+            return True
+
+        self.filter._log = fake_log
+        self.filter._get_user_context = fake_user_context
+        self.filter._get_chat_context = lambda body, metadata=None: {
+            "chat_id": "",
+            "message_id": "",
+        }
+        self.filter._get_latest_summary = lambda chat_id: None
+
+        body = {
+            "params": {"function_calling": "native"},
+            "messages": [
+                {
+                    "role": "assistant",
+                    "tool_calls": [{"id": "call_1", "type": "function"}],
+                    "content": "",
+                },
+                {"role": "tool", "content": "short result"},
+                {"role": "assistant", "content": "Final answer"},
+            ],
+        }
+
+        asyncio.run(self.filter.inlet(body, __event_call__=fake_event_call))
+
+        self.assertTrue(
+            any("Tool trimming check:" in message for message in logged_messages)
+        )
+        self.assertTrue(
+            any(
+                "no oversized native tool outputs were found" in message
+                for message in logged_messages
+            )
+        )
+
+    def test_inlet_logs_tool_trimming_skip_reason_when_disabled(self):
+        self.filter.valves.show_debug_log = True
+        self.filter.valves.enable_tool_output_trimming = False
+
+        logged_messages = []
+
+        async def fake_log(message, log_type="info", event_call=None):
+            logged_messages.append(message)
+
+        async def fake_user_context(__user__, __event_call__):
+            return {"user_language": "en-US"}
+
+        async def fake_event_call(_payload):
+            return True
+
+        self.filter._log = fake_log
+        self.filter._get_user_context = fake_user_context
+        self.filter._get_chat_context = lambda body, metadata=None: {
+            "chat_id": "",
+            "message_id": "",
+        }
+        self.filter._get_latest_summary = lambda chat_id: None
+
+        body = {"messages": [], "params": {"function_calling": "native"}}
+
+        asyncio.run(self.filter.inlet(body, __event_call__=fake_event_call))
+
+        self.assertTrue(
+            any("Tool trimming skipped: tool trimming disabled" in message for message in logged_messages)
+        )
+
+    def test_normalize_native_tool_call_ids_keeps_links_aligned(self):
+        long_tool_call_id = "call_abcdefghijklmnopqrstuvwxyz_1234567890abcd"
+        messages = [
+            {
+                "role": "assistant",
+                "tool_calls": [
+                    {
+                        "id": long_tool_call_id,
+                        "type": "function",
+                        "function": {"name": "search", "arguments": "{}"},
+                    }
+                ],
+                "content": "",
+            },
+            {
+                "role": "tool",
+                "tool_call_id": long_tool_call_id,
+                "content": "tool result",
+            },
+        ]
+
+        normalized_count = self.filter._normalize_native_tool_call_ids(messages)
+
+        normalized_id = messages[0]["tool_calls"][0]["id"]
+        self.assertEqual(normalized_count, 1)
+        self.assertLessEqual(len(normalized_id), 40)
+        self.assertNotEqual(normalized_id, long_tool_call_id)
+        self.assertEqual(messages[1]["tool_call_id"], normalized_id)
+
+    def test_trim_native_tool_outputs_restores_real_behavior(self):
+        messages = [
+            {
+                "role": "assistant",
+                "tool_calls": [{"id": "call_1", "type": "function"}],
+                "content": "",
+            },
+            {"role": "tool", "content": "x" * 1600},
+            {"role": "assistant", "content": "Final answer"},
+        ]
+
+        trimmed_count = self.filter._trim_native_tool_outputs(messages, "en-US")
+
+        self.assertEqual(trimmed_count, 1)
+        self.assertEqual(messages[1]["content"], "... [Content collapsed] ...")
+        self.assertTrue(messages[1]["metadata"]["is_trimmed"])
+        self.assertTrue(messages[2]["metadata"]["tool_outputs_trimmed"])
+        self.assertIn("Final answer", messages[2]["content"])
+        self.assertIn("Tool outputs trimmed", messages[2]["content"])
+
+    def test_trim_native_tool_outputs_supports_embedded_tool_call_cards(self):
+        messages = [
+            {
+                "role": "assistant",
+                "content": (
+                    '<details type="tool_calls" done="true" id="call-1" '
+                    'name="execute_code" arguments="&quot;{}&quot;" '
+                    f'result="&quot;{"x" * 1600}&quot;">\n'
+                    "<summary>Tool Executed</summary>\n"
+                    "</details>\n"
+                    "Final answer"
+                ),
+            }
+        ]
+
+        trimmed_count = self.filter._trim_native_tool_outputs(messages, "en-US")
+
+        self.assertEqual(trimmed_count, 1)
+        self.assertIn(
+            'result="&quot;... [Content collapsed] ...&quot;"',
+            messages[0]["content"],
+        )
+        self.assertNotIn("x" * 200, messages[0]["content"])
+        self.assertTrue(messages[0]["metadata"]["tool_outputs_trimmed"])
+
+    def test_function_calling_mode_reads_params_fallback(self):
+        self.assertEqual(
+            self.filter._get_function_calling_mode(
+                {"params": {"function_calling": "native"}}
+            ),
+            "native",
+        )
+
+    def test_function_calling_mode_infers_native_from_message_shape(self):
+        self.assertEqual(
+            self.filter._get_function_calling_mode(
+                {
+                    "messages": [
+                        {
+                            "role": "assistant",
+                            "tool_calls": [{"id": "call_1", "type": "function"}],
+                            "content": "",
+                        },
+                        {"role": "tool", "content": "tool result"},
+                    ]
+                }
+            ),
+            "native",
+        )
+
+    def test_trim_native_tool_outputs_handles_pending_tool_chain(self):
+        messages = [
+            {
+                "role": "assistant",
+                "tool_calls": [{"id": "call_1", "type": "function"}],
+                "content": "",
+            },
+            {"role": "tool", "content": "x" * 1600},
+        ]
+
+        trimmed_count = self.filter._trim_native_tool_outputs(messages, "en-US")
+
+        self.assertEqual(trimmed_count, 1)
+        self.assertEqual(messages[1]["content"], "... [Content collapsed] ...")
+        self.assertTrue(messages[1]["metadata"]["is_trimmed"])
+
+    def test_target_progress_uses_original_history_coordinates(self):
+        self.filter.valves.keep_last = 2
+        summary_message = self.filter._build_summary_message(
+            "older summary", "en-US", 6
+        )
+        messages = [
+            {"role": "system", "content": "System prompt"},
+            summary_message,
+            {"role": "user", "content": "Question 1"},
+            {"role": "assistant", "content": "Answer 1"},
+            {"role": "user", "content": "Question 2"},
+            {"role": "assistant", "content": "Answer 2"},
+        ]
+
+        self.assertEqual(self.filter._get_original_history_count(messages), 10)
+        self.assertEqual(self.filter._calculate_target_compressed_count(messages), 8)
+
+    def test_load_full_chat_messages_rebuilds_active_history_branch(self):
+        class FakeChats:
+            @staticmethod
+            def get_chat_by_id(chat_id):
+                return types.SimpleNamespace(
+                    chat={
+                        "history": {
+                            "currentId": "m3",
+                            "messages": {
+                                "m1": {
+                                    "id": "m1",
+                                    "role": "user",
+                                    "content": "Question",
+                                },
+                                "m2": {
+                                    "id": "m2",
+                                    "role": "assistant",
+                                    "content": "Tool call",
+                                    "tool_calls": [{"id": "call_1"}],
+                                    "parentId": "m1",
+                                },
+                                "m3": {
+                                    "id": "m3",
+                                    "role": "tool",
+                                    "content": "Tool result",
+                                    "tool_call_id": "call_1",
+                                    "parentId": "m2",
+                                },
+                            },
+                        }
+                    }
+                )
+
+        original_chats = module.Chats
+        module.Chats = FakeChats
+        try:
+            messages = self.filter._load_full_chat_messages("chat-1")
+        finally:
+            module.Chats = original_chats
+
+        self.assertEqual([message["id"] for message in messages], ["m1", "m2", "m3"])
+        self.assertEqual(messages[2]["role"], "tool")
+
+    def test_outlet_unfolds_compact_tool_details_view(self):
+        compact_messages = [
+            {"role": "user", "content": "U1"},
+            {
+                "role": "assistant",
+                "content": (
+                    '<details type="tool_calls" done="true" id="call-1" '
+                    'name="search_notes" arguments="&quot;{}&quot;" '
+                    f'result="&quot;{"x" * 3000}&quot;">\n'
+                    "<summary>Tool Executed</summary>\n"
+                    "</details>\n"
+                    "Answer 1"
+                ),
+            },
+            {"role": "user", "content": "U2"},
+            {
+                "role": "assistant",
+                "content": (
+                    '<details type="tool_calls" done="true" id="call-2" '
+                    'name="merge_notes" arguments="&quot;{}&quot;" '
+                    f'result="&quot;{"y" * 4000}&quot;">\n'
+                    "<summary>Tool Executed</summary>\n"
+                    "</details>\n"
+                    "Answer 2"
+                ),
+            },
+        ]
+
+        async def fake_user_context(__user__, __event_call__):
+            return {"user_language": "en-US"}
+
+        async def noop_log(*args, **kwargs):
+            return None
+
+        create_task_called = False
+
+        def fake_create_task(coro):
+            nonlocal create_task_called
+            create_task_called = True
+            coro.close()
+            return None
+
+        self.filter._get_user_context = fake_user_context
+        self.filter._get_chat_context = lambda body, metadata=None: {
+            "chat_id": "chat-1",
+            "message_id": "msg-1",
+        }
+        self.filter._should_skip_compression = lambda body, model: False
+        self.filter._log = noop_log
+
+        # Set a low threshold so the task is guaranteed to trigger
+        self.filter.valves.compression_threshold_tokens = 100
+
+        original_create_task = asyncio.create_task
+        asyncio.create_task = fake_create_task
+        try:
+            asyncio.run(
+                self.filter.outlet(
+                    {"model": "test-model", "messages": compact_messages},
+                    __event_call__=None,
+                )
+            )
+        finally:
+            asyncio.create_task = original_create_task
+
+        self.assertTrue(create_task_called)
+
+    def test_summary_save_progress_matches_truncated_input(self):
+        self.filter.valves.keep_first = 1
+        self.filter.valves.keep_last = 1
+        self.filter.valves.summary_model = "fake-summary-model"
+        self.filter.valves.summary_model_max_context = 0
+
+        captured = {}
+        events = []
+
+        async def mock_emitter(event):
+            events.append(event)
+
+        async def mock_summary_llm(
+            previous_summary,
+            new_conversation_text,
+            body,
+            user_data,
+            __event_call__,
+        ):
+            return "new summary"
+
+        def mock_save_summary(chat_id, summary, compressed_count):
+            captured["chat_id"] = chat_id
+            captured["summary"] = summary
+            captured["compressed_count"] = compressed_count
+
+        async def noop_log(*args, **kwargs):
+            return None
+
+        self.filter._log = noop_log
+        self.filter._call_summary_llm = mock_summary_llm
+        self.filter._save_summary = mock_save_summary
+        self.filter._get_model_thresholds = lambda model_id: {
+            "max_context_tokens": 3500
+        }
+        self.filter._calculate_messages_tokens = lambda messages: len(messages) * 1000
+        self.filter._count_tokens = lambda text: 1000
+
+        messages = [
+            {"role": "system", "content": "System prompt"},
+            {"role": "user", "content": "Question 1"},
+            {"role": "assistant", "content": "Answer 1"},
+            {"role": "user", "content": "Question 2"},
+            {"role": "assistant", "content": "Answer 2"},
+            {"role": "user", "content": "Question 3"},
+        ]
+
+        asyncio.run(
+            self.filter._generate_summary_async(
+                messages=messages,
+                chat_id="chat-1",
+                body={"model": "fake-summary-model"},
+                user_data={"id": "user-1"},
+                target_compressed_count=5,
+                lang="en-US",
+                __event_emitter__=mock_emitter,
+                __event_call__=None,
+            )
+        )
+
+        self.assertEqual(captured["chat_id"], "chat-1")
+        self.assertEqual(captured["summary"], "new summary")
+        self.assertEqual(captured["compressed_count"], 2)
+        self.assertTrue(any(event["type"] == "status" for event in events))
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/plugins/filters/async-context-compression/v1.4.1.md
+++ b/plugins/filters/async-context-compression/v1.4.1.md
@@ -0,0 +1,17 @@
+[![](https://img.shields.io/badge/OpenWebUI%20Community-Get%20Plugin-blue?style=for-the-badge)](https://openwebui.com/f/fujie/async_context_compression)
+
+## Overview
+
+This release addresses the critical progress coordinate drift issue in OpenWebUI's `outlet` phase, ensuring robust summarization for long tool-calling conversations.
+
+[View on GitHub](https://github.com/Fu-Jie/openwebui-extensions/blob/main/plugins/filters/async-context-compression/README.md)
+
+- **New Features**
+  - **Reverse-Unfolding Mechanism**: Accurately reconstructs the expanded native tool-calling sequence during the outlet phase to permanently fix coordinate drift and missing summaries for long tool-based conversations.
+  - **Safer Tool Trimming**: Refactored `enable_tool_output_trimming` to strictly use atomic block groups for safe trimming, completely preventing JSON payload corruption.
+
+- **Bug Fixes**
+  - Fixed coordinate drift where `compressed_message_count` could lose track due to OpenWebUI's frontend view truncating tool calls.
+
+- **Related Issues**
+  - Closes #56