release: async-context-compression v1.2.0 and markdown-normalizer v1.2.4

2026-01-19 20:11:55 +08:00
parent 014a5a9d1f
commit 0d8c4e048e
15 changed files with 1370 additions and 88 deletions
--- a/docs/plugins/filters/async-context-compression.md
+++ b/docs/plugins/filters/async-context-compression.md
@@ -1,7 +1,7 @@
 # Async Context Compression
 <span class="category-badge filter">Filter</span>
-<span class="version-badge">v1.1.3</span>
+<span class="version-badge">v1.2.0</span>
 Reduces token consumption in long conversations through intelligent summarization while maintaining conversational coherence.
@@ -34,6 +34,10 @@ This is especially useful for:
 - :material-check-all: **Open WebUI v0.7.x Compatibility**: Dynamic DB session handling
 - :material-account-convert: **Improved Compatibility**: Summary role changed to `assistant`
 - :material-shield-check: **Enhanced Stability**: Resolved race conditions in state management
 - :material-ruler: **Preflight Context Check**: Validates context fit before sending
 - :material-format-align-justify: **Structure-Aware Trimming**: Preserves document structure
 - :material-content-cut: **Native Tool Output Trimming**: Trims verbose tool outputs (Note: Non-native tool outputs are not fully injected into context)
 - :material-chart-bar: **Detailed Token Logging**: Granular token breakdown
 ---
@@ -64,10 +68,13 @@ graph TD
 | Option | Type | Default | Description |
 |--------|------|---------|-------------|
-| `token_threshold` | integer | `4000` | Trigger compression above this token count |
+| `compression_threshold_tokens` | integer | `64000` | Trigger compression above this token count |
-| `preserve_recent` | integer | `5` | Number of recent messages to keep uncompressed |
+| `max_context_tokens` | integer | `128000` | Hard limit for context |
-| `summary_model` | string | `"auto"` | Model to use for summarization |
+| `keep_first` | integer | `1` | Always keep the first N messages |
-| `compression_ratio` | float | `0.3` | Target compression ratio |
+| `keep_last` | integer | `6` | Always keep the last N messages |
 | `summary_model` | string | `None` | Model to use for summarization |
 | `max_summary_tokens` | integer | `16384` | Maximum tokens for the summary |
 | `enable_tool_output_trimming` | boolean | `false` | Enable trimming of large tool outputs |
 ---
--- a/docs/plugins/filters/async-context-compression.zh.md
+++ b/docs/plugins/filters/async-context-compression.zh.md
@@ -1,7 +1,7 @@
 # Async Context Compression（异步上下文压缩）
 <span class="category-badge filter">Filter</span>
-<span class="version-badge">v1.1.3</span>
+<span class="version-badge">v1.2.0</span>
 通过智能摘要减少长对话的 token 消耗，同时保持对话连贯。
@@ -34,6 +34,10 @@ Async Context Compression 过滤器通过以下方式帮助管理长对话的 to
 - :material-check-all: **Open WebUI v0.7.x 兼容性**：动态数据库会话处理
 - :material-account-convert: **兼容性提升**：摘要角色改为 `assistant`
 - :material-shield-check: **稳定性增强**：解决状态管理竞态条件
 - :material-ruler: **预检上下文检查**：发送前验证上下文是否超限
 - :material-format-align-justify: **结构感知裁剪**：保留文档结构的智能裁剪
 - :material-content-cut: **原生工具输出裁剪**：自动裁剪冗长的工具输出（注意：非原生工具调用输出不会完整注入上下文）
 - :material-chart-bar: **详细 Token 日志**：提供细粒度的 Token 统计
 ---
@@ -64,10 +68,13 @@ graph TD
 | 选项 | 类型 | 默认值 | 说明 |
 |--------|------|---------|-------------|
-| `token_threshold` | integer | `4000` | 超过该 token 数触发压缩 |
+| `compression_threshold_tokens` | integer | `64000` | 超过该 token 数触发压缩 |
-| `preserve_recent` | integer | `5` | 保留不压缩的最近消息数量 |
+| `max_context_tokens` | integer | `128000` | 上下文硬性上限 |
-| `summary_model` | string | `"auto"` | 用于摘要的模型 |
+| `keep_first` | integer | `1` | 始终保留的前 N 条消息 |
-| `compression_ratio` | float | `0.3` | 目标压缩比例 |
+| `keep_last` | integer | `6` | 始终保留的后 N 条消息 |
 | `summary_model` | string | `None` | 用于摘要的模型 |
 | `max_summary_tokens` | integer | `16384` | 摘要的最大 token 数 |
 | `enable_tool_output_trimming` | boolean | `false` | 启用长工具输出裁剪 |
 ---
--- a/docs/plugins/filters/index.md
+++ b/docs/plugins/filters/index.md
@@ -44,7 +44,7 @@ Filters act as middleware in the message pipeline:
    Fixes common Markdown formatting issues in LLM outputs, including Mermaid syntax, code blocks, and LaTeX formulas.
-    **Version:** 1.2.3
+    **Version:** 1.2.4
    [:octicons-arrow-right-24: Documentation](markdown_normalizer.md)
--- a/docs/plugins/filters/index.zh.md
+++ b/docs/plugins/filters/index.zh.md
@@ -44,7 +44,7 @@ Filter 充当消息管线中的中间件：
    修复 LLM 输出中常见的 Markdown 格式问题，包括 Mermaid 语法、代码块和 LaTeX 公式。
-    **版本：** 1.2.3
+    **版本：** 1.2.4
    [:octicons-arrow-right-24: 查看文档](markdown_normalizer.zh.md)
--- a/docs/plugins/filters/markdown_normalizer.md
+++ b/docs/plugins/filters/markdown_normalizer.md
@@ -51,6 +51,10 @@ A content normalizer filter for Open WebUI that fixes common Markdown formatting
 ## Changelog
 ### v1.2.4
 * **Documentation Updates**: Synchronized version numbers across all documentation and code files.
 ### v1.2.3
 * **List Marker Protection Enhancement**: Fixed a bug where list markers (`*`) followed by plain text and emphasis were having their spaces incorrectly stripped (e.g., `*   U16 forward` became `*U16 forward`).
--- a/docs/plugins/filters/markdown_normalizer.zh.md
+++ b/docs/plugins/filters/markdown_normalizer.zh.md
@@ -51,6 +51,10 @@
 ## 更新日志
 ### v1.2.4
 * **文档更新**: 同步了所有文档和代码文件的版本号。
 ### v1.2.3
 * **列表标记保护增强**: 修复了列表标记 (`*`) 后跟普通文本和强调标记时，空格被错误剥离的问题（例如 `*   U16 前锋` 变成 `*U16 前锋`）。
--- a/plugins/actions/infographic/infographic.png
+++ b/plugins/actions/infographic/infographic.png
--- a/plugins/filters/async-context-compression/README.md
+++ b/plugins/filters/async-context-compression/README.md
@@ -1,9 +1,19 @@
 # Async Context Compression Filter
-**Author:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **Version:** 1.1.3 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **License:** MIT
+**Author:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **Version:** 1.2.0 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **License:** MIT
 This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.
 ## What's new in 1.2.0
 - **Preflight Context Check**: Before sending to the model, validates that total tokens fit within the context window. Automatically trims or drops oldest messages if exceeded.
 - **Structure-Aware Assistant Trimming**: When context exceeds the limit, long AI responses are intelligently collapsed while preserving their structure (headers H1-H6, first line, last line).
 - **Native Tool Output Trimming**: Detects and trims native tool outputs (`function_calling: "native"`), extracting only the final answer. Enable via `enable_tool_output_trimming`. **Note**: Non-native tool outputs are not fully injected into context.
 - **Consolidated Status Notifications**: Unified "Context Usage" and "Context Summary Updated" notifications with appended warnings (e.g., `| ⚠️ High Usage`) for clearer feedback.
 - **Context Usage Warning**: Emits a warning notification when context usage exceeds 90%.
 - **Enhanced Header Detection**: Optimized regex (`^#{1,6}\s+`) to avoid false positives like `#hashtag`.
 - **Detailed Token Logging**: Logs now show token breakdown for System, Head, Summary, and Tail sections with total.
 ## What's new in 1.1.3
 - **Improved Compatibility**: Changed summary injection role from `user` to `assistant` for better compatibility across different LLMs.
 - **Enhanced Stability**: Fixed a race condition in state management that could cause "inlet state not found" warnings in high-concurrency scenarios.
@@ -31,6 +41,10 @@ This filter reduces token consumption in long conversations through intelligent
 - ✅ Persistent storage via Open WebUI's shared database connection (PostgreSQL, SQLite, etc.).
 - ✅ Flexible retention policy to keep the first and last N messages.
 - ✅ Smart injection of historical summaries back into the context.
 - ✅ Structure-aware trimming that preserves document structure (headers, intro, conclusion).
 - ✅ Native tool output trimming for cleaner context when using function calling.
 - ✅ Real-time context usage monitoring with warning notifications (>90%).
 - ✅ Detailed token logging for precise debugging and optimization.
 ---
@@ -64,6 +78,7 @@ It is recommended to keep this filter early in the chain so it runs before filte
 | `max_summary_tokens`           | `4000`   | Maximum tokens for the generated summary.                                                                                                                             |
 | `summary_temperature`          | `0.3`    | Randomness for summary generation. Lower is more deterministic.                                                                                                       |
 | `model_thresholds`             | `{}`     | Per-model overrides for `compression_threshold_tokens` and `max_context_tokens` (useful for mixed models).                                                            |
 | `enable_tool_output_trimming`  | `false`  | When enabled and `function_calling: "native"` is active, trims verbose tool outputs to extract only the final answer.                                                 |
 | `debug_mode`                   | `true`   | Log verbose debug info. Set to `false` in production.                                                                                                                 |
 | `show_debug_log`               | `false`  | Print debug logs to browser console (F12). Useful for frontend debugging.                                                                                             |
--- a/plugins/filters/async-context-compression/README_CN.md
+++ b/plugins/filters/async-context-compression/README_CN.md
@@ -1,11 +1,21 @@
 # 异步上下文压缩过滤器
-**作者:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **版本:** 1.1.3 | **项目:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **许可证:** MIT
+**作者:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **版本:** 1.2.0 | **项目:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **许可证:** MIT
 > **重要提示**：为了确保所有过滤器的可维护性和易用性，每个过滤器都应附带清晰、完整的文档，以确保其功能、配置和使用方法得到充分说明。
 本过滤器通过智能摘要和消息压缩技术，在保持对话连贯性的同时，显著降低长对话的 Token 消耗。
 ## 1.2.0 版本更新
 - **预检上下文检查 (Preflight Context Check)**: 在发送给模型之前，验证总 Token 是否符合上下文窗口。如果超出，自动裁剪或丢弃最旧的消息。
 - **结构感知助手裁剪 (Structure-Aware Assistant Trimming)**: 当上下文超出限制时，智能折叠过长的 AI 回复，同时保留其结构（标题 H1-H6、首行、尾行）。
 - **原生工具输出裁剪 (Native Tool Output Trimming)**: 检测并裁剪原生工具输出 (`function_calling: "native"`)，仅提取最终答案。通过 `enable_tool_output_trimming` 启用。**注意**：非原生工具调用输出不会完整注入上下文。
 - **统一状态通知**: 统一了“上下文使用情况”和“上下文摘要更新”的通知，并附加警告（例如 `| ⚠️ 高负载`），反馈更清晰。
 - **上下文使用警告**: 当上下文使用率超过 90% 时发出警告通知。
 - **增强的标题检测**: 优化了正则表达式 (`^#{1,6}\s+`) 以避免误判（如 `#hashtag`）。
 - **详细 Token 日志**: 日志现在显示 System、Head、Summary 和 Tail 部分的 Token 细分及总计。
 ## 1.1.3 版本更新
 - **兼容性提升**: 将摘要注入角色从 `user` 改为 `assistant`，以提高在不同 LLM 之间的兼容性。
 - **稳定性增强**: 修复了状态管理中的竞态条件，解决了高并发场景下可能出现的“无法获取 inlet 状态”警告。
@@ -33,6 +43,10 @@
 - ✅ **持久化存储**: 复用 Open WebUI 共享数据库连接，自动支持 PostgreSQL/SQLite 等。
 - ✅ **灵活保留策略**: 可配置保留对话头部和尾部消息，确保关键信息连贯。
 - ✅ **智能注入**: 将历史摘要智能注入到新上下文中。
 - ✅ **结构感知裁剪**: 智能折叠过长消息，保留文档骨架（标题、首尾）。
 - ✅ **原生工具输出裁剪**: 支持裁剪冗长的工具调用输出。
 - ✅ **实时监控**: 实时监控上下文使用情况，超过 90% 发出警告。
 - ✅ **详细日志**: 提供精确的 Token 统计日志，便于调试。
 详细的工作原理和流程请参考 [工作流程指南](WORKFLOW_GUIDE_CN.md)。
@@ -100,15 +114,12 @@
 }
 ```
-#### `debug_mode`
+| 参数                           | 默认值   | 描述                                                                                                                                    |
-
+| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------- |
- **默认值**: `true`
+| `enable_tool_output_trimming`  | `false`  | 启用时，若 `function_calling: "native"` 激活，将裁剪冗长的工具输出以仅提取最终答案。                                                        |
- **描述**: 是否在 Open WebUI 的控制台日志中打印详细的调试信息（如 Token 计数、压缩进度、数据库操作等）。生产环境建议设为 `false`。
+| `debug_mode`                   | `true`   | 是否在 Open WebUI 的控制台日志中打印详细的调试信息（如 Token 计数、压缩进度、数据库操作等）。生产环境建议设为 `false`。 |
-
+| `show_debug_log`               | `false`  | 是否在浏览器控制台 (F12) 打印调试日志。便于前端调试。                                                                   |
-#### `show_debug_log`
+| `show_token_usage_status`      | `true`   | 是否在对话结束时显示 Token 使用情况的状态通知。                                                                         |
 - **默认值**: `false`
 - **描述**: 是否在浏览器控制台 (F12) 打印调试日志。便于前端调试。
 ---
--- a/plugins/filters/async-context-compression/async_context_compression.py
+++ b/plugins/filters/async-context-compression/async_context_compression.py
@@ -5,10 +5,20 @@ author: Fu-Jie
 author_url: https://github.com/Fu-Jie/awesome-openwebui
 funding_url: https://github.com/open-webui
 description: Reduces token consumption in long conversations while maintaining coherence through intelligent summarization and message compression.
-version: 1.1.3
+version: 1.2.0
 openwebui_id: b1655bc8-6de9-4cad-8cb5-a6f7829a02ce
 license: MIT
 ═══════════════════════════════════════════════════════════════════════════════
 📌 What's new in 1.2.0
 ═══════════════════════════════════════════════════════════════════════════════
  ✅ Preflight Context Check: Validates context fit before sending to model.
  ✅ Structure-Aware Trimming: Collapses long AI responses while keeping H1-H6, intro, and conclusion.
  ✅ Native Tool Output Trimming: Cleaner context when using function calling. (Note: Non-native tool outputs are not fully injected into context)
  ✅ Context Usage Warning: Notification when usage exceeds 90%.
  ✅ Detailed Token Logging: Granular breakdown of System, Head, Summary, and Tail tokens.
 ═══════════════════════════════════════════════════════════════════════════════
 📌 Overview
 ═══════════════════════════════════════════════════════════════════════════════
@@ -21,6 +31,8 @@ Core Features:
  ✅ Persistent storage with database support (PostgreSQL and SQLite)
  ✅ Flexible retention policy (configurable to keep first and last N messages)
  ✅ Smart summary injection to maintain context
  ✅ Structure-aware trimming to preserve document skeleton
  ✅ Native tool output trimming for function calling support
 ═══════════════════════════════════════════════════════════════════════════════
 🔄 Workflow
@@ -110,6 +122,10 @@ model_thresholds
  Description: Threshold override configuration for specific models.
  Example: {"gpt-4": {"compression_threshold_tokens": 8000, "max_context_tokens": 32000}}
 enable_tool_output_trimming
  Default: false
  Description: When enabled and `function_calling: "native"` is active, trims verbose tool outputs to extract only the final answer.
 keep_first
  Default: 1
  Description: Always keep the first N messages of the conversation. Set to 0 to disable. The first message often contains important system prompts.
@@ -245,6 +261,7 @@ Solution:
 from pydantic import BaseModel, Field, model_validator
 from typing import Optional, Dict, Any, List, Union, Callable, Awaitable
 import re
 import asyncio
 import json
 import hashlib
@@ -254,6 +271,7 @@ import contextlib
 # Open WebUI built-in imports
 from open_webui.utils.chat import generate_chat_completion
 from open_webui.models.users import Users
 from open_webui.models.models import Models
 from fastapi.requests import Request
 from open_webui.main import app as webui_app
@@ -370,10 +388,6 @@ class Filter:
        self.valves = self.Valves()
        self._owui_db = owui_db
        self._db_engine = owui_engine
        self._db_engine = owui_engine
        self._fallback_session_factory = (
            sessionmaker(bind=self._db_engine) if self._db_engine else None
        )
        self._fallback_session_factory = (
            sessionmaker(bind=self._db_engine) if self._db_engine else None
        )
@@ -494,7 +508,14 @@ class Filter:
            default=True, description="Enable detailed logging for debugging."
        )
        show_debug_log: bool = Field(
-            default=False, description="Print debug logs to browser console (F12)"
+            default=False, description="Show debug logs in the frontend console"
        )
        show_token_usage_status: bool = Field(
            default=True, description="Show token usage status notification"
        )
        enable_tool_output_trimming: bool = Field(
            default=False,
            description="Enable trimming of large tool outputs (only works with native function calling).",
        )
    def _save_summary(self, chat_id: str, summary: str, compressed_count: int):
@@ -758,6 +779,8 @@ class Filter:
        body: dict,
        __user__: Optional[dict] = None,
        __metadata__: dict = None,
        __request__: Request = None,
        __model__: dict = None,
        __event_emitter__: Callable[[Any], Awaitable[None]] = None,
        __event_call__: Callable[[Any], Awaitable[None]] = None,
    ) -> dict:
@@ -765,10 +788,211 @@ class Filter:
        Executed before sending to the LLM.
        Compression Strategy: Only responsible for injecting existing summaries, no Token calculation.
        """
        messages = body.get("messages", [])
        # --- Native Tool Output Trimming (Opt-in, only for native function calling) ---
        metadata = body.get("metadata", {})
        is_native_func_calling = metadata.get("function_calling") == "native"
        if self.valves.enable_tool_output_trimming and is_native_func_calling:
            trimmed_count = 0
            for msg in messages:
                content = msg.get("content", "")
                if not isinstance(content, str):
                    continue
                role = msg.get("role")
                # Only process assistant messages with native tool outputs
                if role == "assistant":
                    # Detect tool output markers in assistant content
                    if "tool_call_id:" in content or (
                        content.startswith('"') and "\\&quot;" in content
                    ):
                        # Always trim tool outputs when enabled
                        if self.valves.show_debug_log and __event_call__:
                            await self._log(
                                f"[Inlet] 🔍 Native tool output detected in assistant message.",
                                event_call=__event_call__,
                            )
                        # Extract the final answer (after last tool call metadata)
                        # Pattern: Matches escaped JSON strings like ""&quot;...&quot;"" followed by newlines
                        # We look for the last occurrence of such a pattern and take everything after it
                        # 1. Try matching the specific OpenWebUI tool output format: ""&quot;...&quot;""
                        # This regex finds the last end-quote of a tool output block
                        tool_output_pattern = r'""&quot;.*?&quot;""\s*'
                        # Find all matches
                        matches = list(
                            re.finditer(tool_output_pattern, content, re.DOTALL)
                        )
                        if matches:
                            # Get the end position of the last match
                            last_match_end = matches[-1].end()
                            # Everything after the last tool output is the final answer
                            final_answer = content[last_match_end:].strip()
                            if final_answer:
                                msg["content"] = (
                                    f"... [Tool outputs trimmed]\n{final_answer}"
                                )
                                trimmed_count += 1
                        else:
                            # Fallback: Try splitting on "Arguments:" if the new format isn't found
                            # (Preserving backward compatibility or different model behaviors)
                            parts = re.split(r"(?:Arguments:\s*\{[^}]+\})\n+", content)
                            if len(parts) > 1:
                                final_answer = parts[-1].strip()
                                if final_answer:
                                    msg["content"] = (
                                        f"... [Tool outputs trimmed]\n{final_answer}"
                                    )
                                    trimmed_count += 1
            if trimmed_count > 0 and self.valves.show_debug_log and __event_call__:
                await self._log(
                    f"[Inlet] ✂️ Trimmed {trimmed_count} tool output message(s).",
                    event_call=__event_call__,
                )
        chat_ctx = self._get_chat_context(body, __metadata__)
        chat_id = chat_ctx["chat_id"]
        # Extract system prompt for accurate token calculation
        # 1. For custom models: check DB (Models.get_model_by_id)
        # 2. For base models: check messages for role='system'
        system_prompt_content = None
        # Try to get from DB (custom model)
        try:
            model_id = body.get("model")
            if model_id:
                if self.valves.show_debug_log and __event_call__:
                    await self._log(
                        f"[Inlet] 🔍 Attempting DB lookup for model: {model_id}",
                        event_call=__event_call__,
                    )
                # Clean model ID if needed (though get_model_by_id usually expects the full ID)
                model_obj = Models.get_model_by_id(model_id)
                if model_obj:
                    if self.valves.show_debug_log and __event_call__:
                        await self._log(
                            f"[Inlet] ✅ Model found in DB: {model_obj.name} (ID: {model_obj.id})",
                            event_call=__event_call__,
                        )
                    if model_obj.params:
                        try:
                            params = model_obj.params
                            # Handle case where params is a JSON string
                            if isinstance(params, str):
                                params = json.loads(params)
                            # Handle dict or Pydantic object
                            if isinstance(params, dict):
                                system_prompt_content = params.get("system")
                            else:
                                # Assume Pydantic model or object
                                system_prompt_content = getattr(params, "system", None)
                            if system_prompt_content:
                                if self.valves.show_debug_log and __event_call__:
                                    await self._log(
                                        f"[Inlet] 📝 System prompt found in DB params ({len(system_prompt_content)} chars)",
                                        event_call=__event_call__,
                                    )
                            else:
                                if self.valves.show_debug_log and __event_call__:
                                    await self._log(
                                        f"[Inlet] ⚠️ 'system' key missing in model params",
                                        event_call=__event_call__,
                                    )
                        except Exception as e:
                            if self.valves.show_debug_log and __event_call__:
                                await self._log(
                                    f"[Inlet] ❌ Failed to parse model params: {e}",
                                    type="error",
                                    event_call=__event_call__,
                                )
                    else:
                        if self.valves.show_debug_log and __event_call__:
                            await self._log(
                                f"[Inlet] ⚠️ Model params are empty",
                                event_call=__event_call__,
                            )
                else:
                    if self.valves.show_debug_log and __event_call__:
                        await self._log(
                            f"[Inlet] ❌ Model NOT found in DB",
                            type="warning",
                            event_call=__event_call__,
                        )
        except Exception as e:
            if self.valves.show_debug_log and __event_call__:
                await self._log(
                    f"[Inlet] ❌ Error fetching system prompt from DB: {e}",
                    type="error",
                    event_call=__event_call__,
                )
            if self.valves.debug_mode:
                print(f"[Inlet] Error fetching system prompt from DB: {e}")
        # Fall back to checking messages (base model or already included)
        if not system_prompt_content:
            for msg in messages:
                if msg.get("role") == "system":
                    system_prompt_content = msg.get("content", "")
                    break
        # Build system_prompt_msg for token calculation
        system_prompt_msg = None
        if system_prompt_content:
            system_prompt_msg = {"role": "system", "content": system_prompt_content}
            if self.valves.debug_mode:
                print(
                    f"[Inlet] Found system prompt ({len(system_prompt_content)} chars). Including in budget."
                )
        # Log message statistics (Moved here to include extracted system prompt)
        if self.valves.show_debug_log and __event_call__:
            try:
                msg_stats = {
                    "user": 0,
                    "assistant": 0,
                    "system": 0,
                    "total": len(messages),
                }
                for msg in messages:
                    role = msg.get("role", "unknown")
                    if role in msg_stats:
                        msg_stats[role] += 1
                # If system prompt was extracted from DB/Model but not in messages, count it
                if system_prompt_content:
                    # Check if it's already counted (i.e., was in messages)
                    is_in_messages = any(m.get("role") == "system" for m in messages)
                    if not is_in_messages:
                        msg_stats["system"] += 1
                        msg_stats["total"] += 1
                stats_str = f"Total: {msg_stats['total']} | User: {msg_stats['user']} | Assistant: {msg_stats['assistant']} | System: {msg_stats['system']}"
                await self._log(
                    f"[Inlet] Message Stats: {stats_str}", event_call=__event_call__
                )
            except Exception as e:
                print(f"[Inlet] Error logging message stats: {e}")
        if not chat_id:
            await self._log(
                "[Inlet] ❌ Missing chat_id in metadata, skipping compression",
@@ -787,10 +1011,6 @@ class Filter:
        # Target is to compress up to the (total - keep_last) message
        target_compressed_count = max(0, len(messages) - self.valves.keep_last)
        # Record the target compression progress for the original messages, for use in outlet
        # Target is to compress up to the (total - keep_last) message
        target_compressed_count = max(0, len(messages) - self.valves.keep_last)
        await self._log(
            f"[Inlet] Recorded target compression progress: {target_compressed_count}",
            event_call=__event_call__,
@@ -799,6 +1019,14 @@ class Filter:
        # Load summary record
        summary_record = await asyncio.to_thread(self._load_summary_record, chat_id)
        # Calculate effective_keep_first to ensure all system messages are protected
        last_system_index = -1
        for i, msg in enumerate(messages):
            if msg.get("role") == "system":
                last_system_index = i
        effective_keep_first = max(self.valves.keep_first, last_system_index + 1)
        final_messages = []
        if summary_record:
@@ -812,8 +1040,8 @@ class Filter:
            # 1. Head messages (Keep First)
            head_messages = []
-            if self.valves.keep_first > 0:
+            if effective_keep_first > 0:
-                head_messages = messages[: self.valves.keep_first]
+                head_messages = messages[:effective_keep_first]
            # 2. Summary message (Inserted as User message)
            summary_content = (
@@ -826,29 +1054,215 @@ class Filter:
            # 3. Tail messages (Tail) - All messages starting from the last compression point
            # Note: Must ensure head messages are not duplicated
-            start_index = max(compressed_count, self.valves.keep_first)
+            start_index = max(compressed_count, effective_keep_first)
            tail_messages = messages[start_index:]
-            final_messages = head_messages + [summary_msg] + tail_messages
+            if self.valves.show_debug_log and __event_call__:
                tail_preview = [
                    f"{i + start_index}: [{m.get('role')}] {m.get('content', '')[:30]}..."
                    for i, m in enumerate(tail_messages)
                ]
                await self._log(
                    f"[Inlet] 📜 Tail Messages (Start Index: {start_index}): {tail_preview}",
                    event_call=__event_call__,
                )
            # --- Preflight Check & Budgeting (Simplified) ---
            # Assemble candidate messages (for output)
            candidate_messages = head_messages + [summary_msg] + tail_messages
            # Prepare messages for token calculation (include system prompt if missing)
            calc_messages = candidate_messages
            if system_prompt_msg:
                # Check if system prompt is already in head_messages
                is_in_head = any(m.get("role") == "system" for m in head_messages)
                if not is_in_head:
                    calc_messages = [system_prompt_msg] + candidate_messages
            # Get max context limit
            model = self._clean_model_id(body.get("model"))
            thresholds = self._get_model_thresholds(model)
            max_context_tokens = thresholds.get(
                "max_context_tokens", self.valves.max_context_tokens
            )
            # Calculate total tokens
            total_tokens = await asyncio.to_thread(
                self._calculate_messages_tokens, calc_messages
            )
            # Preflight Check Log
            await self._log(
                f"[Inlet] 🔎 Preflight Check: {total_tokens}t / {max_context_tokens}t ({(total_tokens/max_context_tokens*100):.1f}%)",
                event_call=__event_call__,
            )
            # If over budget, reduce history (Keep Last)
            if total_tokens > max_context_tokens:
                await self._log(
                    f"[Inlet] ⚠️ Candidate prompt ({total_tokens} Tokens) exceeds limit ({max_context_tokens}). Reducing history...",
                    type="warning",
                    event_call=__event_call__,
                )
                # Dynamically remove messages from the start of tail_messages
                # Always try to keep at least the last message (usually user input)
                while total_tokens > max_context_tokens and len(tail_messages) > 1:
                    # Strategy 1: Structure-Aware Assistant Trimming
                    # Retain: Headers (#), First Line, Last Line. Collapse the rest.
                    target_msg = None
                    target_idx = -1
                    # Find the oldest assistant message that is long and not yet trimmed
                    for i, msg in enumerate(tail_messages):
                        # Skip the last message (usually user input, protect it)
                        if i == len(tail_messages) - 1:
                            break
                        if msg.get("role") == "assistant":
                            content = str(msg.get("content", ""))
                            is_trimmed = msg.get("metadata", {}).get(
                                "is_trimmed", False
                            )
                            # Only target messages that are reasonably long (> 200 chars)
                            if len(content) > 200 and not is_trimmed:
                                target_msg = msg
                                target_idx = i
                                break
                    # If found a suitable assistant message, apply structure-aware trimming
                    if target_msg:
                        content = str(target_msg.get("content", ""))
                        lines = content.split("\n")
                        kept_lines = []
                        # Logic: Keep headers, first non-empty line, last non-empty line
                        first_line_found = False
                        last_line_idx = -1
                        # Find last non-empty line index
                        for idx in range(len(lines) - 1, -1, -1):
                            if lines[idx].strip():
                                last_line_idx = idx
                                break
                        for idx, line in enumerate(lines):
                            stripped = line.strip()
                            if not stripped:
                                continue
                            # Keep headers (H1-H6, requires space after #)
                            if re.match(r"^#{1,6}\s+", stripped):
                                kept_lines.append(line)
                                continue
                            # Keep first non-empty line
                            if not first_line_found:
                                kept_lines.append(line)
                                first_line_found = True
                                # Add placeholder if there's more content coming
                                if idx < last_line_idx:
                                    kept_lines.append("\n... [Content collapsed] ...\n")
                                continue
                            # Keep last non-empty line
                            if idx == last_line_idx:
                                kept_lines.append(line)
                                continue
                        # Update message content
                        new_content = "\n".join(kept_lines)
                        # Safety check: If trimming didn't save much (e.g. mostly headers), force drop
                        if len(new_content) > len(content) * 0.8:
                            # Fallback to drop if structure preservation is too verbose
                            pass
                        else:
                            target_msg["content"] = new_content
                            if "metadata" not in target_msg:
                                target_msg["metadata"] = {}
                            target_msg["metadata"]["is_trimmed"] = True
                            # Calculate token reduction
                            old_tokens = self._count_tokens(content)
                            new_tokens = self._count_tokens(target_msg["content"])
                            diff = old_tokens - new_tokens
                            total_tokens -= diff
                            if self.valves.show_debug_log and __event_call__:
                                await self._log(
                                    f"[Inlet] 📉 Structure-trimmed Assistant message. Saved: {diff} tokens.",
                                    event_call=__event_call__,
                                )
                            continue
                    # Strategy 2: Fallback - Drop Oldest Message Entirely (FIFO)
                    # (User requested to remove progressive trimming for other cases)
                    dropped = tail_messages.pop(0)
                    dropped_tokens = self._count_tokens(str(dropped.get("content", "")))
                    total_tokens -= dropped_tokens
                    if self.valves.show_debug_log and __event_call__:
                        await self._log(
                            f"[Inlet] 🗑️ Dropped message from history to fit context. Role: {dropped.get('role')}, Tokens: {dropped_tokens}",
                            event_call=__event_call__,
                        )
                # Re-assemble
                candidate_messages = head_messages + [summary_msg] + tail_messages
                await self._log(
                    f"[Inlet] ✂️ History reduced. New total: {total_tokens} Tokens (Tail size: {len(tail_messages)})",
                    event_call=__event_call__,
                )
            final_messages = candidate_messages
            # Calculate detailed token stats for logging
            system_tokens = (
                self._count_tokens(system_prompt_msg.get("content", ""))
                if system_prompt_msg
                else 0
            )
            head_tokens = self._calculate_messages_tokens(head_messages)
            summary_tokens = self._count_tokens(summary_content)
            tail_tokens = self._calculate_messages_tokens(tail_messages)
            system_info = (
                f"System({system_tokens}t)" if system_prompt_msg else "System(0t)"
            )
            total_section_tokens = (
                system_tokens + head_tokens + summary_tokens + tail_tokens
            )
            await self._log(
                f"[Inlet] Applied summary: {system_info} + Head({len(head_messages)} msg, {head_tokens}t) + Summary({summary_tokens}t) + Tail({len(tail_messages)} msg, {tail_tokens}t) = Total({total_section_tokens}t)",
                type="success",
                event_call=__event_call__,
            )
            # Prepare status message (Context Usage format)
            if max_context_tokens > 0:
                usage_ratio = total_section_tokens / max_context_tokens
                status_msg = f"Context Usage (Estimated): {total_section_tokens} / {max_context_tokens} Tokens ({usage_ratio*100:.1f}%)"
                if usage_ratio > 0.9:
                    status_msg += " | ⚠️ High Usage"
            else:
                status_msg = f"Loaded historical summary (Hidden {compressed_count} historical messages)"
            # Send status notification
            if __event_emitter__:
                await __event_emitter__(
                    {
                        "type": "status",
                        "data": {
-                            "description": f"Loaded historical summary (Hidden {compressed_count} historical messages)",
+                            "description": status_msg,
                            "done": True,
                        },
                    }
                )
            await self._log(
                f"[Inlet] Applied summary: Head({len(head_messages)}) + Summary + Tail({len(tail_messages)})",
                type="success",
                event_call=__event_call__,
            )
            # Emit debug log to frontend (Keep the structured log as well)
            await self._emit_debug_log(
                __event_call__,
@@ -861,8 +1275,71 @@ class Filter:
            )
        else:
            # No summary, use original messages
            # But still need to check budget!
            final_messages = messages
            # Include system prompt in calculation
            calc_messages = final_messages
            if system_prompt_msg:
                is_in_messages = any(m.get("role") == "system" for m in final_messages)
                if not is_in_messages:
                    calc_messages = [system_prompt_msg] + final_messages
            # Get max context limit
            model = self._clean_model_id(body.get("model"))
            thresholds = self._get_model_thresholds(model)
            max_context_tokens = thresholds.get(
                "max_context_tokens", self.valves.max_context_tokens
            )
            total_tokens = await asyncio.to_thread(
                self._calculate_messages_tokens, calc_messages
            )
            if total_tokens > max_context_tokens:
                await self._log(
                    f"[Inlet] ⚠️ Original messages ({total_tokens} Tokens) exceed limit ({max_context_tokens}). Reducing history...",
                    type="warning",
                    event_call=__event_call__,
                )
                # Dynamically remove messages from the start
                # We'll respect effective_keep_first to protect system prompts
                start_trim_index = effective_keep_first
                while (
                    total_tokens > max_context_tokens
                    and len(final_messages)
                    > start_trim_index + 1  # Keep at least 1 message after keep_first
                ):
                    dropped = final_messages.pop(start_trim_index)
                    total_tokens -= self._count_tokens(str(dropped.get("content", "")))
                await self._log(
                    f"[Inlet] ✂️ Messages reduced. New total: {total_tokens} Tokens",
                    event_call=__event_call__,
                )
            # Send status notification (Context Usage format)
            if __event_emitter__:
                status_msg = f"Context Usage (Estimated): {total_tokens} / {max_context_tokens} Tokens"
                if max_context_tokens > 0:
                    usage_ratio = total_tokens / max_context_tokens
                    status_msg += f" ({usage_ratio*100:.1f}%)"
                    if usage_ratio > 0.9:
                        status_msg += " | ⚠️ High Usage"
                await __event_emitter__(
                    {
                        "type": "status",
                        "data": {
                            "description": status_msg,
                            "done": True,
                        },
                    }
                )
        body["messages"] = final_messages
        await self._log(
@@ -1048,9 +1525,21 @@ class Filter:
                return
            middle_messages = messages[start_index:end_index]
            tail_preview_msgs = messages[end_index:]
            if self.valves.show_debug_log and __event_call__:
                middle_preview = [
                    f"{i + start_index}: [{m.get('role')}] {m.get('content', '')[:20]}..."
                    for i, m in enumerate(middle_messages[:3])
                ]
                tail_preview = [
                    f"{i + end_index}: [{m.get('role')}] {m.get('content', '')[:20]}..."
                    for i, m in enumerate(tail_preview_msgs)
                ]
                await self._log(
-                f"[🤖 Async Summary Task] Middle messages to process: {len(middle_messages)}",
+                    f"[🤖 Async Summary Task] 📊 Boundary Check:\n"
                    f"  - Middle (Compressing): {len(middle_messages)} msgs (Indices {start_index}-{end_index-1}) -> Preview: {middle_preview}\n"
                    f"  - Tail (Keeping): {len(tail_preview_msgs)} msgs (Indices {end_index}-End) -> Preview: {tail_preview}",
                    event_call=__event_call__,
                )
@@ -1186,6 +1675,109 @@ class Filter:
                event_call=__event_call__,
            )
            # --- Token Usage Status Notification ---
            if self.valves.show_token_usage_status and __event_emitter__:
                try:
                    # 1. Fetch System Prompt (DB fallback)
                    system_prompt_msg = None
                    model_id = body.get("model")
                    if model_id:
                        try:
                            model_obj = Models.get_model_by_id(model_id)
                            if model_obj and model_obj.params:
                                params = model_obj.params
                                if isinstance(params, str):
                                    params = json.loads(params)
                                if isinstance(params, dict):
                                    sys_content = params.get("system")
                                else:
                                    sys_content = getattr(params, "system", None)
                                if sys_content:
                                    system_prompt_msg = {
                                        "role": "system",
                                        "content": sys_content,
                                    }
                        except Exception:
                            pass  # Ignore DB errors here, best effort
                    # 2. Calculate Effective Keep First
                    last_system_index = -1
                    for i, msg in enumerate(messages):
                        if msg.get("role") == "system":
                            last_system_index = i
                    effective_keep_first = max(
                        self.valves.keep_first, last_system_index + 1
                    )
                    # 3. Construct Next Context
                    # Head
                    head_msgs = (
                        messages[:effective_keep_first]
                        if effective_keep_first > 0
                        else []
                    )
                    # Summary
                    summary_content = (
                        f"【System Prompt: The following is a summary of the historical conversation, provided for context only. Do not reply to the summary content itself; answer the subsequent latest questions directly.】\n\n"
                        f"{new_summary}\n\n"
                        f"---\n"
                        f"Below is the recent conversation:"
                    )
                    summary_msg = {"role": "assistant", "content": summary_content}
                    # Tail (using target_compressed_count which is what we just compressed up to)
                    # Note: target_compressed_count is the index *after* the last compressed message?
                    # In _generate_summary_async, target_compressed_count is passed in.
                    # It represents the number of messages to be covered by summary (excluding keep_last).
                    # So tail starts at max(target_compressed_count, effective_keep_first).
                    start_index = max(target_compressed_count, effective_keep_first)
                    tail_msgs = messages[start_index:]
                    # Assemble
                    next_context = head_msgs + [summary_msg] + tail_msgs
                    # Inject system prompt if needed
                    if system_prompt_msg:
                        is_in_head = any(m.get("role") == "system" for m in head_msgs)
                        if not is_in_head:
                            next_context = [system_prompt_msg] + next_context
                    # 4. Calculate Tokens
                    token_count = self._calculate_messages_tokens(next_context)
                    # 5. Get Thresholds & Calculate Ratio
                    model = self._clean_model_id(body.get("model"))
                    thresholds = self._get_model_thresholds(model)
                    max_context_tokens = thresholds.get(
                        "max_context_tokens", self.valves.max_context_tokens
                    )
                    # 6. Emit Status
                    status_msg = f"Context Summary Updated: {token_count} / {max_context_tokens} Tokens"
                    if max_context_tokens > 0:
                        ratio = (token_count / max_context_tokens) * 100
                        status_msg += f" ({ratio:.1f}%)"
                        if ratio > 90.0:
                            status_msg += " | ⚠️ High Usage"
                    await __event_emitter__(
                        {
                            "type": "status",
                            "data": {
                                "description": status_msg,
                                "done": True,
                            },
                        }
                    )
                except Exception as e:
                    await self._log(
                        f"[Status] Error calculating tokens: {e}",
                        type="error",
                        event_call=__event_call__,
                    )
        except Exception as e:
            await self._log(
                f"[🤖 Async Summary Task] ❌ Error: {str(e)}",
--- a/plugins/filters/async-context-compression/async_context_compression_cn.py
+++ b/plugins/filters/async-context-compression/async_context_compression_cn.py
@@ -5,10 +5,20 @@ author: Fu-Jie
 author_url: https://github.com/Fu-Jie/awesome-openwebui
 funding_url: https://github.com/open-webui
 description: 通过智能摘要和消息压缩，降低长对话的 token 消耗，同时保持对话连贯性。
-version: 1.1.3
+version: 1.2.0
 openwebui_id: 5c0617cb-a9e4-4bd6-a440-d276534ebd18
 license: MIT
 ═══════════════════════════════════════════════════════════════════════════════
 📌 1.2.0 版本更新
 ═══════════════════════════════════════════════════════════════════════════════
  ✅ 预检上下文检查：发送给模型前验证上下文是否适配。
  ✅ 结构感知裁剪：折叠过长的 AI 响应，同时保留标题 (H1-H6)、开头和结尾。
  ✅ 原生工具输出裁剪：使用函数调用时清理上下文，去除冗余输出。（注意：非原生工具调用输出不会完整注入上下文）
  ✅ 上下文使用警告：当使用量超过 90% 时发出通知。
  ✅ 详细 Token 日志：细粒度记录 System、Head、Summary 和 Tail 的 Token 消耗。
 ═══════════════════════════════════════════════════════════════════════════════
 📌 功能概述
 ═══════════════════════════════════════════════════════════════════════════════
@@ -248,9 +258,11 @@ import asyncio
 import json
 import hashlib
 import time
 import re
 # Open WebUI 内置导入
 from open_webui.utils.chat import generate_chat_completion
 from open_webui.models.models import Models
 from open_webui.models.users import Users
 from fastapi.requests import Request
 from open_webui.main import app as webui_app
@@ -353,6 +365,13 @@ class Filter:
        show_debug_log: bool = Field(
            default=False, description="在浏览器控制台打印调试日志 (F12)"
        )
        show_token_usage_status: bool = Field(
            default=True, description="在对话结束时显示 Token 使用情况的状态通知"
        )
        enable_tool_output_trimming: bool = Field(
            default=False,
            description="启用原生工具输出裁剪 (仅适用于 native function calling)，裁剪过长的工具输出以节省 Token。",
        )
    def _save_summary(self, chat_id: str, summary: str, compressed_count: int):
        """保存摘要到数据库"""
@@ -614,12 +633,217 @@ class Filter:
    ) -> dict:
        """
        在发送到 LLM 之前执行
-        压缩策略：只负责注入已有的摘要，不进行 Token 计算
+        压缩策略：
        1. 注入已有摘要
        2. 预检 Token 预算
        3. 如果超限，执行结构化裁剪（Structure-Aware Trimming）或丢弃旧消息
        """
        messages = body.get("messages", [])
        # --- 原生工具输出裁剪 (Native Tool Output Trimming) ---
        # 即使未启用压缩，也始终检查并裁剪过长的工具输出，以节省 Token
        if self.valves.enable_tool_output_trimming:
            trimmed_count = 0
            for msg in messages:
                content = msg.get("content", "")
                if not isinstance(content, str):
                    continue
                role = msg.get("role")
                # 仅处理带有原生工具输出的助手消息
                if role == "assistant":
                    # 检测助手内容中的工具输出标记
                    if "tool_call_id:" in content or (
                        content.startswith('"') and "\\&quot;" in content
                    ):
                        if self.valves.show_debug_log and __event_call__:
                            await self._log(
                                f"[Inlet] 🔍 检测到助手消息中的原生工具输出。",
                                event_call=__event_call__,
                            )
                        # 提取最终答案（在最后一个工具调用元数据之后）
                        # 模式：匹配转义的 JSON 字符串，如 ""&quot;...&quot;"" 后跟换行符
                        # 我们寻找该模式的最后一次出现，并获取其后的所有内容
                        # 1. 尝试匹配特定的 OpenWebUI 工具输出格式：""&quot;...&quot;""
                        tool_output_pattern = r'""&quot;.*?&quot;""\s*'
                        # 查找所有匹配项
                        matches = list(
                            re.finditer(tool_output_pattern, content, re.DOTALL)
                        )
                        if matches:
                            # 获取最后一个匹配项的结束位置
                            last_match_end = matches[-1].end()
                            # 最后一个工具输出之后的所有内容即为最终答案
                            final_answer = content[last_match_end:].strip()
                            if final_answer:
                                msg["content"] = (
                                    f"... [Tool outputs trimmed]\n{final_answer}"
                                )
                                trimmed_count += 1
                        else:
                            # 回退：如果找不到新格式，尝试按 "Arguments:" 分割
                            # (保留向后兼容性或适应不同模型行为)
                            parts = re.split(r"(?:Arguments:\s*\{[^}]+\})\n+", content)
                            if len(parts) > 1:
                                final_answer = parts[-1].strip()
                                if final_answer:
                                    msg["content"] = (
                                        f"... [Tool outputs trimmed]\n{final_answer}"
                                    )
                                    trimmed_count += 1
            if trimmed_count > 0 and self.valves.show_debug_log and __event_call__:
                await self._log(
                    f"[Inlet] ✂️ 已裁剪 {trimmed_count} 条工具输出消息。",
                    event_call=__event_call__,
                )
        chat_ctx = self._get_chat_context(body, __metadata__)
        chat_id = chat_ctx["chat_id"]
        # 提取系统提示词以进行准确的 Token 计算
        # 1. 对于自定义模型：检查数据库 (Models.get_model_by_id)
        # 2. 对于基础模型：检查消息中的 role='system'
        system_prompt_content = None
        # 尝试从数据库获取 (自定义模型)
        try:
            model_id = body.get("model")
            if model_id:
                if self.valves.show_debug_log and __event_call__:
                    await self._log(
                        f"[Inlet] 🔍 尝试从数据库查找模型: {model_id}",
                        event_call=__event_call__,
                    )
                # 清理模型 ID
                model_obj = Models.get_model_by_id(model_id)
                if model_obj:
                    if self.valves.show_debug_log and __event_call__:
                        await self._log(
                            f"[Inlet] ✅ 数据库中找到模型: {model_obj.name} (ID: {model_obj.id})",
                            event_call=__event_call__,
                        )
                    if model_obj.params:
                        try:
                            params = model_obj.params
                            # 处理 params 是 JSON 字符串的情况
                            if isinstance(params, str):
                                params = json.loads(params)
                            # 处理字典或 Pydantic 对象
                            if isinstance(params, dict):
                                system_prompt_content = params.get("system")
                            else:
                                # 假设是 Pydantic 模型或对象
                                system_prompt_content = getattr(params, "system", None)
                            if system_prompt_content:
                                if self.valves.show_debug_log and __event_call__:
                                    await self._log(
                                        f"[Inlet] 📝 在数据库参数中找到系统提示词 ({len(system_prompt_content)} 字符)",
                                        event_call=__event_call__,
                                    )
                            else:
                                if self.valves.show_debug_log and __event_call__:
                                    await self._log(
                                        f"[Inlet] ⚠️ 模型参数中缺少 'system' 键",
                                        event_call=__event_call__,
                                    )
                        except Exception as e:
                            if self.valves.show_debug_log and __event_call__:
                                await self._log(
                                    f"[Inlet] ❌ 解析模型参数失败: {e}",
                                    type="error",
                                    event_call=__event_call__,
                                )
                    else:
                        if self.valves.show_debug_log and __event_call__:
                            await self._log(
                                f"[Inlet] ⚠️ 模型参数为空",
                                event_call=__event_call__,
                            )
                else:
                    if self.valves.show_debug_log and __event_call__:
                        await self._log(
                            f"[Inlet] ❌ 数据库中未找到模型",
                            type="warning",
                            event_call=__event_call__,
                        )
        except Exception as e:
            if self.valves.show_debug_log and __event_call__:
                await self._log(
                    f"[Inlet] ❌ 从数据库获取系统提示词错误: {e}",
                    type="error",
                    event_call=__event_call__,
                )
            if self.valves.debug_mode:
                print(f"[Inlet] 从数据库获取系统提示词错误: {e}")
        # 回退：检查消息列表 (基础模型或已包含)
        if not system_prompt_content:
            for msg in messages:
                if msg.get("role") == "system":
                    system_prompt_content = msg.get("content", "")
                    break
        # 构建 system_prompt_msg 用于 Token 计算
        system_prompt_msg = None
        if system_prompt_content:
            system_prompt_msg = {"role": "system", "content": system_prompt_content}
            if self.valves.debug_mode:
                print(
                    f"[Inlet] 找到系统提示词 ({len(system_prompt_content)} 字符)。计入预算。"
                )
        # 记录消息统计信息 (移至此处以包含提取的系统提示词)
        if self.valves.show_debug_log and __event_call__:
            try:
                msg_stats = {
                    "user": 0,
                    "assistant": 0,
                    "system": 0,
                    "total": len(messages),
                }
                for msg in messages:
                    role = msg.get("role", "unknown")
                    if role in msg_stats:
                        msg_stats[role] += 1
                # 如果系统提示词是从 DB/Model 提取的但不在消息中，则计数
                if system_prompt_content:
                    # 检查是否已计数 (即是否在消息中)
                    is_in_messages = any(m.get("role") == "system" for m in messages)
                    if not is_in_messages:
                        msg_stats["system"] += 1
                        msg_stats["total"] += 1
                stats_str = f"Total: {msg_stats['total']} | User: {msg_stats['user']} | Assistant: {msg_stats['assistant']} | System: {msg_stats['system']}"
                await self._log(
                    f"[Inlet] 消息统计: {stats_str}", event_call=__event_call__
                )
            except Exception as e:
                print(f"[Inlet] 记录消息统计错误: {e}")
        if not chat_id:
            await self._log(
                "[Inlet] ❌ metadata 中缺少 chat_id，跳过压缩",
                type="error",
                event_call=__event_call__,
            )
            return body
        if self.valves.debug_mode or self.valves.show_debug_log:
            await self._log(
                f"\n{'='*60}\n[Inlet] Chat ID: {chat_id}\n[Inlet] 收到 {len(messages)} 条消息",
@@ -630,10 +854,6 @@ class Filter:
        # 目标是压缩到倒数第 keep_last 条之前
        target_compressed_count = max(0, len(messages) - self.valves.keep_last)
        # 记录原始消息的目标压缩进度，供 outlet 使用
        # 目标是压缩到倒数第 keep_last 条之前
        target_compressed_count = max(0, len(messages) - self.valves.keep_last)
        await self._log(
            f"[Inlet] 记录目标压缩进度: {target_compressed_count}",
            event_call=__event_call__,
@@ -642,6 +862,14 @@ class Filter:
        # 加载摘要记录
        summary_record = await asyncio.to_thread(self._load_summary_record, chat_id)
        # 计算 effective_keep_first 以确保所有系统消息都被保护
        last_system_index = -1
        for i, msg in enumerate(messages):
            if msg.get("role") == "system":
                last_system_index = i
        effective_keep_first = max(self.valves.keep_first, last_system_index + 1)
        final_messages = []
        if summary_record:
@@ -655,8 +883,8 @@ class Filter:
            # 1. 头部消息 (Keep First)
            head_messages = []
-            if self.valves.keep_first > 0:
+            if effective_keep_first > 0:
-                head_messages = messages[: self.valves.keep_first]
+                head_messages = messages[:effective_keep_first]
            # 2. 摘要消息 (作为 User 消息插入)
            summary_content = (
@@ -669,29 +897,214 @@ class Filter:
            # 3. 尾部消息 (Tail) - 从上次压缩点开始的所有消息
            # 注意：这里必须确保不重复包含头部消息
-            start_index = max(compressed_count, self.valves.keep_first)
+            start_index = max(compressed_count, effective_keep_first)
            tail_messages = messages[start_index:]
-            final_messages = head_messages + [summary_msg] + tail_messages
+            if self.valves.show_debug_log and __event_call__:
                tail_preview = [
                    f"{i + start_index}: [{m.get('role')}] {m.get('content', '')[:30]}..."
                    for i, m in enumerate(tail_messages)
                ]
                await self._log(
                    f"[Inlet] 📜 尾部消息 (起始索引: {start_index}): {tail_preview}",
                    event_call=__event_call__,
                )
            # --- 预检检查与预算 (Preflight Check & Budgeting) ---
            # 组装候选消息 (用于输出)
            candidate_messages = head_messages + [summary_msg] + tail_messages
            # 准备用于 Token 计算的消息 (如果缺少则包含系统提示词)
            calc_messages = candidate_messages
            if system_prompt_msg:
                # 检查系统提示词是否已在 head_messages 中
                is_in_head = any(m.get("role") == "system" for m in head_messages)
                if not is_in_head:
                    calc_messages = [system_prompt_msg] + candidate_messages
            # 获取最大上下文限制
            model = self._clean_model_id(body.get("model"))
            thresholds = self._get_model_thresholds(model)
            max_context_tokens = thresholds.get(
                "max_context_tokens", self.valves.max_context_tokens
            )
            # 计算总 Token
            total_tokens = await asyncio.to_thread(
                self._calculate_messages_tokens, calc_messages
            )
            # 预检检查日志
            await self._log(
                f"[Inlet] 🔎 预检检查: {total_tokens}t / {max_context_tokens}t ({(total_tokens/max_context_tokens*100):.1f}%)",
                event_call=__event_call__,
            )
            # 如果超出预算，缩减历史记录 (Keep Last)
            if total_tokens > max_context_tokens:
                await self._log(
                    f"[Inlet] ⚠️ 候选提示词 ({total_tokens} Tokens) 超过上限 ({max_context_tokens})。正在缩减历史记录...",
                    type="warning",
                    event_call=__event_call__,
                )
                # 动态从 tail_messages 的开头移除消息
                # 始终尝试保留至少最后一条消息 (通常是用户输入)
                while total_tokens > max_context_tokens and len(tail_messages) > 1:
                    # 策略 1: 结构化助手消息裁剪 (Structure-Aware Assistant Trimming)
                    # 保留: 标题 (#), 第一行, 最后一行。折叠其余部分。
                    target_msg = None
                    target_idx = -1
                    # 查找最旧的、较长且尚未裁剪的助手消息
                    for i, msg in enumerate(tail_messages):
                        # 跳过最后一条消息 (通常是用户输入，保护它)
                        if i == len(tail_messages) - 1:
                            break
                        if msg.get("role") == "assistant":
                            content = str(msg.get("content", ""))
                            is_trimmed = msg.get("metadata", {}).get(
                                "is_trimmed", False
                            )
                            # 仅针对相当长 (> 200 字符) 的消息
                            if len(content) > 200 and not is_trimmed:
                                target_msg = msg
                                target_idx = i
                                break
                    # 如果找到合适的助手消息，应用结构化裁剪
                    if target_msg:
                        content = str(target_msg.get("content", ""))
                        lines = content.split("\n")
                        kept_lines = []
                        # 逻辑: 保留标题, 第一行非空行, 最后一行非空行
                        first_line_found = False
                        last_line_idx = -1
                        # 查找最后一行非空行的索引
                        for idx in range(len(lines) - 1, -1, -1):
                            if lines[idx].strip():
                                last_line_idx = idx
                                break
                        for idx, line in enumerate(lines):
                            stripped = line.strip()
                            if not stripped:
                                continue
                            # 保留标题 (H1-H6, 需要 # 后有空格)
                            if re.match(r"^#{1,6}\s+", stripped):
                                kept_lines.append(line)
                                continue
                            # 保留第一行非空行
                            if not first_line_found:
                                kept_lines.append(line)
                                first_line_found = True
                                # 如果后面还有内容，添加占位符
                                if idx < last_line_idx:
                                    kept_lines.append("\n... [Content collapsed] ...\n")
                                continue
                            # 保留最后一行非空行
                            if idx == last_line_idx:
                                kept_lines.append(line)
                                continue
                        # 更新消息内容
                        new_content = "\n".join(kept_lines)
                        # 安全检查: 如果裁剪没有节省太多 (例如主要是标题)，则强制丢弃
                        if len(new_content) > len(content) * 0.8:
                            # 如果结构保留过于冗长，回退到丢弃
                            pass
                        else:
                            target_msg["content"] = new_content
                            if "metadata" not in target_msg:
                                target_msg["metadata"] = {}
                            target_msg["metadata"]["is_trimmed"] = True
                            # 计算 Token 减少量
                            old_tokens = self._count_tokens(content)
                            new_tokens = self._count_tokens(target_msg["content"])
                            diff = old_tokens - new_tokens
                            total_tokens -= diff
                            if self.valves.show_debug_log and __event_call__:
                                await self._log(
                                    f"[Inlet] 📉 结构化裁剪助手消息。节省: {diff} tokens。",
                                    event_call=__event_call__,
                                )
                            continue
                    # 策略 2: 回退 - 完全丢弃最旧的消息 (FIFO)
                    dropped = tail_messages.pop(0)
                    dropped_tokens = self._count_tokens(str(dropped.get("content", "")))
                    total_tokens -= dropped_tokens
                    if self.valves.show_debug_log and __event_call__:
                        await self._log(
                            f"[Inlet] 🗑️ 从历史记录中丢弃消息以适应上下文。角色: {dropped.get('role')}, Tokens: {dropped_tokens}",
                            event_call=__event_call__,
                        )
                # 重新组装
                candidate_messages = head_messages + [summary_msg] + tail_messages
                await self._log(
                    f"[Inlet] ✂️ 历史记录已缩减。新总数: {total_tokens} Tokens (尾部大小: {len(tail_messages)})",
                    event_call=__event_call__,
                )
            final_messages = candidate_messages
            # 计算详细 Token 统计以用于日志
            system_tokens = (
                self._count_tokens(system_prompt_msg.get("content", ""))
                if system_prompt_msg
                else 0
            )
            head_tokens = self._calculate_messages_tokens(head_messages)
            summary_tokens = self._count_tokens(summary_content)
            tail_tokens = self._calculate_messages_tokens(tail_messages)
            system_info = (
                f"System({system_tokens}t)" if system_prompt_msg else "System(0t)"
            )
            total_section_tokens = (
                system_tokens + head_tokens + summary_tokens + tail_tokens
            )
            await self._log(
                f"[Inlet] 应用摘要: {system_info} + Head({len(head_messages)} 条, {head_tokens}t) + Summary({summary_tokens}t) + Tail({len(tail_messages)} 条, {tail_tokens}t) = Total({total_section_tokens}t)",
                type="success",
                event_call=__event_call__,
            )
            # 准备状态消息 (上下文使用量格式)
            if max_context_tokens > 0:
                usage_ratio = total_section_tokens / max_context_tokens
                status_msg = f"上下文使用量 (预估): {total_section_tokens} / {max_context_tokens} Tokens ({usage_ratio*100:.1f}%)"
                if usage_ratio > 0.9:
                    status_msg += " | ⚠️ 高负载"
            else:
                status_msg = f"已加载历史摘要 (隐藏 {compressed_count} 条历史消息)"
            # 发送状态通知
            if __event_emitter__:
                await __event_emitter__(
                    {
                        "type": "status",
                        "data": {
-                            "description": f"已加载历史摘要 (隐藏 {compressed_count} 条历史消息)",
+                            "description": status_msg,
                            "done": True,
                        },
                    }
                )
            await self._log(
                f"[Inlet] 应用摘要: Head({len(head_messages)}) + Summary + Tail({len(tail_messages)})",
                type="success",
                event_call=__event_call__,
            )
            # Emit debug log to frontend (Keep the structured log as well)
            await self._emit_debug_log(
                __event_call__,
@@ -704,8 +1117,73 @@ class Filter:
            )
        else:
            # 没有摘要，使用原始消息
            # 但仍然需要检查预算！
            final_messages = messages
            # 包含系统提示词进行计算
            calc_messages = final_messages
            if system_prompt_msg:
                is_in_messages = any(m.get("role") == "system" for m in final_messages)
                if not is_in_messages:
                    calc_messages = [system_prompt_msg] + final_messages
            # 获取最大上下文限制
            model = self._clean_model_id(body.get("model"))
            thresholds = self._get_model_thresholds(model)
            max_context_tokens = thresholds.get(
                "max_context_tokens", self.valves.max_context_tokens
            )
            total_tokens = await asyncio.to_thread(
                self._calculate_messages_tokens, calc_messages
            )
            if total_tokens > max_context_tokens:
                await self._log(
                    f"[Inlet] ⚠️ 原始消息 ({total_tokens} Tokens) 超过上限 ({max_context_tokens})。正在缩减历史记录...",
                    type="warning",
                    event_call=__event_call__,
                )
                # 动态从开头移除消息
                # 我们将遵守 effective_keep_first 以保护系统提示词
                start_trim_index = effective_keep_first
                while (
                    total_tokens > max_context_tokens
                    and len(final_messages)
                    > start_trim_index + 1  # 保留 keep_first 之后至少 1 条消息
                ):
                    dropped = final_messages.pop(start_trim_index)
                    total_tokens -= self._count_tokens(str(dropped.get("content", "")))
                await self._log(
                    f"[Inlet] ✂️ 消息已缩减。新总数: {total_tokens} Tokens",
                    event_call=__event_call__,
                )
            # 发送状态通知 (上下文使用量格式)
            if __event_emitter__:
                status_msg = (
                    f"上下文使用量 (预估): {total_tokens} / {max_context_tokens} Tokens"
                )
                if max_context_tokens > 0:
                    usage_ratio = total_tokens / max_context_tokens
                    status_msg += f" ({usage_ratio*100:.1f}%)"
                    if usage_ratio > 0.9:
                        status_msg += " | ⚠️ 高负载"
                await __event_emitter__(
                    {
                        "type": "status",
                        "data": {
                            "description": status_msg,
                            "done": True,
                        },
                    }
                )
        body["messages"] = final_messages
        await self._log(
@@ -882,9 +1360,21 @@ class Filter:
                return
            middle_messages = messages[start_index:end_index]
            tail_preview_msgs = messages[end_index:]
            if self.valves.show_debug_log and __event_call__:
                middle_preview = [
                    f"{i + start_index}: [{m.get('role')}] {m.get('content', '')[:20]}..."
                    for i, m in enumerate(middle_messages[:3])
                ]
                tail_preview = [
                    f"{i + end_index}: [{m.get('role')}] {m.get('content', '')[:20]}..."
                    for i, m in enumerate(tail_preview_msgs)
                ]
                await self._log(
-                f"[🤖 异步摘要任务] 待处理中间消息: {len(middle_messages)} 条",
+                    f"[🤖 异步摘要任务] 📊 边界检查:\n"
                    f"  - 中间 (压缩): {len(middle_messages)} 条 (索引 {start_index}-{end_index-1}) -> 预览: {middle_preview}\n"
                    f"  - 尾部 (保留): {len(tail_preview_msgs)} 条 (索引 {end_index}-End) -> 预览: {tail_preview}",
                    event_call=__event_call__,
                )
@@ -1020,6 +1510,109 @@ class Filter:
                event_call=__event_call__,
            )
            # --- Token 使用情况状态通知 ---
            if self.valves.show_token_usage_status and __event_emitter__:
                try:
                    # 1. 获取系统提示词 (DB 回退)
                    system_prompt_msg = None
                    model_id = body.get("model")
                    if model_id:
                        try:
                            model_obj = Models.get_model_by_id(model_id)
                            if model_obj and model_obj.params:
                                params = model_obj.params
                                if isinstance(params, str):
                                    params = json.loads(params)
                                if isinstance(params, dict):
                                    sys_content = params.get("system")
                                else:
                                    sys_content = getattr(params, "system", None)
                                if sys_content:
                                    system_prompt_msg = {
                                        "role": "system",
                                        "content": sys_content,
                                    }
                        except Exception:
                            pass  # 忽略 DB 错误，尽力而为
                    # 2. 计算 Effective Keep First
                    last_system_index = -1
                    for i, msg in enumerate(messages):
                        if msg.get("role") == "system":
                            last_system_index = i
                    effective_keep_first = max(
                        self.valves.keep_first, last_system_index + 1
                    )
                    # 3. 构建下一个上下文 (Next Context)
                    # Head
                    head_msgs = (
                        messages[:effective_keep_first]
                        if effective_keep_first > 0
                        else []
                    )
                    # Summary
                    summary_content = (
                        f"【系统提示：以下是历史对话的摘要，仅供参考上下文，请勿对摘要内容进行回复，直接回答后续的最新问题】\n\n"
                        f"{new_summary}\n\n"
                        f"---\n"
                        f"以下是最近的对话："
                    )
                    summary_msg = {"role": "assistant", "content": summary_content}
                    # Tail (使用 target_compressed_count，这是我们刚刚压缩到的位置)
                    # 注意：target_compressed_count 是要被摘要覆盖的消息数（不包括 keep_last）
                    # 所以 tail 从 max(target_compressed_count, effective_keep_first) 开始
                    start_index = max(target_compressed_count, effective_keep_first)
                    tail_msgs = messages[start_index:]
                    # 组装
                    next_context = head_msgs + [summary_msg] + tail_msgs
                    # 如果需要，注入系统提示词
                    if system_prompt_msg:
                        is_in_head = any(m.get("role") == "system" for m in head_msgs)
                        if not is_in_head:
                            next_context = [system_prompt_msg] + next_context
                    # 4. 计算 Token
                    token_count = self._calculate_messages_tokens(next_context)
                    # 5. 获取阈值并计算比例
                    model = self._clean_model_id(body.get("model"))
                    thresholds = self._get_model_thresholds(model)
                    max_context_tokens = thresholds.get(
                        "max_context_tokens", self.valves.max_context_tokens
                    )
                    # 6. 发送状态
                    status_msg = (
                        f"上下文摘要已更新: {token_count} / {max_context_tokens} Tokens"
                    )
                    if max_context_tokens > 0:
                        ratio = (token_count / max_context_tokens) * 100
                        status_msg += f" ({ratio:.1f}%)"
                        if ratio > 90.0:
                            status_msg += " | ⚠️ 高负载"
                    await __event_emitter__(
                        {
                            "type": "status",
                            "data": {
                                "description": status_msg,
                                "done": True,
                            },
                        }
                    )
                except Exception as e:
                    await self._log(
                        f"[Status] 计算 Token 错误: {e}",
                        type="error",
                        event_call=__event_call__,
                    )
        except Exception as e:
            await self._log(
                f"[🤖 异步摘要任务] ❌ 错误: {str(e)}",
--- a/plugins/filters/markdown_normalizer/README.md
+++ b/plugins/filters/markdown_normalizer/README.md
@@ -1,6 +1,6 @@
 # Markdown Normalizer Filter
-**Author:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **Version:** 1.2.2 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **License:** MIT
+**Author:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **Version:** 1.2.4 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **License:** MIT
 A content normalizer filter for Open WebUI that fixes common Markdown formatting issues in LLM outputs. It ensures that code blocks, LaTeX formulas, Mermaid diagrams, and other Markdown elements are rendered correctly.
@@ -43,7 +43,7 @@ A content normalizer filter for Open WebUI that fixes common Markdown formatting
 * `enable_heading_fix`: Fix missing space in headings.
 * `enable_table_fix`: Fix missing closing pipe in tables.
 * `enable_xml_tag_cleanup`: Cleanup leftover XML tags.
-* `enable_emphasis_spacing_fix`: Fix extra spaces in emphasis (default: True).
+* `enable_emphasis_spacing_fix`: Fix extra spaces in emphasis (default: False).
 * `show_status`: Show status notification when fixes are applied.
 * `show_debug_log`: Print debug logs to browser console.
@@ -53,6 +53,10 @@ A content normalizer filter for Open WebUI that fixes common Markdown formatting
 ## Changelog
 ### v1.2.4
 * **Documentation Updates**: Synchronized version numbers across all documentation and code files.
 ### v1.2.3
 * **List Marker Protection Enhancement**: Fixed a bug where list markers (`*`) followed by plain text and emphasis were having their spaces incorrectly stripped (e.g., `*   U16 forward` became `*U16 forward`).
--- a/plugins/filters/markdown_normalizer/README_CN.md
+++ b/plugins/filters/markdown_normalizer/README_CN.md
@@ -1,6 +1,6 @@
 # Markdown 格式化过滤器 (Markdown Normalizer)
-**作者:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **版本:** 1.2.2 | **项目:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **许可证:** MIT
+**作者:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **版本:** 1.2.4 | **项目:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **许可证:** MIT
 这是一个用于 Open WebUI 的内容格式化过滤器，旨在修复 LLM 输出中常见的 Markdown 格式问题。它能确保代码块、LaTeX 公式、Mermaid 图表和其他 Markdown 元素被正确渲染。
@@ -53,6 +53,10 @@
 ## 更新日志
 ### v1.2.4
 * **文档更新**: 同步了所有文档和代码文件的版本号。
 ### v1.2.3
 * **列表标记保护增强**: 修复了列表标记 (`*`) 后跟普通文本和强调标记时，空格被错误剥离的问题（例如 `*   U16 前锋` 变成 `*U16 前锋`）。
--- a/plugins/filters/markdown_normalizer/markdown_normalizer.py
+++ b/plugins/filters/markdown_normalizer/markdown_normalizer.py
@@ -3,7 +3,7 @@ title: Markdown Normalizer
 author: Fu-Jie
 author_url: https://github.com/Fu-Jie/awesome-openwebui
 funding_url: https://github.com/open-webui
-version: 1.2.3
+version: 1.2.4
 openwebui_id: baaa8732-9348-40b7-8359-7e009660e23c
 description: A content normalizer filter that fixes common Markdown formatting issues in LLM outputs, such as broken code blocks, LaTeX formulas, and list formatting.
 """
@@ -43,7 +43,7 @@ class NormalizerConfig:
    )
    enable_table_fix: bool = True  # Fix missing closing pipe in tables
    enable_xml_tag_cleanup: bool = True  # Cleanup leftover XML tags
-    enable_emphasis_spacing_fix: bool = True  # Fix spaces inside **emphasis**
+    enable_emphasis_spacing_fix: bool = False  # Fix spaces inside **emphasis**
    # Custom cleaner functions (for advanced extension)
    custom_cleaners: List[Callable[[str], str]] = field(default_factory=list)
@@ -564,7 +564,7 @@ class Filter:
            default=True, description="Cleanup leftover XML tags"
        )
        enable_emphasis_spacing_fix: bool = Field(
-            default=True,
+            default=False,
            description="Fix spaces inside **emphasis** (e.g. ** text ** -> **text**)",
        )
        show_status: bool = Field(
@@ -695,6 +695,15 @@ class Filter:
                if self._contains_html(content):
                    return body
                # Skip if content contains tool output markers (native function calling)
                # Pattern: ""&quot;...&quot;"" or tool_call_id or <details type="tool_calls"...>
                if (
                    '""&quot;' in content
                    or "tool_call_id" in content
                    or '<details type="tool_calls"' in content
                ):
                    return body
                # Configure normalizer based on valves
                config = NormalizerConfig(
                    enable_escape_fix=self.valves.enable_escape_fix,
--- a/plugins/filters/markdown_normalizer/markdown_normalizer_cn.py
+++ b/plugins/filters/markdown_normalizer/markdown_normalizer_cn.py
@@ -3,7 +3,7 @@ title: Markdown 格式修复器 (Markdown Normalizer)
 author: Fu-Jie
 author_url: https://github.com/Fu-Jie/awesome-openwebui
 funding_url: https://github.com/open-webui
-version: 1.2.3
+version: 1.2.4
 description: 内容规范化过滤器，修复 LLM 输出中常见的 Markdown 格式问题，如损坏的代码块、LaTeX 公式、Mermaid 图表和列表格式。
 """
@@ -24,6 +24,9 @@ class NormalizerConfig:
    """配置类，用于启用/禁用特定的规范化规则"""
    enable_escape_fix: bool = True  # 修复过度的转义字符
    enable_escape_fix_in_code_blocks: bool = (
        False  # 在代码块内部应用转义修复 (默认：关闭，以确保安全)
    )
    enable_thought_tag_fix: bool = True  # 规范化思维链标签
    enable_details_tag_fix: bool = True  # 规范化 <details> 标签（类似思维链标签）
    enable_code_block_fix: bool = True  # 修复代码块格式
@@ -35,7 +38,7 @@ class NormalizerConfig:
    enable_heading_fix: bool = True  # 修复标题中缺失的空格 (#Header -> # Header)
    enable_table_fix: bool = True  # 修复表格中缺失的闭合管道符
    enable_xml_tag_cleanup: bool = True  # 清理残留的 XML 标签
-    enable_emphasis_spacing_fix: bool = True  # 修复 **强调内容** 中的多余空格
+    enable_emphasis_spacing_fix: bool = False  # 修复 **强调内容** 中的多余空格
    # 自定义清理函数 (用于高级扩展)
    custom_cleaners: List[Callable[[str], str]] = field(default_factory=list)
@@ -239,12 +242,27 @@ class ContentNormalizer:
            return content
    def _fix_escape_characters(self, content: str) -> str:
-        """Fix excessive escape characters"""
+        """修复过度的转义字符
        如果 enable_escape_fix_in_code_blocks 为 False (默认)，此方法将仅修复代码块外部的转义字符，
        以避免破坏有效的代码示例 (例如，带有 \\n 的 JSON 字符串、正则表达式模式等)。
        """
        if self.config.enable_escape_fix_in_code_blocks:
            # 全局应用 (原始行为)
            content = content.replace("\\r\\n", "\n")
            content = content.replace("\\n", "\n")
            content = content.replace("\\t", "\t")
            content = content.replace("\\\\", "\\")
            return content
        else:
            # 仅在代码块外部应用 (安全模式)
            parts = content.split("```")
            for i in range(0, len(parts), 2):  # 偶数索引是 Markdown 文本 (非代码)
                parts[i] = parts[i].replace("\\r\\n", "\n")
                parts[i] = parts[i].replace("\\n", "\n")
                parts[i] = parts[i].replace("\\t", "\t")
                parts[i] = parts[i].replace("\\\\", "\\")
            return "```".join(parts)
    def _fix_thought_tags(self, content: str) -> str:
        """Normalize thought tags: unify naming and fix spacing"""
@@ -501,6 +519,10 @@ class Filter:
        enable_escape_fix: bool = Field(
            default=True, description="修复过度的转义字符 (\\n, \\t 等)"
        )
        enable_escape_fix_in_code_blocks: bool = Field(
            default=False,
            description="在代码块内部应用转义修复 (⚠️ 警告：可能会破坏有效的代码，如 JSON 字符串或正则模式。默认：关闭，以确保安全)",
        )
        enable_thought_tag_fix: bool = Field(
            default=True, description="规范化思维链标签 (<think> -> <thought>)"
        )
@@ -539,7 +561,7 @@ class Filter:
            default=True, description="清理残留的 XML 标签"
        )
        enable_emphasis_spacing_fix: bool = Field(
-            default=True,
+            default=False,
            description="修复强调语法中的多余空格 (例如 ** 文本 ** -> **文本**)",
        )
        show_status: bool = Field(default=True, description="应用修复时显示状态通知")
@@ -682,13 +704,23 @@ class Filter:
            content = last.get("content", "") or ""
            if last.get("role") == "assistant" and isinstance(content, str):
-                # Skip if content looks like HTML to avoid breaking it
+                # 如果内容看起来像 HTML，则跳过以避免破坏它
                if self._contains_html(content):
                    return body
-                # Configure normalizer based on valves
+                # 如果内容包含工具输出标记 (原生函数调用)，则跳过
                # 模式：""&quot;...&quot;"" 或 tool_call_id 或 <details type="tool_calls"...>
                if (
                    '""&quot;' in content
                    or "tool_call_id" in content
                    or '<details type="tool_calls"' in content
                ):
                    return body
                # 根据 Valves 配置 Normalizer
                config = NormalizerConfig(
                    enable_escape_fix=self.valves.enable_escape_fix,
                    enable_escape_fix_in_code_blocks=self.valves.enable_escape_fix_in_code_blocks,
                    enable_thought_tag_fix=self.valves.enable_thought_tag_fix,
                    enable_details_tag_fix=self.valves.enable_details_tag_fix,
                    enable_code_block_fix=self.valves.enable_code_block_fix,