feat(async-context-compression): release v1.2.1 with smart config & optimizations

This release introduces significant improvements to configuration flexibility, performance, and stability. **Key Changes:** * **Smart Configuration:** * Added `summary_model_max_context` to allow independent context limits for the summary model (e.g., using `gemini-flash` with 1M context to summarize `gpt-4` history). * Implemented auto-detection of base model settings for custom models, ensuring correct threshold application. * **Performance & Refactoring:** * Optimized `model_thresholds` parsing with caching to reduce overhead. * Refactored `inlet` and `outlet` logic to remove redundant code and improve maintainability. * Replaced all `print` statements with proper `logging` calls for better production monitoring. * **Bug Fixes & Modernization:** * Fixed `datetime.utcnow()` deprecation warnings by switching to timezone-aware `datetime.now(timezone.utc)`. * Corrected type annotations and improved error handling for `JSONResponse` objects from LLM backends. * Removed hard truncation in summary generation to allow full context usage. **Files Updated:** * Plugin source code (English & Chinese) * Documentation and READMEs * Version bumped to 1.2.1
chore: update community stats - followers increased (136 -> 137)
2026-01-20 19:09:25 +08:00 · 2026-01-20 09:15:24 +00:00 · 2026-01-20 07:14:01 +00:00 · 2026-01-19 23:08:06 +00:00 · 2026-01-19 20:37:37 +08:00 · 2026-01-19 12:15:44 +00:00
15 changed files with 763 additions and 335 deletions
--- a/README.md
+++ b/README.md
@@ -10,28 +10,28 @@ A collection of enhancements, plugins, and prompts for [OpenWebUI](https://githu
 <!-- STATS_START -->
 ## 📊 Community Stats

-> 🕐 Auto-updated: 2026-01-19 18:11
+> 🕐 Auto-updated: 2026-01-20 17:15

 | 👤 Author | 👥 Followers | ⭐ Points | 🏆 Contributions |
 |:---:|:---:|:---:|:---:|
-| [Fu-Jie](https://openwebui.com/u/Fu-Jie) | **133** | **134** | **25** |
+| [Fu-Jie](https://openwebui.com/u/Fu-Jie) | **137** | **134** | **25** |

 | 📝 Posts | ⬇️ Downloads | 👁️ Views | 👍 Upvotes | 💾 Saves |
 |:---:|:---:|:---:|:---:|:---:|
-| **16** | **1792** | **21276** | **120** | **135** |
+| **16** | **1878** | **22027** | **120** | **147** |

 ### 🔥 Top 6 Popular Plugins

-> 🕐 Auto-updated: 2026-01-19 18:11
+> 🕐 Auto-updated: 2026-01-20 17:15

 | Rank | Plugin | Version | Downloads | Views | Updated |
 |:---:|------|:---:|:---:|:---:|:---:|
-| 🥇 | [Smart Mind Map](https://openwebui.com/posts/turn_any_text_into_beautiful_mind_maps_3094c59a) | 0.9.1 | 532 | 4822 | 2026-01-17 |
-| 🥈 | [📊 Smart Infographic (AntV)](https://openwebui.com/posts/smart_infographic_ad6f0c7f) | 1.4.9 | 260 | 2514 | 2026-01-18 |
-| 🥉 | [Export to Excel](https://openwebui.com/posts/export_mulit_table_to_excel_244b8f9d) | 0.3.7 | 209 | 800 | 2026-01-07 |
-| 4️⃣ | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | 1.1.3 | 180 | 1975 | 2026-01-17 |
-| 5️⃣ | [Export to Word (Enhanced)](https://openwebui.com/posts/export_to_word_enhanced_formatting_fca6a315) | 0.4.3 | 158 | 1377 | 2026-01-17 |
-| 6️⃣ | [Flash Card](https://openwebui.com/posts/flash_card_65a2ea8f) | 0.2.4 | 138 | 2329 | 2026-01-17 |
+| 🥇 | [Smart Mind Map](https://openwebui.com/posts/turn_any_text_into_beautiful_mind_maps_3094c59a) | 0.9.1 | 550 | 4933 | 2026-01-17 |
+| 🥈 | [📊 Smart Infographic (AntV)](https://openwebui.com/posts/smart_infographic_ad6f0c7f) | 1.4.9 | 281 | 2651 | 2026-01-18 |
+| 🥉 | [Export to Excel](https://openwebui.com/posts/export_mulit_table_to_excel_244b8f9d) | 0.3.7 | 213 | 835 | 2026-01-07 |
+| 4️⃣ | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | 1.2.0 | 189 | 2048 | 2026-01-19 |
+| 5️⃣ | [Export to Word (Enhanced)](https://openwebui.com/posts/export_to_word_enhanced_formatting_fca6a315) | 0.4.3 | 168 | 1449 | 2026-01-17 |
+| 6️⃣ | [Flash Card](https://openwebui.com/posts/flash_card_65a2ea8f) | 0.2.4 | 143 | 2386 | 2026-01-17 |

 *See full stats in [Community Stats Report](./docs/community-stats.md)*
 <!-- STATS_END -->
--- a/README_CN.md
+++ b/README_CN.md
@@ -7,28 +7,28 @@ OpenWebUI 增强功能集合。包含个人开发与收集的插件、提示词
 <!-- STATS_START -->
 ## 📊 社区统计

-> 🕐 自动更新于 2026-01-19 18:11
+> 🕐 自动更新于 2026-01-20 17:15

 | 👤 作者 | 👥 粉丝 | ⭐ 积分 | 🏆 贡献 |
 |:---:|:---:|:---:|:---:|
-| [Fu-Jie](https://openwebui.com/u/Fu-Jie) | **133** | **134** | **25** |
+| [Fu-Jie](https://openwebui.com/u/Fu-Jie) | **137** | **134** | **25** |

 | 📝 发布 | ⬇️ 下载 | 👁️ 浏览 | 👍 点赞 | 💾 收藏 |
 |:---:|:---:|:---:|:---:|:---:|
-| **16** | **1792** | **21276** | **120** | **135** |
+| **16** | **1878** | **22027** | **120** | **147** |

 ### 🔥 热门插件 Top 6

-> 🕐 自动更新于 2026-01-19 18:11
+> 🕐 自动更新于 2026-01-20 17:15

 | 排名 | 插件 | 版本 | 下载 | 浏览 | 更新日期 |
 |:---:|------|:---:|:---:|:---:|:---:|
-| 🥇 | [Smart Mind Map](https://openwebui.com/posts/turn_any_text_into_beautiful_mind_maps_3094c59a) | 0.9.1 | 532 | 4822 | 2026-01-17 |
-| 🥈 | [📊 Smart Infographic (AntV)](https://openwebui.com/posts/smart_infographic_ad6f0c7f) | 1.4.9 | 260 | 2514 | 2026-01-18 |
-| 🥉 | [Export to Excel](https://openwebui.com/posts/export_mulit_table_to_excel_244b8f9d) | 0.3.7 | 209 | 800 | 2026-01-07 |
-| 4️⃣ | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | 1.1.3 | 180 | 1975 | 2026-01-17 |
-| 5️⃣ | [Export to Word (Enhanced)](https://openwebui.com/posts/export_to_word_enhanced_formatting_fca6a315) | 0.4.3 | 158 | 1377 | 2026-01-17 |
-| 6️⃣ | [Flash Card](https://openwebui.com/posts/flash_card_65a2ea8f) | 0.2.4 | 138 | 2329 | 2026-01-17 |
+| 🥇 | [Smart Mind Map](https://openwebui.com/posts/turn_any_text_into_beautiful_mind_maps_3094c59a) | 0.9.1 | 550 | 4933 | 2026-01-17 |
+| 🥈 | [📊 Smart Infographic (AntV)](https://openwebui.com/posts/smart_infographic_ad6f0c7f) | 1.4.9 | 281 | 2651 | 2026-01-18 |
+| 🥉 | [Export to Excel](https://openwebui.com/posts/export_mulit_table_to_excel_244b8f9d) | 0.3.7 | 213 | 835 | 2026-01-07 |
+| 4️⃣ | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | 1.2.0 | 189 | 2048 | 2026-01-19 |
+| 5️⃣ | [Export to Word (Enhanced)](https://openwebui.com/posts/export_to_word_enhanced_formatting_fca6a315) | 0.4.3 | 168 | 1449 | 2026-01-17 |
+| 6️⃣ | [Flash Card](https://openwebui.com/posts/flash_card_65a2ea8f) | 0.2.4 | 143 | 2386 | 2026-01-17 |

 *完整统计请查看 [社区统计报告](./docs/community-stats.zh.md)*
 <!-- STATS_END -->
--- a/docs/badges/downloads.json
+++ b/docs/badges/downloads.json
@@ -1,7 +1,7 @@
 {
  "schemaVersion": 1,
  "label": "downloads",
-  "message": "1.8k",
+  "message": "1.9k",
  "color": "blue",
  "namedLogo": "openwebui"
 }
--- a/docs/badges/followers.json
+++ b/docs/badges/followers.json
@@ -1,6 +1,6 @@
 {
  "schemaVersion": 1,
  "label": "followers",
-  "message": "133",
+  "message": "137",
  "color": "blue"
 }
--- a/docs/community-stats.json
+++ b/docs/community-stats.json
@@ -1,13 +1,14 @@
 {
  "total_posts": 16,
-  "total_downloads": 1792,
-  "total_views": 21276,
+  "total_downloads": 1878,
+  "total_views": 22027,
  "total_upvotes": 120,
  "total_downvotes": 2,
-  "total_saves": 135,
+  "total_saves": 147,
  "total_comments": 24,
  "by_type": {
-    "action": 14,
+    "filter": 1,
+    "action": 13,
    "unknown": 2
  },
  "posts": [
@@ -18,10 +19,10 @@
      "version": "0.9.1",
      "author": "Fu-Jie",
      "description": "Intelligently analyzes text content and generates interactive mind maps to help users structure and visualize knowledge.",
-      "downloads": 532,
-      "views": 4822,
+      "downloads": 550,
+      "views": 4933,
      "upvotes": 15,
-      "saves": 28,
+      "saves": 30,
      "comments": 11,
      "created_at": "2025-12-30",
      "updated_at": "2026-01-17",
@@ -34,10 +35,10 @@
      "version": "1.4.9",
      "author": "Fu-Jie",
      "description": "AI-powered infographic generator based on AntV Infographic. Supports professional templates, auto-icon matching, and SVG/PNG downloads.",
-      "downloads": 260,
-      "views": 2514,
+      "downloads": 281,
+      "views": 2651,
      "upvotes": 14,
-      "saves": 20,
+      "saves": 21,
      "comments": 3,
      "created_at": "2025-12-28",
      "updated_at": "2026-01-18",
@@ -50,10 +51,10 @@
      "version": "0.3.7",
      "author": "Fu-Jie",
      "description": "Extracts tables from chat messages and exports them to Excel (.xlsx) files with smart formatting.",
-      "downloads": 209,
-      "views": 800,
+      "downloads": 213,
+      "views": 835,
      "upvotes": 4,
-      "saves": 5,
+      "saves": 6,
      "comments": 0,
      "created_at": "2025-05-30",
      "updated_at": "2026-01-07",
@@ -63,16 +64,16 @@
      "title": "Async Context Compression",
      "slug": "async_context_compression_b1655bc8",
      "type": "action",
-      "version": "1.1.3",
+      "version": "1.2.0",
      "author": "Fu-Jie",
      "description": "Reduces token consumption in long conversations while maintaining coherence through intelligent summarization and message compression.",
-      "downloads": 180,
-      "views": 1975,
+      "downloads": 189,
+      "views": 2048,
      "upvotes": 9,
-      "saves": 19,
+      "saves": 22,
      "comments": 0,
      "created_at": "2025-11-08",
-      "updated_at": "2026-01-17",
+      "updated_at": "2026-01-19",
      "url": "https://openwebui.com/posts/async_context_compression_b1655bc8"
    },
    {
@@ -82,10 +83,10 @@
      "version": "0.4.3",
      "author": "Fu-Jie",
      "description": "Export current conversation from Markdown to Word (.docx) with Mermaid diagrams rendered client-side (Mermaid.js, SVG+PNG), LaTeX math, real hyperlinks, improved tables, syntax highlighting, and blockquote support.",
-      "downloads": 158,
-      "views": 1377,
+      "downloads": 168,
+      "views": 1449,
      "upvotes": 8,
-      "saves": 16,
+      "saves": 17,
      "comments": 0,
      "created_at": "2026-01-03",
      "updated_at": "2026-01-17",
@@ -98,10 +99,10 @@
      "version": "0.2.4",
      "author": "Fu-Jie",
      "description": "Quickly generates beautiful flashcards from text, extracting key points and categories.",
-      "downloads": 138,
-      "views": 2329,
+      "downloads": 143,
+      "views": 2386,
      "upvotes": 10,
-      "saves": 10,
+      "saves": 12,
      "comments": 2,
      "created_at": "2025-12-30",
      "updated_at": "2026-01-17",
@@ -111,16 +112,16 @@
      "title": "Markdown Normalizer",
      "slug": "markdown_normalizer_baaa8732",
      "type": "action",
-      "version": "1.2.3",
+      "version": "1.2.4",
      "author": "Fu-Jie",
      "description": "A content normalizer filter that fixes common Markdown formatting issues in LLM outputs, such as broken code blocks, LaTeX formulas, and list formatting.",
-      "downloads": 84,
-      "views": 2100,
+      "downloads": 95,
+      "views": 2228,
      "upvotes": 10,
      "saves": 17,
      "comments": 5,
      "created_at": "2026-01-12",
-      "updated_at": "2026-01-17",
+      "updated_at": "2026-01-19",
      "url": "https://openwebui.com/posts/markdown_normalizer_baaa8732"
    },
    {
@@ -130,10 +131,10 @@
      "version": "1.0.0",
      "author": "Fu-Jie",
      "description": "A comprehensive thinking lens that dives deep into any content - from context to logic, insights, and action paths.",
-      "downloads": 68,
-      "views": 663,
+      "downloads": 71,
+      "views": 703,
      "upvotes": 4,
-      "saves": 6,
+      "saves": 7,
      "comments": 0,
      "created_at": "2026-01-08",
      "updated_at": "2026-01-08",
@@ -146,8 +147,8 @@
      "version": "0.4.3",
      "author": "Fu-Jie",
      "description": "将对话导出为 Word (.docx)，支持 Mermaid 图表 (客户端渲染 SVG+PNG)、LaTeX 数学公式、真实超链接、增强表格格式、代码高亮和引用块。",
-      "downloads": 63,
-      "views": 1305,
+      "downloads": 65,
+      "views": 1329,
      "upvotes": 11,
      "saves": 3,
      "comments": 1,
@@ -162,8 +163,8 @@
      "version": "1.4.9",
      "author": "Fu-Jie",
      "description": "基于 AntV Infographic 的智能信息图生成插件。支持多种专业模板，自动图标匹配，并提供 SVG/PNG 下载功能。",
-      "downloads": 42,
-      "views": 683,
+      "downloads": 43,
+      "views": 702,
      "upvotes": 6,
      "saves": 0,
      "comments": 0,
@@ -178,8 +179,8 @@
      "version": "0.9.1",
      "author": "Fu-Jie",
      "description": "智能分析文本内容,生成交互式思维导图,帮助用户结构化和可视化知识。",
-      "downloads": 22,
-      "views": 398,
+      "downloads": 24,
+      "views": 406,
      "upvotes": 3,
      "saves": 1,
      "comments": 0,
@@ -195,7 +196,7 @@
      "author": "Fu-Jie",
      "description": "快速将文本提炼为精美的学习记忆卡片，支持核心要点提取与分类。",
      "downloads": 16,
-      "views": 443,
+      "views": 452,
      "upvotes": 5,
      "saves": 1,
      "comments": 0,
@@ -206,17 +207,17 @@
    {
      "title": "异步上下文压缩",
      "slug": "异步上下文压缩_5c0617cb",
-      "type": "action",
-      "version": "1.1.3",
+      "type": "filter",
+      "version": "1.2.0",
      "author": "Fu-Jie",
      "description": "通过智能摘要和消息压缩，降低长对话的 token 消耗，同时保持对话连贯性。",
      "downloads": 14,
-      "views": 351,
+      "views": 374,
      "upvotes": 5,
      "saves": 1,
      "comments": 0,
      "created_at": "2025-11-08",
-      "updated_at": "2026-01-17",
+      "updated_at": "2026-01-19",
      "url": "https://openwebui.com/posts/异步上下文压缩_5c0617cb"
    },
    {
@@ -227,7 +228,7 @@
      "author": "Fu-Jie",
      "description": "全方位的思维透镜 —— 从背景全景到逻辑脉络，从深度洞察到行动路径。",
      "downloads": 6,
-      "views": 259,
+      "views": 261,
      "upvotes": 3,
      "saves": 1,
      "comments": 0,
@@ -243,7 +244,7 @@
      "author": "",
      "description": "",
      "downloads": 0,
-      "views": 59,
+      "views": 62,
      "upvotes": 1,
      "saves": 0,
      "comments": 0,
@@ -259,9 +260,9 @@
      "author": "",
      "description": "",
      "downloads": 0,
-      "views": 1198,
+      "views": 1208,
      "upvotes": 12,
-      "saves": 7,
+      "saves": 8,
      "comments": 2,
      "created_at": "2026-01-10",
      "updated_at": "2026-01-10",
@@ -273,7 +274,7 @@
    "name": "Fu-Jie",
    "profile_url": "https://openwebui.com/u/Fu-Jie",
    "profile_image": "https://community.s3.openwebui.com/uploads/users/b15d1348-4347-42b4-b815-e053342d6cb0/profile_d9510745-4bd4-4f8f-a997-4a21847d9300.webp",
-    "followers": 133,
+    "followers": 137,
    "following": 2,
    "total_points": 134,
    "post_points": 118,
--- a/docs/community-stats.md
+++ b/docs/community-stats.md
@@ -1,40 +1,41 @@
 # 📊 OpenWebUI Community Stats Report

-> 📅 Updated: 2026-01-19 18:11
+> 📅 Updated: 2026-01-20 17:15

 ## 📈 Overview

 | Metric | Value |
 |------|------|
 | 📝 Total Posts | 16 |
-| ⬇️ Total Downloads | 1792 |
-| 👁️ Total Views | 21276 |
+| ⬇️ Total Downloads | 1878 |
+| 👁️ Total Views | 22027 |
 | 👍 Total Upvotes | 120 |
-| 💾 Total Saves | 135 |
+| 💾 Total Saves | 147 |
 | 💬 Total Comments | 24 |

 ## 📂 By Type

- **action**: 14
+- **filter**: 1
+- **action**: 13
 - **unknown**: 2

 ## 📋 Posts List

 | Rank | Title | Type | Version | Downloads | Views | Upvotes | Saves | Updated |
 |:---:|------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-| 1 | [Smart Mind Map](https://openwebui.com/posts/turn_any_text_into_beautiful_mind_maps_3094c59a) | action | 0.9.1 | 532 | 4822 | 15 | 28 | 2026-01-17 |
-| 2 | [📊 Smart Infographic (AntV)](https://openwebui.com/posts/smart_infographic_ad6f0c7f) | action | 1.4.9 | 260 | 2514 | 14 | 20 | 2026-01-18 |
-| 3 | [Export to Excel](https://openwebui.com/posts/export_mulit_table_to_excel_244b8f9d) | action | 0.3.7 | 209 | 800 | 4 | 5 | 2026-01-07 |
-| 4 | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | action | 1.1.3 | 180 | 1975 | 9 | 19 | 2026-01-17 |
-| 5 | [Export to Word (Enhanced)](https://openwebui.com/posts/export_to_word_enhanced_formatting_fca6a315) | action | 0.4.3 | 158 | 1377 | 8 | 16 | 2026-01-17 |
-| 6 | [Flash Card](https://openwebui.com/posts/flash_card_65a2ea8f) | action | 0.2.4 | 138 | 2329 | 10 | 10 | 2026-01-17 |
-| 7 | [Markdown Normalizer](https://openwebui.com/posts/markdown_normalizer_baaa8732) | action | 1.2.3 | 84 | 2100 | 10 | 17 | 2026-01-17 |
-| 8 | [Deep Dive](https://openwebui.com/posts/deep_dive_c0b846e4) | action | 1.0.0 | 68 | 663 | 4 | 6 | 2026-01-08 |
-| 9 | [导出为 Word (增强版)](https://openwebui.com/posts/导出为_word_支持公式流程图表格和代码块_8a6306c0) | action | 0.4.3 | 63 | 1305 | 11 | 3 | 2026-01-17 |
-| 10 | [📊 智能信息图 (AntV Infographic)](https://openwebui.com/posts/智能信息图_e04a48ff) | action | 1.4.9 | 42 | 683 | 6 | 0 | 2026-01-17 |
-| 11 | [思维导图](https://openwebui.com/posts/智能生成交互式思维导图帮助用户可视化知识_8d4b097b) | action | 0.9.1 | 22 | 398 | 3 | 1 | 2026-01-17 |
-| 12 | [闪记卡 (Flash Card)](https://openwebui.com/posts/闪记卡生成插件_4a31eac3) | action | 0.2.4 | 16 | 443 | 5 | 1 | 2026-01-17 |
-| 13 | [异步上下文压缩](https://openwebui.com/posts/异步上下文压缩_5c0617cb) | action | 1.1.3 | 14 | 351 | 5 | 1 | 2026-01-17 |
-| 14 | [精读](https://openwebui.com/posts/精读_99830b0f) | action | 1.0.0 | 6 | 259 | 3 | 1 | 2026-01-08 |
-| 15 | [Review of Claude Haiku 4.5](https://openwebui.com/posts/review_of_claude_haiku_45_41b0db39) | unknown |  | 0 | 59 | 1 | 0 | 2026-01-14 |
-| 16 | [ 🛠️ Debug Open WebUI Plugins in Your Browser](https://openwebui.com/posts/debug_open_webui_plugins_in_your_browser_81bf7960) | unknown |  | 0 | 1198 | 12 | 7 | 2026-01-10 |
+| 1 | [Smart Mind Map](https://openwebui.com/posts/turn_any_text_into_beautiful_mind_maps_3094c59a) | action | 0.9.1 | 550 | 4933 | 15 | 30 | 2026-01-17 |
+| 2 | [📊 Smart Infographic (AntV)](https://openwebui.com/posts/smart_infographic_ad6f0c7f) | action | 1.4.9 | 281 | 2651 | 14 | 21 | 2026-01-18 |
+| 3 | [Export to Excel](https://openwebui.com/posts/export_mulit_table_to_excel_244b8f9d) | action | 0.3.7 | 213 | 835 | 4 | 6 | 2026-01-07 |
+| 4 | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | action | 1.2.0 | 189 | 2048 | 9 | 22 | 2026-01-19 |
+| 5 | [Export to Word (Enhanced)](https://openwebui.com/posts/export_to_word_enhanced_formatting_fca6a315) | action | 0.4.3 | 168 | 1449 | 8 | 17 | 2026-01-17 |
+| 6 | [Flash Card](https://openwebui.com/posts/flash_card_65a2ea8f) | action | 0.2.4 | 143 | 2386 | 10 | 12 | 2026-01-17 |
+| 7 | [Markdown Normalizer](https://openwebui.com/posts/markdown_normalizer_baaa8732) | action | 1.2.4 | 95 | 2228 | 10 | 17 | 2026-01-19 |
+| 8 | [Deep Dive](https://openwebui.com/posts/deep_dive_c0b846e4) | action | 1.0.0 | 71 | 703 | 4 | 7 | 2026-01-08 |
+| 9 | [导出为 Word (增强版)](https://openwebui.com/posts/导出为_word_支持公式流程图表格和代码块_8a6306c0) | action | 0.4.3 | 65 | 1329 | 11 | 3 | 2026-01-17 |
+| 10 | [📊 智能信息图 (AntV Infographic)](https://openwebui.com/posts/智能信息图_e04a48ff) | action | 1.4.9 | 43 | 702 | 6 | 0 | 2026-01-17 |
+| 11 | [思维导图](https://openwebui.com/posts/智能生成交互式思维导图帮助用户可视化知识_8d4b097b) | action | 0.9.1 | 24 | 406 | 3 | 1 | 2026-01-17 |
+| 12 | [闪记卡 (Flash Card)](https://openwebui.com/posts/闪记卡生成插件_4a31eac3) | action | 0.2.4 | 16 | 452 | 5 | 1 | 2026-01-17 |
+| 13 | [异步上下文压缩](https://openwebui.com/posts/异步上下文压缩_5c0617cb) | filter | 1.2.0 | 14 | 374 | 5 | 1 | 2026-01-19 |
+| 14 | [精读](https://openwebui.com/posts/精读_99830b0f) | action | 1.0.0 | 6 | 261 | 3 | 1 | 2026-01-08 |
+| 15 | [Review of Claude Haiku 4.5](https://openwebui.com/posts/review_of_claude_haiku_45_41b0db39) | unknown |  | 0 | 62 | 1 | 0 | 2026-01-14 |
+| 16 | [ 🛠️ Debug Open WebUI Plugins in Your Browser](https://openwebui.com/posts/debug_open_webui_plugins_in_your_browser_81bf7960) | unknown |  | 0 | 1208 | 12 | 8 | 2026-01-10 |
--- a/docs/community-stats.zh.md
+++ b/docs/community-stats.zh.md
@@ -1,40 +1,41 @@
 # 📊 OpenWebUI 社区统计报告

-> 📅 更新时间: 2026-01-19 18:11
+> 📅 更新时间: 2026-01-20 17:15

 ## 📈 总览

 | 指标 | 数值 |
 |------|------|
 | 📝 发布数量 | 16 |
-| ⬇️ 总下载量 | 1792 |
-| 👁️ 总浏览量 | 21276 |
+| ⬇️ 总下载量 | 1878 |
+| 👁️ 总浏览量 | 22027 |
 | 👍 总点赞数 | 120 |
-| 💾 总收藏数 | 135 |
+| 💾 总收藏数 | 147 |
 | 💬 总评论数 | 24 |

 ## 📂 按类型分类

- **action**: 14
+- **filter**: 1
+- **action**: 13
 - **unknown**: 2

 ## 📋 发布列表

 | 排名 | 标题 | 类型 | 版本 | 下载 | 浏览 | 点赞 | 收藏 | 更新日期 |
 |:---:|------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-| 1 | [Smart Mind Map](https://openwebui.com/posts/turn_any_text_into_beautiful_mind_maps_3094c59a) | action | 0.9.1 | 532 | 4822 | 15 | 28 | 2026-01-17 |
-| 2 | [📊 Smart Infographic (AntV)](https://openwebui.com/posts/smart_infographic_ad6f0c7f) | action | 1.4.9 | 260 | 2514 | 14 | 20 | 2026-01-18 |
-| 3 | [Export to Excel](https://openwebui.com/posts/export_mulit_table_to_excel_244b8f9d) | action | 0.3.7 | 209 | 800 | 4 | 5 | 2026-01-07 |
-| 4 | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | action | 1.1.3 | 180 | 1975 | 9 | 19 | 2026-01-17 |
-| 5 | [Export to Word (Enhanced)](https://openwebui.com/posts/export_to_word_enhanced_formatting_fca6a315) | action | 0.4.3 | 158 | 1377 | 8 | 16 | 2026-01-17 |
-| 6 | [Flash Card](https://openwebui.com/posts/flash_card_65a2ea8f) | action | 0.2.4 | 138 | 2329 | 10 | 10 | 2026-01-17 |
-| 7 | [Markdown Normalizer](https://openwebui.com/posts/markdown_normalizer_baaa8732) | action | 1.2.3 | 84 | 2100 | 10 | 17 | 2026-01-17 |
-| 8 | [Deep Dive](https://openwebui.com/posts/deep_dive_c0b846e4) | action | 1.0.0 | 68 | 663 | 4 | 6 | 2026-01-08 |
-| 9 | [导出为 Word (增强版)](https://openwebui.com/posts/导出为_word_支持公式流程图表格和代码块_8a6306c0) | action | 0.4.3 | 63 | 1305 | 11 | 3 | 2026-01-17 |
-| 10 | [📊 智能信息图 (AntV Infographic)](https://openwebui.com/posts/智能信息图_e04a48ff) | action | 1.4.9 | 42 | 683 | 6 | 0 | 2026-01-17 |
-| 11 | [思维导图](https://openwebui.com/posts/智能生成交互式思维导图帮助用户可视化知识_8d4b097b) | action | 0.9.1 | 22 | 398 | 3 | 1 | 2026-01-17 |
-| 12 | [闪记卡 (Flash Card)](https://openwebui.com/posts/闪记卡生成插件_4a31eac3) | action | 0.2.4 | 16 | 443 | 5 | 1 | 2026-01-17 |
-| 13 | [异步上下文压缩](https://openwebui.com/posts/异步上下文压缩_5c0617cb) | action | 1.1.3 | 14 | 351 | 5 | 1 | 2026-01-17 |
-| 14 | [精读](https://openwebui.com/posts/精读_99830b0f) | action | 1.0.0 | 6 | 259 | 3 | 1 | 2026-01-08 |
-| 15 | [Review of Claude Haiku 4.5](https://openwebui.com/posts/review_of_claude_haiku_45_41b0db39) | unknown |  | 0 | 59 | 1 | 0 | 2026-01-14 |
-| 16 | [ 🛠️ Debug Open WebUI Plugins in Your Browser](https://openwebui.com/posts/debug_open_webui_plugins_in_your_browser_81bf7960) | unknown |  | 0 | 1198 | 12 | 7 | 2026-01-10 |
+| 1 | [Smart Mind Map](https://openwebui.com/posts/turn_any_text_into_beautiful_mind_maps_3094c59a) | action | 0.9.1 | 550 | 4933 | 15 | 30 | 2026-01-17 |
+| 2 | [📊 Smart Infographic (AntV)](https://openwebui.com/posts/smart_infographic_ad6f0c7f) | action | 1.4.9 | 281 | 2651 | 14 | 21 | 2026-01-18 |
+| 3 | [Export to Excel](https://openwebui.com/posts/export_mulit_table_to_excel_244b8f9d) | action | 0.3.7 | 213 | 835 | 4 | 6 | 2026-01-07 |
+| 4 | [Async Context Compression](https://openwebui.com/posts/async_context_compression_b1655bc8) | action | 1.2.0 | 189 | 2048 | 9 | 22 | 2026-01-19 |
+| 5 | [Export to Word (Enhanced)](https://openwebui.com/posts/export_to_word_enhanced_formatting_fca6a315) | action | 0.4.3 | 168 | 1449 | 8 | 17 | 2026-01-17 |
+| 6 | [Flash Card](https://openwebui.com/posts/flash_card_65a2ea8f) | action | 0.2.4 | 143 | 2386 | 10 | 12 | 2026-01-17 |
+| 7 | [Markdown Normalizer](https://openwebui.com/posts/markdown_normalizer_baaa8732) | action | 1.2.4 | 95 | 2228 | 10 | 17 | 2026-01-19 |
+| 8 | [Deep Dive](https://openwebui.com/posts/deep_dive_c0b846e4) | action | 1.0.0 | 71 | 703 | 4 | 7 | 2026-01-08 |
+| 9 | [导出为 Word (增强版)](https://openwebui.com/posts/导出为_word_支持公式流程图表格和代码块_8a6306c0) | action | 0.4.3 | 65 | 1329 | 11 | 3 | 2026-01-17 |
+| 10 | [📊 智能信息图 (AntV Infographic)](https://openwebui.com/posts/智能信息图_e04a48ff) | action | 1.4.9 | 43 | 702 | 6 | 0 | 2026-01-17 |
+| 11 | [思维导图](https://openwebui.com/posts/智能生成交互式思维导图帮助用户可视化知识_8d4b097b) | action | 0.9.1 | 24 | 406 | 3 | 1 | 2026-01-17 |
+| 12 | [闪记卡 (Flash Card)](https://openwebui.com/posts/闪记卡生成插件_4a31eac3) | action | 0.2.4 | 16 | 452 | 5 | 1 | 2026-01-17 |
+| 13 | [异步上下文压缩](https://openwebui.com/posts/异步上下文压缩_5c0617cb) | filter | 1.2.0 | 14 | 374 | 5 | 1 | 2026-01-19 |
+| 14 | [精读](https://openwebui.com/posts/精读_99830b0f) | action | 1.0.0 | 6 | 261 | 3 | 1 | 2026-01-08 |
+| 15 | [Review of Claude Haiku 4.5](https://openwebui.com/posts/review_of_claude_haiku_45_41b0db39) | unknown |  | 0 | 62 | 1 | 0 | 2026-01-14 |
+| 16 | [ 🛠️ Debug Open WebUI Plugins in Your Browser](https://openwebui.com/posts/debug_open_webui_plugins_in_your_browser_81bf7960) | unknown |  | 0 | 1208 | 12 | 8 | 2026-01-10 |
--- a/docs/plugins/filters/async-context-compression.md
+++ b/docs/plugins/filters/async-context-compression.md
@@ -1,7 +1,7 @@
 # Async Context Compression

 <span class="category-badge filter">Filter</span>
-<span class="version-badge">v1.2.0</span>
+<span class="version-badge">v1.2.1</span>

 Reduces token consumption in long conversations through intelligent summarization while maintaining conversational coherence.

@@ -38,6 +38,8 @@ This is especially useful for:
 - :material-format-align-justify: **Structure-Aware Trimming**: Preserves document structure
 - :material-content-cut: **Native Tool Output Trimming**: Trims verbose tool outputs (Note: Non-native tool outputs are not fully injected into context)
 - :material-chart-bar: **Detailed Token Logging**: Granular token breakdown
+- :material-account-search: **Smart Model Matching**: Inherit config from base models
+- :material-image-off: **Multimodal Support**: Images are preserved but tokens are **NOT** calculated

 ---

@@ -73,6 +75,7 @@ graph TD
 | `keep_first` | integer | `1` | Always keep the first N messages |
 | `keep_last` | integer | `6` | Always keep the last N messages |
 | `summary_model` | string | `None` | Model to use for summarization |
+| `summary_model_max_context` | integer | `0` | Max context tokens for summary model |
 | `max_summary_tokens` | integer | `16384` | Maximum tokens for the summary |
 | `enable_tool_output_trimming` | boolean | `false` | Enable trimming of large tool outputs |

--- a/docs/plugins/filters/async-context-compression.zh.md
+++ b/docs/plugins/filters/async-context-compression.zh.md
@@ -1,7 +1,7 @@
 # Async Context Compression（异步上下文压缩）

 <span class="category-badge filter">Filter</span>
-<span class="version-badge">v1.2.0</span>
+<span class="version-badge">v1.2.1</span>

 通过智能摘要减少长对话的 token 消耗，同时保持对话连贯。

@@ -38,6 +38,8 @@ Async Context Compression 过滤器通过以下方式帮助管理长对话的 to
 - :material-format-align-justify: **结构感知裁剪**：保留文档结构的智能裁剪
 - :material-content-cut: **原生工具输出裁剪**：自动裁剪冗长的工具输出（注意：非原生工具调用输出不会完整注入上下文）
 - :material-chart-bar: **详细 Token 日志**：提供细粒度的 Token 统计
+- :material-account-search: **智能模型匹配**：自定义模型自动继承基础模型配置
+- :material-image-off: **多模态支持**：图片内容保留但 Token **不参与计算**

 ---

@@ -73,6 +75,7 @@ graph TD
 | `keep_first` | integer | `1` | 始终保留的前 N 条消息 |
 | `keep_last` | integer | `6` | 始终保留的后 N 条消息 |
 | `summary_model` | string | `None` | 用于摘要的模型 |
+| `summary_model_max_context` | integer | `0` | 摘要模型的最大上下文 Token 数 |
 | `max_summary_tokens` | integer | `16384` | 摘要的最大 token 数 |
 | `enable_tool_output_trimming` | boolean | `false` | 启用长工具输出裁剪 |

--- a/docs/plugins/filters/index.md
+++ b/docs/plugins/filters/index.md
@@ -22,7 +22,7 @@ Filters act as middleware in the message pipeline:

    Reduces token consumption in long conversations through intelligent summarization while maintaining coherence.

-    **Version:** 1.1.3
+    **Version:** 1.2.1

    [:octicons-arrow-right-24: Documentation](async-context-compression.md)

--- a/docs/plugins/filters/index.zh.md
+++ b/docs/plugins/filters/index.zh.md
@@ -22,7 +22,7 @@ Filter 充当消息管线中的中间件：

    通过智能总结减少长对话的 token 消耗，同时保持连贯性。

-    **版本：** 1.1.3
+    **版本：** 1.2.1

    [:octicons-arrow-right-24: 查看文档](async-context-compression.md)

--- a/plugins/filters/async-context-compression/README.md
+++ b/plugins/filters/async-context-compression/README.md
@@ -1,9 +1,15 @@
 # Async Context Compression Filter

-**Author:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **Version:** 1.2.0 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **License:** MIT
+**Author:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **Version:** 1.2.1 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **License:** MIT

 This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.

+## What's new in 1.2.1
+
+- **Smart Configuration**: Automatically detects base model settings for custom models and adds `summary_model_max_context` for independent summary limits.
+- **Performance & Refactoring**: Optimized threshold parsing with caching, removed redundant code, and improved LLM response handling (JSONResponse support).
+- **Bug Fixes & Modernization**: Fixed `datetime` deprecation warnings, corrected type annotations, and replaced print statements with proper logging.
+
 ## What's new in 1.2.0

 - **Preflight Context Check**: Before sending to the model, validates that total tokens fit within the context window. Automatically trims or drops oldest messages if exceeded.
@@ -19,18 +25,6 @@ This filter reduces token consumption in long conversations through intelligent
 - **Enhanced Stability**: Fixed a race condition in state management that could cause "inlet state not found" warnings in high-concurrency scenarios.
 - **Bug Fixes**: Corrected default model handling to prevent misleading logs when no model is specified.

-## What's new in 1.1.2
-
- **Open WebUI v0.7.x Compatibility**: Resolved a critical database session binding error affecting Open WebUI v0.7.x users. The plugin now dynamically discovers the database engine and session context, ensuring compatibility across versions.
- **Enhanced Error Reporting**: Errors during background summary generation are now reported via both the status bar and browser console.
- **Robust Model Handling**: Improved handling of missing or invalid model IDs to prevent crashes.
-
-## What's new in 1.1.1
-
- **Frontend Debugging**: Added `show_debug_log` option to print debug info to the browser console (F12).
- **Optimized Compression**: Improved token calculation logic to prevent aggressive truncation of history, ensuring more context is retained.
-
-

 ---

@@ -45,6 +39,8 @@ This filter reduces token consumption in long conversations through intelligent
 - ✅ Native tool output trimming for cleaner context when using function calling.
 - ✅ Real-time context usage monitoring with warning notifications (>90%).
 - ✅ Detailed token logging for precise debugging and optimization.
+- ✅ **Smart Model Matching**: Automatically inherits configuration from base models for custom presets.
+- ⚠ **Multimodal Support**: Images are preserved but their tokens are **NOT** calculated. Please adjust thresholds accordingly.

 ---

@@ -75,7 +71,8 @@ It is recommended to keep this filter early in the chain so it runs before filte
 | `keep_first`                   | `1`      | Always keep the first N messages (protects system prompts).                                                                                                           |
 | `keep_last`                    | `6`      | Always keep the last N messages to preserve recent context.                                                                                                           |
 | `summary_model`                | `None`   | Model for summaries. Strongly recommended to set a fast, economical model (e.g., `gemini-2.5-flash`, `deepseek-v3`). Falls back to the current chat model when empty. |
-| `max_summary_tokens`           | `4000`   | Maximum tokens for the generated summary.                                                                                                                             |
+| `summary_model_max_context`    | `0`      | Max context tokens for the summary model. If 0, falls back to `model_thresholds` or global `max_context_tokens`.                                                      |
+| `max_summary_tokens`           | `16384`  | Maximum tokens for the generated summary.                                                                                                                             |
 | `summary_temperature`          | `0.3`    | Randomness for summary generation. Lower is more deterministic.                                                                                                       |
 | `model_thresholds`             | `{}`     | Per-model overrides for `compression_threshold_tokens` and `max_context_tokens` (useful for mixed models).                                                            |
 | `enable_tool_output_trimming`  | `false`  | When enabled and `function_calling: "native"` is active, trims verbose tool outputs to extract only the final answer.                                                 |
--- a/plugins/filters/async-context-compression/README_CN.md
+++ b/plugins/filters/async-context-compression/README_CN.md
@@ -1,11 +1,17 @@
 # 异步上下文压缩过滤器

-**作者:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **版本:** 1.2.0 | **项目:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **许可证:** MIT
+**作者:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **版本:** 1.2.1 | **项目:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **许可证:** MIT

 > **重要提示**：为了确保所有过滤器的可维护性和易用性，每个过滤器都应附带清晰、完整的文档，以确保其功能、配置和使用方法得到充分说明。

 本过滤器通过智能摘要和消息压缩技术，在保持对话连贯性的同时，显著降低长对话的 Token 消耗。

+## 1.2.1 版本更新
+
+- **智能配置增强**: 自动检测自定义模型的基础模型配置，并新增 `summary_model_max_context` 参数以独立控制摘要模型的上下文限制。
+- **性能优化与重构**: 重构了阈值解析逻辑并增加缓存，移除了冗余的处理代码，并增强了 LLM 响应处理（支持 JSONResponse）。
+- **稳定性改进**: 修复了 `datetime` 弃用警告，修正了类型注解，并将 print 语句替换为标准日志记录。
+
 ## 1.2.0 版本更新

 - **预检上下文检查 (Preflight Context Check)**: 在发送给模型之前，验证总 Token 是否符合上下文窗口。如果超出，自动裁剪或丢弃最旧的消息。
@@ -21,18 +27,6 @@
 - **稳定性增强**: 修复了状态管理中的竞态条件，解决了高并发场景下可能出现的“无法获取 inlet 状态”警告。
 - **Bug 修复**: 修正了默认模型处理逻辑，防止在未指定模型时产生误导性日志。

-## 1.1.2 版本更新
-
- **Open WebUI v0.7.x 兼容性**: 修复了影响 Open WebUI v0.7.x 用户的严重数据库会话绑定错误。插件现在动态发现数据库引擎和会话上下文，确保跨版本兼容性。
- **增强错误报告**: 后台摘要生成过程中的错误现在会通过状态栏和浏览器控制台同时报告。
- **健壮的模型处理**: 改进了对缺失或无效模型 ID 的处理，防止程序崩溃。
-
-## 1.1.1 版本更新
-
- **前端调试**: 新增 `show_debug_log` 选项，支持在浏览器控制台 (F12) 打印调试信息。
- **压缩优化**: 优化 Token 计算逻辑，防止历史记录被过度截断，保留更多上下文。
-
-

 ---

@@ -47,6 +41,8 @@
 - ✅ **原生工具输出裁剪**: 支持裁剪冗长的工具调用输出。
 - ✅ **实时监控**: 实时监控上下文使用情况，超过 90% 发出警告。
 - ✅ **详细日志**: 提供精确的 Token 统计日志，便于调试。
+- ✅ **智能模型匹配**: 自定义模型自动继承基础模型的阈值配置。
+- ⚠ **多模态支持**: 图片内容会被保留，但其 Token **不参与计算**。请相应调整阈值。

 详细的工作原理和流程请参考 [工作流程指南](WORKFLOW_GUIDE_CN.md)。

@@ -88,6 +84,7 @@
 | 参数                  | 默认值  | 描述                                                                                                                                        |
 | :-------------------- | :------ | :------------------------------------------------------------------------------------------------------------------------------------------ |
 | `summary_model`       | `None`  | 用于生成摘要的模型 ID。**强烈建议**配置快速、经济、上下文窗口大的模型（如 `gemini-2.5-flash`、`deepseek-v3`）。留空则尝试复用当前对话模型。 |
+| `summary_model_max_context` | `0`     | 摘要模型的最大上下文 Token 数。如果为 0，则回退到 `model_thresholds` 或全局 `max_context_tokens`。                                          |
 | `max_summary_tokens`  | `16384` | 生成摘要时允许的最大 Token 数。                                                                                                             |
 | `summary_temperature` | `0.1`   | 控制摘要生成的随机性，较低的值结果更稳定。                                                                                                  |

--- a/plugins/filters/async-context-compression/async_context_compression.py
+++ b/plugins/filters/async-context-compression/async_context_compression.py
@@ -5,19 +5,17 @@ author: Fu-Jie
 author_url: https://github.com/Fu-Jie/awesome-openwebui
 funding_url: https://github.com/open-webui
 description: Reduces token consumption in long conversations while maintaining coherence through intelligent summarization and message compression.
-version: 1.2.0
+version: 1.2.1
 openwebui_id: b1655bc8-6de9-4cad-8cb5-a6f7829a02ce
 license: MIT

 ═══════════════════════════════════════════════════════════════════════════════
-📌 What's new in 1.2.0
+📌 What's new in 1.2.1
 ═══════════════════════════════════════════════════════════════════════════════

-  ✅ Preflight Context Check: Validates context fit before sending to model.
-  ✅ Structure-Aware Trimming: Collapses long AI responses while keeping H1-H6, intro, and conclusion.
-  ✅ Native Tool Output Trimming: Cleaner context when using function calling. (Note: Non-native tool outputs are not fully injected into context)
-  ✅ Context Usage Warning: Notification when usage exceeds 90%.
-  ✅ Detailed Token Logging: Granular breakdown of System, Head, Summary, and Tail tokens.
+  ✅ Smart Configuration: Automatically detects base model settings for custom models and adds `summary_model_max_context` for independent summary limits.
+  ✅ Performance & Refactoring: Optimized threshold parsing with caching and removed redundant code for better efficiency.
+  ✅ Bug Fixes & Modernization: Fixed `datetime` deprecation warnings and corrected type annotations.

 ═══════════════════════════════════════════════════════════════════════════════
 📌 Overview
@@ -229,6 +227,8 @@ Statistics:
   ✓ This filter supports multimodal messages containing images.
   ✓ The summary is generated only from the text content.
   ✓ Non-text parts (like images) are preserved in their original messages during compression.
+   ⚠ Image tokens are NOT calculated. Different models have vastly different image token costs
+     (GPT-4o: 85-1105, Claude: ~1300, Gemini: ~258 per image). Plan your thresholds accordingly.

 ═══════════════════════════════════════════════════════════════════════════════
 🐛 Troubleshooting
@@ -259,7 +259,7 @@ Solution:

 """

-from pydantic import BaseModel, Field, model_validator
+from pydantic import BaseModel, Field
 from typing import Optional, Dict, Any, List, Union, Callable, Awaitable
 import re
 import asyncio
@@ -267,6 +267,10 @@ import json
 import hashlib
 import time
 import contextlib
+import logging
+
+# Setup logger
+logger = logging.getLogger(__name__)

 # Open WebUI built-in imports
 from open_webui.utils.chat import generate_chat_completion
@@ -291,7 +295,7 @@ except ImportError:
 from sqlalchemy import Column, String, Text, DateTime, Integer, inspect
 from sqlalchemy.orm import declarative_base, sessionmaker
 from sqlalchemy.engine import Engine
-from datetime import datetime
+from datetime import datetime, timezone


 def _discover_owui_engine(db_module: Any) -> Optional[Engine]:
@@ -312,7 +316,7 @@ def _discover_owui_engine(db_module: Any) -> Optional[Engine]:
                        session, "engine", None
                    )
        except Exception as exc:
-            print(f"[DB Discover] get_db_context failed: {exc}")
+            logger.error(f"[DB Discover] get_db_context failed: {exc}")

    for attr in ("engine", "ENGINE", "bind", "BIND"):
        candidate = getattr(db_module, attr, None)
@@ -334,7 +338,7 @@ def _discover_owui_schema(db_module: Any) -> Optional[str]:
        if isinstance(candidate, str) and candidate.strip():
            return candidate.strip()
    except Exception as exc:
-        print(f"[DB Discover] Base metadata schema lookup failed: {exc}")
+        logger.error(f"[DB Discover] Base metadata schema lookup failed: {exc}")

    try:
        metadata_obj = getattr(db_module, "metadata_obj", None)
@@ -344,7 +348,7 @@ def _discover_owui_schema(db_module: Any) -> Optional[str]:
        if isinstance(candidate, str) and candidate.strip():
            return candidate.strip()
    except Exception as exc:
-        print(f"[DB Discover] metadata_obj schema lookup failed: {exc}")
+        logger.error(f"[DB Discover] metadata_obj schema lookup failed: {exc}")

    try:
        from open_webui import env as owui_env
@@ -353,7 +357,7 @@ def _discover_owui_schema(db_module: Any) -> Optional[str]:
        if isinstance(candidate, str) and candidate.strip():
            return candidate.strip()
    except Exception as exc:
-        print(f"[DB Discover] env schema lookup failed: {exc}")
+        logger.error(f"[DB Discover] env schema lookup failed: {exc}")

    return None

@@ -379,8 +383,21 @@ class ChatSummary(owui_Base):
    chat_id = Column(String(255), unique=True, nullable=False, index=True)
    summary = Column(Text, nullable=False)
    compressed_message_count = Column(Integer, default=0)
-    created_at = Column(DateTime, default=datetime.utcnow)
-    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
+    created_at = Column(DateTime, default=lambda: datetime.now(timezone.utc))
+    updated_at = Column(
+        DateTime,
+        default=lambda: datetime.now(timezone.utc),
+        onupdate=lambda: datetime.now(timezone.utc),
+    )
+
+
+# Global cache for tiktoken encoding
+TIKTOKEN_ENCODING = None
+if tiktoken:
+    try:
+        TIKTOKEN_ENCODING = tiktoken.get_encoding("o200k_base")
+    except Exception as e:
+        logger.error(f"[Init] Failed to load tiktoken encoding: {e}")


 class Filter:
@@ -391,8 +408,48 @@ class Filter:
        self._fallback_session_factory = (
            sessionmaker(bind=self._db_engine) if self._db_engine else None
        )
+        self._model_thresholds_cache: Optional[Dict[str, Any]] = None
        self._init_database()

+    def _parse_model_thresholds(self) -> Dict[str, Any]:
+        """Parse model_thresholds string into a dictionary.
+
+        Format: model_id:compression_threshold:max_context, model_id2:threshold2:max2
+        Example: gpt-4:8000:32000, claude-3:100000:200000
+
+        Returns cached result if already parsed.
+        """
+        if self._model_thresholds_cache is not None:
+            return self._model_thresholds_cache
+
+        self._model_thresholds_cache = {}
+        raw_config = self.valves.model_thresholds
+        if not raw_config:
+            return self._model_thresholds_cache
+
+        for entry in raw_config.split(","):
+            entry = entry.strip()
+            if not entry:
+                continue
+
+            parts = entry.split(":")
+            if len(parts) != 3:
+                continue
+
+            try:
+                model_id = parts[0].strip()
+                compression_threshold = int(parts[1].strip())
+                max_context = int(parts[2].strip())
+
+                self._model_thresholds_cache[model_id] = {
+                    "compression_threshold_tokens": compression_threshold,
+                    "max_context_tokens": max_context,
+                }
+            except ValueError:
+                continue
+
+        return self._model_thresholds_cache
+
    @contextlib.contextmanager
    def _db_session(self):
        """Yield a database session using Open WebUI helpers with graceful fallbacks."""
@@ -435,7 +492,7 @@ class Filter:
            try:
                session.close()
            except Exception as exc:  # pragma: no cover - best-effort cleanup
-                print(f"[Database] ⚠️ Failed to close fallback session: {exc}")
+                logger.warning(f"[Database] ⚠️ Failed to close fallback session: {exc}")

    def _init_database(self):
        """Initializes the database table using Open WebUI's shared connection."""
@@ -447,19 +504,26 @@ class Filter:

            # Check if table exists using SQLAlchemy inspect
            inspector = inspect(self._db_engine)
-            if not inspector.has_table("chat_summary"):
+            # Support schema if configured
+            has_table = (
+                inspector.has_table("chat_summary", schema=owui_schema)
+                if owui_schema
+                else inspector.has_table("chat_summary")
+            )
+
+            if not has_table:
                # Create the chat_summary table if it doesn't exist
                ChatSummary.__table__.create(bind=self._db_engine, checkfirst=True)
-                print(
+                logger.info(
                    "[Database] ✅ Successfully created chat_summary table using Open WebUI's shared database connection."
                )
            else:
-                print(
+                logger.info(
                    "[Database] ✅ Using Open WebUI's shared database connection. chat_summary table already exists."
                )

        except Exception as e:
-            print(f"[Database] ❌ Initialization failed: {str(e)}")
+            logger.error(f"[Database] ❌ Initialization failed: {str(e)}")

    class Valves(BaseModel):
        priority: int = Field(
@@ -476,9 +540,9 @@ class Filter:
            ge=0,
            description="Hard limit for context. Exceeding this value will force removal of earliest messages (Global Default)",
        )
-        model_thresholds: dict = Field(
-            default={},
-            description="Threshold override configuration for specific models. Only includes models requiring special configuration.",
+        model_thresholds: str = Field(
+            default="",
+            description="Per-model threshold overrides. Format: model_id:compression_threshold:max_context (comma-separated). Example: gpt-4:8000:32000, claude-3:100000:200000",
        )

        keep_first: int = Field(
@@ -489,10 +553,15 @@ class Filter:
        keep_last: int = Field(
            default=6, ge=0, description="Always keep the last N full messages."
        )
-        summary_model: str = Field(
+        summary_model: Optional[str] = Field(
            default=None,
            description="The model ID used to generate the summary. If empty, uses the current conversation's model. Used to match configurations in model_thresholds.",
        )
+        summary_model_max_context: int = Field(
+            default=0,
+            ge=0,
+            description="Max context tokens for the summary model. If 0, falls back to model_thresholds or global max_context_tokens. Example: gemini-flash=1000000, gpt-4o-mini=128000.",
+        )
        max_summary_tokens: int = Field(
            default=16384,
            ge=1,
@@ -529,7 +598,7 @@ class Filter:
                    # [Optimization] Optimistic lock check: update only if progress moves forward
                    if compressed_count <= existing.compressed_message_count:
                        if self.valves.debug_mode:
-                            print(
+                            logger.info(
                                f"[Storage] Skipping update: New progress ({compressed_count}) is not greater than existing progress ({existing.compressed_message_count})"
                            )
                        return
@@ -537,7 +606,7 @@ class Filter:
                    # Update existing record
                    existing.summary = summary
                    existing.compressed_message_count = compressed_count
-                    existing.updated_at = datetime.utcnow()
+                    existing.updated_at = datetime.now(timezone.utc)
                else:
                    # Create new record
                    new_summary = ChatSummary(
@@ -551,12 +620,12 @@ class Filter:

                if self.valves.debug_mode:
                    action = "Updated" if existing else "Created"
-                    print(
+                    logger.info(
                        f"[Storage] Summary has been {action.lower()} in the database (Chat ID: {chat_id})"
                    )

        except Exception as e:
-            print(f"[Storage] ❌ Database save failed: {str(e)}")
+            logger.error(f"[Storage] ❌ Database save failed: {str(e)}")

    def _load_summary_record(self, chat_id: str) -> Optional[ChatSummary]:
        """Loads the summary record object from the database."""
@@ -568,7 +637,7 @@ class Filter:
                    session.expunge(record)
                    return record
        except Exception as e:
-            print(f"[Load] ❌ Database read failed: {str(e)}")
+            logger.error(f"[Load] ❌ Database read failed: {str(e)}")
        return None

    def _load_summary(self, chat_id: str, body: dict) -> Optional[str]:
@@ -576,8 +645,8 @@ class Filter:
        record = self._load_summary_record(chat_id)
        if record:
            if self.valves.debug_mode:
-                print(f"[Load] Loaded summary from database (Chat ID: {chat_id})")
-                print(
+                logger.info(f"[Load] Loaded summary from database (Chat ID: {chat_id})")
+                logger.info(
                    f"[Load] Last updated: {record.updated_at}, Compressed message count: {record.compressed_message_count}"
                )
            return record.summary
@@ -588,14 +657,12 @@ class Filter:
        if not text:
            return 0

-        if tiktoken:
+        if TIKTOKEN_ENCODING:
            try:
-                # Uniformly use o200k_base encoding (adapted for latest models)
-                encoding = tiktoken.get_encoding("o200k_base")
-                return len(encoding.encode(text))
+                return len(TIKTOKEN_ENCODING.encode(text))
            except Exception as e:
                if self.valves.debug_mode:
-                    print(
+                    logger.warning(
                        f"[Token Count] tiktoken error: {e}, falling back to character estimation"
                    )

@@ -604,6 +671,7 @@ class Filter:

    def _calculate_messages_tokens(self, messages: List[Dict]) -> int:
        """Calculates the total tokens for a list of messages."""
+        start_time = time.time()
        total_tokens = 0
        for msg in messages:
            content = msg.get("content", "")
@@ -616,6 +684,13 @@ class Filter:
                total_tokens += self._count_tokens(text_content)
            else:
                total_tokens += self._count_tokens(str(content))
+
+        duration = (time.time() - start_time) * 1000
+        if self.valves.debug_mode:
+            logger.info(
+                f"[Token Calc] Calculated {total_tokens} tokens for {len(messages)} messages in {duration:.2f}ms"
+            )
+
        return total_tokens

    def _get_model_thresholds(self, model_id: str) -> Dict[str, int]:
@@ -623,17 +698,48 @@ class Filter:

        Priority:
        1. If configuration exists for the model ID in model_thresholds, use it.
-        2. Otherwise, use global parameters compression_threshold_tokens and max_context_tokens.
+        2. If model is a custom model, try to match its base_model_id.
+        3. Otherwise, use global parameters compression_threshold_tokens and max_context_tokens.
        """
-        # Try to match from model-specific configuration
-        if model_id in self.valves.model_thresholds:
-            if self.valves.debug_mode:
-                print(f"[Config] Using model-specific configuration: {model_id}")
-            return self.valves.model_thresholds[model_id]
+        parsed = self._parse_model_thresholds()

-        # Use global default configuration
+        # 1. Direct match with model_id
+        if model_id in parsed:
+            if self.valves.debug_mode:
+                logger.info(f"[Config] Using model-specific configuration: {model_id}")
+            return parsed[model_id]
+
+        # 2. Try to find base_model_id for custom models
+        try:
+            model_obj = Models.get_model_by_id(model_id)
+            if model_obj:
+                # Check for base_model_id (custom model)
+                base_model_id = getattr(model_obj, "base_model_id", None)
+                if not base_model_id:
+                    # Try base_model_ids (array) - take first one
+                    base_model_ids = getattr(model_obj, "base_model_ids", None)
+                    if (
+                        base_model_ids
+                        and isinstance(base_model_ids, list)
+                        and len(base_model_ids) > 0
+                    ):
+                        base_model_id = base_model_ids[0]
+
+                if base_model_id and base_model_id in parsed:
+                    if self.valves.debug_mode:
+                        logger.info(
+                            f"[Config] Custom model '{model_id}' -> base_model '{base_model_id}': using base model configuration"
+                        )
+                    return parsed[base_model_id]
+        except Exception as e:
+            if self.valves.debug_mode:
+                logger.warning(
+                    f"[Config] Failed to lookup base_model for '{model_id}': {e}"
+                )
+
+        # 3. Use global default configuration
        if self.valves.debug_mode:
-            print(
+            logger.info(
                f"[Config] Model {model_id} not in model_thresholds, using global parameters"
            )

@@ -731,13 +837,13 @@ class Filter:
                }
            )
        except Exception as e:
-            print(f"Error emitting debug log: {e}")
+            logger.error(f"Error emitting debug log: {e}")

    async def _log(self, message: str, type: str = "info", event_call=None):
        """Unified logging to both backend (print) and frontend (console.log)"""
        # Backend logging
        if self.valves.debug_mode:
-            print(message)
+            logger.info(message)

        # Frontend logging
        if self.valves.show_debug_log and event_call:
@@ -770,9 +876,17 @@ class Filter:
                js_code = f"""
                    console.log("%c[Compression] {safe_message}", "{css}");
                """
-                await event_call({"type": "execute", "data": {"code": js_code}})
+                # Add timeout to prevent blocking if frontend connection is broken
+                await asyncio.wait_for(
+                    event_call({"type": "execute", "data": {"code": js_code}}),
+                    timeout=2.0,
+                )
+            except asyncio.TimeoutError:
+                logger.warning(
+                    f"Failed to emit log to frontend: Timeout (connection may be broken)"
+                )
            except Exception as e:
-                print(f"Failed to emit log to frontend: {e}")
+                logger.error(f"Failed to emit log to frontend: {type(e).__name__}: {e}")

    async def inlet(
        self,
@@ -819,42 +933,57 @@ class Filter:
                                event_call=__event_call__,
                            )

-                        # Extract the final answer (after last tool call metadata)
-                        # Pattern: Matches escaped JSON strings like ""&quot;...&quot;"" followed by newlines
-                        # We look for the last occurrence of such a pattern and take everything after it
-
-                        # 1. Try matching the specific OpenWebUI tool output format: ""&quot;...&quot;""
-                        # This regex finds the last end-quote of a tool output block
-                        tool_output_pattern = r'""&quot;.*?&quot;""\s*'
-
-                        # Find all matches
-                        matches = list(
-                            re.finditer(tool_output_pattern, content, re.DOTALL)
+                        # Strategy 1: Tool Output / Code Block Trimming
+                        # Detect if message contains large tool outputs or code blocks
+                        # Improved regex to be less brittle
+                        is_tool_output = (
+                            "&quot;" in content
+                            or "Arguments:" in content
+                            or "```" in content
+                            or "<tool_code>" in content
                        )

-                        if matches:
-                            # Get the end position of the last match
-                            last_match_end = matches[-1].end()
+                        if is_tool_output:
+                            # Regex to find the last occurrence of a tool output block or code block
+                            # This pattern looks for:
+                            # 1. OpenWebUI's escaped JSON format: ""&quot;...&quot;""
+                            # 2. "Arguments: {...}" pattern
+                            # 3. Generic code blocks: ```...```
+                            # 4. <tool_code>...</tool_code>
+                            # It captures the content *after* the last such block.
+                            tool_output_pattern = r'(?:""&quot;.*?&quot;""|Arguments:\s*\{[^}]+\}|```.*?```|<tool_code>.*?</tool_code>)\s*'

-                            # Everything after the last tool output is the final answer
-                            final_answer = content[last_match_end:].strip()
+                            # Find all matches
+                            matches = list(
+                                re.finditer(tool_output_pattern, content, re.DOTALL)
+                            )
+
+                            if matches:
+                                # Get the end position of the last match
+                                last_match_end = matches[-1].end()
+
+                                # Everything after the last tool output is the final answer
+                                final_answer = content[last_match_end:].strip()

-                            if final_answer:
-                                msg["content"] = (
-                                    f"... [Tool outputs trimmed]\n{final_answer}"
-                                )
-                                trimmed_count += 1
-                        else:
-                            # Fallback: Try splitting on "Arguments:" if the new format isn't found
-                            # (Preserving backward compatibility or different model behaviors)
-                            parts = re.split(r"(?:Arguments:\s*\{[^}]+\})\n+", content)
-                            if len(parts) > 1:
-                                final_answer = parts[-1].strip()
                                if final_answer:
                                    msg["content"] = (
                                        f"... [Tool outputs trimmed]\n{final_answer}"
                                    )
                                    trimmed_count += 1
+                            else:
+                                # Fallback: If no specific pattern matched, but it was identified as tool output,
+                                # try a simpler split or just mark as trimmed if no final answer can be extracted.
+                                # (Preserving backward compatibility or different model behaviors)
+                                parts = re.split(
+                                    r"(?:Arguments:\s*\{[^}]+\})\n+", content
+                                )
+                                if len(parts) > 1:
+                                    final_answer = parts[-1].strip()
+                                    if final_answer:
+                                        msg["content"] = (
+                                            f"... [Tool outputs trimmed]\n{final_answer}"
+                                        )
+                                        trimmed_count += 1

            if trimmed_count > 0 and self.valves.show_debug_log and __event_call__:
                await self._log(
@@ -881,7 +1010,8 @@ class Filter:
                    )

                # Clean model ID if needed (though get_model_by_id usually expects the full ID)
-                model_obj = Models.get_model_by_id(model_id)
+                # Run in thread to avoid blocking event loop on slow DB queries
+                model_obj = await asyncio.to_thread(Models.get_model_by_id, model_id)

                if model_obj:
                    if self.valves.show_debug_log and __event_call__:
@@ -933,8 +1063,7 @@ class Filter:
                else:
                    if self.valves.show_debug_log and __event_call__:
                        await self._log(
-                            f"[Inlet] ❌ Model NOT found in DB",
-                            type="warning",
+                            f"[Inlet] ℹ️ Not a custom model, skipping custom system prompt check",
                            event_call=__event_call__,
                        )

@@ -946,7 +1075,7 @@ class Filter:
                    event_call=__event_call__,
                )
            if self.valves.debug_mode:
-                print(f"[Inlet] Error fetching system prompt from DB: {e}")
+                logger.error(f"[Inlet] Error fetching system prompt from DB: {e}")

        # Fall back to checking messages (base model or already included)
        if not system_prompt_content:
@@ -960,7 +1089,7 @@ class Filter:
        if system_prompt_content:
            system_prompt_msg = {"role": "system", "content": system_prompt_content}
            if self.valves.debug_mode:
-                print(
+                logger.info(
                    f"[Inlet] Found system prompt ({len(system_prompt_content)} chars). Including in budget."
                )

@@ -991,7 +1120,7 @@ class Filter:
                    f"[Inlet] Message Stats: {stats_str}", event_call=__event_call__
                )
            except Exception as e:
-                print(f"[Inlet] Error logging message stats: {e}")
+                logger.error(f"[Inlet] Error logging message stats: {e}")

        if not chat_id:
            await self._log(
@@ -1007,6 +1136,33 @@ class Filter:
                event_call=__event_call__,
            )

+            # Log custom model configurations
+            raw_config = self.valves.model_thresholds
+            parsed_configs = self._parse_model_thresholds()
+
+            if raw_config:
+                config_list = [
+                    f"{model}: {cfg['compression_threshold_tokens']}t/{cfg['max_context_tokens']}t"
+                    for model, cfg in parsed_configs.items()
+                ]
+
+                if config_list:
+                    await self._log(
+                        f"[Inlet] 📋 Model Configs (Raw: '{raw_config}'): {', '.join(config_list)}",
+                        event_call=__event_call__,
+                    )
+                else:
+                    await self._log(
+                        f"[Inlet] ⚠️ Invalid Model Configs (Raw: '{raw_config}'): No valid configs parsed. Expected format: 'model_id:threshold:max_context'",
+                        type="warning",
+                        event_call=__event_call__,
+                    )
+            else:
+                await self._log(
+                    f"[Inlet] 📋 Model Configs: No custom configuration (Global defaults only)",
+                    event_call=__event_call__,
+                )
+
        # Record the target compression progress for the original messages, for use in outlet
        # Target is to compress up to the (total - keep_last) message
        target_compressed_count = max(0, len(messages) - self.valves.keep_last)
@@ -1043,9 +1199,9 @@ class Filter:
            if effective_keep_first > 0:
                head_messages = messages[:effective_keep_first]

-            # 2. Summary message (Inserted as User message)
+            # 2. Summary message (Inserted as Assistant message)
            summary_content = (
-                f"【System Prompt: The following is a summary of the historical conversation, provided for context only. Do not reply to the summary content itself; answer the subsequent latest questions directly.】\n\n"
+                f"【Previous Summary: The following is a summary of the historical conversation, provided for context only. Do not reply to the summary content itself; answer the subsequent latest questions directly.】\n\n"
                f"{summary_record.summary}\n\n"
                f"---\n"
                f"Below is the recent conversation:"
@@ -1287,7 +1443,7 @@ class Filter:

            # Get max context limit
            model = self._clean_model_id(body.get("model"))
-            thresholds = self._get_model_thresholds(model)
+            thresholds = self._get_model_thresholds(model) or {}
            max_context_tokens = thresholds.get(
                "max_context_tokens", self.valves.max_context_tokens
            )
@@ -1314,7 +1470,8 @@ class Filter:
                    > start_trim_index + 1  # Keep at least 1 message after keep_first
                ):
                    dropped = final_messages.pop(start_trim_index)
-                    total_tokens -= self._count_tokens(str(dropped.get("content", "")))
+                    dropped_tokens = self._count_tokens(str(dropped.get("content", "")))
+                    total_tokens -= dropped_tokens

                await self._log(
                    f"[Inlet] ✂️ Messages reduced. New total: {total_tokens} Tokens",
@@ -1371,18 +1528,11 @@ class Filter:
            )
            return body
        model = body.get("model") or ""
+        messages = body.get("messages", [])

        # Calculate target compression progress directly
-        # Assuming body['messages'] in outlet contains the full history (including new response)
-        messages = body.get("messages", [])
        target_compressed_count = max(0, len(messages) - self.valves.keep_last)

-        if self.valves.debug_mode or self.valves.show_debug_log:
-            await self._log(
-                f"\n{'='*60}\n[Outlet] Chat ID: {chat_id}\n[Outlet] Response complete\n[Outlet] Calculated target compression progress: {target_compressed_count} (Messages: {len(messages)})",
-                event_call=__event_call__,
-            )
-
        # Process Token calculation and summary generation asynchronously in the background (do not wait for completion, do not affect output)
        asyncio.create_task(
            self._check_and_generate_summary_async(
@@ -1396,11 +1546,6 @@ class Filter:
            )
        )

-        await self._log(
-            f"[Outlet] Background processing started\n{'='*60}\n",
-            event_call=__event_call__,
-        )
-
        return body

    async def _check_and_generate_summary_async(
@@ -1416,11 +1561,25 @@ class Filter:
        """
        Background processing: Calculates Token count and generates summary (does not block response).
        """
+
        try:
            messages = body.get("messages", [])

+            # Clean model ID
+            model = self._clean_model_id(model)
+
+            if self.valves.debug_mode or self.valves.show_debug_log:
+                await self._log(
+                    f"\n{'='*60}\n[Outlet] Chat ID: {chat_id}\n[Outlet] Response complete\n[Outlet] Calculated target compression progress: {target_compressed_count} (Messages: {len(messages)})",
+                    event_call=__event_call__,
+                )
+                await self._log(
+                    f"[Outlet] Background processing started\n{'='*60}\n",
+                    event_call=__event_call__,
+                )
+
            # Get threshold configuration for current model
-            thresholds = self._get_model_thresholds(model)
+            thresholds = self._get_model_thresholds(model) or {}
            compression_threshold_tokens = thresholds.get(
                "compression_threshold_tokens", self.valves.compression_threshold_tokens
            )
@@ -1440,6 +1599,28 @@ class Filter:
                event_call=__event_call__,
            )

+            # Send status notification (Context Usage format)
+            if __event_emitter__ and self.valves.show_token_usage_status:
+                max_context_tokens = thresholds.get(
+                    "max_context_tokens", self.valves.max_context_tokens
+                )
+                status_msg = f"Context Usage (Estimated): {current_tokens} / {max_context_tokens} Tokens"
+                if max_context_tokens > 0:
+                    usage_ratio = current_tokens / max_context_tokens
+                    status_msg += f" ({usage_ratio*100:.1f}%)"
+                    if usage_ratio > 0.9:
+                        status_msg += " | ⚠️ High Usage"
+
+                await __event_emitter__(
+                    {
+                        "type": "status",
+                        "data": {
+                            "description": status_msg,
+                            "done": True,
+                        },
+                    }
+                )
+
            # Check if compression is needed
            if current_tokens >= compression_threshold_tokens:
                await self._log(
@@ -1559,10 +1740,13 @@ class Filter:
                return

            thresholds = self._get_model_thresholds(summary_model_id)
-            # Note: Using the summary model's max context limit here
-            max_context_tokens = thresholds.get(
-                "max_context_tokens", self.valves.max_context_tokens
-            )
+            # Priority: 1. summary_model_max_context (if > 0) -> 2. model_thresholds -> 3. global max_context_tokens
+            if self.valves.summary_model_max_context > 0:
+                max_context_tokens = self.valves.summary_model_max_context
+            else:
+                max_context_tokens = thresholds.get(
+                    "max_context_tokens", self.valves.max_context_tokens
+                )

            await self._log(
                f"[🤖 Async Summary Task] Using max limit for model {summary_model_id}: {max_context_tokens} Tokens",
@@ -1753,7 +1937,6 @@ class Filter:
                    max_context_tokens = thresholds.get(
                        "max_context_tokens", self.valves.max_context_tokens
                    )
-
                    # 6. Emit Status
                    status_msg = f"Context Summary Updated: {token_count} / {max_context_tokens} Tokens"
                    if max_context_tokens > 0:
@@ -1798,7 +1981,7 @@ class Filter:

            import traceback

-            traceback.print_exc()
+            logger.exception("[🤖 Async Summary Task] Unhandled exception")

    def _format_messages_for_summary(self, messages: list) -> str:
        """Formats messages for summarization."""
@@ -1818,9 +2001,8 @@ class Filter:
            # Handle role name
            role_name = {"user": "User", "assistant": "Assistant"}.get(role, role)

-            # Limit length of each message to avoid excessive length
-            if len(content) > 500:
-                content = content[:500] + "..."
+            # User requested to remove truncation to allow full context for summary
+            # unless it exceeds model limits (which is handled by the LLM call itself or max_tokens)

            formatted.append(f"[{i}] {role_name}: {content}")

@@ -1927,8 +2109,25 @@ Based on the content above, generate the summary:
            # Call generate_chat_completion
            response = await generate_chat_completion(request, payload, user)

-            if not response or "choices" not in response or not response["choices"]:
-                raise ValueError("LLM response format incorrect or empty")
+            # Handle JSONResponse (some backends return JSONResponse instead of dict)
+            if hasattr(response, "body"):
+                # It's a Response object, extract the body
+                import json as json_module
+
+                try:
+                    response = json_module.loads(response.body.decode("utf-8"))
+                except Exception:
+                    raise ValueError(f"Failed to parse JSONResponse body: {response}")
+
+            if (
+                not response
+                or not isinstance(response, dict)
+                or "choices" not in response
+                or not response["choices"]
+            ):
+                raise ValueError(
+                    f"LLM response format incorrect or empty: {type(response).__name__}"
+                )

            summary = response["choices"][0]["message"]["content"].strip()

--- a/plugins/filters/async-context-compression/async_context_compression_cn.py
+++ b/plugins/filters/async-context-compression/async_context_compression_cn.py
@@ -5,19 +5,17 @@ author: Fu-Jie
 author_url: https://github.com/Fu-Jie/awesome-openwebui
 funding_url: https://github.com/open-webui
 description: 通过智能摘要和消息压缩，降低长对话的 token 消耗，同时保持对话连贯性。
-version: 1.2.0
+version: 1.2.1
 openwebui_id: 5c0617cb-a9e4-4bd6-a440-d276534ebd18
 license: MIT

 ═══════════════════════════════════════════════════════════════════════════════
-📌 1.2.0 版本更新
+📌 1.2.1 版本更新
 ═══════════════════════════════════════════════════════════════════════════════

-  ✅ 预检上下文检查：发送给模型前验证上下文是否适配。
-  ✅ 结构感知裁剪：折叠过长的 AI 响应，同时保留标题 (H1-H6)、开头和结尾。
-  ✅ 原生工具输出裁剪：使用函数调用时清理上下文，去除冗余输出。（注意：非原生工具调用输出不会完整注入上下文）
-  ✅ 上下文使用警告：当使用量超过 90% 时发出通知。
-  ✅ 详细 Token 日志：细粒度记录 System、Head、Summary 和 Tail 的 Token 消耗。
+  ✅ 智能配置增强：自动检测自定义模型的基础模型配置，并新增 `summary_model_max_context` 参数以独立控制摘要模型的上下文限制。
+  ✅ 性能优化与重构：重构了阈值解析逻辑并增加缓存，移除了冗余的处理代码，并增强了 LLM 响应处理（支持 JSONResponse）。
+  ✅ 稳定性改进：修复了 `datetime` 弃用警告，修正了类型注解，并将 print 语句替换为标准日志记录。

 ═══════════════════════════════════════════════════════════════════════════════
 📌 功能概述
@@ -254,23 +252,36 @@ show_debug_log (前端调试日志)

 from pydantic import BaseModel, Field, model_validator
 from typing import Optional, Dict, Any, List, Union, Callable, Awaitable
+import re
 import asyncio
 import json
 import hashlib
-import time
-import re
+import contextlib
+import logging
+
+# 配置日志记录
+logger = logging.getLogger(__name__)
+if not logger.handlers:
+    handler = logging.StreamHandler()
+    formatter = logging.Formatter(
+        "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+    )
+    handler.setFormatter(formatter)
+    logger.addHandler(handler)
+logger.setLevel(logging.INFO)

 # Open WebUI 内置导入
 from open_webui.utils.chat import generate_chat_completion
-from open_webui.models.models import Models
 from open_webui.models.users import Users
+from open_webui.models.models import Models
 from fastapi.requests import Request
 from open_webui.main import app as webui_app

 # Open WebUI 内部数据库 (复用共享连接)
-from open_webui.internal.db import engine as owui_engine
-from open_webui.internal.db import Session as owui_Session
-from open_webui.internal.db import Base as owui_Base
+try:
+    from open_webui.internal import db as owui_db
+except ModuleNotFoundError:  # pragma: no cover - filter runs inside Open WebUI
+    owui_db = None

 # 尝试导入 tiktoken
 try:
@@ -280,35 +291,167 @@ except ImportError:

 # 数据库导入
 from sqlalchemy import Column, String, Text, DateTime, Integer, inspect
-from datetime import datetime
+from sqlalchemy.orm import declarative_base, sessionmaker
+from sqlalchemy.engine import Engine
+from datetime import datetime, timezone
+
+
+def _discover_owui_engine(db_module: Any) -> Optional[Engine]:
+    """Discover the Open WebUI SQLAlchemy engine via provided db module helpers."""
+    if db_module is None:
+        return None
+
+    db_context = getattr(db_module, "get_db_context", None) or getattr(
+        db_module, "get_db", None
+    )
+    if callable(db_context):
+        try:
+            with db_context() as session:
+                try:
+                    return session.get_bind()
+                except AttributeError:
+                    return getattr(session, "bind", None) or getattr(
+                        session, "engine", None
+                    )
+        except Exception as exc:
+            print(f"[DB Discover] get_db_context failed: {exc}")
+
+    for attr in ("engine", "ENGINE", "bind", "BIND"):
+        candidate = getattr(db_module, attr, None)
+        if candidate is not None:
+            return candidate
+
+    return None
+
+
+def _discover_owui_schema(db_module: Any) -> Optional[str]:
+    """Discover the Open WebUI database schema name if configured."""
+    if db_module is None:
+        return None
+
+    try:
+        base = getattr(db_module, "Base", None)
+        metadata = getattr(base, "metadata", None) if base is not None else None
+        candidate = getattr(metadata, "schema", None) if metadata is not None else None
+        if isinstance(candidate, str) and candidate.strip():
+            return candidate.strip()
+    except Exception as exc:
+        print(f"[DB Discover] Base metadata schema lookup failed: {exc}")
+
+    try:
+        metadata_obj = getattr(db_module, "metadata_obj", None)
+        candidate = (
+            getattr(metadata_obj, "schema", None) if metadata_obj is not None else None
+        )
+        if isinstance(candidate, str) and candidate.strip():
+            return candidate.strip()
+    except Exception as exc:
+        print(f"[DB Discover] metadata_obj schema lookup failed: {exc}")
+
+    try:
+        from open_webui import env as owui_env
+
+        candidate = getattr(owui_env, "DATABASE_SCHEMA", None)
+        if isinstance(candidate, str) and candidate.strip():
+            return candidate.strip()
+    except Exception as exc:
+        print(f"[DB Discover] env schema lookup failed: {exc}")
+
+    return None
+
+
+owui_engine = _discover_owui_engine(owui_db)
+owui_schema = _discover_owui_schema(owui_db)
+owui_Base = getattr(owui_db, "Base", None) if owui_db is not None else None
+if owui_Base is None:
+    owui_Base = declarative_base()


 class ChatSummary(owui_Base):
    """对话摘要存储表"""

    __tablename__ = "chat_summary"
-    __table_args__ = {"extend_existing": True}
+    __table_args__ = (
+        {"extend_existing": True, "schema": owui_schema}
+        if owui_schema
+        else {"extend_existing": True}
+    )

    id = Column(Integer, primary_key=True, autoincrement=True)
    chat_id = Column(String(255), unique=True, nullable=False, index=True)
    summary = Column(Text, nullable=False)
    compressed_message_count = Column(Integer, default=0)
-    created_at = Column(DateTime, default=datetime.utcnow)
-    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
+    created_at = Column(DateTime, default=lambda: datetime.now(timezone.utc))
+    updated_at = Column(
+        DateTime,
+        default=lambda: datetime.now(timezone.utc),
+        onupdate=lambda: datetime.now(timezone.utc),
+    )


 class Filter:
    def __init__(self):
        self.valves = self.Valves()
+        self._owui_db = owui_db
        self._db_engine = owui_engine
-        self._SessionLocal = owui_Session
-        self._SessionLocal = owui_Session
-        self._init_database()
+        self._fallback_session_factory = (
+            sessionmaker(bind=self._db_engine) if self._db_engine else None
+        )
+        self._threshold_cache = {}
        self._init_database()

+    @contextlib.contextmanager
+    def _db_session(self):
+        """Yield a database session using Open WebUI helpers with graceful fallbacks."""
+        db_module = self._owui_db
+        db_context = None
+        if db_module is not None:
+            db_context = getattr(db_module, "get_db_context", None) or getattr(
+                db_module, "get_db", None
+            )
+
+        if callable(db_context):
+            with db_context() as session:
+                yield session
+                return
+
+        factory = None
+        if db_module is not None:
+            factory = getattr(db_module, "SessionLocal", None) or getattr(
+                db_module, "ScopedSession", None
+            )
+        if callable(factory):
+            session = factory()
+            try:
+                yield session
+            finally:
+                close = getattr(session, "close", None)
+                if callable(close):
+                    close()
+            return
+
+        if self._fallback_session_factory is None:
+            raise RuntimeError(
+                "Open WebUI database session is unavailable. Ensure Open WebUI's database layer is initialized."
+            )
+
+        session = self._fallback_session_factory()
+        try:
+            yield session
+        finally:
+            try:
+                session.close()
+            except Exception as exc:  # pragma: no cover - best-effort cleanup
+                print(f"[Database] ⚠️ Failed to close fallback session: {exc}")
+
    def _init_database(self):
        """使用 Open WebUI 的共享连接初始化数据库表"""
        try:
+            if self._db_engine is None:
+                raise RuntimeError(
+                    "Open WebUI database engine is unavailable. Ensure Open WebUI is configured with a valid DATABASE_URL."
+                )
+
            # 使用 SQLAlchemy inspect 检查表是否存在
            inspector = inspect(self._db_engine)
            if not inspector.has_table("chat_summary"):
@@ -340,21 +483,38 @@ class Filter:
            ge=0,
            description="上下文的硬性上限。超过此值将强制移除最早的消息 (全局默认值)",
        )
-        model_thresholds: dict = Field(
+        model_thresholds: Union[str, dict] = Field(
            default={},
-            description="针对特定模型的阈值覆盖配置。仅包含需要特殊配置的模型。",
+            description="针对特定模型的阈值覆盖配置。可以是 JSON 字符串或字典。",
        )

+        @model_validator(mode="before")
+        @classmethod
+        def parse_model_thresholds(cls, data: Any) -> Any:
+            if isinstance(data, dict):
+                thresholds = data.get("model_thresholds")
+                if isinstance(thresholds, str) and thresholds.strip():
+                    try:
+                        data["model_thresholds"] = json.loads(thresholds)
+                    except Exception as e:
+                        logger.error(f"Failed to parse model_thresholds JSON: {e}")
+            return data
+
        keep_first: int = Field(
            default=1, ge=0, description="始终保留最初的 N 条消息。设置为 0 则不保留。"
        )
        keep_last: int = Field(
            default=6, ge=0, description="始终保留最近的 N 条完整消息。"
        )
-        summary_model: str = Field(
+        summary_model: Optional[str] = Field(
            default=None,
            description="用于生成摘要的模型 ID。留空则使用当前对话的模型。用于匹配 model_thresholds 中的配置。",
        )
+        summary_model_max_context: int = Field(
+            default=0,
+            ge=0,
+            description="摘要模型的最大上下文 Token 数。如果为 0，则回退到 model_thresholds 或全局 max_context_tokens。",
+        )
        max_summary_tokens: int = Field(
            default=16384, ge=1, description="摘要的最大 token 数"
        )
@@ -376,7 +536,7 @@ class Filter:
    def _save_summary(self, chat_id: str, summary: str, compressed_count: int):
        """保存摘要到数据库"""
        try:
-            with self._SessionLocal() as session:
+            with self._db_session() as session:
                # 查找现有记录
                existing = session.query(ChatSummary).filter_by(chat_id=chat_id).first()

@@ -384,7 +544,7 @@ class Filter:
                    # [优化] 乐观锁检查：只有进度向前推进时才更新
                    if compressed_count <= existing.compressed_message_count:
                        if self.valves.debug_mode:
-                            print(
+                            logger.debug(
                                f"[存储] 跳过更新：新进度 ({compressed_count}) 不大于现有进度 ({existing.compressed_message_count})"
                            )
                        return
@@ -392,7 +552,7 @@ class Filter:
                    # 更新现有记录
                    existing.summary = summary
                    existing.compressed_message_count = compressed_count
-                    existing.updated_at = datetime.utcnow()
+                    existing.updated_at = datetime.now(timezone.utc)
                else:
                    # 创建新记录
                    new_summary = ChatSummary(
@@ -406,22 +566,22 @@ class Filter:

                if self.valves.debug_mode:
                    action = "更新" if existing else "创建"
-                    print(f"[存储] 摘要已{action}到数据库 (Chat ID: {chat_id})")
+                    logger.info(f"[存储] 摘要已{action}到数据库 (Chat ID: {chat_id})")

        except Exception as e:
-            print(f"[存储] ❌ 数据库保存失败: {str(e)}")
+            logger.error(f"[存储] ❌ 数据库保存失败: {str(e)}")

    def _load_summary_record(self, chat_id: str) -> Optional[ChatSummary]:
        """从数据库加载摘要记录对象"""
        try:
-            with self._SessionLocal() as session:
+            with self._db_session() as session:
                record = session.query(ChatSummary).filter_by(chat_id=chat_id).first()
                if record:
                    # Detach the object from the session so it can be used after session close
                    session.expunge(record)
                    return record
        except Exception as e:
-            print(f"[加载] ❌ 数据库读取失败: {str(e)}")
+            logger.error(f"[加载] ❌ 数据库读取失败: {str(e)}")
        return None

    def _load_summary(self, chat_id: str, body: dict) -> Optional[str]:
@@ -429,8 +589,8 @@ class Filter:
        record = self._load_summary_record(chat_id)
        if record:
            if self.valves.debug_mode:
-                print(f"[加载] 从数据库加载摘要 (Chat ID: {chat_id})")
-                print(
+                logger.debug(f"[加载] 从数据库加载摘要 (Chat ID: {chat_id})")
+                logger.debug(
                    f"[加载] 更新时间: {record.updated_at}, 已压缩消息数: {record.compressed_message_count}"
                )
            return record.summary
@@ -473,23 +633,68 @@ class Filter:
        """获取特定模型的阈值配置

        优先级：
-        1. 如果 model_thresholds 中存在该模型ID的配置，使用该配置
-        2. 否则使用全局参数 compression_threshold_tokens 和 max_context_tokens
+        1. 缓存匹配
+        2. model_thresholds 直接匹配
+        3. 基础模型 (base_model_id) 匹配
+        4. 全局默认配置
        """
-        # 尝试从模型特定配置中匹配
-        if model_id in self.valves.model_thresholds:
+        if not model_id:
+            return {
+                "compression_threshold_tokens": self.valves.compression_threshold_tokens,
+                "max_context_tokens": self.valves.max_context_tokens,
+            }
+
+        # 1. 检查缓存
+        if model_id in self._threshold_cache:
+            return self._threshold_cache[model_id]
+
+        # 获取解析后的阈值配置
+        parsed = self.valves.model_thresholds
+        if isinstance(parsed, str):
+            try:
+                parsed = json.loads(parsed)
+            except Exception:
+                parsed = {}
+
+        # 2. 尝试直接匹配
+        if model_id in parsed:
+            res = parsed[model_id]
+            self._threshold_cache[model_id] = res
            if self.valves.debug_mode:
-                print(f"[配置] 使用模型特定配置: {model_id}")
-            return self.valves.model_thresholds[model_id]
+                logger.debug(f"[配置] 模型 {model_id} 命中直接配置")
+            return res

-        # 使用全局默认配置
-        if self.valves.debug_mode:
-            print(f"[配置] 模型 {model_id} 未在 model_thresholds 中，使用全局参数")
+        # 3. 尝试匹配基础模型 (base_model_id)
+        try:
+            model_obj = Models.get_model_by_id(model_id)
+            if model_obj:
+                # 某些模型可能有多个基础模型 ID
+                base_ids = []
+                if hasattr(model_obj, "base_model_id") and model_obj.base_model_id:
+                    base_ids.append(model_obj.base_model_id)
+                if hasattr(model_obj, "base_model_ids") and model_obj.base_model_ids:
+                    if isinstance(model_obj.base_model_ids, list):
+                        base_ids.extend(model_obj.base_model_ids)

-        return {
+                for b_id in base_ids:
+                    if b_id in parsed:
+                        res = parsed[b_id]
+                        self._threshold_cache[model_id] = res
+                        if self.valves.debug_mode:
+                            logger.info(
+                                f"[配置] 模型 {model_id} 匹配到基础模型 {b_id} 的配置"
+                            )
+                        return res
+        except Exception as e:
+            logger.error(f"[配置] 查找基础模型失败: {e}")
+
+        # 4. 使用全局默认配置
+        res = {
            "compression_threshold_tokens": self.valves.compression_threshold_tokens,
            "max_context_tokens": self.valves.max_context_tokens,
        }
+        self._threshold_cache[model_id] = res
+        return res

    def _get_chat_context(
        self, body: dict, __metadata__: Optional[dict] = None
@@ -621,13 +826,15 @@ class Filter:
                """
                await event_call({"type": "execute", "data": {"code": js_code}})
            except Exception as e:
-                print(f"发送前端日志失败: {e}")
+                logger.error(f"发送前端日志失败: {e}")

    async def inlet(
        self,
        body: dict,
        __user__: Optional[dict] = None,
        __metadata__: dict = None,
+        __request__: Request = None,
+        __model__: dict = None,
        __event_emitter__: Callable[[Any], Awaitable[None]] = None,
        __event_call__: Callable[[Any], Awaitable[None]] = None,
    ) -> dict:
@@ -641,8 +848,10 @@ class Filter:
        messages = body.get("messages", [])

        # --- 原生工具输出裁剪 (Native Tool Output Trimming) ---
-        # 即使未启用压缩，也始终检查并裁剪过长的工具输出，以节省 Token
-        if self.valves.enable_tool_output_trimming:
+        metadata = body.get("metadata", {})
+        is_native_func_calling = metadata.get("function_calling") == "native"
+
+        if self.valves.enable_tool_output_trimming and is_native_func_calling:
            trimmed_count = 0
            for msg in messages:
                content = msg.get("content", "")
@@ -789,7 +998,7 @@ class Filter:
                    event_call=__event_call__,
                )
            if self.valves.debug_mode:
-                print(f"[Inlet] 从数据库获取系统提示词错误: {e}")
+                logger.error(f"[Inlet] 从数据库获取系统提示词错误: {e}")

        # 回退：检查消息列表 (基础模型或已包含)
        if not system_prompt_content:
@@ -803,7 +1012,7 @@ class Filter:
        if system_prompt_content:
            system_prompt_msg = {"role": "system", "content": system_prompt_content}
            if self.valves.debug_mode:
-                print(
+                logger.debug(
                    f"[Inlet] 找到系统提示词 ({len(system_prompt_content)} 字符)。计入预算。"
                )

@@ -834,7 +1043,7 @@ class Filter:
                    f"[Inlet] 消息统计: {stats_str}", event_call=__event_call__
                )
            except Exception as e:
-                print(f"[Inlet] 记录消息统计错误: {e}")
+                logger.error(f"[Inlet] 记录消息统计错误: {e}")

        if not chat_id:
            await self._log(
@@ -925,7 +1134,7 @@ class Filter:

            # 获取最大上下文限制
            model = self._clean_model_id(body.get("model"))
-            thresholds = self._get_model_thresholds(model)
+            thresholds = self._get_model_thresholds(model) or {}
            max_context_tokens = thresholds.get(
                "max_context_tokens", self.valves.max_context_tokens
            )
@@ -1129,7 +1338,7 @@ class Filter:

            # 获取最大上下文限制
            model = self._clean_model_id(body.get("model"))
-            thresholds = self._get_model_thresholds(model)
+            thresholds = self._get_model_thresholds(model) or {}
            max_context_tokens = thresholds.get(
                "max_context_tokens", self.valves.max_context_tokens
            )
@@ -1156,7 +1365,8 @@ class Filter:
                    > start_trim_index + 1  # 保留 keep_first 之后至少 1 条消息
                ):
                    dropped = final_messages.pop(start_trim_index)
-                    total_tokens -= self._count_tokens(str(dropped.get("content", "")))
+                    dropped_tokens = self._count_tokens(str(dropped.get("content", "")))
+                    total_tokens -= dropped_tokens

                await self._log(
                    f"[Inlet] ✂️ 消息已缩减。新总数: {total_tokens} Tokens",
@@ -1207,18 +1417,18 @@ class Filter:
        """
        chat_ctx = self._get_chat_context(body, __metadata__)
        chat_id = chat_ctx["chat_id"]
-        model = body.get("model") or ""
-
-        # 直接计算目标压缩进度
-        # 假设 outlet 中的 body['messages'] 包含完整历史（包括新响应）
-        messages = body.get("messages", [])
-        target_compressed_count = max(0, len(messages) - self.valves.keep_last)
-
-        if self.valves.debug_mode or self.valves.show_debug_log:
+        if not chat_id:
            await self._log(
-                f"\n{'='*60}\n[Outlet] Chat ID: {chat_id}\n[Outlet] 响应完成\n[Outlet] 计算目标压缩进度: {target_compressed_count} (消息数: {len(messages)})",
+                "[Outlet] ❌ metadata 中缺少 chat_id，跳过压缩",
+                type="error",
                event_call=__event_call__,
            )
+            return body
+        model = body.get("model") or ""
+        messages = body.get("messages", [])
+
+        # 直接计算目标压缩进度
+        target_compressed_count = max(0, len(messages) - self.valves.keep_last)

        # 在后台异步处理 Token 计算和摘要生成（不等待完成，不影响输出）
        asyncio.create_task(
@@ -1233,11 +1443,6 @@ class Filter:
            )
        )

-        await self._log(
-            f"[Outlet] 后台处理已启动\n{'='*60}\n",
-            event_call=__event_call__,
-        )
-
        return body

    async def _check_and_generate_summary_async(
@@ -1257,7 +1462,7 @@ class Filter:
            messages = body.get("messages", [])

            # 获取当前模型的阈值配置
-            thresholds = self._get_model_thresholds(model)
+            thresholds = self._get_model_thresholds(model) or {}
            compression_threshold_tokens = thresholds.get(
                "compression_threshold_tokens", self.valves.compression_threshold_tokens
            )
@@ -1393,11 +1598,14 @@ class Filter:
                )
                return

-            thresholds = self._get_model_thresholds(summary_model_id)
-            # 注意：这里使用的是摘要模型的最大上下文限制
-            max_context_tokens = thresholds.get(
-                "max_context_tokens", self.valves.max_context_tokens
-            )
+            thresholds = self._get_model_thresholds(summary_model_id) or {}
+            # Priority: 1. summary_model_max_context (if > 0) -> 2. model_thresholds -> 3. global max_context_tokens
+            if self.valves.summary_model_max_context > 0:
+                max_context_tokens = self.valves.summary_model_max_context
+            else:
+                max_context_tokens = thresholds.get(
+                    "max_context_tokens", self.valves.max_context_tokens
+                )

            await self._log(
                f"[🤖 异步摘要任务] 使用模型 {summary_model_id} 的上限: {max_context_tokens} Tokens",
@@ -1582,10 +1790,14 @@ class Filter:

                    # 5. 获取阈值并计算比例
                    model = self._clean_model_id(body.get("model"))
-                    thresholds = self._get_model_thresholds(model)
-                    max_context_tokens = thresholds.get(
-                        "max_context_tokens", self.valves.max_context_tokens
-                    )
+                    thresholds = self._get_model_thresholds(model) or {}
+                    # Priority: 1. summary_model_max_context (if > 0) -> 2. model_thresholds -> 3. global max_context_tokens
+                    if self.valves.summary_model_max_context > 0:
+                        max_context_tokens = self.valves.summary_model_max_context
+                    else:
+                        max_context_tokens = thresholds.get(
+                            "max_context_tokens", self.valves.max_context_tokens
+                        )

                    # 6. 发送状态
                    status_msg = (
@@ -1631,9 +1843,7 @@ class Filter:
                    }
                )

-            import traceback
-
-            traceback.print_exc()
+            logger.exception("[🤖 异步摘要任务] ❌ 发生异常")

    def _format_messages_for_summary(self, messages: list) -> str:
        """Formats messages for summarization."""
@@ -1653,9 +1863,8 @@ class Filter:
            # Handle role name
            role_name = {"user": "User", "assistant": "Assistant"}.get(role, role)

-            # Limit length of each message to avoid excessive length
-            if len(content) > 500:
-                content = content[:500] + "..."
+            # User requested to remove truncation to allow full context for summary
+            # unless it exceeds model limits (which is handled by the LLM call itself or max_tokens)

            formatted.append(f"[{i}] {role_name}: {content}")

@@ -1762,8 +1971,25 @@ class Filter:
            # 调用 generate_chat_completion
            response = await generate_chat_completion(request, payload, user)

-            if not response or "choices" not in response or not response["choices"]:
-                raise ValueError("LLM 响应格式不正确或为空")
+            # Handle JSONResponse (some backends return JSONResponse instead of dict)
+            if hasattr(response, "body"):
+                # It's a Response object, extract the body
+                import json as json_module
+
+                try:
+                    response = json_module.loads(response.body.decode("utf-8"))
+                except Exception:
+                    raise ValueError(f"Failed to parse JSONResponse body: {response}")
+
+            if (
+                not response
+                or not isinstance(response, dict)
+                or "choices" not in response
+                or not response["choices"]
+            ):
+                raise ValueError(
+                    f"LLM response format incorrect or empty: {type(response).__name__}"
+                )

            summary = response["choices"][0]["message"]["content"].strip()
Author	SHA1	Message	Date
fujie	25c9d20f3d	feat(async-context-compression): release v1.2.1 with smart config & optimizations This release introduces significant improvements to configuration flexibility, performance, and stability. Key Changes: * Smart Configuration: * Added `summary_model_max_context` to allow independent context limits for the summary model (e.g., using `gemini-flash` with 1M context to summarize `gpt-4` history). * Implemented auto-detection of base model settings for custom models, ensuring correct threshold application. * Performance & Refactoring: * Optimized `model_thresholds` parsing with caching to reduce overhead. * Refactored `inlet` and `outlet` logic to remove redundant code and improve maintainability. * Replaced all `print` statements with proper `logging` calls for better production monitoring. * Bug Fixes & Modernization: * Fixed `datetime.utcnow()` deprecation warnings by switching to timezone-aware `datetime.now(timezone.utc)`. * Corrected type annotations and improved error handling for `JSONResponse` objects from LLM backends. * Removed hard truncation in summary generation to allow full context usage. Files Updated: * Plugin source code (English & Chinese) * Documentation and READMEs * Version bumped to 1.2.1	2026-01-20 19:09:25 +08:00
github-actions[bot]	0d853577df	chore: update community stats - followers increased (136 -> 137)	2026-01-20 09:15:24 +00:00
github-actions[bot]	f91f3d8692	chore: update community stats - followers increased (135 -> 136)	2026-01-20 07:14:01 +00:00
github-actions[bot]	0f7cad8dfa	chore: update community stats - followers increased (134 -> 135)	2026-01-19 23:08:06 +00:00
fujie	db1a1e7ef0	fix(async-context-compression): sync CN version with EN version logic - Add missing imports (contextlib, sessionmaker, Engine) - Add database engine discovery functions (_discover_owui_engine, _discover_owui_schema) - Fix ChatSummary table to support schema configuration - Fix duplicate code in __init__ method - Add _db_session context manager for robust session handling - Fix inlet method signature (add __request__, __model__ parameters) - Fix tool output trimming to check native function calling - Add chat_id empty check in outlet method	2026-01-19 20:37:37 +08:00
github-actions[bot]	e7de80a059	chore: update community stats - plugin version updated, followers increased (133 -> 134)	2026-01-19 12:15:44 +00:00