Compare commits

..

15 Commits

Author SHA1 Message Date
fujie
adc5e0a1f4 feat(filters): release v1.3.0 for async context compression
- Add native i18n support across 9 languages
- Implement non-blocking frontend log emission for zero TTFB delay
- Add token_usage_status_threshold to intelligently control status notifications
- Automatically detect and skip compression for copilot_sdk models
- Set debug_mode default to false for a quieter production environment
- Update documentation and remove legacy bilingual code
2026-02-21 23:44:12 +08:00
fujie
04b8108890 chore: ignore and stop tracking .git-worktrees 2026-02-21 21:50:35 +08:00
fujie
f1ba03e3bd Merge remote-tracking branch 'origin/main' into copilot/sub-pr-42
# Conflicts:
#	plugins/actions/smart-mind-map/README.md
#	plugins/actions/smart-mind-map/README_CN.md
#	plugins/actions/smart-mind-map/smart_mind_map.py
#	plugins/actions/smart-mind-map/smart_mind_map_cn.py
2026-02-21 18:04:48 +08:00
fujie
cdd9950973 docs: update pr submission guidelines to require gh tool 2026-02-21 17:54:09 +08:00
fujie
473012fa6f feat(actions): release Smart Mind Map v1.0.0 - a milestone with Native i18n & Direct Embed 2026-02-21 17:50:44 +08:00
copilot-swe-agent[bot]
dc66610cb2 docs: Align with copilot-instructions.md standards and update all repo references
Co-authored-by: Fu-Jie <33599649+Fu-Jie@users.noreply.github.com>
2026-02-13 03:36:30 +00:00
copilot-swe-agent[bot]
655b5311cf chore: Update repository references from awesome-openwebui to openwebui-extensions
Co-authored-by: Fu-Jie <33599649+Fu-Jie@users.noreply.github.com>
2026-02-13 03:31:06 +00:00
copilot-swe-agent[bot]
4390ee2085 Initial plan 2026-02-13 03:29:45 +00:00
google-labs-jules[bot]
bfb2039095 feat: Add full i18n support, security fixes, and zh-TW to Smart Mind Map plugin
- Consolidated smart_mind_map.py and smart_mind_map_cn.py into a single file.
- Added TRANSLATIONS dictionary supporting 18 languages (including explicit zh-TW support).
- Implemented robust language detection with fallback to browser/local storage.
- Added localized date formatting for various locales.
- Added base language fallback (e.g., fr-BE -> fr-FR) and variant mapping.
- Fixed critical security vulnerabilities:
    - Prevented JS injection by safely escaping IDs with `json.dumps`.
    - Prevented XSS by sanitizing user input and language codes.
    - Prevented DoS crashes from curly braces in LLM output by replacing `.format()` with safe string replacement.
- Fixed regex regression by using standard strings with escaped backslashes.
- Restored clickable "Markmap" link in the footer.
- Verified all changes with comprehensive unit and security tests.

Co-authored-by: Fu-Jie <33599649+Fu-Jie@users.noreply.github.com>
2026-02-12 21:05:45 +00:00
google-labs-jules[bot]
86091f77cf feat: Security and i18n improvements for Smart Mind Map plugin
- Fixed high-severity XSS and JS injection vulnerabilities by safely escaping IDs and user input using `json.dumps` and HTML entity encoding.
- Prevented potential DoS crashes caused by curly braces in LLM output by replacing `.format()` with safe string replacement.
- Refactored language resolution into a `_resolve_language` helper method, implementing base language fallback (e.g., `fr-BE` -> `fr-FR`).
- Refactored date formatting to use a cleaner, dictionary-based approach.
- Consolidated i18n logic into a single file with robust fallback handling.
- Verified all changes with comprehensive unit and security tests.

Co-authored-by: Fu-Jie <33599649+Fu-Jie@users.noreply.github.com>
2026-02-12 17:41:52 +00:00
google-labs-jules[bot]
eb223e3e75 feat: Add full i18n support to Smart Mind Map plugin
- Consolidated smart_mind_map.py and smart_mind_map_cn.py into a single file.
- Added TRANSLATIONS dictionary supporting 18 languages (en-US, ko-KR, fr-FR, es-AR, en-CA, fr-CA, ja-JP, de-DE, zh-HK, it-IT, zh-CN, en-GB, es-MX, id-ID, es-ES, de-AT, en-AU, vi-VN, zh-TW).
- Implemented automatic language detection with fallback to browser/local storage.
- Added localized date formatting for various locales.
- Added explicit support for zh-TW (Traditional Chinese) with correct translations.
- Updated HTML/JS templates to use injected translations.
- Restored clickable "Markmap" link in the footer for all languages.
- Fixed SyntaxWarning in regex strings by properly escaping backslashes in standard strings.
- Implemented robust UI translation loading to prevent crashes on missing keys.
- Verified frontend rendering with Playwright and backend logic with unit tests.

Co-authored-by: Fu-Jie <33599649+Fu-Jie@users.noreply.github.com>
2026-02-12 17:15:50 +00:00
google-labs-jules[bot]
d1bbbd9071 feat: Add full i18n support to Smart Mind Map plugin
- Consolidated smart_mind_map.py and smart_mind_map_cn.py into a single file.
- Added TRANSLATIONS dictionary supporting 18 languages (en-US, ko-KR, fr-FR, es-AR, en-CA, fr-CA, ja-JP, de-DE, zh-HK, it-IT, zh-CN, en-GB, es-MX, id-ID, es-ES, de-AT, en-AU, vi-VN, zh-TW).
- Implemented automatic language detection with fallback to browser/local storage.
- Added localized date formatting for various locales.
- Added explicit support for zh-TW (Traditional Chinese) with correct translations.
- Updated HTML/JS templates to use injected translations.
- Restored clickable "Markmap" link in the footer for all languages.
- Fixed SyntaxWarning in regex strings.
- Verified frontend rendering with Playwright.

Co-authored-by: Fu-Jie <33599649+Fu-Jie@users.noreply.github.com>
2026-02-12 17:06:28 +00:00
google-labs-jules[bot]
840c77ea2f feat: Add full i18n support to Smart Mind Map plugin
- Consolidated smart_mind_map.py and smart_mind_map_cn.py into a single file.
- Added TRANSLATIONS dictionary supporting 18 languages (including explicit zh-TW support).
- Implemented automatic language detection with fallback to browser/local storage.
- Added localized date formatting for various locales.
- Added fallback mapping for regional variants (e.g., es-AR -> es-ES).
- Updated HTML/JS templates to use injected translations.
- Fixed SyntaxWarning in regex strings.
- Verified frontend rendering with Playwright and backend logic with unit tests.

Co-authored-by: Fu-Jie <33599649+Fu-Jie@users.noreply.github.com>
2026-02-12 16:54:48 +00:00
google-labs-jules[bot]
91ba7df086 feat: Add full i18n support to Smart Mind Map plugin
- Consolidated smart_mind_map.py and smart_mind_map_cn.py into a single file.
- Added TRANSLATIONS dictionary supporting 17 languages (en-US, ko-KR, fr-FR, es-AR, en-CA, fr-CA, ja-JP, de-DE, zh-HK, it-IT, zh-CN, en-GB, es-MX, id-ID, es-ES, de-AT, en-AU, vi-VN).
- Implemented automatic language detection with fallback to browser/local storage.
- Added localized date formatting for various locales.
- Added fallback mapping for regional variants (e.g. zh-TW -> zh-HK, es-AR -> es-ES).
- Updated HTML/JS templates to use injected translations.
- Fixed SyntaxWarning in regex strings.
- Verified frontend rendering with Playwright.

Co-authored-by: Fu-Jie <33599649+Fu-Jie@users.noreply.github.com>
2026-02-12 16:46:27 +00:00
google-labs-jules[bot]
fa636c7bc5 feat: Add full i18n support to Smart Mind Map plugin
- Consolidated smart_mind_map.py and smart_mind_map_cn.py into a single file.
- Added TRANSLATIONS dictionary supporting 17 languages (en-US, ko-KR, fr-FR, es-AR, en-CA, fr-CA, ja-JP, de-DE, zh-HK, it-IT, zh-CN, en-GB, es-MX, id-ID, es-ES, de-AT, en-AU, vi-VN).
- Implemented automatic language detection with fallback to browser/local storage.
- Added localized date formatting for various locales.
- Updated HTML/JS templates to use injected translations.
- Fixed SyntaxWarning in regex strings.
- Verified frontend rendering with Playwright.

Co-authored-by: Fu-Jie <33599649+Fu-Jie@users.noreply.github.com>
2026-02-12 16:33:04 +00:00
20 changed files with 2289 additions and 4347 deletions

View File

@@ -12,11 +12,11 @@ Reference: `.github/copilot-instructions.md`
### Bilingual Requirement
Every plugin **MUST** have bilingual versions for both code and documentation:
Every plugin **MUST** have a single internationalized code file and bilingual documentation:
- **Code**:
- English: `plugins/{type}/{name}/{name}.py`
- Chinese: `plugins/{type}/{name}/{name_cn}.py` (or `中文名.py`)
- **Code (i18n)**:
- `plugins/{type}/{name}/{name}.py`
- The single `.py` file must implement internal i18n (e.g., using `navigator.language` or backend headers) to support multiple languages natively, rather than splitting into separate files.
- **README**:
- English: `plugins/{type}/{name}/README.md`
- Chinese: `plugins/{type}/{name}/README_CN.md`
@@ -81,14 +81,13 @@ Reference: `.github/workflows/release.yml`
- **Release Information Compliance**: When a release is requested, the agent must generate a standard release summary (English commit title + bilingual bullet points) as defined in Section 3 & 5.
- **Default Action (Prepare Only)**: When performing a version bump or update, the agent should update all files locally but **STOP** before committing. Present the changes and the **proposed Release/Commit Message** to the user and wait for explicit confirmation to commit/push.
- **Consistency**: When bumping, update version in **ALL** locations:
1. English Code (`.py`)
2. Chinese Code (`.py`)
3. English README (`README.md`)
4. Chinese README (`README_CN.md`)
5. Docs Index (`docs/.../index.md`)
6. Docs Index CN (`docs/.../index.zh.md`)
7. Docs Detail (`docs/.../{name}.md`)
8. Docs Detail CN (`docs/.../{name}.zh.md`)
1. Code (`.py`)
2. English README (`README.md`)
3. Chinese README (`README_CN.md`)
4. Docs Index (`docs/.../index.md`)
5. Docs Index CN (`docs/.../index.zh.md`)
6. Docs Detail (`docs/.../{name}.md`)
7. Docs Detail CN (`docs/.../{name}.zh.md`)
### Automated Release Process
@@ -120,7 +119,7 @@ When the user confirms a release, the agent **MUST** follow these content standa
- Before committing, present a "Release Draft" containing:
- **Title**: e.g., `Release v0.1.1: [Plugin Name] - [Brief Summary]`
- **Changelog**: English-only list of commits since the last release, including hashes (e.g., `896de02 docs(config): reorder antigravity model alias example`).
- **Verification Status**: Confirm all 8+ files have been updated and synced.
- **Verification Status**: Confirm all 7+ files have been updated and synced.
3. **Internal Documentation**: Ensure "What's New" sections in READMEs and `docs/` match exactly the changes being released.
### Pull Request Check
@@ -134,7 +133,7 @@ When the user confirms a release, the agent **MUST** follow these content standa
Before committing:
- [ ] Code is bilingual and functional?
- [ ] Code is internal i18n supported (`.py`) and fully functional?
- [ ] Docstrings have updated version?
- [ ] READMEs are updated and bilingual?
- [ ] **Key Capabilities** in READMEs still cover all legacy core features + new features?

View File

@@ -8,27 +8,26 @@ This document defines the standard conventions and best practices for OpenWebUI
## 🏗️ 项目结构与命名 (Project Structure & Naming)
### 1. 双语版本要求 (Bilingual Version Requirements)
### 1. 语言与代码规范 (Language & Code Requirements)
#### 插件代码 (Plugin Code)
每个插件必须提供两个版本:
每个插件**必须**采用单文件国际化 (i18n) 设计。严禁为不同语言创建独立的源代码文件(如 `_cn.py`)。
1. **英文版本**: `plugin_name.py` - 英文界面、提示词和注释
2. **中文版本**: `plugin_name_cn.py` - 中文界面、提示词和注释
1. **单代码文件**: `plugins/{type}/{name}/{name}.py`
2. **内置 i18n**: 必须在代码中根据前端传来的用户语言(如 `__user__` 中的 `language` 或通过 `get_user_language` 脚本读取)动态切换界面显示、提示词和状态日志。
示例:
示例目录结构
```
plugins/actions/export_to_docx/
├── export_to_word.py # English version
├── export_to_word_cn.py # Chinese version
── README.md # English documentation
└── README_CN.md # Chinese documentation
├── export_to_word.py # 单个代码文件,内置多语言支持
├── README.md # 英文文档 (English documentation)
── README_CN.md # 中文文档
```
#### 文档 (Documentation)
每个插件目录必须包含双语 README 文件:
尽管代码是合一的,但为了市场展示和 SEO每个插件目录仍**必须**包含双语 README 文件:
- `README.md` - English documentation
- `README_CN.md` - 中文文档
@@ -58,12 +57,10 @@ plugins/actions/export_to_docx/
plugins/
├── actions/ # Action 插件 (用户触发的功能)
│ ├── my_action/
│ │ ├── my_action.py # English version
│ │ ├── 我的动作.py # Chinese version
│ │ ├── my_action.py # 单文件,内置 i18n
│ │ ├── README.md # English documentation
│ │ └── README_CN.md # Chinese documentation
│ ├── ACTION_PLUGIN_TEMPLATE.py # English template
│ ├── ACTION_PLUGIN_TEMPLATE_CN.py # Chinese template
│ ├── ACTION_PLUGIN_TEMPLATE.py # 通用 i18n 模板
│ └── README.md
├── filters/ # Filter 插件 (输入处理)
│ └── ...
@@ -474,7 +471,7 @@ async def get_user_language(self):
#### 适用场景与引导 (Usage Guidelines)
- **语言适配**: 动态获取界面语言 (`ru-RU`, `zh-CN`) 自动切换输出语言。
- **语言适配**: 动态获取界面语言 (`ru-RU`, `zh-CN`) 自动切换输出语言和 UI 翻译。这对于单文件 i18n 插件至关重要
- **时区处理**: 获取 `Intl.DateTimeFormat().resolvedOptions().timeZone` 处理时间。
- **客户端存储**: 读取 `localStorage` 中的用户偏好设置。
- **硬件能力**: 获取 `navigator.clipboard``navigator.geolocation` (需授权)。
@@ -932,8 +929,7 @@ Filter 实例是**单例 (Singleton)**。
### 1. ✅ 开发检查清单 (Development Checklist)
- [ ] 创建英文版插件代码 (`plugin_name.py`)
- [ ] 创建中文版插件代码 (`plugin_name_cn.py`)
- [ ] 代码实现了内置 i18n 逻辑 (`.py`)
- [ ] 编写英文 README (`README.md`)
- [ ] 编写中文 README (`README_CN.md`)
- [ ] 包含标准化文档字符串
@@ -941,7 +937,7 @@ Filter 实例是**单例 (Singleton)**。
- [ ] 使用 Lucide 图标
- [ ] 实现 Valves 配置
- [ ] 使用 logging 而非 print
- [ ] 测试双语界面
- [ ] 测试 i18n 界面适配
- [ ] **一致性检查**: 确保文档、代码、README 同步
- [ ] **README 结构**:
- **Key Capabilities** (英文) / **核心功能** (中文): 必须包含所有核心功能
@@ -988,13 +984,14 @@ Filter 实例是**单例 (Singleton)**。
2. **变更列表 (Bilingual Changes)**:
- 英文: Clear descriptions of technical/functional changes.
- 中文: 清晰描述用户可见的功能改进或修复。
3. **核查状态 (Verification)**: 确认版本号已在相关 8+ 处位置同步更新。
3. **核查状态 (Verification)**: 确认版本号已在相关 7+ 处位置同步更新1 个代码文件 + 2 个 README + 4 个 Docs 文件)
### 4. 🤖 Git 提交与推送规范 (Git Operations & Push Rules)
- **核心原则**: 默认仅进行**本地文件准备**更新代码、READMEs、Docs、版本号**严禁**在未获用户明确许可的情况下自动执行 `git commit``git push`
- **允许 (需确认)**: 只有在用户明确表示“发布”、“Commit it”、“Release”或“提交”后才允许直接推送到 `main` 分支或创建 PR。
- **功能分支**: 推荐在进行大规模重构或实验性功能开发时,创建功能分支 (`feature/xxx`) 进行隔离。
- **PR 提交**: 必须使用 GitHub CLI (`gh`) 创建 Pull Request。示例`gh pr create --title "feat: ..." --body "..."`
### 5. 🤝 贡献者认可规范 (Contributor Recognition)
@@ -1004,8 +1001,7 @@ Filter 实例是**单例 (Singleton)**。
## 📚 参考资源 (Reference Resources)
- [Action 插件模板 (英文)](plugins/actions/ACTION_PLUGIN_TEMPLATE.py)
- [Action 插件模板 (中文)](plugins/actions/ACTION_PLUGIN_TEMPLATE_CN.py)
- [Action 插件模板](plugins/actions/ACTION_PLUGIN_TEMPLATE.py)
- [插件开发指南](plugins/actions/PLUGIN_DEVELOPMENT_GUIDE.md)
- [Lucide Icons](https://lucide.dev/icons/)
- [OpenWebUI 文档](https://docs.openwebui.com/)

1
.gitignore vendored
View File

@@ -139,3 +139,4 @@ logs/
# OpenWebUI specific
# Add any specific ignores for OpenWebUI plugins if needed
.git-worktrees/

View File

@@ -23,7 +23,7 @@ Actions are interactive plugins that:
Intelligently analyzes text content and generates interactive mind maps with beautiful visualizations.
**Version:** 0.9.2
**Version:** 1.0.0
[:octicons-arrow-right-24: Documentation](smart-mind-map.md)

View File

@@ -23,7 +23,7 @@ Actions 是交互式插件,能够:
智能分析文本并生成交互式、精美的思维导图。
**版本:** 0.8.0
**版本:** 1.0.0
[:octicons-arrow-right-24: 查看文档](smart-mind-map.md)

View File

@@ -1,7 +1,7 @@
# Smart Mind Map
<span class="category-badge action">Action</span>
<span class="version-badge">v0.9.2</span>
<span class="version-badge">v1.0.0</span>
Intelligently analyzes text content and generates interactive mind maps for better visualization and understanding.
@@ -17,7 +17,8 @@ The Smart Mind Map plugin transforms text content into beautiful, interactive mi
- :material-gesture-swipe: **Rich Controls**: Zoom, reset view, expand level selector (All/2/3) and fullscreen
- :material-palette: **Theme Aware**: Auto-detects OpenWebUI light/dark theme with manual toggle
- :material-download: **One-Click Export**: Download high-res PNG, copy SVG, or copy Markdown source
- :material-translate: **Multi-language**: Matches output language to the input text
- :material-translate: **i18n Embedded**: One code file smartly detects frontend languages and translates the output.
- :material-arrow-all: **Auto-Sizing & Direct Embed**: Seamlessly scales to display massive canvas inline (requires setting toggle).
---
@@ -50,6 +51,7 @@ The Smart Mind Map plugin transforms text content into beautiful, interactive mi
| `MIN_TEXT_LENGTH` | integer | `100` | Minimum characters required before analysis runs |
| `CLEAR_PREVIOUS_HTML` | boolean | `false` | Clear previous plugin HTML instead of merging |
| `MESSAGE_COUNT` | integer | `1` | Number of recent messages to include (15) |
| `ENABLE_DIRECT_EMBED_MODE` | boolean | `false` | Enable inline full-width UI for OpenWebUI 0.8.0+ |
---

View File

@@ -1,7 +1,7 @@
# Smart Mind Map智能思维导图
<span class="category-badge action">Action</span>
<span class="version-badge">v0.9.2</span>
<span class="version-badge">v1.0.0</span>
智能分析文本内容,生成交互式思维导图,帮助你更直观地理解信息结构。
@@ -17,7 +17,8 @@ Smart Mind Map 会将文本转换成漂亮的交互式思维导图。插件会
- :material-gesture-swipe: **丰富控制**:缩放/重置、展开层级(全部/2/3 级)与全屏
- :material-palette: **主题感知**:自动检测 OpenWebUI 亮/暗色主题并支持手动切换
- :material-download: **一键导出**:下载高分辨率 PNG、复制 SVG 或 Markdown
- :material-translate: **多语言**:输出语言与输入文本一致
- :material-translate: **内置 i18n 语言识别**:单个文件自动检测控制台前端语言,无需繁杂的各种语言包版本。
- :material-arrow-all: **直出全屏版体验 (需配置开启)**:新版直出渲染抛开沙盒限制,纵情铺满屏幕,享受原生的图表体验。
---
@@ -50,6 +51,7 @@ Smart Mind Map 会将文本转换成漂亮的交互式思维导图。插件会
| `MIN_TEXT_LENGTH` | integer | `100` | 开始分析所需的最少字符数 |
| `CLEAR_PREVIOUS_HTML` | boolean | `false` | 生成新导图时是否清除之前的插件 HTML |
| `MESSAGE_COUNT` | integer | `1` | 用于生成的最近消息数量15 |
| `ENABLE_DIRECT_EMBED_MODE` | boolean | `false` | 是否开启沉浸式直出模式 (需要 Open WebUI 0.8.0+ ) |
---

View File

@@ -1,137 +1,81 @@
# Async Context Compression
# Async Context Compression Filter
<span class="category-badge filter">Filter</span>
<span class="version-badge">v1.2.2</span>
**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.3.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
Reduces token consumption in long conversations through intelligent summarization while maintaining conversational coherence.
This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.
## What's new in 1.3.0
- **Internationalization (i18n)**: Complete localization of user-facing messages across 9 languages (English, Chinese, Japanese, Korean, French, German, Spanish, Italian).
- **Smart Status Display**: Added `token_usage_status_threshold` valve (default 80%) to intelligently control when token usage status is shown.
- **Improved Performance**: Frontend language detection and logging are optimized to be completely non-blocking, maintaining lightning-fast TTFB.
- **Copilot SDK Integration**: Automatically detects and skips compression for copilot_sdk based models to prevent conflicts.
- **Configuration**: `debug_mode` is now set to `false` by default for a quieter production experience.
---
## Overview
## Core Features
The Async Context Compression filter helps manage token usage in long conversations by:
- Intelligently summarizing older messages
- Preserving important context
- Reducing API costs
- Maintaining conversation coherence
This is especially useful for:
- Long-running conversations
- Complex multi-turn discussions
- Cost optimization
- Token limit management
## Features
- :material-arrow-collapse-vertical: **Smart Compression**: AI-powered context summarization
- :material-clock-fast: **Async Processing**: Non-blocking background compression
- :material-memory: **Context Preservation**: Keeps important information
- :material-currency-usd-off: **Cost Reduction**: Minimize token usage
- :material-console: **Frontend Debugging**: Debug logs in browser console
- :material-alert-circle-check: **Enhanced Error Reporting**: Clear error status notifications
- :material-check-all: **Open WebUI v0.7.x Compatibility**: Dynamic DB session handling
- :material-account-convert: **Improved Compatibility**: Summary role changed to `assistant`
- :material-shield-check: **Enhanced Stability**: Resolved race conditions in state management
- :material-ruler: **Preflight Context Check**: Validates context fit before sending
- :material-format-align-justify: **Structure-Aware Trimming**: Preserves document structure
- :material-content-cut: **Native Tool Output Trimming**: Trims verbose tool outputs (Note: Non-native tool outputs are not fully injected into context)
- :material-chart-bar: **Detailed Token Logging**: Granular token breakdown
- :material-account-search: **Smart Model Matching**: Inherit config from base models
- :material-image-off: **Multimodal Support**: Images are preserved but tokens are **NOT** calculated
-**Full i18n Support**: Native localization across 9 languages.
- ✅ Automatic compression triggered by token thresholds.
- ✅ Asynchronous summarization that does not block chat responses.
- ✅ Persistent storage via Open WebUI's shared database connection (PostgreSQL, SQLite, etc.).
- ✅ Flexible retention policy to keep the first and last N messages.
- ✅ Smart injection of historical summaries back into the context.
- ✅ Structure-aware trimming that preserves document structure (headers, intro, conclusion).
- ✅ Native tool output trimming for cleaner context when using function calling.
- ✅ Real-time context usage monitoring with warning notifications (>90%).
- ✅ Detailed token logging for precise debugging and optimization.
- **Smart Model Matching**: Automatically inherits configuration from base models for custom presets.
- **Multimodal Support**: Images are preserved but their tokens are **NOT** calculated. Please adjust thresholds accordingly.
---
## Installation
## Installation & Configuration
1. Download the plugin file: [`async_context_compression.py`](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression)
2. Upload to OpenWebUI: **Admin Panel****Settings****Functions**
3. Configure compression settings
4. Enable the filter
### 1) Database (automatic)
- Uses Open WebUI's shared database connection; no extra configuration needed.
- The `chat_summary` table is created on first run.
### 2) Filter order
- Recommended order: pre-filters (<10) → this filter (10) → post-filters (>10).
---
## How It Works
## Configuration Parameters
```mermaid
graph TD
A[Incoming Messages] --> B{Token Count > Threshold?}
B -->|No| C[Pass Through]
B -->|Yes| D[Summarize Older Messages]
D --> E[Preserve Recent Messages]
E --> F[Combine Summary + Recent]
F --> G[Send to LLM]
```
| Parameter | Default | Description |
| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `priority` | `10` | Execution order; lower runs earlier. |
| `compression_threshold_tokens` | `64000` | Trigger asynchronous summary when total tokens exceed this value. Set to 50%-70% of your model's context window. |
| `max_context_tokens` | `128000` | Hard cap for context; older messages (except protected ones) are dropped if exceeded. |
| `keep_first` | `1` | Always keep the first N messages (protects system prompts). |
| `keep_last` | `6` | Always keep the last N messages to preserve recent context. |
| `summary_model` | `None` | Model for summaries. Strongly recommended to set a fast, economical model (e.g., `gemini-2.5-flash`, `deepseek-v3`). Falls back to the current chat model when empty. |
| `summary_model_max_context` | `0` | Max context tokens for the summary model. If 0, falls back to `model_thresholds` or global `max_context_tokens`. |
| `max_summary_tokens` | `16384` | Maximum tokens for the generated summary. |
| `summary_temperature` | `0.3` | Randomness for summary generation. Lower is more deterministic. |
| `model_thresholds` | `{}` | Per-model overrides for `compression_threshold_tokens` and `max_context_tokens` (useful for mixed models). |
| `enable_tool_output_trimming` | `false` | When enabled and `function_calling: "native"` is active, trims verbose tool outputs to extract only the final answer. |
| `debug_mode` | `false` | Log verbose debug info. Set to `false` in production. |
| `show_debug_log` | `false` | Print debug logs to browser console (F12). Useful for frontend debugging. |
| `show_token_usage_status` | `true` | Show token usage status notification in the chat interface. |
| `token_usage_status_threshold` | `80` | The minimum usage percentage (0-100) required to show a context usage status notification. |
---
## Configuration
## ⭐ Support
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `compression_threshold_tokens` | integer | `64000` | Trigger compression above this token count |
| `max_context_tokens` | integer | `128000` | Hard limit for context |
| `keep_first` | integer | `1` | Always keep the first N messages |
| `keep_last` | integer | `6` | Always keep the last N messages |
| `summary_model` | string | `None` | Model to use for summarization |
| `summary_model_max_context` | integer | `0` | Max context tokens for summary model |
| `max_summary_tokens` | integer | `16384` | Maximum tokens for the summary |
| `enable_tool_output_trimming` | boolean | `false` | Enable trimming of large tool outputs |
If this plugin has been useful, a star on [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) is a big motivation for me. Thank you for the support.
---
## Troubleshooting ❓
## Example
- **Initial system prompt is lost**: Keep `keep_first` greater than 0 to protect the initial message.
- **Compression effect is weak**: Raise `compression_threshold_tokens` or lower `keep_first` / `keep_last` to allow more aggressive compression.
- **Submit an Issue**: If you encounter any problems, please submit an issue on GitHub: [OpenWebUI Extensions Issues](https://github.com/Fu-Jie/openwebui-extensions/issues)
### Before Compression
## Changelog
```
[Message 1] User: Tell me about Python...
[Message 2] AI: Python is a programming language...
[Message 3] User: What about its history?
[Message 4] AI: Python was created by Guido...
[Message 5] User: And its features?
[Message 6] AI: Python has many features...
... (many more messages)
[Message 20] User: Current question
```
### After Compression
```
[Summary] Previous conversation covered Python basics,
history, features, and common use cases...
[Message 18] User: Recent question about decorators
[Message 19] AI: Decorators in Python are...
[Message 20] User: Current question
```
---
## Requirements
!!! note "Prerequisites"
- OpenWebUI v0.3.0 or later
- Access to an LLM for summarization
!!! tip "Best Practices"
- Set appropriate token thresholds based on your model's context window
- Preserve more recent messages for technical discussions
- Test compression settings in non-critical conversations first
---
## Troubleshooting
??? question "Compression not triggering?"
Check if the token count exceeds your configured threshold. Enable debug logging for more details.
??? question "Important context being lost?"
Increase the `preserve_recent` setting or lower the compression ratio.
---
## Source Code
[:fontawesome-brands-github: View on GitHub](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression){ .md-button }
See the full history on GitHub: [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions)

View File

@@ -1,137 +1,119 @@
# Async Context Compression异步上下文压缩
# 异步上下文压缩过滤器
<span class="category-badge filter">Filter</span>
<span class="version-badge">v1.2.2</span>
**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.3.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
通过智能摘要减少长对话的 token 消耗,同时保持对话连贯
> **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明
本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。
## 1.3.0 版本更新
- **国际化 (i18n) 支持**: 完成了所有用户可见消息的本地化,现已原生支持 9 种语言(含中、英、日、韩及欧洲主要语言)。
- **智能状态显示**: 新增 `token_usage_status_threshold` 阀门(默认 80%),可以智能控制何时显示 Token 用量状态,减少不必要的打扰。
- **性能大幅优化**: 对前端语言检测和日志处理流程进行了非阻塞重构完全不影响首字节响应时间TTFB保持毫秒级极速推流。
- **Copilot SDK 兼容**: 自动检测并跳过基于 `copilot_sdk` 模型的上下文压缩,避免冲突。
- **配置项调整**: 为了提供更安静的生产环境体验,`debug_mode` 现已默认设置为 `false`
---
## 概览
## 核心特性
Async Context Compression 过滤器通过以下方式帮助管理长对话的 token 使用:
-**全方位国际化**: 原生支持 9 种界面语言。
-**自动压缩**: 基于 Token 阈值自动触发上下文压缩。
-**异步摘要**: 后台生成摘要,不阻塞当前对话响应。
-**持久化存储**: 复用 Open WebUI 共享数据库连接,自动支持 PostgreSQL/SQLite 等。
-**灵活保留策略**: 可配置保留对话头部和尾部消息,确保关键信息连贯。
-**智能注入**: 将历史摘要智能注入到新上下文中。
-**结构感知裁剪**: 智能折叠过长消息,保留文档骨架(标题、首尾)。
-**原生工具输出裁剪**: 支持裁剪冗长的工具调用输出。
-**实时监控**: 实时监控上下文使用情况,超过 90% 发出警告。
-**详细日志**: 提供精确的 Token 统计日志,便于调试。
-**智能模型匹配**: 自定义模型自动继承基础模型的阈值配置。
-**多模态支持**: 图片内容会被保留,但其 Token **不参与计算**。请相应调整阈值。
- 智能总结较早的消息
- 保留关键信息
- 降低 API 成本
- 保持对话一致性
特别适用于:
- 长时间会话
- 多轮复杂讨论
- 成本优化
- 上下文长度控制
## 功能特性
- :material-arrow-collapse-vertical: **智能压缩**AI 驱动的上下文摘要
- :material-clock-fast: **异步处理**:后台非阻塞压缩
- :material-memory: **保留上下文**:尽量保留重要信息
- :material-currency-usd-off: **降低成本**:减少 token 使用
- :material-console: **前端调试**:支持浏览器控制台日志
- :material-alert-circle-check: **增强错误报告**:清晰的错误状态通知
- :material-check-all: **Open WebUI v0.7.x 兼容性**:动态数据库会话处理
- :material-account-convert: **兼容性提升**:摘要角色改为 `assistant`
- :material-shield-check: **稳定性增强**:解决状态管理竞态条件
- :material-ruler: **预检上下文检查**:发送前验证上下文是否超限
- :material-format-align-justify: **结构感知裁剪**:保留文档结构的智能裁剪
- :material-content-cut: **原生工具输出裁剪**:自动裁剪冗长的工具输出(注意:非原生工具调用输出不会完整注入上下文)
- :material-chart-bar: **详细 Token 日志**:提供细粒度的 Token 统计
- :material-account-search: **智能模型匹配**:自定义模型自动继承基础模型配置
- :material-image-off: **多模态支持**:图片内容保留但 Token **不参与计算**
详细的工作原理和流程请参考 [工作流程指南](WORKFLOW_GUIDE_CN.md)。
---
## 安装
## 安装与配置
1. 下载插件文件:[`async_context_compression.py`](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression)
2. 上传到 OpenWebUI**Admin Panel** → **Settings****Functions**
3. 配置压缩参数
4. 启用过滤器
### 1. 数据库(自动)
- 自动使用 Open WebUI 的共享数据库连接,**无需额外配置**。
- 首次运行自动创建 `chat_summary` 表。
### 2. 过滤器顺序
- 建议顺序:前置过滤器(<10→ 本过滤器10→ 后置过滤器(>10
---
## 工作原理
## 配置参数
```mermaid
graph TD
A[Incoming Messages] --> B{Token Count > Threshold?}
B -->|No| C[Pass Through]
B -->|Yes| D[Summarize Older Messages]
D --> E[Preserve Recent Messages]
E --> F[Combine Summary + Recent]
F --> G[Send to LLM]
您可以在过滤器的设置中调整以下参数:
### 核心参数
| 参数 | 默认值 | 描述 |
| :----------------------------- | :------- | :------------------------------------------------------------------------------------ |
| `priority` | `10` | 过滤器执行顺序,数值越小越先执行。 |
| `compression_threshold_tokens` | `64000` | **重要**: 当上下文总 Token 超过此值时后台生成摘要,建议设为模型上下文窗口的 50%-70%。 |
| `max_context_tokens` | `128000` | **重要**: 上下文硬上限,超过即移除最早消息(保留受保护消息)。 |
| `keep_first` | `1` | 始终保留对话开始的 N 条消息,保护系统提示或环境变量。 |
| `keep_last` | `6` | 始终保留对话末尾的 N 条消息,确保最近上下文连贯。 |
### 摘要生成配置
| 参数 | 默认值 | 描述 |
| :-------------------- | :------ | :------------------------------------------------------------------------------------------------------------------------------------------ |
| `summary_model` | `None` | 用于生成摘要的模型 ID。**强烈建议**配置快速、经济、上下文窗口大的模型(如 `gemini-2.5-flash``deepseek-v3`)。留空则尝试复用当前对话模型。 |
| `summary_model_max_context` | `0` | 摘要模型的最大上下文 Token 数。如果为 0则回退到 `model_thresholds` 或全局 `max_context_tokens`。 |
| `max_summary_tokens` | `16384` | 生成摘要时允许的最大 Token 数。 |
| `summary_temperature` | `0.1` | 控制摘要生成的随机性,较低的值结果更稳定。 |
### 高级配置
#### `model_thresholds` (模型特定阈值)
这是一个字典配置,可为特定模型 ID 覆盖全局 `compression_threshold_tokens``max_context_tokens`,适用于混合不同上下文窗口的模型。
**默认包含 GPT-4、Claude 3.5、Gemini 1.5/2.0、Qwen 2.5/3、DeepSeek V3 等推荐阈值。**
**配置示例:**
```json
{
"gpt-4": {
"compression_threshold_tokens": 8000,
"max_context_tokens": 32000
},
"gemini-2.5-flash": {
"compression_threshold_tokens": 734000,
"max_context_tokens": 1048576
}
}
```
---
## 配置项
| 选项 | 类型 | 默认值 | 说明 |
|--------|------|---------|-------------|
| `compression_threshold_tokens` | integer | `64000` | 超过该 token 数触发压缩 |
| `max_context_tokens` | integer | `128000` | 上下文硬性上限 |
| `keep_first` | integer | `1` | 始终保留的前 N 条消息 |
| `keep_last` | integer | `6` | 始终保留的后 N 条消息 |
| `summary_model` | string | `None` | 用于摘要的模型 |
| `summary_model_max_context` | integer | `0` | 摘要模型的最大上下文 Token 数 |
| `max_summary_tokens` | integer | `16384` | 摘要的最大 token 数 |
| `enable_tool_output_trimming` | boolean | `false` | 启用长工具输出裁剪 |
| 参数 | 默认值 | 描述 |
| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------- |
| `enable_tool_output_trimming` | `false` | 启用时,若 `function_calling: "native"` 激活,将裁剪冗长的工具输出以仅提取最终答案。 |
| `debug_mode` | `false` | 是否在 Open WebUI 的控制台日志中打印详细的调试信息。生产环境默认且建议设为 `false`。 |
| `show_debug_log` | `false` | 是否在浏览器控制台 (F12) 打印调试日志。便于前端调试。 |
| `show_token_usage_status` | `true` | 是否在对话结束时显示 Token 使用情况的状态通知。 |
| `token_usage_status_threshold` | `80` | 触发显示上下文用量状态通知的最低百分比阈值 (0-100)。 |
---
## 示例
## ⭐ 支持
### 压缩前
如果这个插件对你有帮助,欢迎到 [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) 点个 Star这将是我持续改进的动力感谢支持。
```
[Message 1] User: Tell me about Python...
[Message 2] AI: Python is a programming language...
[Message 3] User: What about its history?
[Message 4] AI: Python was created by Guido...
[Message 5] User: And its features?
[Message 6] AI: Python has many features...
... (many more messages)
[Message 20] User: Current question
```
## 故障排除 (Troubleshooting) ❓
### 压缩后
- **初始系统提示丢失**:将 `keep_first` 设置为大于 0。
- **压缩效果不明显**:提高 `compression_threshold_tokens`,或降低 `keep_first` / `keep_last` 以增强压缩力度。
- **提交 Issue**: 如果遇到任何问题,请在 GitHub 上提交 Issue[OpenWebUI Extensions Issues](https://github.com/Fu-Jie/openwebui-extensions/issues)
```
[Summary] Previous conversation covered Python basics,
history, features, and common use cases...
## 更新日志
[Message 18] User: Recent question about decorators
[Message 19] AI: Decorators in Python are...
[Message 20] User: Current question
```
---
## 运行要求
!!! note "前置条件"
- OpenWebUI v0.3.0 及以上
- 需要可用的 LLM 用于摘要
!!! tip "最佳实践"
- 根据模型上下文窗口设置合适的 token 阈值
- 技术讨论可适当提高 `preserve_recent`
- 先在非关键对话中测试压缩效果
---
## 常见问题
??? question "没有触发压缩?"
检查 token 数是否超过配置的阈值,并开启调试日志了解细节。
??? question "重要上下文丢失?"
提高 `preserve_recent` 或降低压缩比例。
---
## 源码
[:fontawesome-brands-github: 在 GitHub 查看](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression){ .md-button }
完整历史请查看 GitHub 项目: [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions)

View File

@@ -22,7 +22,7 @@ Filters act as middleware in the message pipeline:
Reduces token consumption in long conversations through intelligent summarization while maintaining coherence.
**Version:** 1.2.2
**Version:** 1.3.0
[:octicons-arrow-right-24: Documentation](async-context-compression.md)

View File

@@ -22,7 +22,7 @@ Filter 充当消息管线中的中间件:
通过智能总结减少长对话的 token 消耗,同时保持连贯性。
**版本:** 1.2.2
**版本:** 1.3.0
[:octicons-arrow-right-24: 查看文档](async-context-compression.md)

View File

@@ -2,21 +2,26 @@
Smart Mind Map is a powerful OpenWebUI action plugin that intelligently analyzes long-form text content and automatically generates interactive mind maps, helping users structure and visualize knowledge.
**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 0.9.2 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.0.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
## What's New in v0.9.2
## What's New in v1.0.0
**Language Rule Alignment**
### Direct Embed & UI Refinements
- **Input Language First**: Mind map output now strictly matches the input text language.
- **Consistent Behavior**: Matches the infographic language rule for predictable multilingual output.
- **Native Multi-language UI (i18n)**: The plugin interface (buttons, settings, status) now automatically adapts to your browser's language setting for a seamless global experience.
- **Direct Embed Mode**: Introduced a native-like inline display mode for Open WebUI 0.8.0+, enabling a seamless full-width canvas.
- **Adaptive Auto-Sizing**: Mind map now dynamically scales its height and perfectly refits to the window to eliminate scrollbar artifacts.
- **Subdued & Compact UI**: Completely redesigned the header tooling bar to a slender, single-line configuration to maximize visual rendering space.
- **Configurable Experience**: Added `ENABLE_DIRECT_EMBED_MODE` valve to explicitly toggle the new inline rendering behavior.
## Key Features 🔑
-**Intelligent Text Analysis**: Automatically identifies core themes, key concepts, and hierarchical structures.
-**Native Multi-language UI**: Automatic interface translation (i18n) based on system language for a native feel.
-**Interactive Visualization**: Generates beautiful interactive mind maps based on Markmap.js.
-**Direct Embed Mode**: (Optional) For Open WebUI 0.8.0+, render natively inline to fill entire UI width.
-**High-Resolution PNG Export**: Export mind maps as high-quality PNG images (9x scale).
-**Complete Control Panel**: Zoom controls, expand level selection, and fullscreen mode.
-**Complete Control Panel**: Zoom controls, expand level selection, and fullscreen mode within a compact toolbar.
-**Theme Switching**: Manual theme toggle button with automatic theme detection.
-**Image Output Mode**: Generate static SVG images embedded directly in Markdown for cleaner history.
@@ -37,6 +42,7 @@ Smart Mind Map is a powerful OpenWebUI action plugin that intelligently analyzes
| `CLEAR_PREVIOUS_HTML` | `false` | Whether to clear previous plugin-generated HTML content. |
| `MESSAGE_COUNT` | `1` | Number of recent messages to use for generation (1-5). |
| `OUTPUT_MODE` | `html` | Output mode: `html` (interactive) or `image` (static). |
| `ENABLE_DIRECT_EMBED_MODE` | `false` | Enable Direct Embed Mode (Open WebUI 0.8.0+ native layout) instead of Legacy Mode. |
## ⭐ Support

View File

@@ -2,21 +2,26 @@
思维导图是一个强大的 OpenWebUI 动作插件,能够智能分析长篇文本内容,自动生成交互式思维导图,帮助用户结构化和可视化知识。
**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 0.9.2 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
**作者:** [Fu-Jie](https://github.com/Fu-Jie) | **版本:** 1.0.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
## v0.9.2 更新亮点
## v1.0.0 最新更新
**语言规则对齐**
### 嵌入式直出与 UI 细节全线重构
- **输入语言优先**:导图输出严格与输入文本语言一致
- **一致性提升**:与信息图语言规则保持一致,多语言输出更可预期
- **原生多语言界面 (Native i18n)**:插件界面(按钮、设置说明、状态提示)现在会根据您浏览器的语言设置自动适配系统语言
- **原生态嵌入模式 (Direct Embed)**:针对 Open WebUI 0.8.0+ 的前端架构支持了纯正的内容内联Inline直出模式不再受气泡和 Markdown 隔离,真正撑满屏幕宽度
- **自动响应边界 (Auto-Sizing)**:突破以前高度僵死的问题。思维导图现在可以根据您的当前屏幕大小弹性伸缩(动态 `clamp()` 高度),彻底消灭丑陋的局部滚动条与白边。
- **极简专业 UI (Compact UI)**:推倒重做了头部的菜单栏,统一使用了一套干净、单行的极简全透明微拟物 Toolbar 设计,为导图画布省下极大的垂直空间。
- **模式配置自由**:为了照顾阅读流连贯的习惯,新增了 `ENABLE_DIRECT_EMBED_MODE` 配置开关。您必须在设置中显式开启才能体验宽广内联全屏模式。
## 核心特性 🔑
-**智能文本分析**:自动识别文本的核心主题、关键概念和层次结构。
-**原生多语言界面**:根据系统语言自动切换界面语言 (i18n),提供原生交互体验。
-**交互式可视化**:基于 Markmap.js 生成美观的交互式思维导图。
-**直出全景内嵌 (Direct Embed)**(可选开关) 对于 Open WebUI 0.8.0+,直接填补整个前端宽度,去除气泡剥离感。
-**高分辨率 PNG 导出**:导出高质量的 PNG 图片9 倍分辨率)。
-**完整控制面板**:缩放控制、展开层级选择、全屏模式
-**完整控制面板**极简清爽的单行大屏缩放控制、展开层级选择、全局全屏等核心操作
-**主题切换**:手动主题切换按钮与自动主题检测。
-**图片输出模式**:生成静态 SVG 图片直接嵌入 Markdown聊天记录更简洁。
@@ -37,6 +42,7 @@
| `CLEAR_PREVIOUS_HTML` | `false` | 在生成新的思维导图时,是否清除之前的 HTML 内容。 |
| `MESSAGE_COUNT` | `1` | 用于生成思维导图的最近消息数量1-5。 |
| `OUTPUT_MODE` | `html` | 输出模式:`html`(交互式)或 `image`(静态图片)。 |
| `ENABLE_DIRECT_EMBED_MODE` | `false` | 是否开启沉浸式直出嵌入模式(需要 Open WebUI v0.8.0+ 环境)。如果保持 `false` 将会维持旧版的对话流 Markdown 渲染模式。 |
## ⭐ 支持

File diff suppressed because it is too large Load Diff

Binary file not shown.

Before

Width:  |  Height:  |  Size: 216 KiB

File diff suppressed because it is too large Load Diff

View File

@@ -1,18 +1,22 @@
# Async Context Compression Filter
**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.2.2 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.3.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.
## What's new in 1.2.2
## What's new in 1.3.0
- **Critical Fix**: Resolved `TypeError: 'str' object is not callable` caused by variable name conflict in logging function.
- **Compatibility**: Enhanced `params` handling to support Pydantic objects, improving compatibility with different OpenWebUI versions.
- **Internationalization (i18n)**: Complete localization of user-facing messages across 9 languages (English, Chinese, Japanese, Korean, French, German, Spanish, Italian).
- **Smart Status Display**: Added `token_usage_status_threshold` valve (default 80%) to intelligently control when token usage status is shown.
- **Improved Performance**: Frontend language detection and logging are optimized to be completely non-blocking, maintaining lightning-fast TTFB.
- **Copilot SDK Integration**: Automatically detects and skips compression for copilot_sdk based models to prevent conflicts.
- **Configuration**: `debug_mode` is now set to `false` by default for a quieter production experience.
---
## Core Features
-**Full i18n Support**: Native localization across 9 languages.
- ✅ Automatic compression triggered by token thresholds.
- ✅ Asynchronous summarization that does not block chat responses.
- ✅ Persistent storage via Open WebUI's shared database connection (PostgreSQL, SQLite, etc.).
@@ -55,8 +59,10 @@ This filter reduces token consumption in long conversations through intelligent
| `summary_temperature` | `0.3` | Randomness for summary generation. Lower is more deterministic. |
| `model_thresholds` | `{}` | Per-model overrides for `compression_threshold_tokens` and `max_context_tokens` (useful for mixed models). |
| `enable_tool_output_trimming` | `false` | When enabled and `function_calling: "native"` is active, trims verbose tool outputs to extract only the final answer. |
| `debug_mode` | `true` | Log verbose debug info. Set to `false` in production. |
| `debug_mode` | `false` | Log verbose debug info. Set to `false` in production. |
| `show_debug_log` | `false` | Print debug logs to browser console (F12). Useful for frontend debugging. |
| `show_token_usage_status` | `true` | Show token usage status notification in the chat interface. |
| `token_usage_status_threshold` | `80` | The minimum usage percentage (0-100) required to show a context usage status notification. |
---

View File

@@ -1,20 +1,24 @@
# 异步上下文压缩过滤器
**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.2.2 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.3.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
> **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。
本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。
## 1.2.2 版本更新
## 1.3.0 版本更新
- **严重错误修复**: 解决了因日志函数变量名冲突导致的 `TypeError: 'str' object is not callable` 错误
- **兼容性增强**: 改进了 `params` 处理逻辑以支持 Pydantic 对象,提高了对不同 OpenWebUI 版本的兼容性
- **国际化 (i18n) 支持**: 完成了所有用户可见消息的本地化,现已原生支持 9 种语言(含中、英、日、韩及欧洲主要语言)
- **智能状态显示**: 新增 `token_usage_status_threshold` 阀门(默认 80%),可以智能控制何时显示 Token 用量状态,减少不必要的打扰
- **性能大幅优化**: 对前端语言检测和日志处理流程进行了非阻塞重构完全不影响首字节响应时间TTFB保持毫秒级极速推流。
- **Copilot SDK 兼容**: 自动检测并跳过基于 `copilot_sdk` 模型的上下文压缩,避免冲突。
- **配置项调整**: 为了提供更安静的生产环境体验,`debug_mode` 现已默认设置为 `false`
---
## 核心特性
-**全方位国际化**: 原生支持 9 种界面语言。
-**自动压缩**: 基于 Token 阈值自动触发上下文压缩。
-**异步摘要**: 后台生成摘要,不阻塞当前对话响应。
-**持久化存储**: 复用 Open WebUI 共享数据库连接,自动支持 PostgreSQL/SQLite 等。
@@ -93,9 +97,10 @@
| 参数 | 默认值 | 描述 |
| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------- |
| `enable_tool_output_trimming` | `false` | 启用时,若 `function_calling: "native"` 激活,将裁剪冗长的工具输出以仅提取最终答案。 |
| `debug_mode` | `true` | 是否在 Open WebUI 的控制台日志中打印详细的调试信息(如 Token 计数、压缩进度、数据库操作等)。生产环境建议设为 `false`。 |
| `debug_mode` | `false` | 是否在 Open WebUI 的控制台日志中打印详细的调试信息。生产环境默认且建议设为 `false`。 |
| `show_debug_log` | `false` | 是否在浏览器控制台 (F12) 打印调试日志。便于前端调试。 |
| `show_token_usage_status` | `true` | 是否在对话结束时显示 Token 使用情况的状态通知。 |
| `token_usage_status_threshold` | `80` | 触发显示上下文用量状态通知的最低百分比阈值 (0-100)。 |
---

View File

@@ -5,17 +5,17 @@ author: Fu-Jie
author_url: https://github.com/Fu-Jie/openwebui-extensions
funding_url: https://github.com/open-webui
description: Reduces token consumption in long conversations while maintaining coherence through intelligent summarization and message compression.
version: 1.2.2
version: 1.3.0
openwebui_id: b1655bc8-6de9-4cad-8cb5-a6f7829a02ce
license: MIT
═══════════════════════════════════════════════════════════════════════════════
📌 What's new in 1.2.1
📌 What's new in 1.3.0
═══════════════════════════════════════════════════════════════════════════════
✅ Smart Configuration: Automatically detects base model settings for custom models and adds `summary_model_max_context` for independent summary limits.
Performance & Refactoring: Optimized threshold parsing with caching and removed redundant code for better efficiency.
Bug Fixes & Modernization: Fixed `datetime` deprecation warnings and corrected type annotations.
✅ Smart Status Display: Added `token_usage_status_threshold` valve (default 80%) to control when token usage status is shown, reducing unnecessary notifications.
Copilot SDK Integration: Automatically detects and skips compression for copilot_sdk based models to prevent conflicts.
Improved User Experience: Status messages now only appear when token usage exceeds the configured threshold, keeping the interface cleaner.
═══════════════════════════════════════════════════════════════════════════════
📌 Overview
@@ -150,7 +150,7 @@ summary_temperature
Description: Controls the randomness of the summary generation. Lower values produce more deterministic output.
debug_mode
Default: true
Default: false
Description: Prints detailed debug information to the log. Recommended to set to `false` in production.
show_debug_log
@@ -268,6 +268,7 @@ import hashlib
import time
import contextlib
import logging
from functools import lru_cache
# Setup logger
logger = logging.getLogger(__name__)
@@ -391,6 +392,130 @@ class ChatSummary(owui_Base):
)
TRANSLATIONS = {
"en-US": {
"status_context_usage": "Context Usage (Estimated): {tokens} / {max_tokens} Tokens ({ratio}%)",
"status_high_usage": " | ⚠️ High Usage",
"status_loaded_summary": "Loaded historical summary (Hidden {count} historical messages)",
"status_context_summary_updated": "Context Summary Updated: {tokens} / {max_tokens} Tokens ({ratio}%)",
"status_generating_summary": "Generating context summary in background...",
"status_summary_error": "Summary Error: {error}",
"summary_prompt_prefix": "【Previous Summary: The following is a summary of the historical conversation, provided for context only. Do not reply to the summary content itself; answer the subsequent latest questions directly.】\n\n",
"summary_prompt_suffix": "\n\n---\nBelow is the recent conversation:",
"tool_trimmed": "... [Tool outputs trimmed]\n{content}",
"content_collapsed": "\n... [Content collapsed] ...\n",
},
"zh-CN": {
"status_context_usage": "上下文用量 (预估): {tokens} / {max_tokens} Tokens ({ratio}%)",
"status_high_usage": " | ⚠️ 用量较高",
"status_loaded_summary": "已加载历史总结 (隐藏了 {count} 条历史消息)",
"status_context_summary_updated": "上下文总结已更新: {tokens} / {max_tokens} Tokens ({ratio}%)",
"status_generating_summary": "正在后台生成上下文总结...",
"status_summary_error": "总结生成错误: {error}",
"summary_prompt_prefix": "【前情提要:以下是历史对话的总结,仅供上下文参考。请不要回复总结内容本身,直接回答之后最新的问题。】\n\n",
"summary_prompt_suffix": "\n\n---\n以下是最近的对话:",
"tool_trimmed": "... [工具输出已裁剪]\n{content}",
"content_collapsed": "\n... [内容已折叠] ...\n",
},
"zh-HK": {
"status_context_usage": "上下文用量 (預估): {tokens} / {max_tokens} Tokens ({ratio}%)",
"status_high_usage": " | ⚠️ 用量較高",
"status_loaded_summary": "已載入歷史總結 (隱藏了 {count} 條歷史訊息)",
"status_context_summary_updated": "上下文總結已更新: {tokens} / {max_tokens} Tokens ({ratio}%)",
"status_generating_summary": "正在後台生成上下文總結...",
"status_summary_error": "總結生成錯誤: {error}",
"summary_prompt_prefix": "【前情提要:以下是歷史對話的總結,僅供上下文參考。請不要回覆總結內容本身,直接回答之後最新的問題。】\n\n",
"summary_prompt_suffix": "\n\n---\n以下是最近的對話:",
"tool_trimmed": "... [工具輸出已裁剪]\n{content}",
"content_collapsed": "\n... [內容已折疊] ...\n",
},
"zh-TW": {
"status_context_usage": "上下文用量 (預估): {tokens} / {max_tokens} Tokens ({ratio}%)",
"status_high_usage": " | ⚠️ 用量較高",
"status_loaded_summary": "已載入歷史總結 (隱藏了 {count} 條歷史訊息)",
"status_context_summary_updated": "上下文總結已更新: {tokens} / {max_tokens} Tokens ({ratio}%)",
"status_generating_summary": "正在後台生成上下文總結...",
"status_summary_error": "總結生成錯誤: {error}",
"summary_prompt_prefix": "【前情提要:以下是歷史對話的總結,僅供上下文参考。請不要回覆總結內容本身,直接回答之後最新的問題。】\n\n",
"summary_prompt_suffix": "\n\n---\n以下是最近的對話:",
"tool_trimmed": "... [工具輸出已裁剪]\n{content}",
"content_collapsed": "\n... [內容已折疊] ...\n",
},
"ja-JP": {
"status_context_usage": "コンテキスト使用量 (推定): {tokens} / {max_tokens} トークン ({ratio}%)",
"status_high_usage": " | ⚠️ 使用量高",
"status_loaded_summary": "履歴の要約を読み込みました ({count} 件の履歴メッセージを非表示)",
"status_context_summary_updated": "コンテキストの要約が更新されました: {tokens} / {max_tokens} トークン ({ratio}%)",
"status_generating_summary": "バックグラウンドでコンテキスト要約を生成しています...",
"status_summary_error": "要約エラー: {error}",
"summary_prompt_prefix": "【これまでのあらすじ:以下は過去の会話の要約であり、コンテキストの参考としてのみ提供されます。要約の内容自体には返答せず、その後の最新の質問に直接答えてください。】\n\n",
"summary_prompt_suffix": "\n\n---\n以下は最近の会話です:",
"tool_trimmed": "... [ツールの出力をトリミングしました]\n{content}",
"content_collapsed": "\n... [コンテンツが折りたたまれました] ...\n",
},
"ko-KR": {
"status_context_usage": "컨텍스트 사용량 (예상): {tokens} / {max_tokens} 토큰 ({ratio}%)",
"status_high_usage": " | ⚠️ 사용량 높음",
"status_loaded_summary": "이전 요약 불러옴 ({count}개의 이전 메시지 숨김)",
"status_context_summary_updated": "컨텍스트 요약 업데이트됨: {tokens} / {max_tokens} 토큰 ({ratio}%)",
"status_generating_summary": "백그라운드에서 컨텍스트 요약 생성 중...",
"status_summary_error": "요약 오류: {error}",
"summary_prompt_prefix": "【이전 요약: 다음은 이전 대화의 요약이며 문맥 참고용으로만 제공됩니다. 요약 내용 자체에 답하지 말고 последу의 최신 질문에 직접 답하세요.】\n\n",
"summary_prompt_suffix": "\n\n---\n다음은 최근 대화입니다:",
"tool_trimmed": "... [도구 출력 잘림]\n{content}",
"content_collapsed": "\n... [내용 접힘] ...\n",
},
"fr-FR": {
"status_context_usage": "Utilisation du contexte (estimée) : {tokens} / {max_tokens} jetons ({ratio}%)",
"status_high_usage": " | ⚠️ Utilisation élevée",
"status_loaded_summary": "Résumé historique chargé ({count} messages d'historique masqués)",
"status_context_summary_updated": "Résumé du contexte mis à jour : {tokens} / {max_tokens} jetons ({ratio}%)",
"status_generating_summary": "Génération du résumé du contexte en arrière-plan...",
"status_summary_error": "Erreur de résumé : {error}",
"summary_prompt_prefix": "【Résumé précédent : Ce qui suit est un résumé de la conversation historique, fourni uniquement pour le contexte. Ne répondez pas au contenu du résumé lui-même ; répondez directement aux dernières questions.】\n\n",
"summary_prompt_suffix": "\n\n---\nVoici la conversation récente :",
"tool_trimmed": "... [Sorties d'outils coupées]\n{content}",
"content_collapsed": "\n... [Contenu réduit] ...\n",
},
"de-DE": {
"status_context_usage": "Kontextnutzung (geschätzt): {tokens} / {max_tokens} Tokens ({ratio}%)",
"status_high_usage": " | ⚠️ Hohe Nutzung",
"status_loaded_summary": "Historische Zusammenfassung geladen ({count} historische Nachrichten ausgeblendet)",
"status_context_summary_updated": "Kontextzusammenfassung aktualisiert: {tokens} / {max_tokens} Tokens ({ratio}%)",
"status_generating_summary": "Kontextzusammenfassung wird im Hintergrund generiert...",
"status_summary_error": "Zusammenfassungsfehler: {error}",
"summary_prompt_prefix": "【Vorherige Zusammenfassung: Das Folgende ist eine Zusammenfassung der historischen Konversation, die nur als Kontext dient. Antworten Sie nicht auf den Inhalt der Zusammenfassung selbst, sondern direkt auf die nachfolgenden neuesten Fragen.】\n\n",
"summary_prompt_suffix": "\n\n---\nHier ist die jüngste Konversation:",
"tool_trimmed": "... [Werkzeugausgaben gekürzt]\n{content}",
"content_collapsed": "\n... [Inhalt ausgeblendet] ...\n",
},
"es-ES": {
"status_context_usage": "Uso del contexto (estimado): {tokens} / {max_tokens} Tokens ({ratio}%)",
"status_high_usage": " | ⚠️ Uso elevado",
"status_loaded_summary": "Resumen histórico cargado ({count} mensajes históricos ocultos)",
"status_context_summary_updated": "Resumen del contexto actualizado: {tokens} / {max_tokens} Tokens ({ratio}%)",
"status_generating_summary": "Generando resumen del contexto en segundo plano...",
"status_summary_error": "Error de resumen: {error}",
"summary_prompt_prefix": "【Resumen anterior: El siguiente es un resumen de la conversación histórica, proporcionado solo como contexto. No responda al contenido del resumen en sí; responda directamente a las preguntas más recientes.】\n\n",
"summary_prompt_suffix": "\n\n---\nA continuación se muestra la conversación reciente:",
"tool_trimmed": "... [Salidas de herramientas recortadas]\n{content}",
"content_collapsed": "\n... [Contenido contraído] ...\n",
},
"it-IT": {
"status_context_usage": "Utilizzo contesto (stimato): {tokens} / {max_tokens} Token ({ratio}%)",
"status_high_usage": " | ⚠️ Utilizzo elevato",
"status_loaded_summary": "Riepilogo storico caricato ({count} messaggi storici nascosti)",
"status_context_summary_updated": "Riepilogo contesto aggiornato: {tokens} / {max_tokens} Token ({ratio}%)",
"status_generating_summary": "Generazione riepilogo contesto in background...",
"status_summary_error": "Errore riepilogo: {error}",
"summary_prompt_prefix": "【Riepilogo precedente: Il seguente è un riepilogo della conversazione storica, fornito solo per contesto. Non rispondere al contenuto del riepilogo stesso; rispondi direttamente alle domande più recenti.】\n\n",
"summary_prompt_suffix": "\n\n---\nDi seguito è riportata la conversazione recente:",
"tool_trimmed": "... [Output degli strumenti tagliati]\n{content}",
"content_collapsed": "\n... [Contenuto compresso] ...\n",
},
}
# Global cache for tiktoken encoding
TIKTOKEN_ENCODING = None
if tiktoken:
@@ -400,6 +525,26 @@ if tiktoken:
logger.error(f"[Init] Failed to load tiktoken encoding: {e}")
@lru_cache(maxsize=1024)
def _get_cached_tokens(text: str) -> int:
"""Calculates tokens with LRU caching for exact string matches."""
if not text:
return 0
if TIKTOKEN_ENCODING:
try:
# tiktoken logic is relatively fast, but caching it based on exact string match
# turns O(N) encoding time to O(1) dictionary lookup for historical messages.
return len(TIKTOKEN_ENCODING.encode(text))
except Exception as e:
logger.warning(
f"[Token Count] tiktoken error: {e}, falling back to character estimation"
)
pass
# Fallback strategy: Rough estimation (1 token ≈ 4 chars)
return len(text) // 4
class Filter:
def __init__(self):
self.valves = self.Valves()
@@ -409,8 +554,105 @@ class Filter:
sessionmaker(bind=self._db_engine) if self._db_engine else None
)
self._model_thresholds_cache: Optional[Dict[str, Any]] = None
# Fallback mapping for variants not in TRANSLATIONS keys
self.fallback_map = {
"es-AR": "es-ES",
"es-MX": "es-ES",
"fr-CA": "fr-FR",
"en-CA": "en-US",
"en-GB": "en-US",
"en-AU": "en-US",
"de-AT": "de-DE",
}
self._init_database()
def _resolve_language(self, lang: str) -> str:
"""Resolve the best matching language code from the TRANSLATIONS dict."""
target_lang = lang
# 1. Direct match
if target_lang in TRANSLATIONS:
return target_lang
# 2. Variant fallback (explicit mapping)
if target_lang in self.fallback_map:
target_lang = self.fallback_map[target_lang]
if target_lang in TRANSLATIONS:
return target_lang
# 3. Base language fallback (e.g. fr-BE -> fr-FR)
if "-" in lang:
base_lang = lang.split("-")[0]
for supported_lang in TRANSLATIONS:
if supported_lang.startswith(base_lang + "-"):
return supported_lang
# 4. Final Fallback to en-US
return "en-US"
def _get_translation(self, lang: str, key: str, **kwargs) -> str:
"""Get translated string for the given language and key."""
target_lang = self._resolve_language(lang)
lang_dict = TRANSLATIONS.get(target_lang, TRANSLATIONS["en-US"])
text = lang_dict.get(key, TRANSLATIONS["en-US"].get(key, key))
if kwargs:
try:
text = text.format(**kwargs)
except Exception as e:
logger.warning(f"Translation formatting failed for {key}: {e}")
return text
async def _get_user_context(
self,
__user__: Optional[Dict[str, Any]],
__event_call__: Optional[Callable[[Any], Awaitable[None]]] = None,
) -> Dict[str, str]:
"""Extract basic user context with safe fallbacks."""
if isinstance(__user__, (list, tuple)):
user_data = __user__[0] if __user__ else {}
elif isinstance(__user__, dict):
user_data = __user__
else:
user_data = {}
user_id = user_data.get("id", "unknown_user")
user_name = user_data.get("name", "User")
user_language = user_data.get("language", "en-US")
if __event_call__:
try:
js_code = """
return (
document.documentElement.lang ||
localStorage.getItem('locale') ||
localStorage.getItem('language') ||
navigator.language ||
'en-US'
);
"""
frontend_lang = await asyncio.wait_for(
__event_call__({"type": "execute", "data": {"code": js_code}}),
timeout=1.0,
)
if frontend_lang and isinstance(frontend_lang, str):
user_language = frontend_lang
except asyncio.TimeoutError:
logger.warning(
"Failed to retrieve frontend language: Timeout (using fallback)"
)
except Exception as e:
logger.warning(
f"Failed to retrieve frontend language: {type(e).__name__}: {e}"
)
return {
"user_id": user_id,
"user_name": user_name,
"user_language": user_language,
}
def _parse_model_thresholds(self) -> Dict[str, Any]:
"""Parse model_thresholds string into a dictionary.
@@ -574,7 +816,7 @@ class Filter:
description="The temperature for summary generation.",
)
debug_mode: bool = Field(
default=True, description="Enable detailed logging for debugging."
default=False, description="Enable detailed logging for debugging."
)
show_debug_log: bool = Field(
default=False, description="Show debug logs in the frontend console"
@@ -582,6 +824,12 @@ class Filter:
show_token_usage_status: bool = Field(
default=True, description="Show token usage status notification"
)
token_usage_status_threshold: int = Field(
default=80,
ge=0,
le=100,
description="Only show token usage status when usage exceeds this percentage (0-100). Set to 0 to always show.",
)
enable_tool_output_trimming: bool = Field(
default=False,
description="Enable trimming of large tool outputs (only works with native function calling).",
@@ -654,20 +902,7 @@ class Filter:
def _count_tokens(self, text: str) -> int:
"""Counts the number of tokens in the text."""
if not text:
return 0
if TIKTOKEN_ENCODING:
try:
return len(TIKTOKEN_ENCODING.encode(text))
except Exception as e:
if self.valves.debug_mode:
logger.warning(
f"[Token Count] tiktoken error: {e}, falling back to character estimation"
)
# Fallback strategy: Rough estimation (1 token ≈ 4 chars)
return len(text) // 4
return _get_cached_tokens(text)
def _calculate_messages_tokens(self, messages: List[Dict]) -> int:
"""Calculates the total tokens for a list of messages."""
@@ -693,6 +928,20 @@ class Filter:
return total_tokens
def _estimate_messages_tokens(self, messages: List[Dict]) -> int:
"""Fast estimation of tokens based on character count (1/4 ratio)."""
total_chars = 0
for msg in messages:
content = msg.get("content", "")
if isinstance(content, list):
for part in content:
if isinstance(part, dict) and part.get("type") == "text":
total_chars += len(part.get("text", ""))
else:
total_chars += len(str(content))
return total_chars // 4
def _get_model_thresholds(self, model_id: str) -> Dict[str, int]:
"""Gets threshold configuration for a specific model.
@@ -830,12 +1079,14 @@ class Filter:
}})();
"""
await __event_call__(
asyncio.create_task(
__event_call__(
{
"type": "execute",
"data": {"code": js_code},
}
)
)
except Exception as e:
logger.error(f"Error emitting debug log: {e}")
@@ -876,17 +1127,55 @@ class Filter:
js_code = f"""
console.log("%c[Compression] {safe_message}", "{css}");
"""
# Add timeout to prevent blocking if frontend connection is broken
await asyncio.wait_for(
event_call({"type": "execute", "data": {"code": js_code}}),
timeout=2.0,
)
except asyncio.TimeoutError:
logger.warning(
f"Failed to emit log to frontend: Timeout (connection may be broken)"
asyncio.create_task(
event_call({"type": "execute", "data": {"code": js_code}})
)
except Exception as e:
logger.error(f"Failed to emit log to frontend: {type(e).__name__}: {e}")
logger.error(
f"Failed to process log to frontend: {type(e).__name__}: {e}"
)
def _should_show_status(self, usage_ratio: float) -> bool:
"""
Check if token usage status should be shown based on threshold.
Args:
usage_ratio: Current usage ratio (0.0 to 1.0)
Returns:
True if status should be shown, False otherwise
"""
if not self.valves.show_token_usage_status:
return False
# If threshold is 0, always show
if self.valves.token_usage_status_threshold == 0:
return True
# Check if usage exceeds threshold
threshold_ratio = self.valves.token_usage_status_threshold / 100.0
return usage_ratio >= threshold_ratio
def _should_skip_compression(
self, body: dict, __model__: Optional[dict] = None
) -> bool:
"""
Check if compression should be skipped.
Returns True if:
1. The base model includes 'copilot_sdk'
"""
# Check if base model includes copilot_sdk
if __model__:
base_model_id = __model__.get("base_model_id", "")
if "copilot_sdk" in base_model_id.lower():
return True
# Also check model in body
model_id = body.get("model", "")
if "copilot_sdk" in model_id.lower():
return True
return False
async def inlet(
self,
@@ -903,6 +1192,19 @@ class Filter:
Compression Strategy: Only responsible for injecting existing summaries, no Token calculation.
"""
# Check if compression should be skipped (e.g., for copilot_sdk)
if self._should_skip_compression(body, __model__):
if self.valves.debug_mode:
logger.info(
"[Inlet] Skipping compression: copilot_sdk detected in base model"
)
if self.valves.show_debug_log and __event_call__:
await self._log(
"[Inlet] ⏭️ Skipping compression: copilot_sdk detected",
event_call=__event_call__,
)
return body
messages = body.get("messages", [])
# --- Native Tool Output Trimming (Opt-in, only for native function calling) ---
@@ -966,8 +1268,14 @@ class Filter:
final_answer = content[last_match_end:].strip()
if final_answer:
msg["content"] = (
f"... [Tool outputs trimmed]\n{final_answer}"
msg["content"] = self._get_translation(
(
__user__.get("language", "en-US")
if __user__
else "en-US"
),
"tool_trimmed",
content=final_answer,
)
trimmed_count += 1
else:
@@ -980,8 +1288,14 @@ class Filter:
if len(parts) > 1:
final_answer = parts[-1].strip()
if final_answer:
msg["content"] = (
f"... [Tool outputs trimmed]\n{final_answer}"
msg["content"] = self._get_translation(
(
__user__.get("language", "en-US")
if __user__
else "en-US"
),
"tool_trimmed",
content=final_answer,
)
trimmed_count += 1
@@ -1173,6 +1487,10 @@ class Filter:
# Target is to compress up to the (total - keep_last) message
target_compressed_count = max(0, len(messages) - self.valves.keep_last)
# Get user context for i18n
user_ctx = await self._get_user_context(__user__, __event_call__)
lang = user_ctx["user_language"]
await self._log(
f"[Inlet] Recorded target compression progress: {target_compressed_count}",
event_call=__event_call__,
@@ -1207,10 +1525,9 @@ class Filter:
# 2. Summary message (Inserted as Assistant message)
summary_content = (
f"【Previous Summary: The following is a summary of the historical conversation, provided for context only. Do not reply to the summary content itself; answer the subsequent latest questions directly.】\n\n"
f"{summary_record.summary}\n\n"
f"---\n"
f"Below is the recent conversation:"
self._get_translation(lang, "summary_prompt_prefix")
+ f"{summary_record.summary}"
+ self._get_translation(lang, "summary_prompt_suffix")
)
summary_msg = {"role": "assistant", "content": summary_content}
@@ -1249,14 +1566,25 @@ class Filter:
"max_context_tokens", self.valves.max_context_tokens
)
# Calculate total tokens
# --- Fast Estimation Check ---
estimated_tokens = self._estimate_messages_tokens(calc_messages)
# Since this is a hard limit check, only skip precise calculation if we are far below it (margin of 15%)
if estimated_tokens < max_context_tokens * 0.85:
total_tokens = estimated_tokens
await self._log(
f"[Inlet] 🔎 Fast Preflight Check (Est): {total_tokens}t / {max_context_tokens}t (Well within limit)",
event_call=__event_call__,
)
else:
# Calculate exact total tokens via tiktoken
total_tokens = await asyncio.to_thread(
self._calculate_messages_tokens, calc_messages
)
# Preflight Check Log
await self._log(
f"[Inlet] 🔎 Preflight Check: {total_tokens}t / {max_context_tokens}t ({(total_tokens/max_context_tokens*100):.1f}%)",
f"[Inlet] 🔎 Precise Preflight Check: {total_tokens}t / {max_context_tokens}t ({(total_tokens/max_context_tokens*100):.1f}%)",
event_call=__event_call__,
)
@@ -1325,7 +1653,9 @@ class Filter:
first_line_found = True
# Add placeholder if there's more content coming
if idx < last_line_idx:
kept_lines.append("\n... [Content collapsed] ...\n")
kept_lines.append(
self._get_translation(lang, "content_collapsed")
)
continue
# Keep last non-empty line
@@ -1347,6 +1677,11 @@ class Filter:
target_msg["metadata"]["is_trimmed"] = True
# Calculate token reduction
# Use current token strategy
if total_tokens == estimated_tokens:
old_tokens = len(content) // 4
new_tokens = len(target_msg["content"]) // 4
else:
old_tokens = self._count_tokens(content)
new_tokens = self._count_tokens(target_msg["content"])
diff = old_tokens - new_tokens
@@ -1362,7 +1697,12 @@ class Filter:
# Strategy 2: Fallback - Drop Oldest Message Entirely (FIFO)
# (User requested to remove progressive trimming for other cases)
dropped = tail_messages.pop(0)
dropped_tokens = self._count_tokens(str(dropped.get("content", "")))
if total_tokens == estimated_tokens:
dropped_tokens = len(str(dropped.get("content", ""))) // 4
else:
dropped_tokens = self._count_tokens(
str(dropped.get("content", ""))
)
total_tokens -= dropped_tokens
if self.valves.show_debug_log and __event_call__:
@@ -1382,6 +1722,16 @@ class Filter:
final_messages = candidate_messages
# Calculate detailed token stats for logging
if total_tokens == estimated_tokens:
system_tokens = (
len(system_prompt_msg.get("content", "")) // 4
if system_prompt_msg
else 0
)
head_tokens = self._estimate_messages_tokens(head_messages)
summary_tokens = len(summary_content) // 4
tail_tokens = self._estimate_messages_tokens(tail_messages)
else:
system_tokens = (
self._count_tokens(system_prompt_msg.get("content", ""))
if system_prompt_msg
@@ -1408,11 +1758,17 @@ class Filter:
# Prepare status message (Context Usage format)
if max_context_tokens > 0:
usage_ratio = total_section_tokens / max_context_tokens
status_msg = f"Context Usage (Estimated): {total_section_tokens} / {max_context_tokens} Tokens ({usage_ratio*100:.1f}%)"
# Only show status if threshold is met
if self._should_show_status(usage_ratio):
status_msg = self._get_translation(
lang,
"status_context_usage",
tokens=total_section_tokens,
max_tokens=max_context_tokens,
ratio=f"{usage_ratio*100:.1f}",
)
if usage_ratio > 0.9:
status_msg += " | ⚠️ High Usage"
else:
status_msg = f"Loaded historical summary (Hidden {compressed_count} historical messages)"
status_msg += self._get_translation(lang, "status_high_usage")
if __event_emitter__:
await __event_emitter__(
@@ -1424,6 +1780,21 @@ class Filter:
},
}
)
else:
# For the case where max_context_tokens is 0, show summary info without threshold check
if self.valves.show_token_usage_status and __event_emitter__:
status_msg = self._get_translation(
lang, "status_loaded_summary", count=compressed_count
)
await __event_emitter__(
{
"type": "status",
"data": {
"description": status_msg,
"done": True,
},
}
)
# Emit debug log to frontend (Keep the structured log as well)
await self._emit_debug_log(
@@ -1454,6 +1825,17 @@ class Filter:
"max_context_tokens", self.valves.max_context_tokens
)
# --- Fast Estimation Check ---
estimated_tokens = self._estimate_messages_tokens(calc_messages)
# Only skip precise calculation if we are clearly below the limit
if estimated_tokens < max_context_tokens * 0.85:
total_tokens = estimated_tokens
await self._log(
f"[Inlet] 🔎 Fast limit check (Est): {total_tokens}t / {max_context_tokens}t",
event_call=__event_call__,
)
else:
total_tokens = await asyncio.to_thread(
self._calculate_messages_tokens, calc_messages
)
@@ -1476,7 +1858,12 @@ class Filter:
> start_trim_index + 1 # Keep at least 1 message after keep_first
):
dropped = final_messages.pop(start_trim_index)
dropped_tokens = self._count_tokens(str(dropped.get("content", "")))
if total_tokens == estimated_tokens:
dropped_tokens = len(str(dropped.get("content", ""))) // 4
else:
dropped_tokens = self._count_tokens(
str(dropped.get("content", ""))
)
total_tokens -= dropped_tokens
await self._log(
@@ -1485,14 +1872,21 @@ class Filter:
)
# Send status notification (Context Usage format)
if __event_emitter__:
status_msg = f"Context Usage (Estimated): {total_tokens} / {max_context_tokens} Tokens"
if max_context_tokens > 0:
usage_ratio = total_tokens / max_context_tokens
status_msg += f" ({usage_ratio*100:.1f}%)"
# Only show status if threshold is met
if self._should_show_status(usage_ratio):
status_msg = self._get_translation(
lang,
"status_context_usage",
tokens=total_tokens,
max_tokens=max_context_tokens,
ratio=f"{usage_ratio*100:.1f}",
)
if usage_ratio > 0.9:
status_msg += " | ⚠️ High Usage"
status_msg += self._get_translation(lang, "status_high_usage")
if __event_emitter__:
await __event_emitter__(
{
"type": "status",
@@ -1517,6 +1911,7 @@ class Filter:
body: dict,
__user__: Optional[dict] = None,
__metadata__: dict = None,
__model__: dict = None,
__event_emitter__: Callable[[Any], Awaitable[None]] = None,
__event_call__: Callable[[Any], Awaitable[None]] = None,
) -> dict:
@@ -1524,6 +1919,23 @@ class Filter:
Executed after the LLM response is complete.
Calculates Token count in the background and triggers summary generation (does not block current response, does not affect content output).
"""
# Check if compression should be skipped (e.g., for copilot_sdk)
if self._should_skip_compression(body, __model__):
if self.valves.debug_mode:
logger.info(
"[Outlet] Skipping compression: copilot_sdk detected in base model"
)
if self.valves.show_debug_log and __event_call__:
await self._log(
"[Outlet] ⏭️ Skipping compression: copilot_sdk detected",
event_call=__event_call__,
)
return body
# Get user context for i18n
user_ctx = await self._get_user_context(__user__, __event_call__)
lang = user_ctx["user_language"]
chat_ctx = self._get_chat_context(body, __metadata__)
chat_id = chat_ctx["chat_id"]
if not chat_id:
@@ -1547,6 +1959,7 @@ class Filter:
body,
__user__,
target_compressed_count,
lang,
__event_emitter__,
__event_call__,
)
@@ -1561,6 +1974,7 @@ class Filter:
body: dict,
user_data: Optional[dict],
target_compressed_count: Optional[int],
lang: str = "en-US",
__event_emitter__: Callable[[Any], Awaitable[None]] = None,
__event_call__: Callable[[Any], Awaitable[None]] = None,
):
@@ -1595,27 +2009,48 @@ class Filter:
event_call=__event_call__,
)
# Calculate Token count in a background thread
# --- Fast Estimation Check ---
estimated_tokens = self._estimate_messages_tokens(messages)
# For triggering summary generation, we need to be more precise if we are in the grey zone
# Margin is 15% (skip tiktoken if estimated is < 85% of threshold)
# Note: We still use tiktoken if we exceed threshold, because we want an accurate usage status report
if estimated_tokens < compression_threshold_tokens * 0.85:
current_tokens = estimated_tokens
await self._log(
f"[🔍 Background Calculation] Fast estimate ({current_tokens}) is well below threshold ({compression_threshold_tokens}). Skipping tiktoken.",
event_call=__event_call__,
)
else:
# Calculate Token count precisely in a background thread
current_tokens = await asyncio.to_thread(
self._calculate_messages_tokens, messages
)
await self._log(
f"[🔍 Background Calculation] Token count: {current_tokens}",
f"[🔍 Background Calculation] Precise token count: {current_tokens}",
event_call=__event_call__,
)
# Send status notification (Context Usage format)
if __event_emitter__ and self.valves.show_token_usage_status:
if __event_emitter__:
max_context_tokens = thresholds.get(
"max_context_tokens", self.valves.max_context_tokens
)
status_msg = f"Context Usage (Estimated): {current_tokens} / {max_context_tokens} Tokens"
if max_context_tokens > 0:
usage_ratio = current_tokens / max_context_tokens
status_msg += f" ({usage_ratio*100:.1f}%)"
# Only show status if threshold is met
if self._should_show_status(usage_ratio):
status_msg = self._get_translation(
lang,
"status_context_usage",
tokens=current_tokens,
max_tokens=max_context_tokens,
ratio=f"{usage_ratio*100:.1f}",
)
if usage_ratio > 0.9:
status_msg += " | ⚠️ High Usage"
status_msg += self._get_translation(
lang, "status_high_usage"
)
await __event_emitter__(
{
@@ -1642,6 +2077,7 @@ class Filter:
body,
user_data,
target_compressed_count,
lang,
__event_emitter__,
__event_call__,
)
@@ -1672,6 +2108,7 @@ class Filter:
body: dict,
user_data: Optional[dict],
target_compressed_count: Optional[int],
lang: str = "en-US",
__event_emitter__: Callable[[Any], Awaitable[None]] = None,
__event_call__: Callable[[Any], Awaitable[None]] = None,
):
@@ -1811,7 +2248,9 @@ class Filter:
{
"type": "status",
"data": {
"description": "Generating context summary in background...",
"description": self._get_translation(
lang, "status_generating_summary"
),
"done": False,
},
}
@@ -1849,7 +2288,11 @@ class Filter:
{
"type": "status",
"data": {
"description": f"Context summary updated (Compressed {len(middle_messages)} messages)",
"description": self._get_translation(
lang,
"status_loaded_summary",
count=len(middle_messages),
),
"done": True,
},
}
@@ -1910,10 +2353,9 @@ class Filter:
# Summary
summary_content = (
f"【System Prompt: The following is a summary of the historical conversation, provided for context only. Do not reply to the summary content itself; answer the subsequent latest questions directly.】\n\n"
f"{new_summary}\n\n"
f"---\n"
f"Below is the recent conversation:"
self._get_translation(lang, "summary_prompt_prefix")
+ f"{new_summary}"
+ self._get_translation(lang, "summary_prompt_suffix")
)
summary_msg = {"role": "assistant", "content": summary_content}
@@ -1943,13 +2385,22 @@ class Filter:
max_context_tokens = thresholds.get(
"max_context_tokens", self.valves.max_context_tokens
)
# 6. Emit Status
status_msg = f"Context Summary Updated: {token_count} / {max_context_tokens} Tokens"
# 6. Emit Status (only if threshold is met)
if max_context_tokens > 0:
ratio = (token_count / max_context_tokens) * 100
status_msg += f" ({ratio:.1f}%)"
if ratio > 90.0:
status_msg += " | ⚠️ High Usage"
usage_ratio = token_count / max_context_tokens
# Only show status if threshold is met
if self._should_show_status(usage_ratio):
status_msg = self._get_translation(
lang,
"status_context_summary_updated",
tokens=token_count,
max_tokens=max_context_tokens,
ratio=f"{usage_ratio*100:.1f}",
)
if usage_ratio > 0.9:
status_msg += self._get_translation(
lang, "status_high_usage"
)
await __event_emitter__(
{
@@ -1979,7 +2430,9 @@ class Filter:
{
"type": "status",
"data": {
"description": f"Summary Error: {str(e)[:100]}...",
"description": self._get_translation(
lang, "status_summary_error", error=str(e)[:100]
),
"done": True,
},
}