Compare commits
1 Commits
v2026.02.2
...
copilot/op
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
00e2593801 |
@@ -12,11 +12,11 @@ Reference: `.github/copilot-instructions.md`
|
|||||||
|
|
||||||
### Bilingual Requirement
|
### Bilingual Requirement
|
||||||
|
|
||||||
Every plugin **MUST** have a single internationalized code file and bilingual documentation:
|
Every plugin **MUST** have bilingual versions for both code and documentation:
|
||||||
|
|
||||||
- **Code (i18n)**:
|
- **Code**:
|
||||||
- `plugins/{type}/{name}/{name}.py`
|
- English: `plugins/{type}/{name}/{name}.py`
|
||||||
- The single `.py` file must implement internal i18n (e.g., using `navigator.language` or backend headers) to support multiple languages natively, rather than splitting into separate files.
|
- Chinese: `plugins/{type}/{name}/{name_cn}.py` (or `中文名.py`)
|
||||||
- **README**:
|
- **README**:
|
||||||
- English: `plugins/{type}/{name}/README.md`
|
- English: `plugins/{type}/{name}/README.md`
|
||||||
- Chinese: `plugins/{type}/{name}/README_CN.md`
|
- Chinese: `plugins/{type}/{name}/README_CN.md`
|
||||||
@@ -81,13 +81,14 @@ Reference: `.github/workflows/release.yml`
|
|||||||
- **Release Information Compliance**: When a release is requested, the agent must generate a standard release summary (English commit title + bilingual bullet points) as defined in Section 3 & 5.
|
- **Release Information Compliance**: When a release is requested, the agent must generate a standard release summary (English commit title + bilingual bullet points) as defined in Section 3 & 5.
|
||||||
- **Default Action (Prepare Only)**: When performing a version bump or update, the agent should update all files locally but **STOP** before committing. Present the changes and the **proposed Release/Commit Message** to the user and wait for explicit confirmation to commit/push.
|
- **Default Action (Prepare Only)**: When performing a version bump or update, the agent should update all files locally but **STOP** before committing. Present the changes and the **proposed Release/Commit Message** to the user and wait for explicit confirmation to commit/push.
|
||||||
- **Consistency**: When bumping, update version in **ALL** locations:
|
- **Consistency**: When bumping, update version in **ALL** locations:
|
||||||
1. Code (`.py`)
|
1. English Code (`.py`)
|
||||||
2. English README (`README.md`)
|
2. Chinese Code (`.py`)
|
||||||
3. Chinese README (`README_CN.md`)
|
3. English README (`README.md`)
|
||||||
4. Docs Index (`docs/.../index.md`)
|
4. Chinese README (`README_CN.md`)
|
||||||
5. Docs Index CN (`docs/.../index.zh.md`)
|
5. Docs Index (`docs/.../index.md`)
|
||||||
6. Docs Detail (`docs/.../{name}.md`)
|
6. Docs Index CN (`docs/.../index.zh.md`)
|
||||||
7. Docs Detail CN (`docs/.../{name}.zh.md`)
|
7. Docs Detail (`docs/.../{name}.md`)
|
||||||
|
8. Docs Detail CN (`docs/.../{name}.zh.md`)
|
||||||
|
|
||||||
### Automated Release Process
|
### Automated Release Process
|
||||||
|
|
||||||
@@ -119,7 +120,7 @@ When the user confirms a release, the agent **MUST** follow these content standa
|
|||||||
- Before committing, present a "Release Draft" containing:
|
- Before committing, present a "Release Draft" containing:
|
||||||
- **Title**: e.g., `Release v0.1.1: [Plugin Name] - [Brief Summary]`
|
- **Title**: e.g., `Release v0.1.1: [Plugin Name] - [Brief Summary]`
|
||||||
- **Changelog**: English-only list of commits since the last release, including hashes (e.g., `896de02 docs(config): reorder antigravity model alias example`).
|
- **Changelog**: English-only list of commits since the last release, including hashes (e.g., `896de02 docs(config): reorder antigravity model alias example`).
|
||||||
- **Verification Status**: Confirm all 7+ files have been updated and synced.
|
- **Verification Status**: Confirm all 8+ files have been updated and synced.
|
||||||
3. **Internal Documentation**: Ensure "What's New" sections in READMEs and `docs/` match exactly the changes being released.
|
3. **Internal Documentation**: Ensure "What's New" sections in READMEs and `docs/` match exactly the changes being released.
|
||||||
|
|
||||||
### Pull Request Check
|
### Pull Request Check
|
||||||
@@ -133,7 +134,7 @@ When the user confirms a release, the agent **MUST** follow these content standa
|
|||||||
|
|
||||||
Before committing:
|
Before committing:
|
||||||
|
|
||||||
- [ ] Code is internal i18n supported (`.py`) and fully functional?
|
- [ ] Code is bilingual and functional?
|
||||||
- [ ] Docstrings have updated version?
|
- [ ] Docstrings have updated version?
|
||||||
- [ ] READMEs are updated and bilingual?
|
- [ ] READMEs are updated and bilingual?
|
||||||
- [ ] **Key Capabilities** in READMEs still cover all legacy core features + new features?
|
- [ ] **Key Capabilities** in READMEs still cover all legacy core features + new features?
|
||||||
|
|||||||
38
.github/copilot-instructions.md
vendored
38
.github/copilot-instructions.md
vendored
@@ -8,26 +8,27 @@ This document defines the standard conventions and best practices for OpenWebUI
|
|||||||
|
|
||||||
## 🏗️ 项目结构与命名 (Project Structure & Naming)
|
## 🏗️ 项目结构与命名 (Project Structure & Naming)
|
||||||
|
|
||||||
### 1. 语言与代码规范 (Language & Code Requirements)
|
### 1. 双语版本要求 (Bilingual Version Requirements)
|
||||||
|
|
||||||
#### 插件代码 (Plugin Code)
|
#### 插件代码 (Plugin Code)
|
||||||
|
|
||||||
每个插件**必须**采用单文件国际化 (i18n) 设计。严禁为不同语言创建独立的源代码文件(如 `_cn.py`)。
|
每个插件必须提供两个版本:
|
||||||
|
|
||||||
1. **单代码文件**: `plugins/{type}/{name}/{name}.py`
|
1. **英文版本**: `plugin_name.py` - 英文界面、提示词和注释
|
||||||
2. **内置 i18n**: 必须在代码中根据前端传来的用户语言(如 `__user__` 中的 `language` 或通过 `get_user_language` 脚本读取)动态切换界面显示、提示词和状态日志。
|
2. **中文版本**: `plugin_name_cn.py` - 中文界面、提示词和注释
|
||||||
|
|
||||||
示例目录结构:
|
示例:
|
||||||
```
|
```
|
||||||
plugins/actions/export_to_docx/
|
plugins/actions/export_to_docx/
|
||||||
├── export_to_word.py # 单个代码文件,内置多语言支持
|
├── export_to_word.py # English version
|
||||||
├── README.md # 英文文档 (English documentation)
|
├── export_to_word_cn.py # Chinese version
|
||||||
└── README_CN.md # 中文文档
|
├── README.md # English documentation
|
||||||
|
└── README_CN.md # Chinese documentation
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 文档 (Documentation)
|
#### 文档 (Documentation)
|
||||||
|
|
||||||
尽管代码是合一的,但为了市场展示和 SEO,每个插件目录仍**必须**包含双语 README 文件:
|
每个插件目录必须包含双语 README 文件:
|
||||||
|
|
||||||
- `README.md` - English documentation
|
- `README.md` - English documentation
|
||||||
- `README_CN.md` - 中文文档
|
- `README_CN.md` - 中文文档
|
||||||
@@ -57,10 +58,12 @@ plugins/actions/export_to_docx/
|
|||||||
plugins/
|
plugins/
|
||||||
├── actions/ # Action 插件 (用户触发的功能)
|
├── actions/ # Action 插件 (用户触发的功能)
|
||||||
│ ├── my_action/
|
│ ├── my_action/
|
||||||
│ │ ├── my_action.py # 单文件,内置 i18n
|
│ │ ├── my_action.py # English version
|
||||||
|
│ │ ├── 我的动作.py # Chinese version
|
||||||
│ │ ├── README.md # English documentation
|
│ │ ├── README.md # English documentation
|
||||||
│ │ └── README_CN.md # Chinese documentation
|
│ │ └── README_CN.md # Chinese documentation
|
||||||
│ ├── ACTION_PLUGIN_TEMPLATE.py # 通用 i18n 模板
|
│ ├── ACTION_PLUGIN_TEMPLATE.py # English template
|
||||||
|
│ ├── ACTION_PLUGIN_TEMPLATE_CN.py # Chinese template
|
||||||
│ └── README.md
|
│ └── README.md
|
||||||
├── filters/ # Filter 插件 (输入处理)
|
├── filters/ # Filter 插件 (输入处理)
|
||||||
│ └── ...
|
│ └── ...
|
||||||
@@ -471,7 +474,7 @@ async def get_user_language(self):
|
|||||||
|
|
||||||
#### 适用场景与引导 (Usage Guidelines)
|
#### 适用场景与引导 (Usage Guidelines)
|
||||||
|
|
||||||
- **语言适配**: 动态获取界面语言 (`ru-RU`, `zh-CN`) 自动切换输出语言和 UI 翻译。这对于单文件 i18n 插件至关重要。
|
- **语言适配**: 动态获取界面语言 (`ru-RU`, `zh-CN`) 自动切换输出语言。
|
||||||
- **时区处理**: 获取 `Intl.DateTimeFormat().resolvedOptions().timeZone` 处理时间。
|
- **时区处理**: 获取 `Intl.DateTimeFormat().resolvedOptions().timeZone` 处理时间。
|
||||||
- **客户端存储**: 读取 `localStorage` 中的用户偏好设置。
|
- **客户端存储**: 读取 `localStorage` 中的用户偏好设置。
|
||||||
- **硬件能力**: 获取 `navigator.clipboard` 或 `navigator.geolocation` (需授权)。
|
- **硬件能力**: 获取 `navigator.clipboard` 或 `navigator.geolocation` (需授权)。
|
||||||
@@ -929,7 +932,8 @@ Filter 实例是**单例 (Singleton)**。
|
|||||||
|
|
||||||
### 1. ✅ 开发检查清单 (Development Checklist)
|
### 1. ✅ 开发检查清单 (Development Checklist)
|
||||||
|
|
||||||
- [ ] 代码实现了内置 i18n 逻辑 (`.py`)
|
- [ ] 创建英文版插件代码 (`plugin_name.py`)
|
||||||
|
- [ ] 创建中文版插件代码 (`plugin_name_cn.py`)
|
||||||
- [ ] 编写英文 README (`README.md`)
|
- [ ] 编写英文 README (`README.md`)
|
||||||
- [ ] 编写中文 README (`README_CN.md`)
|
- [ ] 编写中文 README (`README_CN.md`)
|
||||||
- [ ] 包含标准化文档字符串
|
- [ ] 包含标准化文档字符串
|
||||||
@@ -937,7 +941,7 @@ Filter 实例是**单例 (Singleton)**。
|
|||||||
- [ ] 使用 Lucide 图标
|
- [ ] 使用 Lucide 图标
|
||||||
- [ ] 实现 Valves 配置
|
- [ ] 实现 Valves 配置
|
||||||
- [ ] 使用 logging 而非 print
|
- [ ] 使用 logging 而非 print
|
||||||
- [ ] 测试 i18n 界面适配
|
- [ ] 测试双语界面
|
||||||
- [ ] **一致性检查**: 确保文档、代码、README 同步
|
- [ ] **一致性检查**: 确保文档、代码、README 同步
|
||||||
- [ ] **README 结构**:
|
- [ ] **README 结构**:
|
||||||
- **Key Capabilities** (英文) / **核心功能** (中文): 必须包含所有核心功能
|
- **Key Capabilities** (英文) / **核心功能** (中文): 必须包含所有核心功能
|
||||||
@@ -984,14 +988,13 @@ Filter 实例是**单例 (Singleton)**。
|
|||||||
2. **变更列表 (Bilingual Changes)**:
|
2. **变更列表 (Bilingual Changes)**:
|
||||||
- 英文: Clear descriptions of technical/functional changes.
|
- 英文: Clear descriptions of technical/functional changes.
|
||||||
- 中文: 清晰描述用户可见的功能改进或修复。
|
- 中文: 清晰描述用户可见的功能改进或修复。
|
||||||
3. **核查状态 (Verification)**: 确认版本号已在相关 7+ 处位置同步更新(1 个代码文件 + 2 个 README + 4 个 Docs 文件)。
|
3. **核查状态 (Verification)**: 确认版本号已在相关 8+ 处位置同步更新。
|
||||||
|
|
||||||
### 4. 🤖 Git 提交与推送规范 (Git Operations & Push Rules)
|
### 4. 🤖 Git 提交与推送规范 (Git Operations & Push Rules)
|
||||||
|
|
||||||
- **核心原则**: 默认仅进行**本地文件准备**(更新代码、READMEs、Docs、版本号),**严禁**在未获用户明确许可的情况下自动执行 `git commit` 或 `git push`。
|
- **核心原则**: 默认仅进行**本地文件准备**(更新代码、READMEs、Docs、版本号),**严禁**在未获用户明确许可的情况下自动执行 `git commit` 或 `git push`。
|
||||||
- **允许 (需确认)**: 只有在用户明确表示“发布”、“Commit it”、“Release”或“提交”后,才允许直接推送到 `main` 分支或创建 PR。
|
- **允许 (需确认)**: 只有在用户明确表示“发布”、“Commit it”、“Release”或“提交”后,才允许直接推送到 `main` 分支或创建 PR。
|
||||||
- **功能分支**: 推荐在进行大规模重构或实验性功能开发时,创建功能分支 (`feature/xxx`) 进行隔离。
|
- **功能分支**: 推荐在进行大规模重构或实验性功能开发时,创建功能分支 (`feature/xxx`) 进行隔离。
|
||||||
- **PR 提交**: 必须使用 GitHub CLI (`gh`) 创建 Pull Request。示例:`gh pr create --title "feat: ..." --body "..."`。
|
|
||||||
|
|
||||||
### 5. 🤝 贡献者认可规范 (Contributor Recognition)
|
### 5. 🤝 贡献者认可规范 (Contributor Recognition)
|
||||||
|
|
||||||
@@ -1001,7 +1004,8 @@ Filter 实例是**单例 (Singleton)**。
|
|||||||
|
|
||||||
## 📚 参考资源 (Reference Resources)
|
## 📚 参考资源 (Reference Resources)
|
||||||
|
|
||||||
- [Action 插件模板](plugins/actions/ACTION_PLUGIN_TEMPLATE.py)
|
- [Action 插件模板 (英文)](plugins/actions/ACTION_PLUGIN_TEMPLATE.py)
|
||||||
|
- [Action 插件模板 (中文)](plugins/actions/ACTION_PLUGIN_TEMPLATE_CN.py)
|
||||||
- [插件开发指南](plugins/actions/PLUGIN_DEVELOPMENT_GUIDE.md)
|
- [插件开发指南](plugins/actions/PLUGIN_DEVELOPMENT_GUIDE.md)
|
||||||
- [Lucide Icons](https://lucide.dev/icons/)
|
- [Lucide Icons](https://lucide.dev/icons/)
|
||||||
- [OpenWebUI 文档](https://docs.openwebui.com/)
|
- [OpenWebUI 文档](https://docs.openwebui.com/)
|
||||||
|
|||||||
1
.gitignore
vendored
1
.gitignore
vendored
@@ -139,4 +139,3 @@ logs/
|
|||||||
|
|
||||||
# OpenWebUI specific
|
# OpenWebUI specific
|
||||||
# Add any specific ignores for OpenWebUI plugins if needed
|
# Add any specific ignores for OpenWebUI plugins if needed
|
||||||
.git-worktrees/
|
|
||||||
|
|||||||
@@ -23,7 +23,7 @@ Actions are interactive plugins that:
|
|||||||
|
|
||||||
Intelligently analyzes text content and generates interactive mind maps with beautiful visualizations.
|
Intelligently analyzes text content and generates interactive mind maps with beautiful visualizations.
|
||||||
|
|
||||||
**Version:** 1.0.0
|
**Version:** 0.9.2
|
||||||
|
|
||||||
[:octicons-arrow-right-24: Documentation](smart-mind-map.md)
|
[:octicons-arrow-right-24: Documentation](smart-mind-map.md)
|
||||||
|
|
||||||
|
|||||||
@@ -23,7 +23,7 @@ Actions 是交互式插件,能够:
|
|||||||
|
|
||||||
智能分析文本并生成交互式、精美的思维导图。
|
智能分析文本并生成交互式、精美的思维导图。
|
||||||
|
|
||||||
**版本:** 1.0.0
|
**版本:** 0.8.0
|
||||||
|
|
||||||
[:octicons-arrow-right-24: 查看文档](smart-mind-map.md)
|
[:octicons-arrow-right-24: 查看文档](smart-mind-map.md)
|
||||||
|
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
# Smart Mind Map
|
# Smart Mind Map
|
||||||
|
|
||||||
<span class="category-badge action">Action</span>
|
<span class="category-badge action">Action</span>
|
||||||
<span class="version-badge">v1.0.0</span>
|
<span class="version-badge">v0.9.2</span>
|
||||||
|
|
||||||
Intelligently analyzes text content and generates interactive mind maps for better visualization and understanding.
|
Intelligently analyzes text content and generates interactive mind maps for better visualization and understanding.
|
||||||
|
|
||||||
@@ -17,8 +17,7 @@ The Smart Mind Map plugin transforms text content into beautiful, interactive mi
|
|||||||
- :material-gesture-swipe: **Rich Controls**: Zoom, reset view, expand level selector (All/2/3) and fullscreen
|
- :material-gesture-swipe: **Rich Controls**: Zoom, reset view, expand level selector (All/2/3) and fullscreen
|
||||||
- :material-palette: **Theme Aware**: Auto-detects OpenWebUI light/dark theme with manual toggle
|
- :material-palette: **Theme Aware**: Auto-detects OpenWebUI light/dark theme with manual toggle
|
||||||
- :material-download: **One-Click Export**: Download high-res PNG, copy SVG, or copy Markdown source
|
- :material-download: **One-Click Export**: Download high-res PNG, copy SVG, or copy Markdown source
|
||||||
- :material-translate: **i18n Embedded**: One code file smartly detects frontend languages and translates the output.
|
- :material-translate: **Multi-language**: Matches output language to the input text
|
||||||
- :material-arrow-all: **Auto-Sizing & Direct Embed**: Seamlessly scales to display massive canvas inline (requires setting toggle).
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -51,7 +50,6 @@ The Smart Mind Map plugin transforms text content into beautiful, interactive mi
|
|||||||
| `MIN_TEXT_LENGTH` | integer | `100` | Minimum characters required before analysis runs |
|
| `MIN_TEXT_LENGTH` | integer | `100` | Minimum characters required before analysis runs |
|
||||||
| `CLEAR_PREVIOUS_HTML` | boolean | `false` | Clear previous plugin HTML instead of merging |
|
| `CLEAR_PREVIOUS_HTML` | boolean | `false` | Clear previous plugin HTML instead of merging |
|
||||||
| `MESSAGE_COUNT` | integer | `1` | Number of recent messages to include (1–5) |
|
| `MESSAGE_COUNT` | integer | `1` | Number of recent messages to include (1–5) |
|
||||||
| `ENABLE_DIRECT_EMBED_MODE` | boolean | `false` | Enable inline full-width UI for OpenWebUI 0.8.0+ |
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
# Smart Mind Map(智能思维导图)
|
# Smart Mind Map(智能思维导图)
|
||||||
|
|
||||||
<span class="category-badge action">Action</span>
|
<span class="category-badge action">Action</span>
|
||||||
<span class="version-badge">v1.0.0</span>
|
<span class="version-badge">v0.9.2</span>
|
||||||
|
|
||||||
智能分析文本内容,生成交互式思维导图,帮助你更直观地理解信息结构。
|
智能分析文本内容,生成交互式思维导图,帮助你更直观地理解信息结构。
|
||||||
|
|
||||||
@@ -17,8 +17,7 @@ Smart Mind Map 会将文本转换成漂亮的交互式思维导图。插件会
|
|||||||
- :material-gesture-swipe: **丰富控制**:缩放/重置、展开层级(全部/2/3 级)与全屏
|
- :material-gesture-swipe: **丰富控制**:缩放/重置、展开层级(全部/2/3 级)与全屏
|
||||||
- :material-palette: **主题感知**:自动检测 OpenWebUI 亮/暗色主题并支持手动切换
|
- :material-palette: **主题感知**:自动检测 OpenWebUI 亮/暗色主题并支持手动切换
|
||||||
- :material-download: **一键导出**:下载高分辨率 PNG、复制 SVG 或 Markdown
|
- :material-download: **一键导出**:下载高分辨率 PNG、复制 SVG 或 Markdown
|
||||||
- :material-translate: **内置 i18n 语言识别**:单个文件自动检测控制台前端语言,无需繁杂的各种语言包版本。
|
- :material-translate: **多语言**:输出语言与输入文本一致
|
||||||
- :material-arrow-all: **直出全屏版体验 (需配置开启)**:新版直出渲染抛开沙盒限制,纵情铺满屏幕,享受原生的图表体验。
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -51,7 +50,6 @@ Smart Mind Map 会将文本转换成漂亮的交互式思维导图。插件会
|
|||||||
| `MIN_TEXT_LENGTH` | integer | `100` | 开始分析所需的最少字符数 |
|
| `MIN_TEXT_LENGTH` | integer | `100` | 开始分析所需的最少字符数 |
|
||||||
| `CLEAR_PREVIOUS_HTML` | boolean | `false` | 生成新导图时是否清除之前的插件 HTML |
|
| `CLEAR_PREVIOUS_HTML` | boolean | `false` | 生成新导图时是否清除之前的插件 HTML |
|
||||||
| `MESSAGE_COUNT` | integer | `1` | 用于生成的最近消息数量(1–5) |
|
| `MESSAGE_COUNT` | integer | `1` | 用于生成的最近消息数量(1–5) |
|
||||||
| `ENABLE_DIRECT_EMBED_MODE` | boolean | `false` | 是否开启沉浸式直出模式 (需要 Open WebUI 0.8.0+ ) |
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -1,81 +1,137 @@
|
|||||||
# Async Context Compression Filter
|
# Async Context Compression
|
||||||
|
|
||||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.3.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
|
<span class="category-badge filter">Filter</span>
|
||||||
|
<span class="version-badge">v1.2.2</span>
|
||||||
|
|
||||||
This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.
|
Reduces token consumption in long conversations through intelligent summarization while maintaining conversational coherence.
|
||||||
|
|
||||||
## What's new in 1.3.0
|
|
||||||
|
|
||||||
- **Internationalization (i18n)**: Complete localization of user-facing messages across 9 languages (English, Chinese, Japanese, Korean, French, German, Spanish, Italian).
|
|
||||||
- **Smart Status Display**: Added `token_usage_status_threshold` valve (default 80%) to intelligently control when token usage status is shown.
|
|
||||||
- **Improved Performance**: Frontend language detection and logging are optimized to be completely non-blocking, maintaining lightning-fast TTFB.
|
|
||||||
- **Copilot SDK Integration**: Automatically detects and skips compression for copilot_sdk based models to prevent conflicts.
|
|
||||||
- **Configuration**: `debug_mode` is now set to `false` by default for a quieter production experience.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Core Features
|
## Overview
|
||||||
|
|
||||||
- ✅ **Full i18n Support**: Native localization across 9 languages.
|
The Async Context Compression filter helps manage token usage in long conversations by:
|
||||||
- ✅ Automatic compression triggered by token thresholds.
|
|
||||||
- ✅ Asynchronous summarization that does not block chat responses.
|
- Intelligently summarizing older messages
|
||||||
- ✅ Persistent storage via Open WebUI's shared database connection (PostgreSQL, SQLite, etc.).
|
- Preserving important context
|
||||||
- ✅ Flexible retention policy to keep the first and last N messages.
|
- Reducing API costs
|
||||||
- ✅ Smart injection of historical summaries back into the context.
|
- Maintaining conversation coherence
|
||||||
- ✅ Structure-aware trimming that preserves document structure (headers, intro, conclusion).
|
|
||||||
- ✅ Native tool output trimming for cleaner context when using function calling.
|
This is especially useful for:
|
||||||
- ✅ Real-time context usage monitoring with warning notifications (>90%).
|
|
||||||
- ✅ Detailed token logging for precise debugging and optimization.
|
- Long-running conversations
|
||||||
- ✅ **Smart Model Matching**: Automatically inherits configuration from base models for custom presets.
|
- Complex multi-turn discussions
|
||||||
- ⚠ **Multimodal Support**: Images are preserved but their tokens are **NOT** calculated. Please adjust thresholds accordingly.
|
- Cost optimization
|
||||||
|
- Token limit management
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- :material-arrow-collapse-vertical: **Smart Compression**: AI-powered context summarization
|
||||||
|
- :material-clock-fast: **Async Processing**: Non-blocking background compression
|
||||||
|
- :material-memory: **Context Preservation**: Keeps important information
|
||||||
|
- :material-currency-usd-off: **Cost Reduction**: Minimize token usage
|
||||||
|
- :material-console: **Frontend Debugging**: Debug logs in browser console
|
||||||
|
- :material-alert-circle-check: **Enhanced Error Reporting**: Clear error status notifications
|
||||||
|
- :material-check-all: **Open WebUI v0.7.x Compatibility**: Dynamic DB session handling
|
||||||
|
- :material-account-convert: **Improved Compatibility**: Summary role changed to `assistant`
|
||||||
|
- :material-shield-check: **Enhanced Stability**: Resolved race conditions in state management
|
||||||
|
- :material-ruler: **Preflight Context Check**: Validates context fit before sending
|
||||||
|
- :material-format-align-justify: **Structure-Aware Trimming**: Preserves document structure
|
||||||
|
- :material-content-cut: **Native Tool Output Trimming**: Trims verbose tool outputs (Note: Non-native tool outputs are not fully injected into context)
|
||||||
|
- :material-chart-bar: **Detailed Token Logging**: Granular token breakdown
|
||||||
|
- :material-account-search: **Smart Model Matching**: Inherit config from base models
|
||||||
|
- :material-image-off: **Multimodal Support**: Images are preserved but tokens are **NOT** calculated
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Installation & Configuration
|
## Installation
|
||||||
|
|
||||||
### 1) Database (automatic)
|
1. Download the plugin file: [`async_context_compression.py`](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression)
|
||||||
|
2. Upload to OpenWebUI: **Admin Panel** → **Settings** → **Functions**
|
||||||
- Uses Open WebUI's shared database connection; no extra configuration needed.
|
3. Configure compression settings
|
||||||
- The `chat_summary` table is created on first run.
|
4. Enable the filter
|
||||||
|
|
||||||
### 2) Filter order
|
|
||||||
|
|
||||||
- Recommended order: pre-filters (<10) → this filter (10) → post-filters (>10).
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Configuration Parameters
|
## How It Works
|
||||||
|
|
||||||
| Parameter | Default | Description |
|
```mermaid
|
||||||
| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
graph TD
|
||||||
| `priority` | `10` | Execution order; lower runs earlier. |
|
A[Incoming Messages] --> B{Token Count > Threshold?}
|
||||||
| `compression_threshold_tokens` | `64000` | Trigger asynchronous summary when total tokens exceed this value. Set to 50%-70% of your model's context window. |
|
B -->|No| C[Pass Through]
|
||||||
| `max_context_tokens` | `128000` | Hard cap for context; older messages (except protected ones) are dropped if exceeded. |
|
B -->|Yes| D[Summarize Older Messages]
|
||||||
| `keep_first` | `1` | Always keep the first N messages (protects system prompts). |
|
D --> E[Preserve Recent Messages]
|
||||||
| `keep_last` | `6` | Always keep the last N messages to preserve recent context. |
|
E --> F[Combine Summary + Recent]
|
||||||
| `summary_model` | `None` | Model for summaries. Strongly recommended to set a fast, economical model (e.g., `gemini-2.5-flash`, `deepseek-v3`). Falls back to the current chat model when empty. |
|
F --> G[Send to LLM]
|
||||||
| `summary_model_max_context` | `0` | Max context tokens for the summary model. If 0, falls back to `model_thresholds` or global `max_context_tokens`. |
|
```
|
||||||
| `max_summary_tokens` | `16384` | Maximum tokens for the generated summary. |
|
|
||||||
| `summary_temperature` | `0.3` | Randomness for summary generation. Lower is more deterministic. |
|
|
||||||
| `model_thresholds` | `{}` | Per-model overrides for `compression_threshold_tokens` and `max_context_tokens` (useful for mixed models). |
|
|
||||||
| `enable_tool_output_trimming` | `false` | When enabled and `function_calling: "native"` is active, trims verbose tool outputs to extract only the final answer. |
|
|
||||||
| `debug_mode` | `false` | Log verbose debug info. Set to `false` in production. |
|
|
||||||
| `show_debug_log` | `false` | Print debug logs to browser console (F12). Useful for frontend debugging. |
|
|
||||||
| `show_token_usage_status` | `true` | Show token usage status notification in the chat interface. |
|
|
||||||
| `token_usage_status_threshold` | `80` | The minimum usage percentage (0-100) required to show a context usage status notification. |
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## ⭐ Support
|
## Configuration
|
||||||
|
|
||||||
If this plugin has been useful, a star on [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) is a big motivation for me. Thank you for the support.
|
| Option | Type | Default | Description |
|
||||||
|
|--------|------|---------|-------------|
|
||||||
|
| `compression_threshold_tokens` | integer | `64000` | Trigger compression above this token count |
|
||||||
|
| `max_context_tokens` | integer | `128000` | Hard limit for context |
|
||||||
|
| `keep_first` | integer | `1` | Always keep the first N messages |
|
||||||
|
| `keep_last` | integer | `6` | Always keep the last N messages |
|
||||||
|
| `summary_model` | string | `None` | Model to use for summarization |
|
||||||
|
| `summary_model_max_context` | integer | `0` | Max context tokens for summary model |
|
||||||
|
| `max_summary_tokens` | integer | `16384` | Maximum tokens for the summary |
|
||||||
|
| `enable_tool_output_trimming` | boolean | `false` | Enable trimming of large tool outputs |
|
||||||
|
|
||||||
## Troubleshooting ❓
|
---
|
||||||
|
|
||||||
- **Initial system prompt is lost**: Keep `keep_first` greater than 0 to protect the initial message.
|
## Example
|
||||||
- **Compression effect is weak**: Raise `compression_threshold_tokens` or lower `keep_first` / `keep_last` to allow more aggressive compression.
|
|
||||||
- **Submit an Issue**: If you encounter any problems, please submit an issue on GitHub: [OpenWebUI Extensions Issues](https://github.com/Fu-Jie/openwebui-extensions/issues)
|
|
||||||
|
|
||||||
## Changelog
|
### Before Compression
|
||||||
|
|
||||||
See the full history on GitHub: [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions)
|
```
|
||||||
|
[Message 1] User: Tell me about Python...
|
||||||
|
[Message 2] AI: Python is a programming language...
|
||||||
|
[Message 3] User: What about its history?
|
||||||
|
[Message 4] AI: Python was created by Guido...
|
||||||
|
[Message 5] User: And its features?
|
||||||
|
[Message 6] AI: Python has many features...
|
||||||
|
... (many more messages)
|
||||||
|
[Message 20] User: Current question
|
||||||
|
```
|
||||||
|
|
||||||
|
### After Compression
|
||||||
|
|
||||||
|
```
|
||||||
|
[Summary] Previous conversation covered Python basics,
|
||||||
|
history, features, and common use cases...
|
||||||
|
|
||||||
|
[Message 18] User: Recent question about decorators
|
||||||
|
[Message 19] AI: Decorators in Python are...
|
||||||
|
[Message 20] User: Current question
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
!!! note "Prerequisites"
|
||||||
|
- OpenWebUI v0.3.0 or later
|
||||||
|
- Access to an LLM for summarization
|
||||||
|
|
||||||
|
!!! tip "Best Practices"
|
||||||
|
- Set appropriate token thresholds based on your model's context window
|
||||||
|
- Preserve more recent messages for technical discussions
|
||||||
|
- Test compression settings in non-critical conversations first
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
??? question "Compression not triggering?"
|
||||||
|
Check if the token count exceeds your configured threshold. Enable debug logging for more details.
|
||||||
|
|
||||||
|
??? question "Important context being lost?"
|
||||||
|
Increase the `preserve_recent` setting or lower the compression ratio.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Source Code
|
||||||
|
|
||||||
|
[:fontawesome-brands-github: View on GitHub](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression){ .md-button }
|
||||||
|
|||||||
@@ -1,119 +1,137 @@
|
|||||||
# 异步上下文压缩过滤器
|
# Async Context Compression(异步上下文压缩)
|
||||||
|
|
||||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.3.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
|
<span class="category-badge filter">Filter</span>
|
||||||
|
<span class="version-badge">v1.2.2</span>
|
||||||
|
|
||||||
> **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。
|
通过智能摘要减少长对话的 token 消耗,同时保持对话连贯。
|
||||||
|
|
||||||
本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。
|
|
||||||
|
|
||||||
## 1.3.0 版本更新
|
|
||||||
|
|
||||||
- **国际化 (i18n) 支持**: 完成了所有用户可见消息的本地化,现已原生支持 9 种语言(含中、英、日、韩及欧洲主要语言)。
|
|
||||||
- **智能状态显示**: 新增 `token_usage_status_threshold` 阀门(默认 80%),可以智能控制何时显示 Token 用量状态,减少不必要的打扰。
|
|
||||||
- **性能大幅优化**: 对前端语言检测和日志处理流程进行了非阻塞重构,完全不影响首字节响应时间(TTFB),保持毫秒级极速推流。
|
|
||||||
- **Copilot SDK 兼容**: 自动检测并跳过基于 `copilot_sdk` 模型的上下文压缩,避免冲突。
|
|
||||||
- **配置项调整**: 为了提供更安静的生产环境体验,`debug_mode` 现已默认设置为 `false`。
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 核心特性
|
## 概览
|
||||||
|
|
||||||
- ✅ **全方位国际化**: 原生支持 9 种界面语言。
|
Async Context Compression 过滤器通过以下方式帮助管理长对话的 token 使用:
|
||||||
- ✅ **自动压缩**: 基于 Token 阈值自动触发上下文压缩。
|
|
||||||
- ✅ **异步摘要**: 后台生成摘要,不阻塞当前对话响应。
|
|
||||||
- ✅ **持久化存储**: 复用 Open WebUI 共享数据库连接,自动支持 PostgreSQL/SQLite 等。
|
|
||||||
- ✅ **灵活保留策略**: 可配置保留对话头部和尾部消息,确保关键信息连贯。
|
|
||||||
- ✅ **智能注入**: 将历史摘要智能注入到新上下文中。
|
|
||||||
- ✅ **结构感知裁剪**: 智能折叠过长消息,保留文档骨架(标题、首尾)。
|
|
||||||
- ✅ **原生工具输出裁剪**: 支持裁剪冗长的工具调用输出。
|
|
||||||
- ✅ **实时监控**: 实时监控上下文使用情况,超过 90% 发出警告。
|
|
||||||
- ✅ **详细日志**: 提供精确的 Token 统计日志,便于调试。
|
|
||||||
- ✅ **智能模型匹配**: 自定义模型自动继承基础模型的阈值配置。
|
|
||||||
- ⚠ **多模态支持**: 图片内容会被保留,但其 Token **不参与计算**。请相应调整阈值。
|
|
||||||
|
|
||||||
详细的工作原理和流程请参考 [工作流程指南](WORKFLOW_GUIDE_CN.md)。
|
- 智能总结较早的消息
|
||||||
|
- 保留关键信息
|
||||||
|
- 降低 API 成本
|
||||||
|
- 保持对话一致性
|
||||||
|
|
||||||
|
特别适用于:
|
||||||
|
|
||||||
|
- 长时间会话
|
||||||
|
- 多轮复杂讨论
|
||||||
|
- 成本优化
|
||||||
|
- 上下文长度控制
|
||||||
|
|
||||||
|
## 功能特性
|
||||||
|
|
||||||
|
- :material-arrow-collapse-vertical: **智能压缩**:AI 驱动的上下文摘要
|
||||||
|
- :material-clock-fast: **异步处理**:后台非阻塞压缩
|
||||||
|
- :material-memory: **保留上下文**:尽量保留重要信息
|
||||||
|
- :material-currency-usd-off: **降低成本**:减少 token 使用
|
||||||
|
- :material-console: **前端调试**:支持浏览器控制台日志
|
||||||
|
- :material-alert-circle-check: **增强错误报告**:清晰的错误状态通知
|
||||||
|
- :material-check-all: **Open WebUI v0.7.x 兼容性**:动态数据库会话处理
|
||||||
|
- :material-account-convert: **兼容性提升**:摘要角色改为 `assistant`
|
||||||
|
- :material-shield-check: **稳定性增强**:解决状态管理竞态条件
|
||||||
|
- :material-ruler: **预检上下文检查**:发送前验证上下文是否超限
|
||||||
|
- :material-format-align-justify: **结构感知裁剪**:保留文档结构的智能裁剪
|
||||||
|
- :material-content-cut: **原生工具输出裁剪**:自动裁剪冗长的工具输出(注意:非原生工具调用输出不会完整注入上下文)
|
||||||
|
- :material-chart-bar: **详细 Token 日志**:提供细粒度的 Token 统计
|
||||||
|
- :material-account-search: **智能模型匹配**:自定义模型自动继承基础模型配置
|
||||||
|
- :material-image-off: **多模态支持**:图片内容保留但 Token **不参与计算**
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 安装与配置
|
## 安装
|
||||||
|
|
||||||
### 1. 数据库(自动)
|
1. 下载插件文件:[`async_context_compression.py`](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression)
|
||||||
|
2. 上传到 OpenWebUI:**Admin Panel** → **Settings** → **Functions**
|
||||||
- 自动使用 Open WebUI 的共享数据库连接,**无需额外配置**。
|
3. 配置压缩参数
|
||||||
- 首次运行自动创建 `chat_summary` 表。
|
4. 启用过滤器
|
||||||
|
|
||||||
### 2. 过滤器顺序
|
|
||||||
|
|
||||||
- 建议顺序:前置过滤器(<10)→ 本过滤器(10)→ 后置过滤器(>10)。
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 配置参数
|
## 工作原理
|
||||||
|
|
||||||
您可以在过滤器的设置中调整以下参数:
|
```mermaid
|
||||||
|
graph TD
|
||||||
### 核心参数
|
A[Incoming Messages] --> B{Token Count > Threshold?}
|
||||||
|
B -->|No| C[Pass Through]
|
||||||
| 参数 | 默认值 | 描述 |
|
B -->|Yes| D[Summarize Older Messages]
|
||||||
| :----------------------------- | :------- | :------------------------------------------------------------------------------------ |
|
D --> E[Preserve Recent Messages]
|
||||||
| `priority` | `10` | 过滤器执行顺序,数值越小越先执行。 |
|
E --> F[Combine Summary + Recent]
|
||||||
| `compression_threshold_tokens` | `64000` | **重要**: 当上下文总 Token 超过此值时后台生成摘要,建议设为模型上下文窗口的 50%-70%。 |
|
F --> G[Send to LLM]
|
||||||
| `max_context_tokens` | `128000` | **重要**: 上下文硬上限,超过即移除最早消息(保留受保护消息)。 |
|
|
||||||
| `keep_first` | `1` | 始终保留对话开始的 N 条消息,保护系统提示或环境变量。 |
|
|
||||||
| `keep_last` | `6` | 始终保留对话末尾的 N 条消息,确保最近上下文连贯。 |
|
|
||||||
|
|
||||||
### 摘要生成配置
|
|
||||||
|
|
||||||
| 参数 | 默认值 | 描述 |
|
|
||||||
| :-------------------- | :------ | :------------------------------------------------------------------------------------------------------------------------------------------ |
|
|
||||||
| `summary_model` | `None` | 用于生成摘要的模型 ID。**强烈建议**配置快速、经济、上下文窗口大的模型(如 `gemini-2.5-flash`、`deepseek-v3`)。留空则尝试复用当前对话模型。 |
|
|
||||||
| `summary_model_max_context` | `0` | 摘要模型的最大上下文 Token 数。如果为 0,则回退到 `model_thresholds` 或全局 `max_context_tokens`。 |
|
|
||||||
| `max_summary_tokens` | `16384` | 生成摘要时允许的最大 Token 数。 |
|
|
||||||
| `summary_temperature` | `0.1` | 控制摘要生成的随机性,较低的值结果更稳定。 |
|
|
||||||
|
|
||||||
### 高级配置
|
|
||||||
|
|
||||||
#### `model_thresholds` (模型特定阈值)
|
|
||||||
|
|
||||||
这是一个字典配置,可为特定模型 ID 覆盖全局 `compression_threshold_tokens` 与 `max_context_tokens`,适用于混合不同上下文窗口的模型。
|
|
||||||
|
|
||||||
**默认包含 GPT-4、Claude 3.5、Gemini 1.5/2.0、Qwen 2.5/3、DeepSeek V3 等推荐阈值。**
|
|
||||||
|
|
||||||
**配置示例:**
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"gpt-4": {
|
|
||||||
"compression_threshold_tokens": 8000,
|
|
||||||
"max_context_tokens": 32000
|
|
||||||
},
|
|
||||||
"gemini-2.5-flash": {
|
|
||||||
"compression_threshold_tokens": 734000,
|
|
||||||
"max_context_tokens": 1048576
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
| 参数 | 默认值 | 描述 |
|
---
|
||||||
| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------- |
|
|
||||||
| `enable_tool_output_trimming` | `false` | 启用时,若 `function_calling: "native"` 激活,将裁剪冗长的工具输出以仅提取最终答案。 |
|
## 配置项
|
||||||
| `debug_mode` | `false` | 是否在 Open WebUI 的控制台日志中打印详细的调试信息。生产环境默认且建议设为 `false`。 |
|
|
||||||
| `show_debug_log` | `false` | 是否在浏览器控制台 (F12) 打印调试日志。便于前端调试。 |
|
| 选项 | 类型 | 默认值 | 说明 |
|
||||||
| `show_token_usage_status` | `true` | 是否在对话结束时显示 Token 使用情况的状态通知。 |
|
|--------|------|---------|-------------|
|
||||||
| `token_usage_status_threshold` | `80` | 触发显示上下文用量状态通知的最低百分比阈值 (0-100)。 |
|
| `compression_threshold_tokens` | integer | `64000` | 超过该 token 数触发压缩 |
|
||||||
|
| `max_context_tokens` | integer | `128000` | 上下文硬性上限 |
|
||||||
|
| `keep_first` | integer | `1` | 始终保留的前 N 条消息 |
|
||||||
|
| `keep_last` | integer | `6` | 始终保留的后 N 条消息 |
|
||||||
|
| `summary_model` | string | `None` | 用于摘要的模型 |
|
||||||
|
| `summary_model_max_context` | integer | `0` | 摘要模型的最大上下文 Token 数 |
|
||||||
|
| `max_summary_tokens` | integer | `16384` | 摘要的最大 token 数 |
|
||||||
|
| `enable_tool_output_trimming` | boolean | `false` | 启用长工具输出裁剪 |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## ⭐ 支持
|
## 示例
|
||||||
|
|
||||||
如果这个插件对你有帮助,欢迎到 [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) 点个 Star,这将是我持续改进的动力,感谢支持。
|
### 压缩前
|
||||||
|
|
||||||
## 故障排除 (Troubleshooting) ❓
|
```
|
||||||
|
[Message 1] User: Tell me about Python...
|
||||||
|
[Message 2] AI: Python is a programming language...
|
||||||
|
[Message 3] User: What about its history?
|
||||||
|
[Message 4] AI: Python was created by Guido...
|
||||||
|
[Message 5] User: And its features?
|
||||||
|
[Message 6] AI: Python has many features...
|
||||||
|
... (many more messages)
|
||||||
|
[Message 20] User: Current question
|
||||||
|
```
|
||||||
|
|
||||||
- **初始系统提示丢失**:将 `keep_first` 设置为大于 0。
|
### 压缩后
|
||||||
- **压缩效果不明显**:提高 `compression_threshold_tokens`,或降低 `keep_first` / `keep_last` 以增强压缩力度。
|
|
||||||
- **提交 Issue**: 如果遇到任何问题,请在 GitHub 上提交 Issue:[OpenWebUI Extensions Issues](https://github.com/Fu-Jie/openwebui-extensions/issues)
|
|
||||||
|
|
||||||
## 更新日志
|
```
|
||||||
|
[Summary] Previous conversation covered Python basics,
|
||||||
|
history, features, and common use cases...
|
||||||
|
|
||||||
完整历史请查看 GitHub 项目: [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions)
|
[Message 18] User: Recent question about decorators
|
||||||
|
[Message 19] AI: Decorators in Python are...
|
||||||
|
[Message 20] User: Current question
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 运行要求
|
||||||
|
|
||||||
|
!!! note "前置条件"
|
||||||
|
- OpenWebUI v0.3.0 及以上
|
||||||
|
- 需要可用的 LLM 用于摘要
|
||||||
|
|
||||||
|
!!! tip "最佳实践"
|
||||||
|
- 根据模型上下文窗口设置合适的 token 阈值
|
||||||
|
- 技术讨论可适当提高 `preserve_recent`
|
||||||
|
- 先在非关键对话中测试压缩效果
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 常见问题
|
||||||
|
|
||||||
|
??? question "没有触发压缩?"
|
||||||
|
检查 token 数是否超过配置的阈值,并开启调试日志了解细节。
|
||||||
|
|
||||||
|
??? question "重要上下文丢失?"
|
||||||
|
提高 `preserve_recent` 或降低压缩比例。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 源码
|
||||||
|
|
||||||
|
[:fontawesome-brands-github: 在 GitHub 查看](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression){ .md-button }
|
||||||
|
|||||||
@@ -22,7 +22,7 @@ Filters act as middleware in the message pipeline:
|
|||||||
|
|
||||||
Reduces token consumption in long conversations through intelligent summarization while maintaining coherence.
|
Reduces token consumption in long conversations through intelligent summarization while maintaining coherence.
|
||||||
|
|
||||||
**Version:** 1.3.0
|
**Version:** 1.2.2
|
||||||
|
|
||||||
[:octicons-arrow-right-24: Documentation](async-context-compression.md)
|
[:octicons-arrow-right-24: Documentation](async-context-compression.md)
|
||||||
|
|
||||||
|
|||||||
@@ -22,7 +22,7 @@ Filter 充当消息管线中的中间件:
|
|||||||
|
|
||||||
通过智能总结减少长对话的 token 消耗,同时保持连贯性。
|
通过智能总结减少长对话的 token 消耗,同时保持连贯性。
|
||||||
|
|
||||||
**版本:** 1.3.0
|
**版本:** 1.2.2
|
||||||
|
|
||||||
[:octicons-arrow-right-24: 查看文档](async-context-compression.md)
|
[:octicons-arrow-right-24: 查看文档](async-context-compression.md)
|
||||||
|
|
||||||
|
|||||||
@@ -2,26 +2,21 @@
|
|||||||
|
|
||||||
Smart Mind Map is a powerful OpenWebUI action plugin that intelligently analyzes long-form text content and automatically generates interactive mind maps, helping users structure and visualize knowledge.
|
Smart Mind Map is a powerful OpenWebUI action plugin that intelligently analyzes long-form text content and automatically generates interactive mind maps, helping users structure and visualize knowledge.
|
||||||
|
|
||||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.0.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
|
**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 0.9.2 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
|
||||||
|
|
||||||
## What's New in v1.0.0
|
## What's New in v0.9.2
|
||||||
|
|
||||||
### Direct Embed & UI Refinements
|
**Language Rule Alignment**
|
||||||
|
|
||||||
- **Native Multi-language UI (i18n)**: The plugin interface (buttons, settings, status) now automatically adapts to your browser's language setting for a seamless global experience.
|
- **Input Language First**: Mind map output now strictly matches the input text language.
|
||||||
- **Direct Embed Mode**: Introduced a native-like inline display mode for Open WebUI 0.8.0+, enabling a seamless full-width canvas.
|
- **Consistent Behavior**: Matches the infographic language rule for predictable multilingual output.
|
||||||
- **Adaptive Auto-Sizing**: Mind map now dynamically scales its height and perfectly refits to the window to eliminate scrollbar artifacts.
|
|
||||||
- **Subdued & Compact UI**: Completely redesigned the header tooling bar to a slender, single-line configuration to maximize visual rendering space.
|
|
||||||
- **Configurable Experience**: Added `ENABLE_DIRECT_EMBED_MODE` valve to explicitly toggle the new inline rendering behavior.
|
|
||||||
|
|
||||||
## Key Features 🔑
|
## Key Features 🔑
|
||||||
|
|
||||||
- ✅ **Intelligent Text Analysis**: Automatically identifies core themes, key concepts, and hierarchical structures.
|
- ✅ **Intelligent Text Analysis**: Automatically identifies core themes, key concepts, and hierarchical structures.
|
||||||
- ✅ **Native Multi-language UI**: Automatic interface translation (i18n) based on system language for a native feel.
|
|
||||||
- ✅ **Interactive Visualization**: Generates beautiful interactive mind maps based on Markmap.js.
|
- ✅ **Interactive Visualization**: Generates beautiful interactive mind maps based on Markmap.js.
|
||||||
- ✅ **Direct Embed Mode**: (Optional) For Open WebUI 0.8.0+, render natively inline to fill entire UI width.
|
|
||||||
- ✅ **High-Resolution PNG Export**: Export mind maps as high-quality PNG images (9x scale).
|
- ✅ **High-Resolution PNG Export**: Export mind maps as high-quality PNG images (9x scale).
|
||||||
- ✅ **Complete Control Panel**: Zoom controls, expand level selection, and fullscreen mode within a compact toolbar.
|
- ✅ **Complete Control Panel**: Zoom controls, expand level selection, and fullscreen mode.
|
||||||
- ✅ **Theme Switching**: Manual theme toggle button with automatic theme detection.
|
- ✅ **Theme Switching**: Manual theme toggle button with automatic theme detection.
|
||||||
- ✅ **Image Output Mode**: Generate static SVG images embedded directly in Markdown for cleaner history.
|
- ✅ **Image Output Mode**: Generate static SVG images embedded directly in Markdown for cleaner history.
|
||||||
|
|
||||||
@@ -42,7 +37,6 @@ Smart Mind Map is a powerful OpenWebUI action plugin that intelligently analyzes
|
|||||||
| `CLEAR_PREVIOUS_HTML` | `false` | Whether to clear previous plugin-generated HTML content. |
|
| `CLEAR_PREVIOUS_HTML` | `false` | Whether to clear previous plugin-generated HTML content. |
|
||||||
| `MESSAGE_COUNT` | `1` | Number of recent messages to use for generation (1-5). |
|
| `MESSAGE_COUNT` | `1` | Number of recent messages to use for generation (1-5). |
|
||||||
| `OUTPUT_MODE` | `html` | Output mode: `html` (interactive) or `image` (static). |
|
| `OUTPUT_MODE` | `html` | Output mode: `html` (interactive) or `image` (static). |
|
||||||
| `ENABLE_DIRECT_EMBED_MODE` | `false` | Enable Direct Embed Mode (Open WebUI 0.8.0+ native layout) instead of Legacy Mode. |
|
|
||||||
|
|
||||||
## ⭐ Support
|
## ⭐ Support
|
||||||
|
|
||||||
|
|||||||
@@ -2,26 +2,21 @@
|
|||||||
|
|
||||||
思维导图是一个强大的 OpenWebUI 动作插件,能够智能分析长篇文本内容,自动生成交互式思维导图,帮助用户结构化和可视化知识。
|
思维导图是一个强大的 OpenWebUI 动作插件,能够智能分析长篇文本内容,自动生成交互式思维导图,帮助用户结构化和可视化知识。
|
||||||
|
|
||||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie) | **版本:** 1.0.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
|
**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 0.9.2 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
|
||||||
|
|
||||||
## v1.0.0 最新更新
|
## v0.9.2 更新亮点
|
||||||
|
|
||||||
### 嵌入式直出与 UI 细节全线重构
|
**语言规则对齐**
|
||||||
|
|
||||||
- **原生多语言界面 (Native i18n)**:插件界面(按钮、设置说明、状态提示)现在会根据您浏览器的语言设置自动适配系统语言。
|
- **输入语言优先**:导图输出严格与输入文本语言一致。
|
||||||
- **原生态嵌入模式 (Direct Embed)**:针对 Open WebUI 0.8.0+ 的前端架构支持了纯正的内容内联(Inline)直出模式,不再受气泡和 Markdown 隔离,真正撑满屏幕宽度。
|
- **一致性提升**:与信息图语言规则保持一致,多语言输出更可预期。
|
||||||
- **自动响应边界 (Auto-Sizing)**:突破以前高度僵死的问题。思维导图现在可以根据您的当前屏幕大小弹性伸缩(动态 `clamp()` 高度),彻底消灭丑陋的局部滚动条与白边。
|
|
||||||
- **极简专业 UI (Compact UI)**:推倒重做了头部的菜单栏,统一使用了一套干净、单行的极简全透明微拟物 Toolbar 设计,为导图画布省下极大的垂直空间。
|
|
||||||
- **模式配置自由**:为了照顾阅读流连贯的习惯,新增了 `ENABLE_DIRECT_EMBED_MODE` 配置开关。您必须在设置中显式开启才能体验宽广内联全屏模式。
|
|
||||||
|
|
||||||
## 核心特性 🔑
|
## 核心特性 🔑
|
||||||
|
|
||||||
- ✅ **智能文本分析**:自动识别文本的核心主题、关键概念和层次结构。
|
- ✅ **智能文本分析**:自动识别文本的核心主题、关键概念和层次结构。
|
||||||
- ✅ **原生多语言界面**:根据系统语言自动切换界面语言 (i18n),提供原生交互体验。
|
|
||||||
- ✅ **交互式可视化**:基于 Markmap.js 生成美观的交互式思维导图。
|
- ✅ **交互式可视化**:基于 Markmap.js 生成美观的交互式思维导图。
|
||||||
- ✅ **直出全景内嵌 (Direct Embed)**:(可选开关) 对于 Open WebUI 0.8.0+,直接填补整个前端宽度,去除气泡剥离感。
|
|
||||||
- ✅ **高分辨率 PNG 导出**:导出高质量的 PNG 图片(9 倍分辨率)。
|
- ✅ **高分辨率 PNG 导出**:导出高质量的 PNG 图片(9 倍分辨率)。
|
||||||
- ✅ **完整控制面板**:极简清爽的单行大屏缩放控制、展开层级选择、全局全屏等核心操作。
|
- ✅ **完整控制面板**:缩放控制、展开层级选择、全屏模式。
|
||||||
- ✅ **主题切换**:手动主题切换按钮与自动主题检测。
|
- ✅ **主题切换**:手动主题切换按钮与自动主题检测。
|
||||||
- ✅ **图片输出模式**:生成静态 SVG 图片直接嵌入 Markdown,聊天记录更简洁。
|
- ✅ **图片输出模式**:生成静态 SVG 图片直接嵌入 Markdown,聊天记录更简洁。
|
||||||
|
|
||||||
@@ -42,7 +37,6 @@
|
|||||||
| `CLEAR_PREVIOUS_HTML` | `false` | 在生成新的思维导图时,是否清除之前的 HTML 内容。 |
|
| `CLEAR_PREVIOUS_HTML` | `false` | 在生成新的思维导图时,是否清除之前的 HTML 内容。 |
|
||||||
| `MESSAGE_COUNT` | `1` | 用于生成思维导图的最近消息数量(1-5)。 |
|
| `MESSAGE_COUNT` | `1` | 用于生成思维导图的最近消息数量(1-5)。 |
|
||||||
| `OUTPUT_MODE` | `html` | 输出模式:`html`(交互式)或 `image`(静态图片)。 |
|
| `OUTPUT_MODE` | `html` | 输出模式:`html`(交互式)或 `image`(静态图片)。 |
|
||||||
| `ENABLE_DIRECT_EMBED_MODE` | `false` | 是否开启沉浸式直出嵌入模式(需要 Open WebUI v0.8.0+ 环境)。如果保持 `false` 将会维持旧版的对话流 Markdown 渲染模式。 |
|
|
||||||
|
|
||||||
## ⭐ 支持
|
## ⭐ 支持
|
||||||
|
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
BIN
plugins/actions/smart-mind-map/smart_mind_map_cn.png
Normal file
BIN
plugins/actions/smart-mind-map/smart_mind_map_cn.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 216 KiB |
1617
plugins/actions/smart-mind-map/smart_mind_map_cn.py
Normal file
1617
plugins/actions/smart-mind-map/smart_mind_map_cn.py
Normal file
File diff suppressed because it is too large
Load Diff
@@ -1,22 +1,18 @@
|
|||||||
# Async Context Compression Filter
|
# Async Context Compression Filter
|
||||||
|
|
||||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.3.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
|
**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.2.2 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
|
||||||
|
|
||||||
This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.
|
This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.
|
||||||
|
|
||||||
## What's new in 1.3.0
|
## What's new in 1.2.2
|
||||||
|
|
||||||
- **Internationalization (i18n)**: Complete localization of user-facing messages across 9 languages (English, Chinese, Japanese, Korean, French, German, Spanish, Italian).
|
- **Critical Fix**: Resolved `TypeError: 'str' object is not callable` caused by variable name conflict in logging function.
|
||||||
- **Smart Status Display**: Added `token_usage_status_threshold` valve (default 80%) to intelligently control when token usage status is shown.
|
- **Compatibility**: Enhanced `params` handling to support Pydantic objects, improving compatibility with different OpenWebUI versions.
|
||||||
- **Improved Performance**: Frontend language detection and logging are optimized to be completely non-blocking, maintaining lightning-fast TTFB.
|
|
||||||
- **Copilot SDK Integration**: Automatically detects and skips compression for copilot_sdk based models to prevent conflicts.
|
|
||||||
- **Configuration**: `debug_mode` is now set to `false` by default for a quieter production experience.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Core Features
|
## Core Features
|
||||||
|
|
||||||
- ✅ **Full i18n Support**: Native localization across 9 languages.
|
|
||||||
- ✅ Automatic compression triggered by token thresholds.
|
- ✅ Automatic compression triggered by token thresholds.
|
||||||
- ✅ Asynchronous summarization that does not block chat responses.
|
- ✅ Asynchronous summarization that does not block chat responses.
|
||||||
- ✅ Persistent storage via Open WebUI's shared database connection (PostgreSQL, SQLite, etc.).
|
- ✅ Persistent storage via Open WebUI's shared database connection (PostgreSQL, SQLite, etc.).
|
||||||
@@ -59,10 +55,8 @@ This filter reduces token consumption in long conversations through intelligent
|
|||||||
| `summary_temperature` | `0.3` | Randomness for summary generation. Lower is more deterministic. |
|
| `summary_temperature` | `0.3` | Randomness for summary generation. Lower is more deterministic. |
|
||||||
| `model_thresholds` | `{}` | Per-model overrides for `compression_threshold_tokens` and `max_context_tokens` (useful for mixed models). |
|
| `model_thresholds` | `{}` | Per-model overrides for `compression_threshold_tokens` and `max_context_tokens` (useful for mixed models). |
|
||||||
| `enable_tool_output_trimming` | `false` | When enabled and `function_calling: "native"` is active, trims verbose tool outputs to extract only the final answer. |
|
| `enable_tool_output_trimming` | `false` | When enabled and `function_calling: "native"` is active, trims verbose tool outputs to extract only the final answer. |
|
||||||
| `debug_mode` | `false` | Log verbose debug info. Set to `false` in production. |
|
| `debug_mode` | `true` | Log verbose debug info. Set to `false` in production. |
|
||||||
| `show_debug_log` | `false` | Print debug logs to browser console (F12). Useful for frontend debugging. |
|
| `show_debug_log` | `false` | Print debug logs to browser console (F12). Useful for frontend debugging. |
|
||||||
| `show_token_usage_status` | `true` | Show token usage status notification in the chat interface. |
|
|
||||||
| `token_usage_status_threshold` | `80` | The minimum usage percentage (0-100) required to show a context usage status notification. |
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -1,24 +1,20 @@
|
|||||||
# 异步上下文压缩过滤器
|
# 异步上下文压缩过滤器
|
||||||
|
|
||||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.3.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
|
**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.2.2 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
|
||||||
|
|
||||||
> **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。
|
> **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。
|
||||||
|
|
||||||
本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。
|
本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。
|
||||||
|
|
||||||
## 1.3.0 版本更新
|
## 1.2.2 版本更新
|
||||||
|
|
||||||
- **国际化 (i18n) 支持**: 完成了所有用户可见消息的本地化,现已原生支持 9 种语言(含中、英、日、韩及欧洲主要语言)。
|
- **严重错误修复**: 解决了因日志函数变量名冲突导致的 `TypeError: 'str' object is not callable` 错误。
|
||||||
- **智能状态显示**: 新增 `token_usage_status_threshold` 阀门(默认 80%),可以智能控制何时显示 Token 用量状态,减少不必要的打扰。
|
- **兼容性增强**: 改进了 `params` 处理逻辑以支持 Pydantic 对象,提高了对不同 OpenWebUI 版本的兼容性。
|
||||||
- **性能大幅优化**: 对前端语言检测和日志处理流程进行了非阻塞重构,完全不影响首字节响应时间(TTFB),保持毫秒级极速推流。
|
|
||||||
- **Copilot SDK 兼容**: 自动检测并跳过基于 `copilot_sdk` 模型的上下文压缩,避免冲突。
|
|
||||||
- **配置项调整**: 为了提供更安静的生产环境体验,`debug_mode` 现已默认设置为 `false`。
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 核心特性
|
## 核心特性
|
||||||
|
|
||||||
- ✅ **全方位国际化**: 原生支持 9 种界面语言。
|
|
||||||
- ✅ **自动压缩**: 基于 Token 阈值自动触发上下文压缩。
|
- ✅ **自动压缩**: 基于 Token 阈值自动触发上下文压缩。
|
||||||
- ✅ **异步摘要**: 后台生成摘要,不阻塞当前对话响应。
|
- ✅ **异步摘要**: 后台生成摘要,不阻塞当前对话响应。
|
||||||
- ✅ **持久化存储**: 复用 Open WebUI 共享数据库连接,自动支持 PostgreSQL/SQLite 等。
|
- ✅ **持久化存储**: 复用 Open WebUI 共享数据库连接,自动支持 PostgreSQL/SQLite 等。
|
||||||
@@ -97,10 +93,9 @@
|
|||||||
| 参数 | 默认值 | 描述 |
|
| 参数 | 默认值 | 描述 |
|
||||||
| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------- |
|
| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `enable_tool_output_trimming` | `false` | 启用时,若 `function_calling: "native"` 激活,将裁剪冗长的工具输出以仅提取最终答案。 |
|
| `enable_tool_output_trimming` | `false` | 启用时,若 `function_calling: "native"` 激活,将裁剪冗长的工具输出以仅提取最终答案。 |
|
||||||
| `debug_mode` | `false` | 是否在 Open WebUI 的控制台日志中打印详细的调试信息。生产环境默认且建议设为 `false`。 |
|
| `debug_mode` | `true` | 是否在 Open WebUI 的控制台日志中打印详细的调试信息(如 Token 计数、压缩进度、数据库操作等)。生产环境建议设为 `false`。 |
|
||||||
| `show_debug_log` | `false` | 是否在浏览器控制台 (F12) 打印调试日志。便于前端调试。 |
|
| `show_debug_log` | `false` | 是否在浏览器控制台 (F12) 打印调试日志。便于前端调试。 |
|
||||||
| `show_token_usage_status` | `true` | 是否在对话结束时显示 Token 使用情况的状态通知。 |
|
| `show_token_usage_status` | `true` | 是否在对话结束时显示 Token 使用情况的状态通知。 |
|
||||||
| `token_usage_status_threshold` | `80` | 触发显示上下文用量状态通知的最低百分比阈值 (0-100)。 |
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -5,17 +5,17 @@ author: Fu-Jie
|
|||||||
author_url: https://github.com/Fu-Jie/openwebui-extensions
|
author_url: https://github.com/Fu-Jie/openwebui-extensions
|
||||||
funding_url: https://github.com/open-webui
|
funding_url: https://github.com/open-webui
|
||||||
description: Reduces token consumption in long conversations while maintaining coherence through intelligent summarization and message compression.
|
description: Reduces token consumption in long conversations while maintaining coherence through intelligent summarization and message compression.
|
||||||
version: 1.3.0
|
version: 1.2.2
|
||||||
openwebui_id: b1655bc8-6de9-4cad-8cb5-a6f7829a02ce
|
openwebui_id: b1655bc8-6de9-4cad-8cb5-a6f7829a02ce
|
||||||
license: MIT
|
license: MIT
|
||||||
|
|
||||||
═══════════════════════════════════════════════════════════════════════════════
|
═══════════════════════════════════════════════════════════════════════════════
|
||||||
📌 What's new in 1.3.0
|
📌 What's new in 1.2.1
|
||||||
═══════════════════════════════════════════════════════════════════════════════
|
═══════════════════════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
✅ Smart Status Display: Added `token_usage_status_threshold` valve (default 80%) to control when token usage status is shown, reducing unnecessary notifications.
|
✅ Smart Configuration: Automatically detects base model settings for custom models and adds `summary_model_max_context` for independent summary limits.
|
||||||
✅ Copilot SDK Integration: Automatically detects and skips compression for copilot_sdk based models to prevent conflicts.
|
✅ Performance & Refactoring: Optimized threshold parsing with caching and removed redundant code for better efficiency.
|
||||||
✅ Improved User Experience: Status messages now only appear when token usage exceeds the configured threshold, keeping the interface cleaner.
|
✅ Bug Fixes & Modernization: Fixed `datetime` deprecation warnings and corrected type annotations.
|
||||||
|
|
||||||
═══════════════════════════════════════════════════════════════════════════════
|
═══════════════════════════════════════════════════════════════════════════════
|
||||||
📌 Overview
|
📌 Overview
|
||||||
@@ -150,7 +150,7 @@ summary_temperature
|
|||||||
Description: Controls the randomness of the summary generation. Lower values produce more deterministic output.
|
Description: Controls the randomness of the summary generation. Lower values produce more deterministic output.
|
||||||
|
|
||||||
debug_mode
|
debug_mode
|
||||||
Default: false
|
Default: true
|
||||||
Description: Prints detailed debug information to the log. Recommended to set to `false` in production.
|
Description: Prints detailed debug information to the log. Recommended to set to `false` in production.
|
||||||
|
|
||||||
show_debug_log
|
show_debug_log
|
||||||
@@ -268,7 +268,6 @@ import hashlib
|
|||||||
import time
|
import time
|
||||||
import contextlib
|
import contextlib
|
||||||
import logging
|
import logging
|
||||||
from functools import lru_cache
|
|
||||||
|
|
||||||
# Setup logger
|
# Setup logger
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
@@ -392,130 +391,6 @@ class ChatSummary(owui_Base):
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
TRANSLATIONS = {
|
|
||||||
"en-US": {
|
|
||||||
"status_context_usage": "Context Usage (Estimated): {tokens} / {max_tokens} Tokens ({ratio}%)",
|
|
||||||
"status_high_usage": " | ⚠️ High Usage",
|
|
||||||
"status_loaded_summary": "Loaded historical summary (Hidden {count} historical messages)",
|
|
||||||
"status_context_summary_updated": "Context Summary Updated: {tokens} / {max_tokens} Tokens ({ratio}%)",
|
|
||||||
"status_generating_summary": "Generating context summary in background...",
|
|
||||||
"status_summary_error": "Summary Error: {error}",
|
|
||||||
"summary_prompt_prefix": "【Previous Summary: The following is a summary of the historical conversation, provided for context only. Do not reply to the summary content itself; answer the subsequent latest questions directly.】\n\n",
|
|
||||||
"summary_prompt_suffix": "\n\n---\nBelow is the recent conversation:",
|
|
||||||
"tool_trimmed": "... [Tool outputs trimmed]\n{content}",
|
|
||||||
"content_collapsed": "\n... [Content collapsed] ...\n",
|
|
||||||
},
|
|
||||||
"zh-CN": {
|
|
||||||
"status_context_usage": "上下文用量 (预估): {tokens} / {max_tokens} Tokens ({ratio}%)",
|
|
||||||
"status_high_usage": " | ⚠️ 用量较高",
|
|
||||||
"status_loaded_summary": "已加载历史总结 (隐藏了 {count} 条历史消息)",
|
|
||||||
"status_context_summary_updated": "上下文总结已更新: {tokens} / {max_tokens} Tokens ({ratio}%)",
|
|
||||||
"status_generating_summary": "正在后台生成上下文总结...",
|
|
||||||
"status_summary_error": "总结生成错误: {error}",
|
|
||||||
"summary_prompt_prefix": "【前情提要:以下是历史对话的总结,仅供上下文参考。请不要回复总结内容本身,直接回答之后最新的问题。】\n\n",
|
|
||||||
"summary_prompt_suffix": "\n\n---\n以下是最近的对话:",
|
|
||||||
"tool_trimmed": "... [工具输出已裁剪]\n{content}",
|
|
||||||
"content_collapsed": "\n... [内容已折叠] ...\n",
|
|
||||||
},
|
|
||||||
"zh-HK": {
|
|
||||||
"status_context_usage": "上下文用量 (預估): {tokens} / {max_tokens} Tokens ({ratio}%)",
|
|
||||||
"status_high_usage": " | ⚠️ 用量較高",
|
|
||||||
"status_loaded_summary": "已載入歷史總結 (隱藏了 {count} 條歷史訊息)",
|
|
||||||
"status_context_summary_updated": "上下文總結已更新: {tokens} / {max_tokens} Tokens ({ratio}%)",
|
|
||||||
"status_generating_summary": "正在後台生成上下文總結...",
|
|
||||||
"status_summary_error": "總結生成錯誤: {error}",
|
|
||||||
"summary_prompt_prefix": "【前情提要:以下是歷史對話的總結,僅供上下文參考。請不要回覆總結內容本身,直接回答之後最新的問題。】\n\n",
|
|
||||||
"summary_prompt_suffix": "\n\n---\n以下是最近的對話:",
|
|
||||||
"tool_trimmed": "... [工具輸出已裁剪]\n{content}",
|
|
||||||
"content_collapsed": "\n... [內容已折疊] ...\n",
|
|
||||||
},
|
|
||||||
"zh-TW": {
|
|
||||||
"status_context_usage": "上下文用量 (預估): {tokens} / {max_tokens} Tokens ({ratio}%)",
|
|
||||||
"status_high_usage": " | ⚠️ 用量較高",
|
|
||||||
"status_loaded_summary": "已載入歷史總結 (隱藏了 {count} 條歷史訊息)",
|
|
||||||
"status_context_summary_updated": "上下文總結已更新: {tokens} / {max_tokens} Tokens ({ratio}%)",
|
|
||||||
"status_generating_summary": "正在後台生成上下文總結...",
|
|
||||||
"status_summary_error": "總結生成錯誤: {error}",
|
|
||||||
"summary_prompt_prefix": "【前情提要:以下是歷史對話的總結,僅供上下文参考。請不要回覆總結內容本身,直接回答之後最新的問題。】\n\n",
|
|
||||||
"summary_prompt_suffix": "\n\n---\n以下是最近的對話:",
|
|
||||||
"tool_trimmed": "... [工具輸出已裁剪]\n{content}",
|
|
||||||
"content_collapsed": "\n... [內容已折疊] ...\n",
|
|
||||||
},
|
|
||||||
"ja-JP": {
|
|
||||||
"status_context_usage": "コンテキスト使用量 (推定): {tokens} / {max_tokens} トークン ({ratio}%)",
|
|
||||||
"status_high_usage": " | ⚠️ 使用量高",
|
|
||||||
"status_loaded_summary": "履歴の要約を読み込みました ({count} 件の履歴メッセージを非表示)",
|
|
||||||
"status_context_summary_updated": "コンテキストの要約が更新されました: {tokens} / {max_tokens} トークン ({ratio}%)",
|
|
||||||
"status_generating_summary": "バックグラウンドでコンテキスト要約を生成しています...",
|
|
||||||
"status_summary_error": "要約エラー: {error}",
|
|
||||||
"summary_prompt_prefix": "【これまでのあらすじ:以下は過去の会話の要約であり、コンテキストの参考としてのみ提供されます。要約の内容自体には返答せず、その後の最新の質問に直接答えてください。】\n\n",
|
|
||||||
"summary_prompt_suffix": "\n\n---\n以下は最近の会話です:",
|
|
||||||
"tool_trimmed": "... [ツールの出力をトリミングしました]\n{content}",
|
|
||||||
"content_collapsed": "\n... [コンテンツが折りたたまれました] ...\n",
|
|
||||||
},
|
|
||||||
"ko-KR": {
|
|
||||||
"status_context_usage": "컨텍스트 사용량 (예상): {tokens} / {max_tokens} 토큰 ({ratio}%)",
|
|
||||||
"status_high_usage": " | ⚠️ 사용량 높음",
|
|
||||||
"status_loaded_summary": "이전 요약 불러옴 ({count}개의 이전 메시지 숨김)",
|
|
||||||
"status_context_summary_updated": "컨텍스트 요약 업데이트됨: {tokens} / {max_tokens} 토큰 ({ratio}%)",
|
|
||||||
"status_generating_summary": "백그라운드에서 컨텍스트 요약 생성 중...",
|
|
||||||
"status_summary_error": "요약 오류: {error}",
|
|
||||||
"summary_prompt_prefix": "【이전 요약: 다음은 이전 대화의 요약이며 문맥 참고용으로만 제공됩니다. 요약 내용 자체에 답하지 말고 последу의 최신 질문에 직접 답하세요.】\n\n",
|
|
||||||
"summary_prompt_suffix": "\n\n---\n다음은 최근 대화입니다:",
|
|
||||||
"tool_trimmed": "... [도구 출력 잘림]\n{content}",
|
|
||||||
"content_collapsed": "\n... [내용 접힘] ...\n",
|
|
||||||
},
|
|
||||||
"fr-FR": {
|
|
||||||
"status_context_usage": "Utilisation du contexte (estimée) : {tokens} / {max_tokens} jetons ({ratio}%)",
|
|
||||||
"status_high_usage": " | ⚠️ Utilisation élevée",
|
|
||||||
"status_loaded_summary": "Résumé historique chargé ({count} messages d'historique masqués)",
|
|
||||||
"status_context_summary_updated": "Résumé du contexte mis à jour : {tokens} / {max_tokens} jetons ({ratio}%)",
|
|
||||||
"status_generating_summary": "Génération du résumé du contexte en arrière-plan...",
|
|
||||||
"status_summary_error": "Erreur de résumé : {error}",
|
|
||||||
"summary_prompt_prefix": "【Résumé précédent : Ce qui suit est un résumé de la conversation historique, fourni uniquement pour le contexte. Ne répondez pas au contenu du résumé lui-même ; répondez directement aux dernières questions.】\n\n",
|
|
||||||
"summary_prompt_suffix": "\n\n---\nVoici la conversation récente :",
|
|
||||||
"tool_trimmed": "... [Sorties d'outils coupées]\n{content}",
|
|
||||||
"content_collapsed": "\n... [Contenu réduit] ...\n",
|
|
||||||
},
|
|
||||||
"de-DE": {
|
|
||||||
"status_context_usage": "Kontextnutzung (geschätzt): {tokens} / {max_tokens} Tokens ({ratio}%)",
|
|
||||||
"status_high_usage": " | ⚠️ Hohe Nutzung",
|
|
||||||
"status_loaded_summary": "Historische Zusammenfassung geladen ({count} historische Nachrichten ausgeblendet)",
|
|
||||||
"status_context_summary_updated": "Kontextzusammenfassung aktualisiert: {tokens} / {max_tokens} Tokens ({ratio}%)",
|
|
||||||
"status_generating_summary": "Kontextzusammenfassung wird im Hintergrund generiert...",
|
|
||||||
"status_summary_error": "Zusammenfassungsfehler: {error}",
|
|
||||||
"summary_prompt_prefix": "【Vorherige Zusammenfassung: Das Folgende ist eine Zusammenfassung der historischen Konversation, die nur als Kontext dient. Antworten Sie nicht auf den Inhalt der Zusammenfassung selbst, sondern direkt auf die nachfolgenden neuesten Fragen.】\n\n",
|
|
||||||
"summary_prompt_suffix": "\n\n---\nHier ist die jüngste Konversation:",
|
|
||||||
"tool_trimmed": "... [Werkzeugausgaben gekürzt]\n{content}",
|
|
||||||
"content_collapsed": "\n... [Inhalt ausgeblendet] ...\n",
|
|
||||||
},
|
|
||||||
"es-ES": {
|
|
||||||
"status_context_usage": "Uso del contexto (estimado): {tokens} / {max_tokens} Tokens ({ratio}%)",
|
|
||||||
"status_high_usage": " | ⚠️ Uso elevado",
|
|
||||||
"status_loaded_summary": "Resumen histórico cargado ({count} mensajes históricos ocultos)",
|
|
||||||
"status_context_summary_updated": "Resumen del contexto actualizado: {tokens} / {max_tokens} Tokens ({ratio}%)",
|
|
||||||
"status_generating_summary": "Generando resumen del contexto en segundo plano...",
|
|
||||||
"status_summary_error": "Error de resumen: {error}",
|
|
||||||
"summary_prompt_prefix": "【Resumen anterior: El siguiente es un resumen de la conversación histórica, proporcionado solo como contexto. No responda al contenido del resumen en sí; responda directamente a las preguntas más recientes.】\n\n",
|
|
||||||
"summary_prompt_suffix": "\n\n---\nA continuación se muestra la conversación reciente:",
|
|
||||||
"tool_trimmed": "... [Salidas de herramientas recortadas]\n{content}",
|
|
||||||
"content_collapsed": "\n... [Contenido contraído] ...\n",
|
|
||||||
},
|
|
||||||
"it-IT": {
|
|
||||||
"status_context_usage": "Utilizzo contesto (stimato): {tokens} / {max_tokens} Token ({ratio}%)",
|
|
||||||
"status_high_usage": " | ⚠️ Utilizzo elevato",
|
|
||||||
"status_loaded_summary": "Riepilogo storico caricato ({count} messaggi storici nascosti)",
|
|
||||||
"status_context_summary_updated": "Riepilogo contesto aggiornato: {tokens} / {max_tokens} Token ({ratio}%)",
|
|
||||||
"status_generating_summary": "Generazione riepilogo contesto in background...",
|
|
||||||
"status_summary_error": "Errore riepilogo: {error}",
|
|
||||||
"summary_prompt_prefix": "【Riepilogo precedente: Il seguente è un riepilogo della conversazione storica, fornito solo per contesto. Non rispondere al contenuto del riepilogo stesso; rispondi direttamente alle domande più recenti.】\n\n",
|
|
||||||
"summary_prompt_suffix": "\n\n---\nDi seguito è riportata la conversazione recente:",
|
|
||||||
"tool_trimmed": "... [Output degli strumenti tagliati]\n{content}",
|
|
||||||
"content_collapsed": "\n... [Contenuto compresso] ...\n",
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
# Global cache for tiktoken encoding
|
# Global cache for tiktoken encoding
|
||||||
TIKTOKEN_ENCODING = None
|
TIKTOKEN_ENCODING = None
|
||||||
if tiktoken:
|
if tiktoken:
|
||||||
@@ -525,26 +400,6 @@ if tiktoken:
|
|||||||
logger.error(f"[Init] Failed to load tiktoken encoding: {e}")
|
logger.error(f"[Init] Failed to load tiktoken encoding: {e}")
|
||||||
|
|
||||||
|
|
||||||
@lru_cache(maxsize=1024)
|
|
||||||
def _get_cached_tokens(text: str) -> int:
|
|
||||||
"""Calculates tokens with LRU caching for exact string matches."""
|
|
||||||
if not text:
|
|
||||||
return 0
|
|
||||||
if TIKTOKEN_ENCODING:
|
|
||||||
try:
|
|
||||||
# tiktoken logic is relatively fast, but caching it based on exact string match
|
|
||||||
# turns O(N) encoding time to O(1) dictionary lookup for historical messages.
|
|
||||||
return len(TIKTOKEN_ENCODING.encode(text))
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(
|
|
||||||
f"[Token Count] tiktoken error: {e}, falling back to character estimation"
|
|
||||||
)
|
|
||||||
pass
|
|
||||||
|
|
||||||
# Fallback strategy: Rough estimation (1 token ≈ 4 chars)
|
|
||||||
return len(text) // 4
|
|
||||||
|
|
||||||
|
|
||||||
class Filter:
|
class Filter:
|
||||||
def __init__(self):
|
def __init__(self):
|
||||||
self.valves = self.Valves()
|
self.valves = self.Valves()
|
||||||
@@ -554,105 +409,8 @@ class Filter:
|
|||||||
sessionmaker(bind=self._db_engine) if self._db_engine else None
|
sessionmaker(bind=self._db_engine) if self._db_engine else None
|
||||||
)
|
)
|
||||||
self._model_thresholds_cache: Optional[Dict[str, Any]] = None
|
self._model_thresholds_cache: Optional[Dict[str, Any]] = None
|
||||||
|
|
||||||
# Fallback mapping for variants not in TRANSLATIONS keys
|
|
||||||
self.fallback_map = {
|
|
||||||
"es-AR": "es-ES",
|
|
||||||
"es-MX": "es-ES",
|
|
||||||
"fr-CA": "fr-FR",
|
|
||||||
"en-CA": "en-US",
|
|
||||||
"en-GB": "en-US",
|
|
||||||
"en-AU": "en-US",
|
|
||||||
"de-AT": "de-DE",
|
|
||||||
}
|
|
||||||
|
|
||||||
self._init_database()
|
self._init_database()
|
||||||
|
|
||||||
def _resolve_language(self, lang: str) -> str:
|
|
||||||
"""Resolve the best matching language code from the TRANSLATIONS dict."""
|
|
||||||
target_lang = lang
|
|
||||||
|
|
||||||
# 1. Direct match
|
|
||||||
if target_lang in TRANSLATIONS:
|
|
||||||
return target_lang
|
|
||||||
|
|
||||||
# 2. Variant fallback (explicit mapping)
|
|
||||||
if target_lang in self.fallback_map:
|
|
||||||
target_lang = self.fallback_map[target_lang]
|
|
||||||
if target_lang in TRANSLATIONS:
|
|
||||||
return target_lang
|
|
||||||
|
|
||||||
# 3. Base language fallback (e.g. fr-BE -> fr-FR)
|
|
||||||
if "-" in lang:
|
|
||||||
base_lang = lang.split("-")[0]
|
|
||||||
for supported_lang in TRANSLATIONS:
|
|
||||||
if supported_lang.startswith(base_lang + "-"):
|
|
||||||
return supported_lang
|
|
||||||
|
|
||||||
# 4. Final Fallback to en-US
|
|
||||||
return "en-US"
|
|
||||||
|
|
||||||
def _get_translation(self, lang: str, key: str, **kwargs) -> str:
|
|
||||||
"""Get translated string for the given language and key."""
|
|
||||||
target_lang = self._resolve_language(lang)
|
|
||||||
lang_dict = TRANSLATIONS.get(target_lang, TRANSLATIONS["en-US"])
|
|
||||||
text = lang_dict.get(key, TRANSLATIONS["en-US"].get(key, key))
|
|
||||||
if kwargs:
|
|
||||||
try:
|
|
||||||
text = text.format(**kwargs)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Translation formatting failed for {key}: {e}")
|
|
||||||
return text
|
|
||||||
|
|
||||||
async def _get_user_context(
|
|
||||||
self,
|
|
||||||
__user__: Optional[Dict[str, Any]],
|
|
||||||
__event_call__: Optional[Callable[[Any], Awaitable[None]]] = None,
|
|
||||||
) -> Dict[str, str]:
|
|
||||||
"""Extract basic user context with safe fallbacks."""
|
|
||||||
if isinstance(__user__, (list, tuple)):
|
|
||||||
user_data = __user__[0] if __user__ else {}
|
|
||||||
elif isinstance(__user__, dict):
|
|
||||||
user_data = __user__
|
|
||||||
else:
|
|
||||||
user_data = {}
|
|
||||||
|
|
||||||
user_id = user_data.get("id", "unknown_user")
|
|
||||||
user_name = user_data.get("name", "User")
|
|
||||||
user_language = user_data.get("language", "en-US")
|
|
||||||
|
|
||||||
if __event_call__:
|
|
||||||
try:
|
|
||||||
js_code = """
|
|
||||||
return (
|
|
||||||
document.documentElement.lang ||
|
|
||||||
localStorage.getItem('locale') ||
|
|
||||||
localStorage.getItem('language') ||
|
|
||||||
navigator.language ||
|
|
||||||
'en-US'
|
|
||||||
);
|
|
||||||
"""
|
|
||||||
frontend_lang = await asyncio.wait_for(
|
|
||||||
__event_call__({"type": "execute", "data": {"code": js_code}}),
|
|
||||||
timeout=1.0,
|
|
||||||
)
|
|
||||||
if frontend_lang and isinstance(frontend_lang, str):
|
|
||||||
user_language = frontend_lang
|
|
||||||
except asyncio.TimeoutError:
|
|
||||||
logger.warning(
|
|
||||||
"Failed to retrieve frontend language: Timeout (using fallback)"
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(
|
|
||||||
f"Failed to retrieve frontend language: {type(e).__name__}: {e}"
|
|
||||||
)
|
|
||||||
|
|
||||||
return {
|
|
||||||
"user_id": user_id,
|
|
||||||
"user_name": user_name,
|
|
||||||
"user_language": user_language,
|
|
||||||
}
|
|
||||||
|
|
||||||
def _parse_model_thresholds(self) -> Dict[str, Any]:
|
def _parse_model_thresholds(self) -> Dict[str, Any]:
|
||||||
"""Parse model_thresholds string into a dictionary.
|
"""Parse model_thresholds string into a dictionary.
|
||||||
|
|
||||||
@@ -816,7 +574,7 @@ class Filter:
|
|||||||
description="The temperature for summary generation.",
|
description="The temperature for summary generation.",
|
||||||
)
|
)
|
||||||
debug_mode: bool = Field(
|
debug_mode: bool = Field(
|
||||||
default=False, description="Enable detailed logging for debugging."
|
default=True, description="Enable detailed logging for debugging."
|
||||||
)
|
)
|
||||||
show_debug_log: bool = Field(
|
show_debug_log: bool = Field(
|
||||||
default=False, description="Show debug logs in the frontend console"
|
default=False, description="Show debug logs in the frontend console"
|
||||||
@@ -824,12 +582,6 @@ class Filter:
|
|||||||
show_token_usage_status: bool = Field(
|
show_token_usage_status: bool = Field(
|
||||||
default=True, description="Show token usage status notification"
|
default=True, description="Show token usage status notification"
|
||||||
)
|
)
|
||||||
token_usage_status_threshold: int = Field(
|
|
||||||
default=80,
|
|
||||||
ge=0,
|
|
||||||
le=100,
|
|
||||||
description="Only show token usage status when usage exceeds this percentage (0-100). Set to 0 to always show.",
|
|
||||||
)
|
|
||||||
enable_tool_output_trimming: bool = Field(
|
enable_tool_output_trimming: bool = Field(
|
||||||
default=False,
|
default=False,
|
||||||
description="Enable trimming of large tool outputs (only works with native function calling).",
|
description="Enable trimming of large tool outputs (only works with native function calling).",
|
||||||
@@ -902,7 +654,20 @@ class Filter:
|
|||||||
|
|
||||||
def _count_tokens(self, text: str) -> int:
|
def _count_tokens(self, text: str) -> int:
|
||||||
"""Counts the number of tokens in the text."""
|
"""Counts the number of tokens in the text."""
|
||||||
return _get_cached_tokens(text)
|
if not text:
|
||||||
|
return 0
|
||||||
|
|
||||||
|
if TIKTOKEN_ENCODING:
|
||||||
|
try:
|
||||||
|
return len(TIKTOKEN_ENCODING.encode(text))
|
||||||
|
except Exception as e:
|
||||||
|
if self.valves.debug_mode:
|
||||||
|
logger.warning(
|
||||||
|
f"[Token Count] tiktoken error: {e}, falling back to character estimation"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Fallback strategy: Rough estimation (1 token ≈ 4 chars)
|
||||||
|
return len(text) // 4
|
||||||
|
|
||||||
def _calculate_messages_tokens(self, messages: List[Dict]) -> int:
|
def _calculate_messages_tokens(self, messages: List[Dict]) -> int:
|
||||||
"""Calculates the total tokens for a list of messages."""
|
"""Calculates the total tokens for a list of messages."""
|
||||||
@@ -928,20 +693,6 @@ class Filter:
|
|||||||
|
|
||||||
return total_tokens
|
return total_tokens
|
||||||
|
|
||||||
def _estimate_messages_tokens(self, messages: List[Dict]) -> int:
|
|
||||||
"""Fast estimation of tokens based on character count (1/4 ratio)."""
|
|
||||||
total_chars = 0
|
|
||||||
for msg in messages:
|
|
||||||
content = msg.get("content", "")
|
|
||||||
if isinstance(content, list):
|
|
||||||
for part in content:
|
|
||||||
if isinstance(part, dict) and part.get("type") == "text":
|
|
||||||
total_chars += len(part.get("text", ""))
|
|
||||||
else:
|
|
||||||
total_chars += len(str(content))
|
|
||||||
|
|
||||||
return total_chars // 4
|
|
||||||
|
|
||||||
def _get_model_thresholds(self, model_id: str) -> Dict[str, int]:
|
def _get_model_thresholds(self, model_id: str) -> Dict[str, int]:
|
||||||
"""Gets threshold configuration for a specific model.
|
"""Gets threshold configuration for a specific model.
|
||||||
|
|
||||||
@@ -1079,13 +830,11 @@ class Filter:
|
|||||||
}})();
|
}})();
|
||||||
"""
|
"""
|
||||||
|
|
||||||
asyncio.create_task(
|
await __event_call__(
|
||||||
__event_call__(
|
{
|
||||||
{
|
"type": "execute",
|
||||||
"type": "execute",
|
"data": {"code": js_code},
|
||||||
"data": {"code": js_code},
|
}
|
||||||
}
|
|
||||||
)
|
|
||||||
)
|
)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Error emitting debug log: {e}")
|
logger.error(f"Error emitting debug log: {e}")
|
||||||
@@ -1127,55 +876,17 @@ class Filter:
|
|||||||
js_code = f"""
|
js_code = f"""
|
||||||
console.log("%c[Compression] {safe_message}", "{css}");
|
console.log("%c[Compression] {safe_message}", "{css}");
|
||||||
"""
|
"""
|
||||||
asyncio.create_task(
|
# Add timeout to prevent blocking if frontend connection is broken
|
||||||
event_call({"type": "execute", "data": {"code": js_code}})
|
await asyncio.wait_for(
|
||||||
|
event_call({"type": "execute", "data": {"code": js_code}}),
|
||||||
|
timeout=2.0,
|
||||||
|
)
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
logger.warning(
|
||||||
|
f"Failed to emit log to frontend: Timeout (connection may be broken)"
|
||||||
)
|
)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(
|
logger.error(f"Failed to emit log to frontend: {type(e).__name__}: {e}")
|
||||||
f"Failed to process log to frontend: {type(e).__name__}: {e}"
|
|
||||||
)
|
|
||||||
|
|
||||||
def _should_show_status(self, usage_ratio: float) -> bool:
|
|
||||||
"""
|
|
||||||
Check if token usage status should be shown based on threshold.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
usage_ratio: Current usage ratio (0.0 to 1.0)
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
True if status should be shown, False otherwise
|
|
||||||
"""
|
|
||||||
if not self.valves.show_token_usage_status:
|
|
||||||
return False
|
|
||||||
|
|
||||||
# If threshold is 0, always show
|
|
||||||
if self.valves.token_usage_status_threshold == 0:
|
|
||||||
return True
|
|
||||||
|
|
||||||
# Check if usage exceeds threshold
|
|
||||||
threshold_ratio = self.valves.token_usage_status_threshold / 100.0
|
|
||||||
return usage_ratio >= threshold_ratio
|
|
||||||
|
|
||||||
def _should_skip_compression(
|
|
||||||
self, body: dict, __model__: Optional[dict] = None
|
|
||||||
) -> bool:
|
|
||||||
"""
|
|
||||||
Check if compression should be skipped.
|
|
||||||
Returns True if:
|
|
||||||
1. The base model includes 'copilot_sdk'
|
|
||||||
"""
|
|
||||||
# Check if base model includes copilot_sdk
|
|
||||||
if __model__:
|
|
||||||
base_model_id = __model__.get("base_model_id", "")
|
|
||||||
if "copilot_sdk" in base_model_id.lower():
|
|
||||||
return True
|
|
||||||
|
|
||||||
# Also check model in body
|
|
||||||
model_id = body.get("model", "")
|
|
||||||
if "copilot_sdk" in model_id.lower():
|
|
||||||
return True
|
|
||||||
|
|
||||||
return False
|
|
||||||
|
|
||||||
async def inlet(
|
async def inlet(
|
||||||
self,
|
self,
|
||||||
@@ -1192,19 +903,6 @@ class Filter:
|
|||||||
Compression Strategy: Only responsible for injecting existing summaries, no Token calculation.
|
Compression Strategy: Only responsible for injecting existing summaries, no Token calculation.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
# Check if compression should be skipped (e.g., for copilot_sdk)
|
|
||||||
if self._should_skip_compression(body, __model__):
|
|
||||||
if self.valves.debug_mode:
|
|
||||||
logger.info(
|
|
||||||
"[Inlet] Skipping compression: copilot_sdk detected in base model"
|
|
||||||
)
|
|
||||||
if self.valves.show_debug_log and __event_call__:
|
|
||||||
await self._log(
|
|
||||||
"[Inlet] ⏭️ Skipping compression: copilot_sdk detected",
|
|
||||||
event_call=__event_call__,
|
|
||||||
)
|
|
||||||
return body
|
|
||||||
|
|
||||||
messages = body.get("messages", [])
|
messages = body.get("messages", [])
|
||||||
|
|
||||||
# --- Native Tool Output Trimming (Opt-in, only for native function calling) ---
|
# --- Native Tool Output Trimming (Opt-in, only for native function calling) ---
|
||||||
@@ -1268,14 +966,8 @@ class Filter:
|
|||||||
final_answer = content[last_match_end:].strip()
|
final_answer = content[last_match_end:].strip()
|
||||||
|
|
||||||
if final_answer:
|
if final_answer:
|
||||||
msg["content"] = self._get_translation(
|
msg["content"] = (
|
||||||
(
|
f"... [Tool outputs trimmed]\n{final_answer}"
|
||||||
__user__.get("language", "en-US")
|
|
||||||
if __user__
|
|
||||||
else "en-US"
|
|
||||||
),
|
|
||||||
"tool_trimmed",
|
|
||||||
content=final_answer,
|
|
||||||
)
|
)
|
||||||
trimmed_count += 1
|
trimmed_count += 1
|
||||||
else:
|
else:
|
||||||
@@ -1288,14 +980,8 @@ class Filter:
|
|||||||
if len(parts) > 1:
|
if len(parts) > 1:
|
||||||
final_answer = parts[-1].strip()
|
final_answer = parts[-1].strip()
|
||||||
if final_answer:
|
if final_answer:
|
||||||
msg["content"] = self._get_translation(
|
msg["content"] = (
|
||||||
(
|
f"... [Tool outputs trimmed]\n{final_answer}"
|
||||||
__user__.get("language", "en-US")
|
|
||||||
if __user__
|
|
||||||
else "en-US"
|
|
||||||
),
|
|
||||||
"tool_trimmed",
|
|
||||||
content=final_answer,
|
|
||||||
)
|
)
|
||||||
trimmed_count += 1
|
trimmed_count += 1
|
||||||
|
|
||||||
@@ -1487,10 +1173,6 @@ class Filter:
|
|||||||
# Target is to compress up to the (total - keep_last) message
|
# Target is to compress up to the (total - keep_last) message
|
||||||
target_compressed_count = max(0, len(messages) - self.valves.keep_last)
|
target_compressed_count = max(0, len(messages) - self.valves.keep_last)
|
||||||
|
|
||||||
# Get user context for i18n
|
|
||||||
user_ctx = await self._get_user_context(__user__, __event_call__)
|
|
||||||
lang = user_ctx["user_language"]
|
|
||||||
|
|
||||||
await self._log(
|
await self._log(
|
||||||
f"[Inlet] Recorded target compression progress: {target_compressed_count}",
|
f"[Inlet] Recorded target compression progress: {target_compressed_count}",
|
||||||
event_call=__event_call__,
|
event_call=__event_call__,
|
||||||
@@ -1525,9 +1207,10 @@ class Filter:
|
|||||||
|
|
||||||
# 2. Summary message (Inserted as Assistant message)
|
# 2. Summary message (Inserted as Assistant message)
|
||||||
summary_content = (
|
summary_content = (
|
||||||
self._get_translation(lang, "summary_prompt_prefix")
|
f"【Previous Summary: The following is a summary of the historical conversation, provided for context only. Do not reply to the summary content itself; answer the subsequent latest questions directly.】\n\n"
|
||||||
+ f"{summary_record.summary}"
|
f"{summary_record.summary}\n\n"
|
||||||
+ self._get_translation(lang, "summary_prompt_suffix")
|
f"---\n"
|
||||||
|
f"Below is the recent conversation:"
|
||||||
)
|
)
|
||||||
summary_msg = {"role": "assistant", "content": summary_content}
|
summary_msg = {"role": "assistant", "content": summary_content}
|
||||||
|
|
||||||
@@ -1566,27 +1249,16 @@ class Filter:
|
|||||||
"max_context_tokens", self.valves.max_context_tokens
|
"max_context_tokens", self.valves.max_context_tokens
|
||||||
)
|
)
|
||||||
|
|
||||||
# --- Fast Estimation Check ---
|
# Calculate total tokens
|
||||||
estimated_tokens = self._estimate_messages_tokens(calc_messages)
|
total_tokens = await asyncio.to_thread(
|
||||||
|
self._calculate_messages_tokens, calc_messages
|
||||||
|
)
|
||||||
|
|
||||||
# Since this is a hard limit check, only skip precise calculation if we are far below it (margin of 15%)
|
# Preflight Check Log
|
||||||
if estimated_tokens < max_context_tokens * 0.85:
|
await self._log(
|
||||||
total_tokens = estimated_tokens
|
f"[Inlet] 🔎 Preflight Check: {total_tokens}t / {max_context_tokens}t ({(total_tokens/max_context_tokens*100):.1f}%)",
|
||||||
await self._log(
|
event_call=__event_call__,
|
||||||
f"[Inlet] 🔎 Fast Preflight Check (Est): {total_tokens}t / {max_context_tokens}t (Well within limit)",
|
)
|
||||||
event_call=__event_call__,
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
# Calculate exact total tokens via tiktoken
|
|
||||||
total_tokens = await asyncio.to_thread(
|
|
||||||
self._calculate_messages_tokens, calc_messages
|
|
||||||
)
|
|
||||||
|
|
||||||
# Preflight Check Log
|
|
||||||
await self._log(
|
|
||||||
f"[Inlet] 🔎 Precise Preflight Check: {total_tokens}t / {max_context_tokens}t ({(total_tokens/max_context_tokens*100):.1f}%)",
|
|
||||||
event_call=__event_call__,
|
|
||||||
)
|
|
||||||
|
|
||||||
# If over budget, reduce history (Keep Last)
|
# If over budget, reduce history (Keep Last)
|
||||||
if total_tokens > max_context_tokens:
|
if total_tokens > max_context_tokens:
|
||||||
@@ -1653,9 +1325,7 @@ class Filter:
|
|||||||
first_line_found = True
|
first_line_found = True
|
||||||
# Add placeholder if there's more content coming
|
# Add placeholder if there's more content coming
|
||||||
if idx < last_line_idx:
|
if idx < last_line_idx:
|
||||||
kept_lines.append(
|
kept_lines.append("\n... [Content collapsed] ...\n")
|
||||||
self._get_translation(lang, "content_collapsed")
|
|
||||||
)
|
|
||||||
continue
|
continue
|
||||||
|
|
||||||
# Keep last non-empty line
|
# Keep last non-empty line
|
||||||
@@ -1677,13 +1347,8 @@ class Filter:
|
|||||||
target_msg["metadata"]["is_trimmed"] = True
|
target_msg["metadata"]["is_trimmed"] = True
|
||||||
|
|
||||||
# Calculate token reduction
|
# Calculate token reduction
|
||||||
# Use current token strategy
|
old_tokens = self._count_tokens(content)
|
||||||
if total_tokens == estimated_tokens:
|
new_tokens = self._count_tokens(target_msg["content"])
|
||||||
old_tokens = len(content) // 4
|
|
||||||
new_tokens = len(target_msg["content"]) // 4
|
|
||||||
else:
|
|
||||||
old_tokens = self._count_tokens(content)
|
|
||||||
new_tokens = self._count_tokens(target_msg["content"])
|
|
||||||
diff = old_tokens - new_tokens
|
diff = old_tokens - new_tokens
|
||||||
total_tokens -= diff
|
total_tokens -= diff
|
||||||
|
|
||||||
@@ -1697,12 +1362,7 @@ class Filter:
|
|||||||
# Strategy 2: Fallback - Drop Oldest Message Entirely (FIFO)
|
# Strategy 2: Fallback - Drop Oldest Message Entirely (FIFO)
|
||||||
# (User requested to remove progressive trimming for other cases)
|
# (User requested to remove progressive trimming for other cases)
|
||||||
dropped = tail_messages.pop(0)
|
dropped = tail_messages.pop(0)
|
||||||
if total_tokens == estimated_tokens:
|
dropped_tokens = self._count_tokens(str(dropped.get("content", "")))
|
||||||
dropped_tokens = len(str(dropped.get("content", ""))) // 4
|
|
||||||
else:
|
|
||||||
dropped_tokens = self._count_tokens(
|
|
||||||
str(dropped.get("content", ""))
|
|
||||||
)
|
|
||||||
total_tokens -= dropped_tokens
|
total_tokens -= dropped_tokens
|
||||||
|
|
||||||
if self.valves.show_debug_log and __event_call__:
|
if self.valves.show_debug_log and __event_call__:
|
||||||
@@ -1722,24 +1382,14 @@ class Filter:
|
|||||||
final_messages = candidate_messages
|
final_messages = candidate_messages
|
||||||
|
|
||||||
# Calculate detailed token stats for logging
|
# Calculate detailed token stats for logging
|
||||||
if total_tokens == estimated_tokens:
|
system_tokens = (
|
||||||
system_tokens = (
|
self._count_tokens(system_prompt_msg.get("content", ""))
|
||||||
len(system_prompt_msg.get("content", "")) // 4
|
if system_prompt_msg
|
||||||
if system_prompt_msg
|
else 0
|
||||||
else 0
|
)
|
||||||
)
|
head_tokens = self._calculate_messages_tokens(head_messages)
|
||||||
head_tokens = self._estimate_messages_tokens(head_messages)
|
summary_tokens = self._count_tokens(summary_content)
|
||||||
summary_tokens = len(summary_content) // 4
|
tail_tokens = self._calculate_messages_tokens(tail_messages)
|
||||||
tail_tokens = self._estimate_messages_tokens(tail_messages)
|
|
||||||
else:
|
|
||||||
system_tokens = (
|
|
||||||
self._count_tokens(system_prompt_msg.get("content", ""))
|
|
||||||
if system_prompt_msg
|
|
||||||
else 0
|
|
||||||
)
|
|
||||||
head_tokens = self._calculate_messages_tokens(head_messages)
|
|
||||||
summary_tokens = self._count_tokens(summary_content)
|
|
||||||
tail_tokens = self._calculate_messages_tokens(tail_messages)
|
|
||||||
|
|
||||||
system_info = (
|
system_info = (
|
||||||
f"System({system_tokens}t)" if system_prompt_msg else "System(0t)"
|
f"System({system_tokens}t)" if system_prompt_msg else "System(0t)"
|
||||||
@@ -1758,43 +1408,22 @@ class Filter:
|
|||||||
# Prepare status message (Context Usage format)
|
# Prepare status message (Context Usage format)
|
||||||
if max_context_tokens > 0:
|
if max_context_tokens > 0:
|
||||||
usage_ratio = total_section_tokens / max_context_tokens
|
usage_ratio = total_section_tokens / max_context_tokens
|
||||||
# Only show status if threshold is met
|
status_msg = f"Context Usage (Estimated): {total_section_tokens} / {max_context_tokens} Tokens ({usage_ratio*100:.1f}%)"
|
||||||
if self._should_show_status(usage_ratio):
|
if usage_ratio > 0.9:
|
||||||
status_msg = self._get_translation(
|
status_msg += " | ⚠️ High Usage"
|
||||||
lang,
|
|
||||||
"status_context_usage",
|
|
||||||
tokens=total_section_tokens,
|
|
||||||
max_tokens=max_context_tokens,
|
|
||||||
ratio=f"{usage_ratio*100:.1f}",
|
|
||||||
)
|
|
||||||
if usage_ratio > 0.9:
|
|
||||||
status_msg += self._get_translation(lang, "status_high_usage")
|
|
||||||
|
|
||||||
if __event_emitter__:
|
|
||||||
await __event_emitter__(
|
|
||||||
{
|
|
||||||
"type": "status",
|
|
||||||
"data": {
|
|
||||||
"description": status_msg,
|
|
||||||
"done": True,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
)
|
|
||||||
else:
|
else:
|
||||||
# For the case where max_context_tokens is 0, show summary info without threshold check
|
status_msg = f"Loaded historical summary (Hidden {compressed_count} historical messages)"
|
||||||
if self.valves.show_token_usage_status and __event_emitter__:
|
|
||||||
status_msg = self._get_translation(
|
if __event_emitter__:
|
||||||
lang, "status_loaded_summary", count=compressed_count
|
await __event_emitter__(
|
||||||
)
|
{
|
||||||
await __event_emitter__(
|
"type": "status",
|
||||||
{
|
"data": {
|
||||||
"type": "status",
|
"description": status_msg,
|
||||||
"data": {
|
"done": True,
|
||||||
"description": status_msg,
|
},
|
||||||
"done": True,
|
}
|
||||||
},
|
)
|
||||||
}
|
|
||||||
)
|
|
||||||
|
|
||||||
# Emit debug log to frontend (Keep the structured log as well)
|
# Emit debug log to frontend (Keep the structured log as well)
|
||||||
await self._emit_debug_log(
|
await self._emit_debug_log(
|
||||||
@@ -1825,20 +1454,9 @@ class Filter:
|
|||||||
"max_context_tokens", self.valves.max_context_tokens
|
"max_context_tokens", self.valves.max_context_tokens
|
||||||
)
|
)
|
||||||
|
|
||||||
# --- Fast Estimation Check ---
|
total_tokens = await asyncio.to_thread(
|
||||||
estimated_tokens = self._estimate_messages_tokens(calc_messages)
|
self._calculate_messages_tokens, calc_messages
|
||||||
|
)
|
||||||
# Only skip precise calculation if we are clearly below the limit
|
|
||||||
if estimated_tokens < max_context_tokens * 0.85:
|
|
||||||
total_tokens = estimated_tokens
|
|
||||||
await self._log(
|
|
||||||
f"[Inlet] 🔎 Fast limit check (Est): {total_tokens}t / {max_context_tokens}t",
|
|
||||||
event_call=__event_call__,
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
total_tokens = await asyncio.to_thread(
|
|
||||||
self._calculate_messages_tokens, calc_messages
|
|
||||||
)
|
|
||||||
|
|
||||||
if total_tokens > max_context_tokens:
|
if total_tokens > max_context_tokens:
|
||||||
await self._log(
|
await self._log(
|
||||||
@@ -1858,12 +1476,7 @@ class Filter:
|
|||||||
> start_trim_index + 1 # Keep at least 1 message after keep_first
|
> start_trim_index + 1 # Keep at least 1 message after keep_first
|
||||||
):
|
):
|
||||||
dropped = final_messages.pop(start_trim_index)
|
dropped = final_messages.pop(start_trim_index)
|
||||||
if total_tokens == estimated_tokens:
|
dropped_tokens = self._count_tokens(str(dropped.get("content", "")))
|
||||||
dropped_tokens = len(str(dropped.get("content", ""))) // 4
|
|
||||||
else:
|
|
||||||
dropped_tokens = self._count_tokens(
|
|
||||||
str(dropped.get("content", ""))
|
|
||||||
)
|
|
||||||
total_tokens -= dropped_tokens
|
total_tokens -= dropped_tokens
|
||||||
|
|
||||||
await self._log(
|
await self._log(
|
||||||
@@ -1872,30 +1485,23 @@ class Filter:
|
|||||||
)
|
)
|
||||||
|
|
||||||
# Send status notification (Context Usage format)
|
# Send status notification (Context Usage format)
|
||||||
if max_context_tokens > 0:
|
if __event_emitter__:
|
||||||
usage_ratio = total_tokens / max_context_tokens
|
status_msg = f"Context Usage (Estimated): {total_tokens} / {max_context_tokens} Tokens"
|
||||||
# Only show status if threshold is met
|
if max_context_tokens > 0:
|
||||||
if self._should_show_status(usage_ratio):
|
usage_ratio = total_tokens / max_context_tokens
|
||||||
status_msg = self._get_translation(
|
status_msg += f" ({usage_ratio*100:.1f}%)"
|
||||||
lang,
|
|
||||||
"status_context_usage",
|
|
||||||
tokens=total_tokens,
|
|
||||||
max_tokens=max_context_tokens,
|
|
||||||
ratio=f"{usage_ratio*100:.1f}",
|
|
||||||
)
|
|
||||||
if usage_ratio > 0.9:
|
if usage_ratio > 0.9:
|
||||||
status_msg += self._get_translation(lang, "status_high_usage")
|
status_msg += " | ⚠️ High Usage"
|
||||||
|
|
||||||
if __event_emitter__:
|
await __event_emitter__(
|
||||||
await __event_emitter__(
|
{
|
||||||
{
|
"type": "status",
|
||||||
"type": "status",
|
"data": {
|
||||||
"data": {
|
"description": status_msg,
|
||||||
"description": status_msg,
|
"done": True,
|
||||||
"done": True,
|
},
|
||||||
},
|
}
|
||||||
}
|
)
|
||||||
)
|
|
||||||
|
|
||||||
body["messages"] = final_messages
|
body["messages"] = final_messages
|
||||||
|
|
||||||
@@ -1911,7 +1517,6 @@ class Filter:
|
|||||||
body: dict,
|
body: dict,
|
||||||
__user__: Optional[dict] = None,
|
__user__: Optional[dict] = None,
|
||||||
__metadata__: dict = None,
|
__metadata__: dict = None,
|
||||||
__model__: dict = None,
|
|
||||||
__event_emitter__: Callable[[Any], Awaitable[None]] = None,
|
__event_emitter__: Callable[[Any], Awaitable[None]] = None,
|
||||||
__event_call__: Callable[[Any], Awaitable[None]] = None,
|
__event_call__: Callable[[Any], Awaitable[None]] = None,
|
||||||
) -> dict:
|
) -> dict:
|
||||||
@@ -1919,23 +1524,6 @@ class Filter:
|
|||||||
Executed after the LLM response is complete.
|
Executed after the LLM response is complete.
|
||||||
Calculates Token count in the background and triggers summary generation (does not block current response, does not affect content output).
|
Calculates Token count in the background and triggers summary generation (does not block current response, does not affect content output).
|
||||||
"""
|
"""
|
||||||
# Check if compression should be skipped (e.g., for copilot_sdk)
|
|
||||||
if self._should_skip_compression(body, __model__):
|
|
||||||
if self.valves.debug_mode:
|
|
||||||
logger.info(
|
|
||||||
"[Outlet] Skipping compression: copilot_sdk detected in base model"
|
|
||||||
)
|
|
||||||
if self.valves.show_debug_log and __event_call__:
|
|
||||||
await self._log(
|
|
||||||
"[Outlet] ⏭️ Skipping compression: copilot_sdk detected",
|
|
||||||
event_call=__event_call__,
|
|
||||||
)
|
|
||||||
return body
|
|
||||||
|
|
||||||
# Get user context for i18n
|
|
||||||
user_ctx = await self._get_user_context(__user__, __event_call__)
|
|
||||||
lang = user_ctx["user_language"]
|
|
||||||
|
|
||||||
chat_ctx = self._get_chat_context(body, __metadata__)
|
chat_ctx = self._get_chat_context(body, __metadata__)
|
||||||
chat_id = chat_ctx["chat_id"]
|
chat_id = chat_ctx["chat_id"]
|
||||||
if not chat_id:
|
if not chat_id:
|
||||||
@@ -1959,7 +1547,6 @@ class Filter:
|
|||||||
body,
|
body,
|
||||||
__user__,
|
__user__,
|
||||||
target_compressed_count,
|
target_compressed_count,
|
||||||
lang,
|
|
||||||
__event_emitter__,
|
__event_emitter__,
|
||||||
__event_call__,
|
__event_call__,
|
||||||
)
|
)
|
||||||
@@ -1974,7 +1561,6 @@ class Filter:
|
|||||||
body: dict,
|
body: dict,
|
||||||
user_data: Optional[dict],
|
user_data: Optional[dict],
|
||||||
target_compressed_count: Optional[int],
|
target_compressed_count: Optional[int],
|
||||||
lang: str = "en-US",
|
|
||||||
__event_emitter__: Callable[[Any], Awaitable[None]] = None,
|
__event_emitter__: Callable[[Any], Awaitable[None]] = None,
|
||||||
__event_call__: Callable[[Any], Awaitable[None]] = None,
|
__event_call__: Callable[[Any], Awaitable[None]] = None,
|
||||||
):
|
):
|
||||||
@@ -2009,58 +1595,37 @@ class Filter:
|
|||||||
event_call=__event_call__,
|
event_call=__event_call__,
|
||||||
)
|
)
|
||||||
|
|
||||||
# --- Fast Estimation Check ---
|
# Calculate Token count in a background thread
|
||||||
estimated_tokens = self._estimate_messages_tokens(messages)
|
current_tokens = await asyncio.to_thread(
|
||||||
|
self._calculate_messages_tokens, messages
|
||||||
|
)
|
||||||
|
|
||||||
# For triggering summary generation, we need to be more precise if we are in the grey zone
|
await self._log(
|
||||||
# Margin is 15% (skip tiktoken if estimated is < 85% of threshold)
|
f"[🔍 Background Calculation] Token count: {current_tokens}",
|
||||||
# Note: We still use tiktoken if we exceed threshold, because we want an accurate usage status report
|
event_call=__event_call__,
|
||||||
if estimated_tokens < compression_threshold_tokens * 0.85:
|
)
|
||||||
current_tokens = estimated_tokens
|
|
||||||
await self._log(
|
|
||||||
f"[🔍 Background Calculation] Fast estimate ({current_tokens}) is well below threshold ({compression_threshold_tokens}). Skipping tiktoken.",
|
|
||||||
event_call=__event_call__,
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
# Calculate Token count precisely in a background thread
|
|
||||||
current_tokens = await asyncio.to_thread(
|
|
||||||
self._calculate_messages_tokens, messages
|
|
||||||
)
|
|
||||||
await self._log(
|
|
||||||
f"[🔍 Background Calculation] Precise token count: {current_tokens}",
|
|
||||||
event_call=__event_call__,
|
|
||||||
)
|
|
||||||
|
|
||||||
# Send status notification (Context Usage format)
|
# Send status notification (Context Usage format)
|
||||||
if __event_emitter__:
|
if __event_emitter__ and self.valves.show_token_usage_status:
|
||||||
max_context_tokens = thresholds.get(
|
max_context_tokens = thresholds.get(
|
||||||
"max_context_tokens", self.valves.max_context_tokens
|
"max_context_tokens", self.valves.max_context_tokens
|
||||||
)
|
)
|
||||||
|
status_msg = f"Context Usage (Estimated): {current_tokens} / {max_context_tokens} Tokens"
|
||||||
if max_context_tokens > 0:
|
if max_context_tokens > 0:
|
||||||
usage_ratio = current_tokens / max_context_tokens
|
usage_ratio = current_tokens / max_context_tokens
|
||||||
# Only show status if threshold is met
|
status_msg += f" ({usage_ratio*100:.1f}%)"
|
||||||
if self._should_show_status(usage_ratio):
|
if usage_ratio > 0.9:
|
||||||
status_msg = self._get_translation(
|
status_msg += " | ⚠️ High Usage"
|
||||||
lang,
|
|
||||||
"status_context_usage",
|
|
||||||
tokens=current_tokens,
|
|
||||||
max_tokens=max_context_tokens,
|
|
||||||
ratio=f"{usage_ratio*100:.1f}",
|
|
||||||
)
|
|
||||||
if usage_ratio > 0.9:
|
|
||||||
status_msg += self._get_translation(
|
|
||||||
lang, "status_high_usage"
|
|
||||||
)
|
|
||||||
|
|
||||||
await __event_emitter__(
|
await __event_emitter__(
|
||||||
{
|
{
|
||||||
"type": "status",
|
"type": "status",
|
||||||
"data": {
|
"data": {
|
||||||
"description": status_msg,
|
"description": status_msg,
|
||||||
"done": True,
|
"done": True,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
)
|
)
|
||||||
|
|
||||||
# Check if compression is needed
|
# Check if compression is needed
|
||||||
if current_tokens >= compression_threshold_tokens:
|
if current_tokens >= compression_threshold_tokens:
|
||||||
@@ -2077,7 +1642,6 @@ class Filter:
|
|||||||
body,
|
body,
|
||||||
user_data,
|
user_data,
|
||||||
target_compressed_count,
|
target_compressed_count,
|
||||||
lang,
|
|
||||||
__event_emitter__,
|
__event_emitter__,
|
||||||
__event_call__,
|
__event_call__,
|
||||||
)
|
)
|
||||||
@@ -2108,7 +1672,6 @@ class Filter:
|
|||||||
body: dict,
|
body: dict,
|
||||||
user_data: Optional[dict],
|
user_data: Optional[dict],
|
||||||
target_compressed_count: Optional[int],
|
target_compressed_count: Optional[int],
|
||||||
lang: str = "en-US",
|
|
||||||
__event_emitter__: Callable[[Any], Awaitable[None]] = None,
|
__event_emitter__: Callable[[Any], Awaitable[None]] = None,
|
||||||
__event_call__: Callable[[Any], Awaitable[None]] = None,
|
__event_call__: Callable[[Any], Awaitable[None]] = None,
|
||||||
):
|
):
|
||||||
@@ -2248,9 +1811,7 @@ class Filter:
|
|||||||
{
|
{
|
||||||
"type": "status",
|
"type": "status",
|
||||||
"data": {
|
"data": {
|
||||||
"description": self._get_translation(
|
"description": "Generating context summary in background...",
|
||||||
lang, "status_generating_summary"
|
|
||||||
),
|
|
||||||
"done": False,
|
"done": False,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
@@ -2288,11 +1849,7 @@ class Filter:
|
|||||||
{
|
{
|
||||||
"type": "status",
|
"type": "status",
|
||||||
"data": {
|
"data": {
|
||||||
"description": self._get_translation(
|
"description": f"Context summary updated (Compressed {len(middle_messages)} messages)",
|
||||||
lang,
|
|
||||||
"status_loaded_summary",
|
|
||||||
count=len(middle_messages),
|
|
||||||
),
|
|
||||||
"done": True,
|
"done": True,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
@@ -2353,9 +1910,10 @@ class Filter:
|
|||||||
|
|
||||||
# Summary
|
# Summary
|
||||||
summary_content = (
|
summary_content = (
|
||||||
self._get_translation(lang, "summary_prompt_prefix")
|
f"【System Prompt: The following is a summary of the historical conversation, provided for context only. Do not reply to the summary content itself; answer the subsequent latest questions directly.】\n\n"
|
||||||
+ f"{new_summary}"
|
f"{new_summary}\n\n"
|
||||||
+ self._get_translation(lang, "summary_prompt_suffix")
|
f"---\n"
|
||||||
|
f"Below is the recent conversation:"
|
||||||
)
|
)
|
||||||
summary_msg = {"role": "assistant", "content": summary_content}
|
summary_msg = {"role": "assistant", "content": summary_content}
|
||||||
|
|
||||||
@@ -2385,32 +1943,23 @@ class Filter:
|
|||||||
max_context_tokens = thresholds.get(
|
max_context_tokens = thresholds.get(
|
||||||
"max_context_tokens", self.valves.max_context_tokens
|
"max_context_tokens", self.valves.max_context_tokens
|
||||||
)
|
)
|
||||||
# 6. Emit Status (only if threshold is met)
|
# 6. Emit Status
|
||||||
|
status_msg = f"Context Summary Updated: {token_count} / {max_context_tokens} Tokens"
|
||||||
if max_context_tokens > 0:
|
if max_context_tokens > 0:
|
||||||
usage_ratio = token_count / max_context_tokens
|
ratio = (token_count / max_context_tokens) * 100
|
||||||
# Only show status if threshold is met
|
status_msg += f" ({ratio:.1f}%)"
|
||||||
if self._should_show_status(usage_ratio):
|
if ratio > 90.0:
|
||||||
status_msg = self._get_translation(
|
status_msg += " | ⚠️ High Usage"
|
||||||
lang,
|
|
||||||
"status_context_summary_updated",
|
|
||||||
tokens=token_count,
|
|
||||||
max_tokens=max_context_tokens,
|
|
||||||
ratio=f"{usage_ratio*100:.1f}",
|
|
||||||
)
|
|
||||||
if usage_ratio > 0.9:
|
|
||||||
status_msg += self._get_translation(
|
|
||||||
lang, "status_high_usage"
|
|
||||||
)
|
|
||||||
|
|
||||||
await __event_emitter__(
|
await __event_emitter__(
|
||||||
{
|
{
|
||||||
"type": "status",
|
"type": "status",
|
||||||
"data": {
|
"data": {
|
||||||
"description": status_msg,
|
"description": status_msg,
|
||||||
"done": True,
|
"done": True,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
)
|
)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
await self._log(
|
await self._log(
|
||||||
f"[Status] Error calculating tokens: {e}",
|
f"[Status] Error calculating tokens: {e}",
|
||||||
@@ -2430,9 +1979,7 @@ class Filter:
|
|||||||
{
|
{
|
||||||
"type": "status",
|
"type": "status",
|
||||||
"data": {
|
"data": {
|
||||||
"description": self._get_translation(
|
"description": f"Summary Error: {str(e)[:100]}...",
|
||||||
lang, "status_summary_error", error=str(e)[:100]
|
|
||||||
),
|
|
||||||
"done": True,
|
"done": True,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user