feat(filters): release v1.3.0 for async context compression
- Add native i18n support across 9 languages - Implement non-blocking frontend log emission for zero TTFB delay - Add token_usage_status_threshold to intelligently control status notifications - Automatically detect and skip compression for copilot_sdk models - Set debug_mode default to false for a quieter production environment - Update documentation and remove legacy bilingual code
This commit is contained in:
@@ -1,137 +1,81 @@
|
||||
# Async Context Compression
|
||||
# Async Context Compression Filter
|
||||
|
||||
<span class="category-badge filter">Filter</span>
|
||||
<span class="version-badge">v1.2.2</span>
|
||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **Version:** 1.3.0 | **Project:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **License:** MIT
|
||||
|
||||
Reduces token consumption in long conversations through intelligent summarization while maintaining conversational coherence.
|
||||
This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.
|
||||
|
||||
## What's new in 1.3.0
|
||||
|
||||
- **Internationalization (i18n)**: Complete localization of user-facing messages across 9 languages (English, Chinese, Japanese, Korean, French, German, Spanish, Italian).
|
||||
- **Smart Status Display**: Added `token_usage_status_threshold` valve (default 80%) to intelligently control when token usage status is shown.
|
||||
- **Improved Performance**: Frontend language detection and logging are optimized to be completely non-blocking, maintaining lightning-fast TTFB.
|
||||
- **Copilot SDK Integration**: Automatically detects and skips compression for copilot_sdk based models to prevent conflicts.
|
||||
- **Configuration**: `debug_mode` is now set to `false` by default for a quieter production experience.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
## Core Features
|
||||
|
||||
The Async Context Compression filter helps manage token usage in long conversations by:
|
||||
|
||||
- Intelligently summarizing older messages
|
||||
- Preserving important context
|
||||
- Reducing API costs
|
||||
- Maintaining conversation coherence
|
||||
|
||||
This is especially useful for:
|
||||
|
||||
- Long-running conversations
|
||||
- Complex multi-turn discussions
|
||||
- Cost optimization
|
||||
- Token limit management
|
||||
|
||||
## Features
|
||||
|
||||
- :material-arrow-collapse-vertical: **Smart Compression**: AI-powered context summarization
|
||||
- :material-clock-fast: **Async Processing**: Non-blocking background compression
|
||||
- :material-memory: **Context Preservation**: Keeps important information
|
||||
- :material-currency-usd-off: **Cost Reduction**: Minimize token usage
|
||||
- :material-console: **Frontend Debugging**: Debug logs in browser console
|
||||
- :material-alert-circle-check: **Enhanced Error Reporting**: Clear error status notifications
|
||||
- :material-check-all: **Open WebUI v0.7.x Compatibility**: Dynamic DB session handling
|
||||
- :material-account-convert: **Improved Compatibility**: Summary role changed to `assistant`
|
||||
- :material-shield-check: **Enhanced Stability**: Resolved race conditions in state management
|
||||
- :material-ruler: **Preflight Context Check**: Validates context fit before sending
|
||||
- :material-format-align-justify: **Structure-Aware Trimming**: Preserves document structure
|
||||
- :material-content-cut: **Native Tool Output Trimming**: Trims verbose tool outputs (Note: Non-native tool outputs are not fully injected into context)
|
||||
- :material-chart-bar: **Detailed Token Logging**: Granular token breakdown
|
||||
- :material-account-search: **Smart Model Matching**: Inherit config from base models
|
||||
- :material-image-off: **Multimodal Support**: Images are preserved but tokens are **NOT** calculated
|
||||
- ✅ **Full i18n Support**: Native localization across 9 languages.
|
||||
- ✅ Automatic compression triggered by token thresholds.
|
||||
- ✅ Asynchronous summarization that does not block chat responses.
|
||||
- ✅ Persistent storage via Open WebUI's shared database connection (PostgreSQL, SQLite, etc.).
|
||||
- ✅ Flexible retention policy to keep the first and last N messages.
|
||||
- ✅ Smart injection of historical summaries back into the context.
|
||||
- ✅ Structure-aware trimming that preserves document structure (headers, intro, conclusion).
|
||||
- ✅ Native tool output trimming for cleaner context when using function calling.
|
||||
- ✅ Real-time context usage monitoring with warning notifications (>90%).
|
||||
- ✅ Detailed token logging for precise debugging and optimization.
|
||||
- ✅ **Smart Model Matching**: Automatically inherits configuration from base models for custom presets.
|
||||
- ⚠ **Multimodal Support**: Images are preserved but their tokens are **NOT** calculated. Please adjust thresholds accordingly.
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
## Installation & Configuration
|
||||
|
||||
1. Download the plugin file: [`async_context_compression.py`](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression)
|
||||
2. Upload to OpenWebUI: **Admin Panel** → **Settings** → **Functions**
|
||||
3. Configure compression settings
|
||||
4. Enable the filter
|
||||
### 1) Database (automatic)
|
||||
|
||||
- Uses Open WebUI's shared database connection; no extra configuration needed.
|
||||
- The `chat_summary` table is created on first run.
|
||||
|
||||
### 2) Filter order
|
||||
|
||||
- Recommended order: pre-filters (<10) → this filter (10) → post-filters (>10).
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
## Configuration Parameters
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[Incoming Messages] --> B{Token Count > Threshold?}
|
||||
B -->|No| C[Pass Through]
|
||||
B -->|Yes| D[Summarize Older Messages]
|
||||
D --> E[Preserve Recent Messages]
|
||||
E --> F[Combine Summary + Recent]
|
||||
F --> G[Send to LLM]
|
||||
```
|
||||
| Parameter | Default | Description |
|
||||
| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `priority` | `10` | Execution order; lower runs earlier. |
|
||||
| `compression_threshold_tokens` | `64000` | Trigger asynchronous summary when total tokens exceed this value. Set to 50%-70% of your model's context window. |
|
||||
| `max_context_tokens` | `128000` | Hard cap for context; older messages (except protected ones) are dropped if exceeded. |
|
||||
| `keep_first` | `1` | Always keep the first N messages (protects system prompts). |
|
||||
| `keep_last` | `6` | Always keep the last N messages to preserve recent context. |
|
||||
| `summary_model` | `None` | Model for summaries. Strongly recommended to set a fast, economical model (e.g., `gemini-2.5-flash`, `deepseek-v3`). Falls back to the current chat model when empty. |
|
||||
| `summary_model_max_context` | `0` | Max context tokens for the summary model. If 0, falls back to `model_thresholds` or global `max_context_tokens`. |
|
||||
| `max_summary_tokens` | `16384` | Maximum tokens for the generated summary. |
|
||||
| `summary_temperature` | `0.3` | Randomness for summary generation. Lower is more deterministic. |
|
||||
| `model_thresholds` | `{}` | Per-model overrides for `compression_threshold_tokens` and `max_context_tokens` (useful for mixed models). |
|
||||
| `enable_tool_output_trimming` | `false` | When enabled and `function_calling: "native"` is active, trims verbose tool outputs to extract only the final answer. |
|
||||
| `debug_mode` | `false` | Log verbose debug info. Set to `false` in production. |
|
||||
| `show_debug_log` | `false` | Print debug logs to browser console (F12). Useful for frontend debugging. |
|
||||
| `show_token_usage_status` | `true` | Show token usage status notification in the chat interface. |
|
||||
| `token_usage_status_threshold` | `80` | The minimum usage percentage (0-100) required to show a context usage status notification. |
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
## ⭐ Support
|
||||
|
||||
| Option | Type | Default | Description |
|
||||
|--------|------|---------|-------------|
|
||||
| `compression_threshold_tokens` | integer | `64000` | Trigger compression above this token count |
|
||||
| `max_context_tokens` | integer | `128000` | Hard limit for context |
|
||||
| `keep_first` | integer | `1` | Always keep the first N messages |
|
||||
| `keep_last` | integer | `6` | Always keep the last N messages |
|
||||
| `summary_model` | string | `None` | Model to use for summarization |
|
||||
| `summary_model_max_context` | integer | `0` | Max context tokens for summary model |
|
||||
| `max_summary_tokens` | integer | `16384` | Maximum tokens for the summary |
|
||||
| `enable_tool_output_trimming` | boolean | `false` | Enable trimming of large tool outputs |
|
||||
If this plugin has been useful, a star on [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) is a big motivation for me. Thank you for the support.
|
||||
|
||||
---
|
||||
## Troubleshooting ❓
|
||||
|
||||
## Example
|
||||
- **Initial system prompt is lost**: Keep `keep_first` greater than 0 to protect the initial message.
|
||||
- **Compression effect is weak**: Raise `compression_threshold_tokens` or lower `keep_first` / `keep_last` to allow more aggressive compression.
|
||||
- **Submit an Issue**: If you encounter any problems, please submit an issue on GitHub: [OpenWebUI Extensions Issues](https://github.com/Fu-Jie/openwebui-extensions/issues)
|
||||
|
||||
### Before Compression
|
||||
## Changelog
|
||||
|
||||
```
|
||||
[Message 1] User: Tell me about Python...
|
||||
[Message 2] AI: Python is a programming language...
|
||||
[Message 3] User: What about its history?
|
||||
[Message 4] AI: Python was created by Guido...
|
||||
[Message 5] User: And its features?
|
||||
[Message 6] AI: Python has many features...
|
||||
... (many more messages)
|
||||
[Message 20] User: Current question
|
||||
```
|
||||
|
||||
### After Compression
|
||||
|
||||
```
|
||||
[Summary] Previous conversation covered Python basics,
|
||||
history, features, and common use cases...
|
||||
|
||||
[Message 18] User: Recent question about decorators
|
||||
[Message 19] AI: Decorators in Python are...
|
||||
[Message 20] User: Current question
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Requirements
|
||||
|
||||
!!! note "Prerequisites"
|
||||
- OpenWebUI v0.3.0 or later
|
||||
- Access to an LLM for summarization
|
||||
|
||||
!!! tip "Best Practices"
|
||||
- Set appropriate token thresholds based on your model's context window
|
||||
- Preserve more recent messages for technical discussions
|
||||
- Test compression settings in non-critical conversations first
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
??? question "Compression not triggering?"
|
||||
Check if the token count exceeds your configured threshold. Enable debug logging for more details.
|
||||
|
||||
??? question "Important context being lost?"
|
||||
Increase the `preserve_recent` setting or lower the compression ratio.
|
||||
|
||||
---
|
||||
|
||||
## Source Code
|
||||
|
||||
[:fontawesome-brands-github: View on GitHub](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression){ .md-button }
|
||||
See the full history on GitHub: [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions)
|
||||
|
||||
@@ -1,137 +1,119 @@
|
||||
# Async Context Compression(异步上下文压缩)
|
||||
# 异步上下文压缩过滤器
|
||||
|
||||
<span class="category-badge filter">Filter</span>
|
||||
<span class="version-badge">v1.2.2</span>
|
||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie/openwebui-extensions) | **版本:** 1.3.0 | **项目:** [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) | **许可证:** MIT
|
||||
|
||||
通过智能摘要减少长对话的 token 消耗,同时保持对话连贯。
|
||||
> **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。
|
||||
|
||||
本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。
|
||||
|
||||
## 1.3.0 版本更新
|
||||
|
||||
- **国际化 (i18n) 支持**: 完成了所有用户可见消息的本地化,现已原生支持 9 种语言(含中、英、日、韩及欧洲主要语言)。
|
||||
- **智能状态显示**: 新增 `token_usage_status_threshold` 阀门(默认 80%),可以智能控制何时显示 Token 用量状态,减少不必要的打扰。
|
||||
- **性能大幅优化**: 对前端语言检测和日志处理流程进行了非阻塞重构,完全不影响首字节响应时间(TTFB),保持毫秒级极速推流。
|
||||
- **Copilot SDK 兼容**: 自动检测并跳过基于 `copilot_sdk` 模型的上下文压缩,避免冲突。
|
||||
- **配置项调整**: 为了提供更安静的生产环境体验,`debug_mode` 现已默认设置为 `false`。
|
||||
|
||||
---
|
||||
|
||||
## 概览
|
||||
## 核心特性
|
||||
|
||||
Async Context Compression 过滤器通过以下方式帮助管理长对话的 token 使用:
|
||||
- ✅ **全方位国际化**: 原生支持 9 种界面语言。
|
||||
- ✅ **自动压缩**: 基于 Token 阈值自动触发上下文压缩。
|
||||
- ✅ **异步摘要**: 后台生成摘要,不阻塞当前对话响应。
|
||||
- ✅ **持久化存储**: 复用 Open WebUI 共享数据库连接,自动支持 PostgreSQL/SQLite 等。
|
||||
- ✅ **灵活保留策略**: 可配置保留对话头部和尾部消息,确保关键信息连贯。
|
||||
- ✅ **智能注入**: 将历史摘要智能注入到新上下文中。
|
||||
- ✅ **结构感知裁剪**: 智能折叠过长消息,保留文档骨架(标题、首尾)。
|
||||
- ✅ **原生工具输出裁剪**: 支持裁剪冗长的工具调用输出。
|
||||
- ✅ **实时监控**: 实时监控上下文使用情况,超过 90% 发出警告。
|
||||
- ✅ **详细日志**: 提供精确的 Token 统计日志,便于调试。
|
||||
- ✅ **智能模型匹配**: 自定义模型自动继承基础模型的阈值配置。
|
||||
- ⚠ **多模态支持**: 图片内容会被保留,但其 Token **不参与计算**。请相应调整阈值。
|
||||
|
||||
- 智能总结较早的消息
|
||||
- 保留关键信息
|
||||
- 降低 API 成本
|
||||
- 保持对话一致性
|
||||
|
||||
特别适用于:
|
||||
|
||||
- 长时间会话
|
||||
- 多轮复杂讨论
|
||||
- 成本优化
|
||||
- 上下文长度控制
|
||||
|
||||
## 功能特性
|
||||
|
||||
- :material-arrow-collapse-vertical: **智能压缩**:AI 驱动的上下文摘要
|
||||
- :material-clock-fast: **异步处理**:后台非阻塞压缩
|
||||
- :material-memory: **保留上下文**:尽量保留重要信息
|
||||
- :material-currency-usd-off: **降低成本**:减少 token 使用
|
||||
- :material-console: **前端调试**:支持浏览器控制台日志
|
||||
- :material-alert-circle-check: **增强错误报告**:清晰的错误状态通知
|
||||
- :material-check-all: **Open WebUI v0.7.x 兼容性**:动态数据库会话处理
|
||||
- :material-account-convert: **兼容性提升**:摘要角色改为 `assistant`
|
||||
- :material-shield-check: **稳定性增强**:解决状态管理竞态条件
|
||||
- :material-ruler: **预检上下文检查**:发送前验证上下文是否超限
|
||||
- :material-format-align-justify: **结构感知裁剪**:保留文档结构的智能裁剪
|
||||
- :material-content-cut: **原生工具输出裁剪**:自动裁剪冗长的工具输出(注意:非原生工具调用输出不会完整注入上下文)
|
||||
- :material-chart-bar: **详细 Token 日志**:提供细粒度的 Token 统计
|
||||
- :material-account-search: **智能模型匹配**:自定义模型自动继承基础模型配置
|
||||
- :material-image-off: **多模态支持**:图片内容保留但 Token **不参与计算**
|
||||
详细的工作原理和流程请参考 [工作流程指南](WORKFLOW_GUIDE_CN.md)。
|
||||
|
||||
---
|
||||
|
||||
## 安装
|
||||
## 安装与配置
|
||||
|
||||
1. 下载插件文件:[`async_context_compression.py`](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression)
|
||||
2. 上传到 OpenWebUI:**Admin Panel** → **Settings** → **Functions**
|
||||
3. 配置压缩参数
|
||||
4. 启用过滤器
|
||||
### 1. 数据库(自动)
|
||||
|
||||
- 自动使用 Open WebUI 的共享数据库连接,**无需额外配置**。
|
||||
- 首次运行自动创建 `chat_summary` 表。
|
||||
|
||||
### 2. 过滤器顺序
|
||||
|
||||
- 建议顺序:前置过滤器(<10)→ 本过滤器(10)→ 后置过滤器(>10)。
|
||||
|
||||
---
|
||||
|
||||
## 工作原理
|
||||
## 配置参数
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[Incoming Messages] --> B{Token Count > Threshold?}
|
||||
B -->|No| C[Pass Through]
|
||||
B -->|Yes| D[Summarize Older Messages]
|
||||
D --> E[Preserve Recent Messages]
|
||||
E --> F[Combine Summary + Recent]
|
||||
F --> G[Send to LLM]
|
||||
您可以在过滤器的设置中调整以下参数:
|
||||
|
||||
### 核心参数
|
||||
|
||||
| 参数 | 默认值 | 描述 |
|
||||
| :----------------------------- | :------- | :------------------------------------------------------------------------------------ |
|
||||
| `priority` | `10` | 过滤器执行顺序,数值越小越先执行。 |
|
||||
| `compression_threshold_tokens` | `64000` | **重要**: 当上下文总 Token 超过此值时后台生成摘要,建议设为模型上下文窗口的 50%-70%。 |
|
||||
| `max_context_tokens` | `128000` | **重要**: 上下文硬上限,超过即移除最早消息(保留受保护消息)。 |
|
||||
| `keep_first` | `1` | 始终保留对话开始的 N 条消息,保护系统提示或环境变量。 |
|
||||
| `keep_last` | `6` | 始终保留对话末尾的 N 条消息,确保最近上下文连贯。 |
|
||||
|
||||
### 摘要生成配置
|
||||
|
||||
| 参数 | 默认值 | 描述 |
|
||||
| :-------------------- | :------ | :------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `summary_model` | `None` | 用于生成摘要的模型 ID。**强烈建议**配置快速、经济、上下文窗口大的模型(如 `gemini-2.5-flash`、`deepseek-v3`)。留空则尝试复用当前对话模型。 |
|
||||
| `summary_model_max_context` | `0` | 摘要模型的最大上下文 Token 数。如果为 0,则回退到 `model_thresholds` 或全局 `max_context_tokens`。 |
|
||||
| `max_summary_tokens` | `16384` | 生成摘要时允许的最大 Token 数。 |
|
||||
| `summary_temperature` | `0.1` | 控制摘要生成的随机性,较低的值结果更稳定。 |
|
||||
|
||||
### 高级配置
|
||||
|
||||
#### `model_thresholds` (模型特定阈值)
|
||||
|
||||
这是一个字典配置,可为特定模型 ID 覆盖全局 `compression_threshold_tokens` 与 `max_context_tokens`,适用于混合不同上下文窗口的模型。
|
||||
|
||||
**默认包含 GPT-4、Claude 3.5、Gemini 1.5/2.0、Qwen 2.5/3、DeepSeek V3 等推荐阈值。**
|
||||
|
||||
**配置示例:**
|
||||
|
||||
```json
|
||||
{
|
||||
"gpt-4": {
|
||||
"compression_threshold_tokens": 8000,
|
||||
"max_context_tokens": 32000
|
||||
},
|
||||
"gemini-2.5-flash": {
|
||||
"compression_threshold_tokens": 734000,
|
||||
"max_context_tokens": 1048576
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 配置项
|
||||
|
||||
| 选项 | 类型 | 默认值 | 说明 |
|
||||
|--------|------|---------|-------------|
|
||||
| `compression_threshold_tokens` | integer | `64000` | 超过该 token 数触发压缩 |
|
||||
| `max_context_tokens` | integer | `128000` | 上下文硬性上限 |
|
||||
| `keep_first` | integer | `1` | 始终保留的前 N 条消息 |
|
||||
| `keep_last` | integer | `6` | 始终保留的后 N 条消息 |
|
||||
| `summary_model` | string | `None` | 用于摘要的模型 |
|
||||
| `summary_model_max_context` | integer | `0` | 摘要模型的最大上下文 Token 数 |
|
||||
| `max_summary_tokens` | integer | `16384` | 摘要的最大 token 数 |
|
||||
| `enable_tool_output_trimming` | boolean | `false` | 启用长工具输出裁剪 |
|
||||
| 参数 | 默认值 | 描述 |
|
||||
| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `enable_tool_output_trimming` | `false` | 启用时,若 `function_calling: "native"` 激活,将裁剪冗长的工具输出以仅提取最终答案。 |
|
||||
| `debug_mode` | `false` | 是否在 Open WebUI 的控制台日志中打印详细的调试信息。生产环境默认且建议设为 `false`。 |
|
||||
| `show_debug_log` | `false` | 是否在浏览器控制台 (F12) 打印调试日志。便于前端调试。 |
|
||||
| `show_token_usage_status` | `true` | 是否在对话结束时显示 Token 使用情况的状态通知。 |
|
||||
| `token_usage_status_threshold` | `80` | 触发显示上下文用量状态通知的最低百分比阈值 (0-100)。 |
|
||||
|
||||
---
|
||||
|
||||
## 示例
|
||||
## ⭐ 支持
|
||||
|
||||
### 压缩前
|
||||
如果这个插件对你有帮助,欢迎到 [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions) 点个 Star,这将是我持续改进的动力,感谢支持。
|
||||
|
||||
```
|
||||
[Message 1] User: Tell me about Python...
|
||||
[Message 2] AI: Python is a programming language...
|
||||
[Message 3] User: What about its history?
|
||||
[Message 4] AI: Python was created by Guido...
|
||||
[Message 5] User: And its features?
|
||||
[Message 6] AI: Python has many features...
|
||||
... (many more messages)
|
||||
[Message 20] User: Current question
|
||||
```
|
||||
## 故障排除 (Troubleshooting) ❓
|
||||
|
||||
### 压缩后
|
||||
- **初始系统提示丢失**:将 `keep_first` 设置为大于 0。
|
||||
- **压缩效果不明显**:提高 `compression_threshold_tokens`,或降低 `keep_first` / `keep_last` 以增强压缩力度。
|
||||
- **提交 Issue**: 如果遇到任何问题,请在 GitHub 上提交 Issue:[OpenWebUI Extensions Issues](https://github.com/Fu-Jie/openwebui-extensions/issues)
|
||||
|
||||
```
|
||||
[Summary] Previous conversation covered Python basics,
|
||||
history, features, and common use cases...
|
||||
## 更新日志
|
||||
|
||||
[Message 18] User: Recent question about decorators
|
||||
[Message 19] AI: Decorators in Python are...
|
||||
[Message 20] User: Current question
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 运行要求
|
||||
|
||||
!!! note "前置条件"
|
||||
- OpenWebUI v0.3.0 及以上
|
||||
- 需要可用的 LLM 用于摘要
|
||||
|
||||
!!! tip "最佳实践"
|
||||
- 根据模型上下文窗口设置合适的 token 阈值
|
||||
- 技术讨论可适当提高 `preserve_recent`
|
||||
- 先在非关键对话中测试压缩效果
|
||||
|
||||
---
|
||||
|
||||
## 常见问题
|
||||
|
||||
??? question "没有触发压缩?"
|
||||
检查 token 数是否超过配置的阈值,并开启调试日志了解细节。
|
||||
|
||||
??? question "重要上下文丢失?"
|
||||
提高 `preserve_recent` 或降低压缩比例。
|
||||
|
||||
---
|
||||
|
||||
## 源码
|
||||
|
||||
[:fontawesome-brands-github: 在 GitHub 查看](https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression){ .md-button }
|
||||
完整历史请查看 GitHub 项目: [OpenWebUI Extensions](https://github.com/Fu-Jie/openwebui-extensions)
|
||||
|
||||
@@ -22,7 +22,7 @@ Filters act as middleware in the message pipeline:
|
||||
|
||||
Reduces token consumption in long conversations through intelligent summarization while maintaining coherence.
|
||||
|
||||
**Version:** 1.2.2
|
||||
**Version:** 1.3.0
|
||||
|
||||
[:octicons-arrow-right-24: Documentation](async-context-compression.md)
|
||||
|
||||
|
||||
@@ -22,7 +22,7 @@ Filter 充当消息管线中的中间件:
|
||||
|
||||
通过智能总结减少长对话的 token 消耗,同时保持连贯性。
|
||||
|
||||
**版本:** 1.2.2
|
||||
**版本:** 1.3.0
|
||||
|
||||
[:octicons-arrow-right-24: 查看文档](async-context-compression.md)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user