更新文档和参数:将版本号提升至1.2.0,增加新特性说明,调整最大摘要token数至16384
This commit is contained in:
@@ -1,70 +1,65 @@
|
||||
# Async Context Compression Filter
|
||||
|
||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.1.0 | **License:** MIT
|
||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.2.0 | **License:** MIT
|
||||
|
||||
> **Important Note**: To ensure the maintainability and usability of all filters, each filter should be accompanied by clear and complete documentation to fully explain its functionality, configuration, and usage.
|
||||
This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.
|
||||
|
||||
This filter significantly reduces token consumption in long conversations by using intelligent summarization and message compression, while maintaining conversational coherence.
|
||||
## What's new in 1.1.0
|
||||
|
||||
- Reuses Open WebUI's shared database connection by default (no custom engine or env vars required).
|
||||
- Token-based thresholds (`compression_threshold_tokens`, `max_context_tokens`) for safer long-context handling.
|
||||
- Per-model overrides via `model_thresholds` for mixed-model workflows.
|
||||
- Documentation now mirrors the latest async workflow and retention-first injection.
|
||||
|
||||
---
|
||||
|
||||
## Core Features
|
||||
|
||||
- ✅ **Automatic Compression**: Triggers context compression automatically based on a message count threshold.
|
||||
- ✅ **Asynchronous Summarization**: Generates summaries in the background without blocking the current chat response.
|
||||
- ✅ **Persistent Storage**: Uses Open WebUI's shared database connection - automatically supports any database backend (PostgreSQL, SQLite, etc.).
|
||||
- ✅ **Flexible Retention Policy**: Freely configure the number of initial and final messages to keep, ensuring critical information and context continuity.
|
||||
- ✅ **Smart Injection**: Intelligently injects the generated historical summary into the new context.
|
||||
- ✅ Automatic compression triggered by token thresholds.
|
||||
- ✅ Asynchronous summarization that does not block chat responses.
|
||||
- ✅ Persistent storage via Open WebUI's shared database connection (PostgreSQL, SQLite, etc.).
|
||||
- ✅ Flexible retention policy to keep the first and last N messages.
|
||||
- ✅ Smart injection of historical summaries back into the context.
|
||||
|
||||
---
|
||||
|
||||
## Installation & Configuration
|
||||
|
||||
### 1. Database (Automatic)
|
||||
### 1) Database (automatic)
|
||||
|
||||
This plugin automatically uses Open WebUI's shared database connection. **No additional database configuration is required.**
|
||||
- Uses Open WebUI's shared database connection; no extra configuration needed.
|
||||
- The `chat_summary` table is created on first run.
|
||||
|
||||
The `chat_summary` table will be created automatically on first run.
|
||||
### 2) Filter order
|
||||
|
||||
### 2. Filter Order
|
||||
It is recommended to keep this filter early in the chain so it runs before filters that mutate messages:
|
||||
|
||||
It is recommended to set the priority of this filter relatively high (a smaller number) to ensure it runs before other filters that might modify message content. A typical order might be:
|
||||
|
||||
1. **Pre-Filters (priority < 10)**
|
||||
- e.g., A filter that injects a system-level prompt.
|
||||
2. **This Compression Filter (priority = 10)**
|
||||
3. **Post-Filters (priority > 10)**
|
||||
- e.g., A filter that formats the final output.
|
||||
1. Pre-filters (priority < 10) — e.g., system prompt injectors.
|
||||
2. This compression filter (priority = 10).
|
||||
3. Post-filters (priority > 10) — e.g., output formatting.
|
||||
|
||||
---
|
||||
|
||||
## Configuration Parameters
|
||||
|
||||
You can adjust the following parameters in the filter's settings:
|
||||
|
||||
| Parameter | Default | Description |
|
||||
| :--- | :--- | :--- |
|
||||
| `priority` | `10` | The execution order of the filter. Lower numbers run first. |
|
||||
| `compression_threshold` | `15` | When the total message count reaches this value, a background summary generation will be triggered. |
|
||||
| `keep_first` | `1` | Always keep the first N messages. The first message often contains important system prompts. |
|
||||
| `keep_last` | `6` | Always keep the last N messages to ensure contextual coherence. |
|
||||
| `summary_model` | `None` | The model used for generating summaries. **Strongly recommended** to set a fast, economical, and compatible model (e.g., `gemini-2.5-flash`). If left empty, it will try to use the current chat's model, which may fail if it's an incompatible model type (like a Pipe model). |
|
||||
| `max_summary_tokens` | `4000` | The maximum number of tokens allowed for the generated summary. |
|
||||
| `summary_temperature` | `0.3` | Controls the randomness of the summary. Lower values are more deterministic. |
|
||||
| `debug_mode` | `true` | Whether to print detailed debug information to the log. Recommended to set to `false` in production. |
|
||||
| Parameter | Default | Description |
|
||||
| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `priority` | `10` | Execution order; lower runs earlier. |
|
||||
| `compression_threshold_tokens` | `64000` | Trigger asynchronous summary when total tokens exceed this value. Set to 50%-70% of your model's context window. |
|
||||
| `max_context_tokens` | `128000` | Hard cap for context; older messages (except protected ones) are dropped if exceeded. |
|
||||
| `keep_first` | `1` | Always keep the first N messages (protects system prompts). |
|
||||
| `keep_last` | `6` | Always keep the last N messages to preserve recent context. |
|
||||
| `summary_model` | `None` | Model for summaries. Strongly recommended to set a fast, economical model (e.g., `gemini-2.5-flash`, `deepseek-v3`). Falls back to the current chat model when empty. |
|
||||
| `max_summary_tokens` | `4000` | Maximum tokens for the generated summary. |
|
||||
| `summary_temperature` | `0.3` | Randomness for summary generation. Lower is more deterministic. |
|
||||
| `model_thresholds` | `{}` | Per-model overrides for `compression_threshold_tokens` and `max_context_tokens` (useful for mixed models). |
|
||||
| `debug_mode` | `true` | Log verbose debug info. Set to `false` in production. |
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- **Problem: Database table not created.**
|
||||
- **Solution**: Ensure Open WebUI is properly configured with a database and check Open WebUI's logs for detailed error messages.
|
||||
|
||||
- **Problem: Summary not generated.**
|
||||
- **Solution**: Check if the `compression_threshold` has been met and verify that `summary_model` is configured correctly. Check the logs for detailed errors.
|
||||
|
||||
- **Problem: Initial system prompt is lost.**
|
||||
- **Solution**: Ensure `keep_first` is set to a value greater than 0 to preserve the initial messages containing important information.
|
||||
|
||||
- **Problem: Compression effect is not significant.**
|
||||
- **Solution**: Try increasing the `compression_threshold` or decreasing the `keep_first` / `keep_last` values.
|
||||
- **Database table not created**: Ensure Open WebUI is configured with a database and check Open WebUI logs for errors.
|
||||
- **Summary not generated**: Confirm `compression_threshold_tokens` was hit and `summary_model` is compatible. Review logs for details.
|
||||
- **Initial system prompt is lost**: Keep `keep_first` greater than 0 to protect the initial message.
|
||||
- **Compression effect is weak**: Raise `compression_threshold_tokens` or lower `keep_first` / `keep_last` to allow more aggressive compression.
|
||||
|
||||
@@ -1,20 +1,27 @@
|
||||
# 异步上下文压缩过滤器
|
||||
|
||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie) | **版本:** 1.1.0 | **许可证:** MIT
|
||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie) | **版本:** 1.2.0 | **许可证:** MIT
|
||||
|
||||
> **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。
|
||||
|
||||
本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的Token消耗。
|
||||
本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。
|
||||
|
||||
## 1.1.0 版本更新
|
||||
|
||||
- 默认复用 OpenWebUI 内置数据库连接,无需自建引擎、无需配置 `DATABASE_URL`。
|
||||
- 基于 Token 的阈值控制(`compression_threshold_tokens`、`max_context_tokens`),长上下文更安全。
|
||||
- 支持 `model_thresholds` 为不同模型设置专属阈值,适合混用多模型场景。
|
||||
- 文档同步最新异步工作流与“先保留再注入”策略。
|
||||
|
||||
---
|
||||
|
||||
## 核心特性
|
||||
|
||||
- ✅ **自动压缩**: 基于消息数量阈值自动触发上下文压缩。
|
||||
- ✅ **异步摘要**: 在后台生成摘要,不阻塞当前对话的响应。
|
||||
- ✅ **持久化存储**: 使用 Open WebUI 的共享数据库连接 - 自动支持任何数据库后端(PostgreSQL、SQLite 等)。
|
||||
- ✅ **灵活保留策略**: 可自由配置保留对话头部和尾部的消息数量,确保关键信息和上下文的连贯性。
|
||||
- ✅ **智能注入**: 将生成的历史摘要智能地注入到新的上下文中。
|
||||
- ✅ **自动压缩**: 基于 Token 阈值自动触发上下文压缩。
|
||||
- ✅ **异步摘要**: 后台生成摘要,不阻塞当前对话响应。
|
||||
- ✅ **持久化存储**: 复用 Open WebUI 共享数据库连接,自动支持 PostgreSQL/SQLite 等。
|
||||
- ✅ **灵活保留策略**: 可配置保留对话头部和尾部消息,确保关键信息连贯。
|
||||
- ✅ **智能注入**: 将历史摘要智能注入到新上下文中。
|
||||
|
||||
详细的工作原理和流程请参考 [工作流程指南](WORKFLOW_GUIDE_CN.md)。
|
||||
|
||||
@@ -24,19 +31,16 @@
|
||||
|
||||
### 1. 数据库(自动)
|
||||
|
||||
本插件自动使用 Open WebUI 的共享数据库连接。**无需额外的数据库配置。**
|
||||
|
||||
`chat_summary` 表将在首次运行时自动创建。
|
||||
- 自动使用 Open WebUI 的共享数据库连接,**无需额外配置**。
|
||||
- 首次运行自动创建 `chat_summary` 表。
|
||||
|
||||
### 2. 过滤器顺序
|
||||
|
||||
建议将此过滤器的优先级设置得相对较高(数值较小),以确保它在其他可能修改消息内容的过滤器之前运行。一个典型的顺序可能是:
|
||||
|
||||
1. **前置过滤器 (priority < 10)**
|
||||
- 例如:注入系统级提示的过滤器。
|
||||
2. **本压缩过滤器 (priority = 10)**
|
||||
3. **后置过滤器 (priority > 10)**
|
||||
- 例如:对最终输出进行格式化的过滤器。
|
||||
1. 前置过滤器 (priority < 10) —— 例如系统提示注入。
|
||||
2. 本压缩过滤器 (priority = 10)。
|
||||
3. 后置过滤器 (priority > 10) —— 例如最终输出格式化。
|
||||
|
||||
---
|
||||
|
||||
@@ -46,29 +50,29 @@
|
||||
|
||||
### 核心参数
|
||||
|
||||
| 参数 | 默认值 | 描述 |
|
||||
| :--- | :--- | :--- |
|
||||
| `priority` | `10` | 过滤器执行顺序,数值越小越先执行。 |
|
||||
| `compression_threshold_tokens` | `64000` | **(重要)** 当上下文总 Token 数超过此值时,将在后台触发摘要生成。建议设置为模型最大上下文窗口的 50%-70%。 |
|
||||
| `max_context_tokens` | `128000` | **(重要)** 上下文的硬性上限。如果超过此值,将强制移除最早的消息(保留受保护消息除外)。防止 Token 溢出。 |
|
||||
| `keep_first` | `1` | 始终保留对话开始的 N 条消息。第一条消息通常包含重要的系统提示或环境变量,建议至少保留 1 条。 |
|
||||
| `keep_last` | `6` | 始终保留对话末尾的 N 条消息,以确保最近对话的连贯性。 |
|
||||
| 参数 | 默认值 | 描述 |
|
||||
| :----------------------------- | :------- | :------------------------------------------------------------------------------------ |
|
||||
| `priority` | `10` | 过滤器执行顺序,数值越小越先执行。 |
|
||||
| `compression_threshold_tokens` | `64000` | **重要**: 当上下文总 Token 超过此值时后台生成摘要,建议设为模型上下文窗口的 50%-70%。 |
|
||||
| `max_context_tokens` | `128000` | **重要**: 上下文硬上限,超过即移除最早消息(保留受保护消息)。 |
|
||||
| `keep_first` | `1` | 始终保留对话开始的 N 条消息,保护系统提示或环境变量。 |
|
||||
| `keep_last` | `6` | 始终保留对话末尾的 N 条消息,确保最近上下文连贯。 |
|
||||
|
||||
### 摘要生成配置
|
||||
|
||||
| 参数 | 默认值 | 描述 |
|
||||
| :--- | :--- | :--- |
|
||||
| `summary_model` | `None` | 用于生成摘要的模型 ID。**强烈建议**配置一个快速、经济且上下文窗口较大的模型(如 `gemini-2.5-flash`)。如果留空,将尝试使用当前对话的模型。 |
|
||||
| `max_summary_tokens` | `16384` | 生成摘要时允许的最大 Token 数。 |
|
||||
| `summary_temperature` | `0.1` | 控制摘要生成的随机性,较低的值结果更稳定。 |
|
||||
| 参数 | 默认值 | 描述 |
|
||||
| :-------------------- | :------ | :------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `summary_model` | `None` | 用于生成摘要的模型 ID。**强烈建议**配置快速、经济、上下文窗口大的模型(如 `gemini-2.5-flash`、`deepseek-v3`)。留空则尝试复用当前对话模型。 |
|
||||
| `max_summary_tokens` | `16384` | 生成摘要时允许的最大 Token 数。 |
|
||||
| `summary_temperature` | `0.1` | 控制摘要生成的随机性,较低的值结果更稳定。 |
|
||||
|
||||
### 高级配置
|
||||
|
||||
#### `model_thresholds` (模型特定阈值)
|
||||
|
||||
这是一个字典配置,允许您为特定的模型 ID 覆盖全局的 `compression_threshold_tokens` 和 `max_context_tokens`。这对于混合使用不同上下文窗口大小的模型非常有用。
|
||||
这是一个字典配置,可为特定模型 ID 覆盖全局 `compression_threshold_tokens` 与 `max_context_tokens`,适用于混合不同上下文窗口的模型。
|
||||
|
||||
**默认配置包含了主流模型(如 GPT-4, Claude 3.5, Gemini 1.5/2.0, Qwen 2.5/3, DeepSeek V3 等)的推荐阈值。**
|
||||
**默认包含 GPT-4、Claude 3.5、Gemini 1.5/2.0、Qwen 2.5/3、DeepSeek V3 等推荐阈值。**
|
||||
|
||||
**配置示例:**
|
||||
|
||||
@@ -87,21 +91,14 @@
|
||||
|
||||
#### `debug_mode`
|
||||
|
||||
- **默认值**: `true`
|
||||
- **描述**: 是否在 Open WebUI 的控制台日志中打印详细的调试信息(如 Token 计数、压缩进度、数据库操作等)。生产环境建议设为 `false`。
|
||||
- **默认值**: `true`
|
||||
- **描述**: 是否在 Open WebUI 的控制台日志中打印详细的调试信息(如 Token 计数、压缩进度、数据库操作等)。生产环境建议设为 `false`。
|
||||
|
||||
---
|
||||
|
||||
## 故障排除
|
||||
|
||||
- **问题:数据库表未创建**
|
||||
- **解决**:确保 Open WebUI 已正确配置数据库,并查看 Open WebUI 的日志以获取详细的错误信息。
|
||||
|
||||
- **问题:摘要未生成**
|
||||
- **解决**:检查 `compression_threshold_tokens` 是否已达到,并确认 `summary_model` 配置正确。查看日志以获取详细错误。
|
||||
|
||||
- **问题:初始的系统提示丢失**
|
||||
- **解决**:确保 `keep_first` 的值大于 0,以保留包含重要信息的初始消息。
|
||||
|
||||
- **问题:压缩效果不明显**
|
||||
- **解决**:尝试适当提高 `compression_threshold_tokens`,或减少 `keep_first` / `keep_last` 的值。
|
||||
- **数据库表未创建**:确保 Open WebUI 已配置数据库,并查看日志获取错误信息。
|
||||
- **摘要未生成**:检查是否达到 `compression_threshold_tokens`,确认 `summary_model` 可用,并查看日志。
|
||||
- **初始系统提示丢失**:将 `keep_first` 设置为大于 0。
|
||||
- **压缩效果不明显**:提高 `compression_threshold_tokens`,或降低 `keep_first` / `keep_last` 以增强压缩力度。
|
||||
|
||||
@@ -127,7 +127,7 @@ summary_model
|
||||
- If the current conversation uses a pipeline (Pipe) model or a model that does not support standard generation APIs, leaving this field empty may cause summary generation to fail. In this case, you must specify a valid model.
|
||||
|
||||
max_summary_tokens
|
||||
Default: 4000
|
||||
Default: 16384
|
||||
Description: The maximum number of tokens allowed for the generated summary.
|
||||
|
||||
summary_temperature
|
||||
|
||||
@@ -126,7 +126,7 @@ summary_model (摘要模型)
|
||||
- 如果当前对话使用的是流水线(Pipe)模型或不直接支持标准生成API的模型,留空此项可能会导致摘要生成失败。在这种情况下,必须指定一个有效的模型。
|
||||
|
||||
max_summary_tokens (摘要长度)
|
||||
默认: 4000
|
||||
默认: 16384
|
||||
说明: 生成摘要时允许的最大 token 数。
|
||||
|
||||
summary_temperature (摘要温度)
|
||||
|
||||
Reference in New Issue
Block a user