更新文档和参数:将版本号提升至1.2.0,增加新特性说明,调整最大摘要token数至16384

This commit is contained in:
Jeff fu
2025-12-31 13:21:33 +08:00
parent 2cf64c085a
commit 59ee25754d
4 changed files with 78 additions and 86 deletions

View File

@@ -1,70 +1,65 @@
# Async Context Compression Filter
**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.1.0 | **License:** MIT
**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.2.0 | **License:** MIT
> **Important Note**: To ensure the maintainability and usability of all filters, each filter should be accompanied by clear and complete documentation to fully explain its functionality, configuration, and usage.
This filter reduces token consumption in long conversations through intelligent summarization and message compression while keeping conversations coherent.
This filter significantly reduces token consumption in long conversations by using intelligent summarization and message compression, while maintaining conversational coherence.
## What's new in 1.1.0
- Reuses Open WebUI's shared database connection by default (no custom engine or env vars required).
- Token-based thresholds (`compression_threshold_tokens`, `max_context_tokens`) for safer long-context handling.
- Per-model overrides via `model_thresholds` for mixed-model workflows.
- Documentation now mirrors the latest async workflow and retention-first injection.
---
## Core Features
- **Automatic Compression**: Triggers context compression automatically based on a message count threshold.
- **Asynchronous Summarization**: Generates summaries in the background without blocking the current chat response.
- **Persistent Storage**: Uses Open WebUI's shared database connection - automatically supports any database backend (PostgreSQL, SQLite, etc.).
- **Flexible Retention Policy**: Freely configure the number of initial and final messages to keep, ensuring critical information and context continuity.
- **Smart Injection**: Intelligently injects the generated historical summary into the new context.
- Automatic compression triggered by token thresholds.
- Asynchronous summarization that does not block chat responses.
- Persistent storage via Open WebUI's shared database connection (PostgreSQL, SQLite, etc.).
- Flexible retention policy to keep the first and last N messages.
- Smart injection of historical summaries back into the context.
---
## Installation & Configuration
### 1. Database (Automatic)
### 1) Database (automatic)
This plugin automatically uses Open WebUI's shared database connection. **No additional database configuration is required.**
- Uses Open WebUI's shared database connection; no extra configuration needed.
- The `chat_summary` table is created on first run.
The `chat_summary` table will be created automatically on first run.
### 2) Filter order
### 2. Filter Order
It is recommended to keep this filter early in the chain so it runs before filters that mutate messages:
It is recommended to set the priority of this filter relatively high (a smaller number) to ensure it runs before other filters that might modify message content. A typical order might be:
1. **Pre-Filters (priority < 10)**
- e.g., A filter that injects a system-level prompt.
2. **This Compression Filter (priority = 10)**
3. **Post-Filters (priority > 10)**
- e.g., A filter that formats the final output.
1. Pre-filters (priority < 10) — e.g., system prompt injectors.
2. This compression filter (priority = 10).
3. Post-filters (priority > 10) — e.g., output formatting.
---
## Configuration Parameters
You can adjust the following parameters in the filter's settings:
| Parameter | Default | Description |
| :--- | :--- | :--- |
| `priority` | `10` | The execution order of the filter. Lower numbers run first. |
| `compression_threshold` | `15` | When the total message count reaches this value, a background summary generation will be triggered. |
| `keep_first` | `1` | Always keep the first N messages. The first message often contains important system prompts. |
| `keep_last` | `6` | Always keep the last N messages to ensure contextual coherence. |
| `summary_model` | `None` | The model used for generating summaries. **Strongly recommended** to set a fast, economical, and compatible model (e.g., `gemini-2.5-flash`). If left empty, it will try to use the current chat's model, which may fail if it's an incompatible model type (like a Pipe model). |
| `max_summary_tokens` | `4000` | The maximum number of tokens allowed for the generated summary. |
| `summary_temperature` | `0.3` | Controls the randomness of the summary. Lower values are more deterministic. |
| `debug_mode` | `true` | Whether to print detailed debug information to the log. Recommended to set to `false` in production. |
| Parameter | Default | Description |
| :----------------------------- | :------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `priority` | `10` | Execution order; lower runs earlier. |
| `compression_threshold_tokens` | `64000` | Trigger asynchronous summary when total tokens exceed this value. Set to 50%-70% of your model's context window. |
| `max_context_tokens` | `128000` | Hard cap for context; older messages (except protected ones) are dropped if exceeded. |
| `keep_first` | `1` | Always keep the first N messages (protects system prompts). |
| `keep_last` | `6` | Always keep the last N messages to preserve recent context. |
| `summary_model` | `None` | Model for summaries. Strongly recommended to set a fast, economical model (e.g., `gemini-2.5-flash`, `deepseek-v3`). Falls back to the current chat model when empty. |
| `max_summary_tokens` | `4000` | Maximum tokens for the generated summary. |
| `summary_temperature` | `0.3` | Randomness for summary generation. Lower is more deterministic. |
| `model_thresholds` | `{}` | Per-model overrides for `compression_threshold_tokens` and `max_context_tokens` (useful for mixed models). |
| `debug_mode` | `true` | Log verbose debug info. Set to `false` in production. |
---
## Troubleshooting
- **Problem: Database table not created.**
- **Solution**: Ensure Open WebUI is properly configured with a database and check Open WebUI's logs for detailed error messages.
- **Problem: Summary not generated.**
- **Solution**: Check if the `compression_threshold` has been met and verify that `summary_model` is configured correctly. Check the logs for detailed errors.
- **Problem: Initial system prompt is lost.**
- **Solution**: Ensure `keep_first` is set to a value greater than 0 to preserve the initial messages containing important information.
- **Problem: Compression effect is not significant.**
- **Solution**: Try increasing the `compression_threshold` or decreasing the `keep_first` / `keep_last` values.
- **Database table not created**: Ensure Open WebUI is configured with a database and check Open WebUI logs for errors.
- **Summary not generated**: Confirm `compression_threshold_tokens` was hit and `summary_model` is compatible. Review logs for details.
- **Initial system prompt is lost**: Keep `keep_first` greater than 0 to protect the initial message.
- **Compression effect is weak**: Raise `compression_threshold_tokens` or lower `keep_first` / `keep_last` to allow more aggressive compression.

View File

@@ -1,20 +1,27 @@
# 异步上下文压缩过滤器
**作者:** [Fu-Jie](https://github.com/Fu-Jie) | **版本:** 1.1.0 | **许可证:** MIT
**作者:** [Fu-Jie](https://github.com/Fu-Jie) | **版本:** 1.2.0 | **许可证:** MIT
> **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。
本过滤器通过智能摘要和消息压缩技术在保持对话连贯性的同时显著降低长对话的Token消耗。
本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的 Token 消耗。
## 1.1.0 版本更新
- 默认复用 OpenWebUI 内置数据库连接,无需自建引擎、无需配置 `DATABASE_URL`
- 基于 Token 的阈值控制(`compression_threshold_tokens``max_context_tokens`),长上下文更安全。
- 支持 `model_thresholds` 为不同模型设置专属阈值,适合混用多模型场景。
- 文档同步最新异步工作流与“先保留再注入”策略。
---
## 核心特性
- **自动压缩**: 基于消息数量阈值自动触发上下文压缩。
- **异步摘要**: 后台生成摘要,不阻塞当前对话响应。
- **持久化存储**: 使用 Open WebUI 共享数据库连接 - 自动支持任何数据库后端(PostgreSQLSQLite 等
- **灵活保留策略**: 可自由配置保留对话头部和尾部消息数量,确保关键信息和上下文的连贯
- **智能注入**: 将生成的历史摘要智能注入到新上下文中。
-**自动压缩**: 基于 Token 阈值自动触发上下文压缩。
-**异步摘要**: 后台生成摘要,不阻塞当前对话响应。
-**持久化存储**: 用 Open WebUI 共享数据库连接自动支持 PostgreSQL/SQLite 等。
-**灵活保留策略**: 可配置保留对话头部和尾部消息,确保关键信息连贯。
-**智能注入**: 将历史摘要智能注入到新上下文中。
详细的工作原理和流程请参考 [工作流程指南](WORKFLOW_GUIDE_CN.md)。
@@ -24,19 +31,16 @@
### 1. 数据库(自动)
本插件自动使用 Open WebUI 的共享数据库连接**无需额外的数据库配置**
`chat_summary` 表将在首次运行时自动创建。
- 自动使用 Open WebUI 的共享数据库连接**无需额外配置**
- 首次运行自动创建 `chat_summary` 表。
### 2. 过滤器顺序
建议将此过滤器的优先级设置得相对较高(数值较小),以确保它在其他可能修改消息内容的过滤器之前运行。一个典型的顺序可能是:
1. **前置过滤器 (priority < 10)**
- 例如:注入系统级提示的过滤器
2. **本压缩过滤器 (priority = 10)**
3. **后置过滤器 (priority > 10)**
- 例如:对最终输出进行格式化的过滤器。
1. 前置过滤器 (priority < 10) —— 例如系统提示注入。
2. 本压缩过滤器 (priority = 10)
3. 后置过滤器 (priority > 10) —— 例如最终输出格式化。
---
@@ -46,29 +50,29 @@
### 核心参数
| 参数 | 默认值 | 描述 |
| :--- | :--- | :--- |
| `priority` | `10` | 过滤器执行顺序,数值越小越先执行。 |
| `compression_threshold_tokens` | `64000` | **(重要)** 当上下文总 Token 超过此值时,将在后台触发摘要生成。建议设为模型最大上下文窗口的 50%-70%。 |
| `max_context_tokens` | `128000` | **(重要)** 上下文的硬性上限。如果超过此值,将强制移除最早消息(保留受保护消息除外)。防止 Token 溢出。 |
| `keep_first` | `1` | 始终保留对话开始的 N 条消息。第一条消息通常包含重要的系统提示或环境变量,建议至少保留 1 条。 |
| `keep_last` | `6` | 始终保留对话末尾的 N 条消息,确保最近对话的连贯性。 |
| 参数 | 默认值 | 描述 |
| :----------------------------- | :------- | :------------------------------------------------------------------------------------ |
| `priority` | `10` | 过滤器执行顺序,数值越小越先执行。 |
| `compression_threshold_tokens` | `64000` | **重要**: 当上下文总 Token 超过此值时后台生成摘要,建议设为模型上下文窗口的 50%-70%。 |
| `max_context_tokens` | `128000` | **重要**: 上下文硬上限,超过即移除最早消息(保留受保护消息)。 |
| `keep_first` | `1` | 始终保留对话开始的 N 条消息,保护系统提示或环境变量。 |
| `keep_last` | `6` | 始终保留对话末尾的 N 条消息,确保最近上下文连贯。 |
### 摘要生成配置
| 参数 | 默认值 | 描述 |
| :--- | :--- | :--- |
| `summary_model` | `None` | 用于生成摘要的模型 ID。**强烈建议**配置一个快速、经济上下文窗口大的模型(如 `gemini-2.5-flash`)。如果留空,将尝试使用当前对话模型。 |
| `max_summary_tokens` | `16384` | 生成摘要时允许的最大 Token 数。 |
| `summary_temperature` | `0.1` | 控制摘要生成的随机性,较低的值结果更稳定。 |
| 参数 | 默认值 | 描述 |
| :-------------------- | :------ | :------------------------------------------------------------------------------------------------------------------------------------------ |
| `summary_model` | `None` | 用于生成摘要的模型 ID。**强烈建议**配置快速、经济上下文窗口大的模型(如 `gemini-2.5-flash``deepseek-v3`)。留空则尝试用当前对话模型。 |
| `max_summary_tokens` | `16384` | 生成摘要时允许的最大 Token 数。 |
| `summary_temperature` | `0.1` | 控制摘要生成的随机性,较低的值结果更稳定。 |
### 高级配置
#### `model_thresholds` (模型特定阈值)
这是一个字典配置,允许您为特定模型 ID 覆盖全局 `compression_threshold_tokens` `max_context_tokens`。这对于混合使用不同上下文窗口大小的模型非常有用
这是一个字典配置,为特定模型 ID 覆盖全局 `compression_threshold_tokens` `max_context_tokens`,适用于混合不同上下文窗口的模型。
**默认配置包含了主流模型(如 GPT-4, Claude 3.5, Gemini 1.5/2.0, Qwen 2.5/3, DeepSeek V3 等)的推荐阈值。**
**默认包含 GPT-4Claude 3.5Gemini 1.5/2.0Qwen 2.5/3DeepSeek V3 等推荐阈值。**
**配置示例:**
@@ -87,21 +91,14 @@
#### `debug_mode`
- **默认值**: `true`
- **描述**: 是否在 Open WebUI 的控制台日志中打印详细的调试信息(如 Token 计数、压缩进度、数据库操作等)。生产环境建议设为 `false`
- **默认值**: `true`
- **描述**: 是否在 Open WebUI 的控制台日志中打印详细的调试信息(如 Token 计数、压缩进度、数据库操作等)。生产环境建议设为 `false`
---
## 故障排除
- **问题:数据库表未创建**
- **解决**:确保 Open WebUI 已正确配置数据库,并查看 Open WebUI 的日志以获取详细的错误信息
- **问题:摘要未生成**
- **解决**:检查 `compression_threshold_tokens` 是否已达到,并确认 `summary_model` 配置正确。查看日志以获取详细错误。
- **问题:初始的系统提示丢失**
- **解决**:确保 `keep_first` 的值大于 0以保留包含重要信息的初始消息。
- **问题:压缩效果不明显**
- **解决**:尝试适当提高 `compression_threshold_tokens`,或减少 `keep_first` / `keep_last` 的值。
- **数据库表未创建**:确保 Open WebUI 已配置数据库,并查看日志获取错误信息。
- **摘要未生成**:检查是否达到 `compression_threshold_tokens`,确认 `summary_model` 可用,并查看日志
- **初始系统提示丢失**:将 `keep_first` 设置为大于 0。
- **压缩效果不明显**:提高 `compression_threshold_tokens`,或降低 `keep_first` / `keep_last` 以增强压缩力度。

View File

@@ -127,7 +127,7 @@ summary_model
- If the current conversation uses a pipeline (Pipe) model or a model that does not support standard generation APIs, leaving this field empty may cause summary generation to fail. In this case, you must specify a valid model.
max_summary_tokens
Default: 4000
Default: 16384
Description: The maximum number of tokens allowed for the generated summary.
summary_temperature

View File

@@ -126,7 +126,7 @@ summary_model (摘要模型)
- 如果当前对话使用的是流水线Pipe模型或不直接支持标准生成API的模型留空此项可能会导致摘要生成失败。在这种情况下必须指定一个有效的模型。
max_summary_tokens (摘要长度)
默认: 4000
默认: 16384
说明: 生成摘要时允许的最大 token 数。
summary_temperature (摘要温度)