feat: 新增插件系统、多种插件类型、开发指南及多语言文档。
This commit is contained in:
45
plugins/filters/README.md
Normal file
45
plugins/filters/README.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Filters
|
||||
|
||||
English | [中文](./README_CN.md)
|
||||
|
||||
Filters process and modify user input before it is sent to the LLM. This directory contains various filters that can be used to extend OpenWebUI functionality.
|
||||
|
||||
## 📋 Filter List
|
||||
|
||||
| Filter Name | Description | Documentation |
|
||||
| :--- | :--- | :--- |
|
||||
| **Async Context Compression** | Reduces token consumption in long conversations through intelligent summarization and message compression while maintaining conversational coherence. | [English](./async-context-compression/async_context_compression.md) / [中文](./async-context-compression/async_context_compression_cn.md) |
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### Installing a Filter
|
||||
|
||||
1. Navigate to the desired filter directory
|
||||
2. Download the corresponding `.py` file to your local machine
|
||||
3. Open OpenWebUI Admin Settings and find the "Filters" section
|
||||
4. Upload the Python file
|
||||
5. Configure the filter parameters according to its documentation
|
||||
6. Refresh the page and enable the filter in your chat settings
|
||||
|
||||
## 📖 Development Guide
|
||||
|
||||
When adding a new filter, please follow these steps:
|
||||
|
||||
1. **Create Filter Directory**: Create a new folder in the current directory (e.g., `my_filter/`)
|
||||
2. **Write Filter Code**: Create a `.py` file with clear documentation of functionality and configuration in comments
|
||||
3. **Write Documentation**:
|
||||
- Create `filter_name.md` (English version)
|
||||
- Create `filter_name_cn.md` (Chinese version)
|
||||
- Documentation should include: feature description, configuration parameters, usage examples, and troubleshooting
|
||||
4. **Update This List**: Add your new filter to the table above
|
||||
|
||||
## ⚙️ Configuration Best Practices
|
||||
|
||||
- **Priority Management**: Set appropriate filter priority to ensure correct execution order
|
||||
- **Parameter Tuning**: Adjust filter parameters based on your specific needs
|
||||
- **Debug Logging**: Enable debug mode during development, disable in production
|
||||
- **Performance Testing**: Test filter performance under high load
|
||||
|
||||
---
|
||||
|
||||
> **Contributor Note**: To ensure project maintainability and user experience, please provide clear and complete documentation for each new filter, including feature description, parameter configuration, usage examples, and troubleshooting guide.
|
||||
67
plugins/filters/README_CN.md
Normal file
67
plugins/filters/README_CN.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# 自动上下文合并过滤器 (Auto Context Merger Filter)
|
||||
|
||||
## 概述
|
||||
|
||||
`auto_context_merger` 是一个 Open WebUI 过滤器插件,旨在通过自动收集和注入上一回合多模型回答的上下文,来增强后续对话的连贯性和深度。当用户在一次多模型回答之后提出新的后续问题时,此过滤器会自动激活。
|
||||
|
||||
它会从对话历史中识别出上一回合所有 AI 模型的回答,将它们按照清晰的格式直接拼接起来,然后作为一个系统消息注入到当前请求中。这样,当前模型在处理用户的新问题时,就能直接参考到之前所有 AI 的观点,从而提供更全面、更连贯的回答。
|
||||
|
||||
## 工作原理
|
||||
|
||||
1. **触发时机**: 当用户在一次“多模型回答”之后,发送新的后续问题时,此过滤器会自动激活。
|
||||
2. **获取历史数据**: 过滤器会使用当前对话的 `chat_id`,从数据库中加载完整的对话历史记录。
|
||||
3. **分析上一回合**: 通过分析对话树结构,它能准确找到用户上一个问题,以及当时所有 AI 模型给出的并行回答。
|
||||
4. **直接格式化**: 如果检测到上一回合确实有多个 AI 回答,它会收集所有这些 AI 的回答内容。
|
||||
5. **智能注入**: 将这些格式化后的回答作为一个系统消息,注入到当前请求的 `messages` 列表的开头,紧邻用户的新问题之前。
|
||||
6. **传递给目标模型**: 修改后的消息体(包含格式化后的上下文)将传递给用户最初选择的目标模型。目标模型在生成响应时,将能够利用这个更丰富的上下文。
|
||||
7. **状态更新**: 在整个处理过程中,过滤器会通过 `__event_emitter__` 提供实时状态更新,让用户了解处理进度。
|
||||
|
||||
## 配置 (Valves)
|
||||
|
||||
您可以在 Open WebUI 的管理界面中配置此过滤器的 `Valves`。
|
||||
|
||||
* **`CONTEXT_PREFIX`** (字符串, 必填):
|
||||
* **描述**: 注入的系统消息的前缀文本。它会出现在合并后的上下文之前,用于向模型解释这段内容的来源和目的。
|
||||
* **示例**: `**背景知识**:为了更好地回答您的新问题,请参考上一轮对话中多个AI模型给出的回答:\n\n`
|
||||
|
||||
## 如何使用
|
||||
|
||||
1. **部署过滤器**: 将 `auto_context_merger.py` 文件放置在 Open WebUI 实例的 `plugins/filters/` 目录下。
|
||||
2. **启用过滤器**: 登录 Open WebUI 管理界面,导航到 **Workspace -> Functions**。找到 `auto_context_merger` 过滤器并启用它。
|
||||
3. **配置参数**: 点击 `auto_context_merger` 过滤器旁边的编辑按钮,根据您的需求配置 `CONTEXT_PREFIX`。
|
||||
4. **开始对话**:
|
||||
* 首先,向一个模型提问,并确保有多个模型(例如通过 `gemini_manifold` 或其他多模型工具)给出回答。
|
||||
* 然后,针对这个多模型回答,提出您的后续问题。
|
||||
* 此过滤器将自动激活,将上一回合所有 AI 的回答合并并注入到当前请求中。
|
||||
|
||||
## 示例
|
||||
|
||||
假设您配置了 `CONTEXT_PREFIX` 为默认值。
|
||||
|
||||
1. **用户提问**: “解释一下量子力学”
|
||||
2. **多个 AI 回答** (例如,模型 A 和模型 B 都给出了回答)
|
||||
3. **用户再次提问**: “那么,量子纠缠和量子隧穿有什么区别?”
|
||||
|
||||
此时,`auto_context_merger` 过滤器将自动激活:
|
||||
1. 它会获取模型 A 和模型 B 对“解释一下量子力学”的回答。
|
||||
2. 将它们格式化为:
|
||||
```
|
||||
**背景知识**:为了更好地回答您的新问题,请参考上一轮对话中多个AI模型给出的回答:
|
||||
|
||||
**来自模型 '模型A名称' 的回答是:**
|
||||
[模型A对量子力学的解释]
|
||||
|
||||
---
|
||||
|
||||
**来自模型 '模型B名称' 的回答是:**
|
||||
[模型B对量子力学的解释]
|
||||
```
|
||||
3. 然后,将这段内容作为一个系统消息,注入到当前请求中,紧邻“那么,量子纠缠和量子隧穿有什么区别?”这个用户问题之前。
|
||||
|
||||
最终,模型将收到一个包含所有相关上下文的请求,从而能够更准确、更全面地回答您的后续问题。
|
||||
|
||||
## 注意事项
|
||||
|
||||
* 此过滤器旨在增强多模型对话的连贯性,通过提供更丰富的上下文来帮助模型理解后续问题。
|
||||
* 确保您的 Open WebUI 实例中已配置并启用了 `gemini_manifold` 或其他能够产生多模型回答的工具,以便此过滤器能够检测到多模型历史。
|
||||
* 此过滤器不会增加额外的模型调用,因此不会显著增加延迟或成本。它只是对现有历史数据进行格式化和注入。
|
||||
@@ -0,0 +1,77 @@
|
||||
# Async Context Compression Filter
|
||||
|
||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.0.0 | **License:** MIT
|
||||
|
||||
> **Important Note**: To ensure the maintainability and usability of all filters, each filter should be accompanied by clear and complete documentation to fully explain its functionality, configuration, and usage.
|
||||
|
||||
This filter significantly reduces token consumption in long conversations by using intelligent summarization and message compression, while maintaining conversational coherence.
|
||||
|
||||
---
|
||||
|
||||
## Core Features
|
||||
|
||||
- ✅ **Automatic Compression**: Triggers context compression automatically based on a message count threshold.
|
||||
- ✅ **Asynchronous Summarization**: Generates summaries in the background without blocking the current chat response.
|
||||
- ✅ **Persistent Storage**: Supports both PostgreSQL and SQLite databases to ensure summaries are not lost after a service restart.
|
||||
- ✅ **Flexible Retention Policy**: Freely configure the number of initial and final messages to keep, ensuring critical information and context continuity.
|
||||
- ✅ **Smart Injection**: Intelligently injects the generated historical summary into the new context.
|
||||
|
||||
---
|
||||
|
||||
## Installation & Configuration
|
||||
|
||||
### 1. Environment Variable
|
||||
|
||||
This plugin requires a database connection. You **must** configure the `DATABASE_URL` in your Open WebUI environment variables.
|
||||
|
||||
- **PostgreSQL Example**:
|
||||
```
|
||||
DATABASE_URL=postgresql://user:password@host:5432/openwebui
|
||||
```
|
||||
- **SQLite Example**:
|
||||
```
|
||||
DATABASE_URL=sqlite:///path/to/your/data/webui.db
|
||||
```
|
||||
|
||||
### 2. Filter Order
|
||||
|
||||
It is recommended to set the priority of this filter relatively high (a smaller number) to ensure it runs before other filters that might modify message content. A typical order might be:
|
||||
|
||||
1. **Pre-Filters (priority < 10)**
|
||||
- e.g., A filter that injects a system-level prompt.
|
||||
2. **This Compression Filter (priority = 10)**
|
||||
3. **Post-Filters (priority > 10)**
|
||||
- e.g., A filter that formats the final output.
|
||||
|
||||
---
|
||||
|
||||
## Configuration Parameters
|
||||
|
||||
You can adjust the following parameters in the filter's settings:
|
||||
|
||||
| Parameter | Default | Description |
|
||||
| :--- | :--- | :--- |
|
||||
| `priority` | `10` | The execution order of the filter. Lower numbers run first. |
|
||||
| `compression_threshold` | `15` | When the total message count reaches this value, a background summary generation will be triggered. |
|
||||
| `keep_first` | `1` | Always keep the first N messages. The first message often contains important system prompts. |
|
||||
| `keep_last` | `6` | Always keep the last N messages to ensure contextual coherence. |
|
||||
| `summary_model` | `None` | The model used for generating summaries. **Strongly recommended** to set a fast, economical, and compatible model (e.g., `gemini-2.5-flash`). If left empty, it will try to use the current chat's model, which may fail if it's an incompatible model type (like a Pipe model). |
|
||||
| `max_summary_tokens` | `4000` | The maximum number of tokens allowed for the generated summary. |
|
||||
| `summary_temperature` | `0.3` | Controls the randomness of the summary. Lower values are more deterministic. |
|
||||
| `debug_mode` | `true` | Whether to print detailed debug information to the log. Recommended to set to `false` in production. |
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- **Problem: Database connection failed.**
|
||||
- **Solution**: Please ensure the `DATABASE_URL` environment variable is set correctly and that the database service is running.
|
||||
|
||||
- **Problem: Summary not generated.**
|
||||
- **Solution**: Check if the `compression_threshold` has been met and verify that `summary_model` is configured correctly. Check the logs for detailed errors.
|
||||
|
||||
- **Problem: Initial system prompt is lost.**
|
||||
- **Solution**: Ensure `keep_first` is set to a value greater than 0 to preserve the initial messages containing important information.
|
||||
|
||||
- **Problem: Compression effect is not significant.**
|
||||
- **Solution**: Try increasing the `compression_threshold` or decreasing the `keep_first` / `keep_last` values.
|
||||
@@ -0,0 +1,780 @@
|
||||
"""
|
||||
title: Async Context Compression
|
||||
id: async_context_compression
|
||||
author: Fu-Jie
|
||||
author_url: https://github.com/Fu-Jie
|
||||
funding_url: https://github.com/Fu-Jie/awesome-openwebui
|
||||
description: Reduces token consumption in long conversations while maintaining coherence through intelligent summarization and message compression.
|
||||
version: 1.0.1
|
||||
license: MIT
|
||||
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
📌 Overview
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
This filter significantly reduces token consumption in long conversations by using intelligent summarization and message compression, while maintaining conversational coherence.
|
||||
|
||||
Core Features:
|
||||
✅ Automatic compression triggered by a message count threshold
|
||||
✅ Asynchronous summary generation (does not block user response)
|
||||
✅ Persistent storage with database support (PostgreSQL and SQLite)
|
||||
✅ Flexible retention policy (configurable to keep first and last N messages)
|
||||
✅ Smart summary injection to maintain context
|
||||
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
🔄 Workflow
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
Phase 1: Inlet (Pre-request processing)
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
1. Receives all messages in the current conversation.
|
||||
2. Checks for a previously saved summary.
|
||||
3. If a summary exists and the message count exceeds the retention threshold:
|
||||
├─ Extracts the first N messages to be kept.
|
||||
├─ Injects the summary into the first message.
|
||||
├─ Extracts the last N messages to be kept.
|
||||
└─ Combines them into a new message list: [Kept First Messages + Summary] + [Kept Last Messages].
|
||||
4. Sends the compressed message list to the LLM.
|
||||
|
||||
Phase 2: Outlet (Post-response processing)
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
1. Triggered after the LLM response is complete.
|
||||
2. Checks if the message count has reached the compression threshold.
|
||||
3. If the threshold is met, an asynchronous background task is started to generate a summary:
|
||||
├─ Extracts messages to be summarized (excluding the kept first and last messages).
|
||||
├─ Calls the LLM to generate a concise summary.
|
||||
└─ Saves the summary to the database.
|
||||
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
💾 Storage
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
This filter uses a database for persistent storage, configured via the `DATABASE_URL` environment variable. It supports both PostgreSQL and SQLite.
|
||||
|
||||
Configuration:
|
||||
- The `DATABASE_URL` environment variable must be set.
|
||||
- PostgreSQL Example: `postgresql://user:password@host:5432/openwebui`
|
||||
- SQLite Example: `sqlite:///path/to/your/database.db`
|
||||
|
||||
The filter automatically selects the appropriate database driver based on the `DATABASE_URL` prefix (`postgres` or `sqlite`).
|
||||
|
||||
Table Structure (`chat_summary`):
|
||||
- id: Primary Key (auto-increment)
|
||||
- chat_id: Unique chat identifier (indexed)
|
||||
- summary: The summary content (TEXT)
|
||||
- compressed_message_count: The original number of messages
|
||||
- created_at: Timestamp of creation
|
||||
- updated_at: Timestamp of last update
|
||||
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
📊 Compression Example
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
Scenario: A 20-message conversation (Default settings: keep first 1, keep last 6)
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
Before Compression:
|
||||
Message 1: [Initial prompt + First question]
|
||||
Messages 2-14: [Historical conversation]
|
||||
Messages 15-20: [Recent conversation]
|
||||
Total: 20 full messages
|
||||
|
||||
After Compression:
|
||||
Message 1: [Initial prompt + Historical summary + First question]
|
||||
Messages 15-20: [Last 6 full messages]
|
||||
Total: 7 messages
|
||||
|
||||
Effect:
|
||||
✓ Saves 13 messages (approx. 65%)
|
||||
✓ Retains full context
|
||||
✓ Protects important initial prompts
|
||||
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
⚙️ Configuration
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
priority
|
||||
Default: 10
|
||||
Description: The execution order of the filter. Lower numbers run first.
|
||||
|
||||
compression_threshold
|
||||
Default: 15
|
||||
Description: When the message count reaches this value, a background summary generation will be triggered after the conversation ends.
|
||||
Recommendation: Adjust based on your model's context window and cost.
|
||||
|
||||
keep_first
|
||||
Default: 1
|
||||
Description: Always keep the first N messages of the conversation. Set to 0 to disable. The first message often contains important system prompts.
|
||||
|
||||
keep_last
|
||||
Default: 6
|
||||
Description: Always keep the last N full messages of the conversation to ensure context coherence.
|
||||
|
||||
summary_model
|
||||
Default: None
|
||||
Description: The LLM used to generate the summary.
|
||||
Recommendation:
|
||||
- It is strongly recommended to configure a fast, economical, and compatible model, such as `deepseek-v3`、`gemini-2.5-flash`、`gpt-4.1`。
|
||||
- If left empty, the filter will attempt to use the model from the current conversation.
|
||||
Note:
|
||||
- If the current conversation uses a pipeline (Pipe) model or a model that does not support standard generation APIs, leaving this field empty may cause summary generation to fail. In this case, you must specify a valid model.
|
||||
|
||||
max_summary_tokens
|
||||
Default: 4000
|
||||
Description: The maximum number of tokens allowed for the generated summary.
|
||||
|
||||
summary_temperature
|
||||
Default: 0.3
|
||||
Description: Controls the randomness of the summary generation. Lower values produce more deterministic output.
|
||||
|
||||
debug_mode
|
||||
Default: true
|
||||
Description: Prints detailed debug information to the log. Recommended to set to `false` in production.
|
||||
|
||||
🔧 Deployment
|
||||
═══════════════════════════════════════════════════════
|
||||
|
||||
Docker Compose Example:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
services:
|
||||
openwebui:
|
||||
environment:
|
||||
DATABASE_URL: postgresql://user:password@postgres:5432/openwebui
|
||||
depends_on:
|
||||
- postgres
|
||||
|
||||
postgres:
|
||||
image: postgres:15-alpine
|
||||
environment:
|
||||
POSTGRES_USER: user
|
||||
POSTGRES_PASSWORD: password
|
||||
POSTGRES_DB: openwebui
|
||||
|
||||
Suggested Filter Installation Order:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
It is recommended to set the priority of this filter relatively high (a smaller number) to ensure it runs before other filters that might modify message content. A typical order might be:
|
||||
|
||||
1. Filters that need access to the full, uncompressed history (priority < 10)
|
||||
(e.g., a filter that injects a system-level prompt)
|
||||
2. This compression filter (priority = 10)
|
||||
3. Filters that run after compression (priority > 10)
|
||||
(e.g., a final output formatting filter)
|
||||
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
📝 Database Query Examples
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
View all summaries:
|
||||
SELECT
|
||||
chat_id,
|
||||
LEFT(summary, 100) as summary_preview,
|
||||
compressed_message_count,
|
||||
updated_at
|
||||
FROM chat_summary
|
||||
ORDER BY updated_at DESC;
|
||||
|
||||
Query a specific conversation:
|
||||
SELECT *
|
||||
FROM chat_summary
|
||||
WHERE chat_id = 'your_chat_id';
|
||||
|
||||
Delete old summaries:
|
||||
DELETE FROM chat_summary
|
||||
WHERE updated_at < NOW() - INTERVAL '30 days';
|
||||
|
||||
Statistics:
|
||||
SELECT
|
||||
COUNT(*) as total_summaries,
|
||||
AVG(LENGTH(summary)) as avg_summary_length,
|
||||
AVG(compressed_message_count) as avg_msg_count
|
||||
FROM chat_summary;
|
||||
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
⚠️ Important Notes
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
1. Database Permissions
|
||||
⚠ Ensure the user specified in `DATABASE_URL` has permissions to create tables.
|
||||
⚠ The `chat_summary` table will be created automatically on first run.
|
||||
|
||||
2. Retention Policy
|
||||
⚠ The `keep_first` setting is crucial for preserving initial messages that contain system prompts. Configure it as needed.
|
||||
|
||||
3. Performance
|
||||
⚠ Summary generation is asynchronous and will not block the user response.
|
||||
⚠ There will be a brief background processing time when the threshold is first met.
|
||||
|
||||
4. Cost Optimization
|
||||
⚠ The summary model is called once each time the threshold is met.
|
||||
⚠ Set `compression_threshold` reasonably to avoid frequent calls.
|
||||
⚠ It's recommended to use a fast and economical model to generate summaries.
|
||||
|
||||
5. Multimodal Support
|
||||
✓ This filter supports multimodal messages containing images.
|
||||
✓ The summary is generated only from the text content.
|
||||
✓ Non-text parts (like images) are preserved in their original messages during compression.
|
||||
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
🐛 Troubleshooting
|
||||
═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
Problem: Database connection failed
|
||||
Solution:
|
||||
1. Verify that the `DATABASE_URL` environment variable is set correctly.
|
||||
2. Confirm that `DATABASE_URL` starts with either `sqlite` or `postgres`.
|
||||
3. Ensure the database service is running and network connectivity is normal.
|
||||
4. Validate the username, password, host, and port in the connection URL.
|
||||
5. Check the Open WebUI container logs for detailed error messages.
|
||||
|
||||
Problem: Summary not generated
|
||||
Solution:
|
||||
1. Check if the `compression_threshold` has been met.
|
||||
2. Verify that the `summary_model` is configured correctly.
|
||||
3. Check the debug logs for any error messages.
|
||||
|
||||
Problem: Initial system prompt is lost
|
||||
Solution:
|
||||
- Ensure `keep_first` is set to a value greater than 0 to preserve the initial messages containing this information.
|
||||
|
||||
Problem: Compression effect is not significant
|
||||
Solution:
|
||||
1. Increase the `compression_threshold` appropriately.
|
||||
2. Decrease the number of `keep_last` or `keep_first`.
|
||||
3. Check if the conversation is actually long enough.
|
||||
|
||||
|
||||
"""
|
||||
|
||||
from pydantic import BaseModel, Field, model_validator
|
||||
from typing import Optional
|
||||
import asyncio
|
||||
import json
|
||||
import hashlib
|
||||
import os
|
||||
|
||||
# Open WebUI built-in imports
|
||||
from open_webui.utils.chat import generate_chat_completion
|
||||
from open_webui.models.users import Users
|
||||
from fastapi.requests import Request
|
||||
from open_webui.main import app as webui_app
|
||||
|
||||
# Database imports
|
||||
from sqlalchemy import create_engine, Column, String, Text, DateTime, Integer
|
||||
from sqlalchemy.ext.declarative import declarative_base
|
||||
from sqlalchemy.orm import sessionmaker
|
||||
from datetime import datetime
|
||||
|
||||
Base = declarative_base()
|
||||
|
||||
|
||||
class ChatSummary(Base):
|
||||
"""Chat Summary Storage Table"""
|
||||
|
||||
__tablename__ = "chat_summary"
|
||||
|
||||
id = Column(Integer, primary_key=True, autoincrement=True)
|
||||
chat_id = Column(String(255), unique=True, nullable=False, index=True)
|
||||
summary = Column(Text, nullable=False)
|
||||
compressed_message_count = Column(Integer, default=0)
|
||||
created_at = Column(DateTime, default=datetime.utcnow)
|
||||
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
|
||||
|
||||
|
||||
class Filter:
|
||||
def __init__(self):
|
||||
self.valves = self.Valves()
|
||||
self._db_engine = None
|
||||
self._SessionLocal = None
|
||||
self._init_database()
|
||||
|
||||
def _init_database(self):
|
||||
"""Initializes the database connection and table."""
|
||||
try:
|
||||
database_url = os.getenv("DATABASE_URL")
|
||||
|
||||
if not database_url:
|
||||
print("[Database] ❌ Error: DATABASE_URL environment variable is not set. Please set this variable.")
|
||||
self._db_engine = None
|
||||
self._SessionLocal = None
|
||||
return
|
||||
|
||||
db_type = None
|
||||
engine_args = {}
|
||||
|
||||
if database_url.startswith("sqlite"):
|
||||
db_type = "SQLite"
|
||||
engine_args = {
|
||||
"connect_args": {"check_same_thread": False},
|
||||
"echo": False,
|
||||
}
|
||||
elif database_url.startswith("postgres"):
|
||||
db_type = "PostgreSQL"
|
||||
if database_url.startswith("postgres://"):
|
||||
database_url = database_url.replace(
|
||||
"postgres://", "postgresql://", 1
|
||||
)
|
||||
print("[Database] ℹ️ Automatically converted postgres:// to postgresql://")
|
||||
engine_args = {
|
||||
"pool_pre_ping": True,
|
||||
"pool_recycle": 3600,
|
||||
"echo": False,
|
||||
}
|
||||
else:
|
||||
print(
|
||||
f"[Database] ❌ Error: Unsupported database type. DATABASE_URL must start with 'sqlite' or 'postgres'. Current value: {database_url}"
|
||||
)
|
||||
self._db_engine = None
|
||||
self._SessionLocal = None
|
||||
return
|
||||
|
||||
# Create database engine
|
||||
self._db_engine = create_engine(database_url, **engine_args)
|
||||
|
||||
# Create session factory
|
||||
self._SessionLocal = sessionmaker(
|
||||
autocommit=False, autoflush=False, bind=self._db_engine
|
||||
)
|
||||
|
||||
# Create table if it doesn't exist
|
||||
Base.metadata.create_all(bind=self._db_engine)
|
||||
|
||||
print(f"[Database] ✅ Successfully connected to {db_type} and initialized the chat_summary table.")
|
||||
|
||||
except Exception as e:
|
||||
print(f"[Database] ❌ Initialization failed: {str(e)}")
|
||||
self._db_engine = None
|
||||
self._SessionLocal = None
|
||||
|
||||
class Valves(BaseModel):
|
||||
priority: int = Field(
|
||||
default=10, description="Priority level for the filter operations."
|
||||
)
|
||||
compression_threshold: int = Field(
|
||||
default=15, ge=0, description="The number of messages at which to trigger compression."
|
||||
)
|
||||
keep_first: int = Field(
|
||||
default=1, ge=0, description="Always keep the first N messages. Set to 0 to disable."
|
||||
)
|
||||
keep_last: int = Field(default=6, ge=0, description="Always keep the last N messages.")
|
||||
summary_model: str = Field(
|
||||
default=None,
|
||||
description="The model to use for generating the summary. If empty, uses the current conversation's model.",
|
||||
)
|
||||
max_summary_tokens: int = Field(
|
||||
default=4000, ge=1, description="The maximum number of tokens for the summary."
|
||||
)
|
||||
summary_temperature: float = Field(
|
||||
default=0.3, ge=0.0, le=2.0, description="The temperature for summary generation."
|
||||
)
|
||||
debug_mode: bool = Field(default=True, description="Enable detailed logging for debugging.")
|
||||
|
||||
@model_validator(mode="after")
|
||||
def check_thresholds(self) -> "Valves":
|
||||
kept_count = self.keep_first + self.keep_last
|
||||
if self.compression_threshold <= kept_count:
|
||||
raise ValueError(
|
||||
f"compression_threshold ({self.compression_threshold}) must be greater than "
|
||||
f"the sum of keep_first ({self.keep_first}) and keep_last ({self.keep_last}) ({kept_count})."
|
||||
)
|
||||
return self
|
||||
|
||||
def _save_summary(self, chat_id: str, summary: str, body: dict):
|
||||
"""Saves the summary to the database."""
|
||||
if not self._SessionLocal:
|
||||
if self.valves.debug_mode:
|
||||
print("[Storage] Database not initialized, skipping summary save.")
|
||||
return
|
||||
|
||||
try:
|
||||
session = self._SessionLocal()
|
||||
try:
|
||||
# Find existing record
|
||||
existing = (
|
||||
session.query(ChatSummary).filter_by(chat_id=chat_id).first()
|
||||
)
|
||||
|
||||
if existing:
|
||||
# Update existing record
|
||||
existing.summary = summary
|
||||
existing.compressed_message_count = len(body.get("messages", []))
|
||||
existing.updated_at = datetime.utcnow()
|
||||
else:
|
||||
# Create new record
|
||||
new_summary = ChatSummary(
|
||||
chat_id=chat_id,
|
||||
summary=summary,
|
||||
compressed_message_count=len(body.get("messages", [])),
|
||||
)
|
||||
session.add(new_summary)
|
||||
|
||||
session.commit()
|
||||
|
||||
if self.valves.debug_mode:
|
||||
action = "Updated" if existing else "Created"
|
||||
print(f"[Storage] Summary has been {action.lower()} in the database (Chat ID: {chat_id})")
|
||||
|
||||
finally:
|
||||
session.close()
|
||||
|
||||
except Exception as e:
|
||||
print(f"[Storage] ❌ Database save failed: {str(e)}")
|
||||
|
||||
def _load_summary(self, chat_id: str, body: dict) -> Optional[str]:
|
||||
"""Loads the summary from the database."""
|
||||
if not self._SessionLocal:
|
||||
if self.valves.debug_mode:
|
||||
print("[Storage] Database not initialized, cannot load summary.")
|
||||
return None
|
||||
|
||||
try:
|
||||
session = self._SessionLocal()
|
||||
try:
|
||||
record = (
|
||||
session.query(ChatSummary).filter_by(chat_id=chat_id).first()
|
||||
)
|
||||
|
||||
if record:
|
||||
if self.valves.debug_mode:
|
||||
print(f"[Storage] Loaded summary from database (Chat ID: {chat_id})")
|
||||
print(
|
||||
f"[Storage] Last updated: {record.updated_at}, Original message count: {record.compressed_message_count}"
|
||||
)
|
||||
return record.summary
|
||||
|
||||
finally:
|
||||
session.close()
|
||||
|
||||
except Exception as e:
|
||||
print(f"[Storage] ❌ Database read failed: {str(e)}")
|
||||
|
||||
return None
|
||||
|
||||
def _inject_summary_to_first_message(self, message: dict, summary: str) -> dict:
|
||||
"""Injects the summary into the first message by prepending it."""
|
||||
content = message.get("content", "")
|
||||
summary_block = f"【Historical Conversation Summary】\n{summary}\n\n---\nBelow is the recent conversation:\n\n"
|
||||
|
||||
# Handle different content types
|
||||
if isinstance(content, list): # Multimodal content
|
||||
# Find the first text part and insert the summary before it
|
||||
new_content = []
|
||||
summary_inserted = False
|
||||
|
||||
for part in content:
|
||||
if (
|
||||
isinstance(part, dict)
|
||||
and part.get("type") == "text"
|
||||
and not summary_inserted
|
||||
):
|
||||
# Prepend summary to the first text part
|
||||
new_content.append(
|
||||
{"type": "text", "text": summary_block + part.get("text", "")}
|
||||
)
|
||||
summary_inserted = True
|
||||
else:
|
||||
new_content.append(part)
|
||||
|
||||
# If no text part, insert at the beginning
|
||||
if not summary_inserted:
|
||||
new_content.insert(0, {"type": "text", "text": summary_block})
|
||||
|
||||
message["content"] = new_content
|
||||
|
||||
elif isinstance(content, str): # Plain text
|
||||
message["content"] = summary_block + content
|
||||
|
||||
return message
|
||||
|
||||
async def inlet(
|
||||
self, body: dict, __user__: Optional[dict] = None, __metadata__: dict = None
|
||||
) -> dict:
|
||||
"""
|
||||
Executed before sending to the LLM.
|
||||
Compression Strategy:
|
||||
1. Keep the first N messages.
|
||||
2. Inject the summary into the first message (if keep_first > 0).
|
||||
3. Keep the last N messages.
|
||||
"""
|
||||
messages = body.get("messages", [])
|
||||
chat_id = __metadata__["chat_id"]
|
||||
|
||||
if self.valves.debug_mode:
|
||||
print(f"\n{'='*60}")
|
||||
print(f"[Inlet] Chat ID: {chat_id}")
|
||||
print(f"[Inlet] Received {len(messages)} messages")
|
||||
|
||||
# [Optimization] Load summary in a background thread to avoid blocking the event loop.
|
||||
if self.valves.debug_mode:
|
||||
print("[Optimization] Loading summary in a background thread to avoid blocking the event loop.")
|
||||
saved_summary = await asyncio.to_thread(self._load_summary, chat_id, body)
|
||||
|
||||
total_kept_count = self.valves.keep_first + self.valves.keep_last
|
||||
|
||||
if saved_summary and len(messages) > total_kept_count:
|
||||
if self.valves.debug_mode:
|
||||
print(f"[Inlet] Found saved summary, applying compression.")
|
||||
|
||||
first_messages_to_keep = []
|
||||
|
||||
if self.valves.keep_first > 0:
|
||||
# Copy the initial messages to keep
|
||||
first_messages_to_keep = [
|
||||
m.copy() for m in messages[: self.valves.keep_first]
|
||||
]
|
||||
# Inject the summary into the very first message
|
||||
first_messages_to_keep[0] = self._inject_summary_to_first_message(
|
||||
first_messages_to_keep[0], saved_summary
|
||||
)
|
||||
else:
|
||||
# If not keeping initial messages, create a new system message for the summary
|
||||
summary_block = (
|
||||
f"【Historical Conversation Summary】\n{saved_summary}\n\n---\nBelow is the recent conversation:\n\n"
|
||||
)
|
||||
first_messages_to_keep.append(
|
||||
{"role": "system", "content": summary_block}
|
||||
)
|
||||
|
||||
# Keep the last messages
|
||||
last_messages_to_keep = (
|
||||
messages[-self.valves.keep_last :] if self.valves.keep_last > 0 else []
|
||||
)
|
||||
|
||||
# Combine: [Kept initial messages (with summary)] + [Kept recent messages]
|
||||
body["messages"] = first_messages_to_keep + last_messages_to_keep
|
||||
|
||||
if self.valves.debug_mode:
|
||||
print(f"[Inlet] ✂️ Compression complete:")
|
||||
print(f" - Original messages: {len(messages)}")
|
||||
print(f" - Compressed to: {len(body['messages'])}")
|
||||
print(
|
||||
f" - Structure: [Keep first {self.valves.keep_first} (with summary)] + [Keep last {self.valves.keep_last}]"
|
||||
)
|
||||
print(f" - Saved: {len(messages) - len(body['messages'])} messages")
|
||||
else:
|
||||
if self.valves.debug_mode:
|
||||
if not saved_summary:
|
||||
print(f"[Inlet] No summary found, using full conversation history.")
|
||||
else:
|
||||
print(f"[Inlet] Message count does not exceed retention threshold, no compression applied.")
|
||||
|
||||
if self.valves.debug_mode:
|
||||
print(f"{'='*60}\n")
|
||||
|
||||
return body
|
||||
|
||||
async def outlet(
|
||||
self, body: dict, __user__: Optional[dict] = None, __metadata__: dict = None
|
||||
) -> dict:
|
||||
"""
|
||||
Executed after the LLM response is complete.
|
||||
Triggers summary generation asynchronously.
|
||||
"""
|
||||
messages = body.get("messages", [])
|
||||
chat_id = __metadata__["chat_id"]
|
||||
|
||||
if self.valves.debug_mode:
|
||||
print(f"\n{'='*60}")
|
||||
print(f"[Outlet] Chat ID: {chat_id}")
|
||||
print(f"[Outlet] Response complete, current message count: {len(messages)}")
|
||||
|
||||
# Check if compression is needed
|
||||
if len(messages) >= self.valves.compression_threshold:
|
||||
if self.valves.debug_mode:
|
||||
print(
|
||||
f"[Outlet] ⚡ Compression threshold reached ({len(messages)} >= {self.valves.compression_threshold})"
|
||||
)
|
||||
print(f"[Outlet] Preparing to generate summary in the background...")
|
||||
|
||||
# Generate summary asynchronously in the background
|
||||
asyncio.create_task(
|
||||
self._generate_summary_async(messages, chat_id, body, __user__)
|
||||
)
|
||||
else:
|
||||
if self.valves.debug_mode:
|
||||
print(
|
||||
f"[Outlet] Compression threshold not reached ({len(messages)} < {self.valves.compression_threshold})"
|
||||
)
|
||||
|
||||
if self.valves.debug_mode:
|
||||
print(f"{'='*60}\n")
|
||||
|
||||
return body
|
||||
|
||||
async def _generate_summary_async(
|
||||
self, messages: list, chat_id: str, body: dict, user_data: Optional[dict]
|
||||
):
|
||||
"""
|
||||
Generates a summary asynchronously in the background.
|
||||
"""
|
||||
try:
|
||||
if self.valves.debug_mode:
|
||||
print(f"\n[🤖 Async Summary Task] Starting...")
|
||||
|
||||
# Messages to summarize: exclude kept initial and final messages
|
||||
if self.valves.keep_last > 0:
|
||||
messages_to_summarize = messages[
|
||||
self.valves.keep_first : -self.valves.keep_last
|
||||
]
|
||||
else:
|
||||
messages_to_summarize = messages[self.valves.keep_first :]
|
||||
|
||||
if len(messages_to_summarize) == 0:
|
||||
if self.valves.debug_mode:
|
||||
print(f"[🤖 Async Summary Task] No messages to summarize, skipping.")
|
||||
return
|
||||
|
||||
if self.valves.debug_mode:
|
||||
print(f"[🤖 Async Summary Task] Preparing to summarize {len(messages_to_summarize)} messages.")
|
||||
print(
|
||||
f"[🤖 Async Summary Task] Protecting: First {self.valves.keep_first} + Last {self.valves.keep_last} messages."
|
||||
)
|
||||
|
||||
# Build conversation history text
|
||||
conversation_text = self._format_messages_for_summary(messages_to_summarize)
|
||||
|
||||
# Call LLM to generate summary
|
||||
summary = await self._call_summary_llm(conversation_text, body, user_data)
|
||||
|
||||
# [Optimization] Save summary in a background thread to avoid blocking the event loop.
|
||||
if self.valves.debug_mode:
|
||||
print("[Optimization] Saving summary in a background thread to avoid blocking the event loop.")
|
||||
await asyncio.to_thread(self._save_summary, chat_id, summary, body)
|
||||
|
||||
if self.valves.debug_mode:
|
||||
print(f"[🤖 Async Summary Task] ✅ Complete! Summary length: {len(summary)} characters.")
|
||||
print(f"[🤖 Async Summary Task] Summary preview: {summary[:150]}...")
|
||||
|
||||
except Exception as e:
|
||||
print(f"[🤖 Async Summary Task] ❌ Error: {str(e)}")
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
# Save a simple placeholder even on failure
|
||||
fallback_summary = (
|
||||
f"[Historical Conversation Summary] Contains content from approximately {len(messages_to_summarize)} messages."
|
||||
)
|
||||
|
||||
# [Optimization] Save summary in a background thread to avoid blocking the event loop.
|
||||
if self.valves.debug_mode:
|
||||
print("[Optimization] Saving summary in a background thread to avoid blocking the event loop.")
|
||||
await asyncio.to_thread(self._save_summary, chat_id, fallback_summary, body)
|
||||
|
||||
def _format_messages_for_summary(self, messages: list) -> str:
|
||||
"""Formats messages for summarization."""
|
||||
formatted = []
|
||||
for i, msg in enumerate(messages, 1):
|
||||
role = msg.get("role", "unknown")
|
||||
content = msg.get("content", "")
|
||||
|
||||
# Handle multimodal content
|
||||
if isinstance(content, list):
|
||||
text_parts = []
|
||||
for part in content:
|
||||
if isinstance(part, dict) and part.get("type") == "text":
|
||||
text_parts.append(part.get("text", ""))
|
||||
content = " ".join(text_parts)
|
||||
|
||||
# Handle role name
|
||||
role_name = {"user": "User", "assistant": "Assistant"}.get(role, role)
|
||||
|
||||
# Limit length of each message to avoid excessive length
|
||||
if len(content) > 500:
|
||||
content = content[:500] + "..."
|
||||
|
||||
formatted.append(f"[{i}] {role_name}: {content}")
|
||||
|
||||
return "\n\n".join(formatted)
|
||||
|
||||
async def _call_summary_llm(
|
||||
self, conversation_text: str, body: dict, user_data: dict
|
||||
) -> str:
|
||||
"""
|
||||
Calls the LLM to generate a summary using Open WebUI's built-in method.
|
||||
"""
|
||||
if self.valves.debug_mode:
|
||||
print(f"[🤖 LLM Call] Using Open WebUI's built-in method.")
|
||||
|
||||
# Build summary prompt
|
||||
summary_prompt = f"""
|
||||
You are a professional conversation context compression assistant. Your task is to perform a high-fidelity compression of the [Conversation Content] below, producing a concise summary that can be used directly as context for subsequent conversation. Strictly adhere to the following requirements:
|
||||
|
||||
MUST RETAIN: Topics/goals, user intent, key facts and data, important parameters and constraints, deadlines, decisions/conclusions, action items and their status, and technical details like code/commands (code must be preserved as is).
|
||||
REMOVE: Greetings, politeness, repetitive statements, off-topic chatter, and procedural details (unless essential). For information that has been overturned or is outdated, please mark it as "Obsolete: <explanation>" when retaining.
|
||||
CONFLICT RESOLUTION: If there are contradictions or multiple revisions, retain the latest consistent conclusion and list unresolved or conflicting points under "Points to Clarify".
|
||||
STRUCTURE AND TONE: Output in structured bullet points. Be logical, objective, and concise. Summarize from a third-person perspective. Use code blocks to preserve technical/code snippets verbatim.
|
||||
OUTPUT LENGTH: Strictly limit the summary content to within {int(self.valves.max_summary_tokens * 3)} characters. Prioritize key information; if space is insufficient, trim details rather than core conclusions.
|
||||
FORMATTING: Output only the summary text. Do not add any extra explanations, execution logs, or generation processes. You must use the following headings (if a section has no content, write "None"):
|
||||
Core Theme:
|
||||
Key Information:
|
||||
... (List 3-6 key points)
|
||||
Decisions/Conclusions:
|
||||
Action Items (with owner/deadline if any):
|
||||
Relevant Roles/Preferences:
|
||||
Risks/Dependencies/Assumptions:
|
||||
Points to Clarify:
|
||||
Compression Ratio: Original ~X words → Summary ~Y words (estimate)
|
||||
Conversation Content:
|
||||
{conversation_text}
|
||||
|
||||
Please directly output the compressed summary that meets the above requirements (summary text only).
|
||||
"""
|
||||
# Determine the model to use
|
||||
model = self.valves.summary_model or body.get("model", "")
|
||||
|
||||
if self.valves.debug_mode:
|
||||
print(f"[🤖 LLM Call] Model: {model}")
|
||||
|
||||
# Build payload
|
||||
payload = {
|
||||
"model": model,
|
||||
"messages": [{"role": "user", "content": summary_prompt}],
|
||||
"stream": False,
|
||||
"max_tokens": self.valves.max_summary_tokens,
|
||||
"temperature": self.valves.summary_temperature,
|
||||
}
|
||||
|
||||
try:
|
||||
# Get user object
|
||||
user_id = user_data.get("id") if user_data else None
|
||||
if not user_id:
|
||||
raise ValueError("Could not get user ID")
|
||||
|
||||
# [Optimization] Get user object in a background thread to avoid blocking the event loop.
|
||||
if self.valves.debug_mode:
|
||||
print("[Optimization] Getting user object in a background thread to avoid blocking the event loop.")
|
||||
user = await asyncio.to_thread(Users.get_user_by_id, user_id)
|
||||
|
||||
if not user:
|
||||
raise ValueError(f"Could not find user: {user_id}")
|
||||
|
||||
if self.valves.debug_mode:
|
||||
print(f"[🤖 LLM Call] User: {user.email}")
|
||||
print(f"[🤖 LLM Call] Sending request...")
|
||||
|
||||
# Create Request object
|
||||
request = Request(scope={"type": "http", "app": webui_app})
|
||||
|
||||
# Call generate_chat_completion
|
||||
response = await generate_chat_completion(request, payload, user)
|
||||
|
||||
if not response or "choices" not in response or not response["choices"]:
|
||||
raise ValueError("LLM response is not in the correct format or is empty")
|
||||
|
||||
summary = response["choices"][0]["message"]["content"].strip()
|
||||
|
||||
if self.valves.debug_mode:
|
||||
print(f"[🤖 LLM Call] ✅ Successfully received summary.")
|
||||
|
||||
return summary
|
||||
|
||||
except Exception as e:
|
||||
error_message = f"An error occurred while calling the LLM ({model}) to generate a summary: {str(e)}"
|
||||
if not self.valves.summary_model:
|
||||
error_message += (
|
||||
"\n[Hint] You did not specify a summary_model, so the filter attempted to use the current conversation's model. "
|
||||
"If this is a pipeline (Pipe) model or an incompatible model, please specify a compatible summary model (e.g., 'gemini-2.5-flash') in the configuration."
|
||||
)
|
||||
|
||||
if self.valves.debug_mode:
|
||||
print(f"[🤖 LLM Call] ❌ {error_message}")
|
||||
|
||||
raise Exception(error_message)
|
||||
@@ -0,0 +1,77 @@
|
||||
# 异步上下文压缩过滤器
|
||||
|
||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie) | **版本:** 1.0.0 | **许可证:** MIT
|
||||
|
||||
> **重要提示**:为了确保所有过滤器的可维护性和易用性,每个过滤器都应附带清晰、完整的文档,以确保其功能、配置和使用方法得到充分说明。
|
||||
|
||||
本过滤器通过智能摘要和消息压缩技术,在保持对话连贯性的同时,显著降低长对话的Token消耗。
|
||||
|
||||
---
|
||||
|
||||
## 核心特性
|
||||
|
||||
- ✅ **自动压缩**: 基于消息数量阈值自动触发上下文压缩。
|
||||
- ✅ **异步摘要**: 在后台生成摘要,不阻塞当前对话的响应。
|
||||
- ✅ **持久化存储**: 支持 PostgreSQL 和 SQLite 数据库,确保摘要在服务重启后不丢失。
|
||||
- ✅ **灵活保留策略**: 可自由配置保留对话头部和尾部的消息数量,确保关键信息和上下文的连贯性。
|
||||
- ✅ **智能注入**: 将生成的历史摘要智能地注入到新的上下文中。
|
||||
|
||||
---
|
||||
|
||||
## 安装与配置
|
||||
|
||||
### 1. 环境变量
|
||||
|
||||
本插件的运行依赖于数据库,您**必须**在 Open WebUI 的环境变量中配置 `DATABASE_URL`。
|
||||
|
||||
- **PostgreSQL 示例**:
|
||||
```
|
||||
DATABASE_URL=postgresql://user:password@host:5432/openwebui
|
||||
```
|
||||
- **SQLite 示例**:
|
||||
```
|
||||
DATABASE_URL=sqlite:///path/to/your/data/webui.db
|
||||
```
|
||||
|
||||
### 2. 过滤器顺序
|
||||
|
||||
建议将此过滤器的优先级设置得相对较高(数值较小),以确保它在其他可能修改消息内容的过滤器之前运行。一个典型的顺序可能是:
|
||||
|
||||
1. **前置过滤器 (priority < 10)**
|
||||
- 例如:注入系统级提示的过滤器。
|
||||
2. **本压缩过滤器 (priority = 10)**
|
||||
3. **后置过滤器 (priority > 10)**
|
||||
- 例如:对最终输出进行格式化的过滤器。
|
||||
|
||||
---
|
||||
|
||||
## 配置参数
|
||||
|
||||
您可以在过滤器的设置中调整以下参数:
|
||||
|
||||
| 参数 | 默认值 | 描述 |
|
||||
| :--- | :--- | :--- |
|
||||
| `priority` | `10` | 过滤器执行顺序,数值越小越先执行。 |
|
||||
| `compression_threshold` | `15` | 当总消息数达到此值时,将在后台触发摘要生成。 |
|
||||
| `keep_first` | `1` | 始终保留对话开始的 N 条消息。第一条消息通常包含重要的系统提示。 |
|
||||
| `keep_last` | `6` | 始终保留对话末尾的 N 条消息,以确保上下文连贯。 |
|
||||
| `summary_model` | `None` | 用于生成摘要的模型。**强烈建议**配置一个快速、经济的兼容模型(如 `gemini-2.5-flash`)。如果留空,将尝试使用当前对话的模型,但这可能因模型不兼容(如 Pipe 模型)而失败。 |
|
||||
| `max_summary_tokens` | `4000` | 生成摘要时允许的最大 Token 数。 |
|
||||
| `summary_temperature` | `0.3` | 控制摘要生成的随机性,较低的值结果更稳定。 |
|
||||
| `debug_mode` | `true` | 是否在日志中打印详细的调试信息。生产环境建议设为 `false`。 |
|
||||
|
||||
---
|
||||
|
||||
## 故障排除
|
||||
|
||||
- **问题:数据库连接失败**
|
||||
- **解决**:请确认 `DATABASE_URL` 环境变量已正确设置,并且数据库服务运行正常。
|
||||
|
||||
- **问题:摘要未生成**
|
||||
- **解决**:检查 `compression_threshold` 是否已达到,并确认 `summary_model` 配置正确。查看日志以获取详细错误。
|
||||
|
||||
- **问题:初始的系统提示丢失**
|
||||
- **解决**:确保 `keep_first` 的值大于 0,以保留包含重要信息的初始消息。
|
||||
|
||||
- **问题:压缩效果不明显**
|
||||
- **解决**:尝试适当提高 `compression_threshold`,或减少 `keep_first` / `keep_last` 的值。
|
||||
662
plugins/filters/async-context-compression/工作流程指南.md
Normal file
662
plugins/filters/async-context-compression/工作流程指南.md
Normal file
@@ -0,0 +1,662 @@
|
||||
# 异步上下文压缩过滤器 - 工作流程指南
|
||||
|
||||
## 📋 目录
|
||||
1. [概述](#概述)
|
||||
2. [系统架构](#系统架构)
|
||||
3. [工作流程详解](#工作流程详解)
|
||||
4. [Token 计数机制](#token-计数机制)
|
||||
5. [递归摘要机制](#递归摘要机制)
|
||||
6. [配置指南](#配置指南)
|
||||
7. [最佳实践](#最佳实践)
|
||||
|
||||
---
|
||||
|
||||
## 概述
|
||||
|
||||
异步上下文压缩过滤器是一个高性能的消息压缩插件,通过以下方式降低长对话的 Token 消耗:
|
||||
|
||||
- **智能摘要**:将历史消息压缩成高保真摘要
|
||||
- **递归更新**:新摘要合并旧摘要,保证历史连贯性
|
||||
- **异步处理**:后台生成摘要,不阻塞用户响应
|
||||
- **灵活配置**:支持全局和模型特定的阈值配置
|
||||
|
||||
### 核心指标
|
||||
- **压缩率**:可达 65% 以上(取决于对话长度)
|
||||
- **响应时间**:inlet 阶段 <10ms(无计算开销)
|
||||
- **摘要质量**:高保真递归摘要,保留关键信息
|
||||
|
||||
---
|
||||
|
||||
## 系统架构
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ 用户请求流程 │
|
||||
└────────────────┬────────────────────────────────────┘
|
||||
│
|
||||
┌────────────▼──────────────┐
|
||||
│ inlet(请求前处理) │
|
||||
│ ├─ 加载摘要记录 │
|
||||
│ ├─ 注入摘要到首条消息 │
|
||||
│ └─ 返回压缩消息列表 │ ◄─ 快速返回 (<10ms)
|
||||
└────────────┬──────────────┘
|
||||
│
|
||||
┌────────────▼──────────────┐
|
||||
│ LLM 处理消息 │
|
||||
│ ├─ 调用语言模型 │
|
||||
│ └─ 生成回复 │
|
||||
└────────────┬──────────────┘
|
||||
│
|
||||
┌────────────▼──────────────┐
|
||||
│ outlet(响应后处理) │
|
||||
│ ├─ 启动后台异步任务 │
|
||||
│ └─ 立即返回(不阻塞) │ ◄─ 返回响应给用户
|
||||
└────────────┬──────────────┘
|
||||
│
|
||||
┌────────────▼──────────────┐
|
||||
│ 后台处理(asyncio 任务) │
|
||||
│ ├─ 计算 Token 数 │
|
||||
│ ├─ 检查压缩阈值 │
|
||||
│ ├─ 生成递归摘要 │
|
||||
│ └─ 保存到数据库 │
|
||||
└────────────┬──────────────┘
|
||||
│
|
||||
┌────────────▼──────────────┐
|
||||
│ 数据库持久化存储 │
|
||||
│ ├─ 摘要内容 │
|
||||
│ ├─ 压缩进度 │
|
||||
│ └─ 时间戳 │
|
||||
└────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 工作流程详解
|
||||
|
||||
### 1️⃣ inlet 阶段:消息注入与压缩视图构建
|
||||
|
||||
**目标**:快速应用已有摘要,构建压缩消息视图
|
||||
|
||||
**流程**:
|
||||
|
||||
```
|
||||
输入:所有消息列表
|
||||
│
|
||||
├─► 从数据库加载摘要记录
|
||||
│ │
|
||||
│ ├─► 找到 ✓ ─────┐
|
||||
│ └─► 未找到 ───┐ │
|
||||
│ │ │
|
||||
├──────────────────┴─┼─► 存在摘要?
|
||||
│ │
|
||||
│ ┌───▼───┐
|
||||
│ │ 是 │ 否
|
||||
│ └───┬───┴───┐
|
||||
│ │ │
|
||||
│ ┌───────────▼─┐ ┌─▼─────────┐
|
||||
│ │ 构建压缩视图 │ │ 使用原始 │
|
||||
│ │ [H] + [T] │ │ 消息列表 │
|
||||
│ └───────┬─────┘ └─┬────────┘
|
||||
│ │ │
|
||||
│ ┌───────────┴──────────┘
|
||||
│ │
|
||||
│ └─► 组合消息:
|
||||
│ • 头部(keep_first)
|
||||
│ • 摘要注入到首条
|
||||
│ • 尾部(keep_last)
|
||||
│
|
||||
└─────► 返回压缩消息列表
|
||||
⏱️ 耗时 <10ms
|
||||
```
|
||||
|
||||
**关键参数**:
|
||||
- `keep_first`:保留前 N 条消息(默认 1)
|
||||
- `keep_last`:保留后 N 条消息(默认 6)
|
||||
- 摘要注入位置:首条消息的内容前
|
||||
|
||||
**示例**:
|
||||
```python
|
||||
# 原始:20 条消息
|
||||
消息1: [系统提示]
|
||||
消息2-14: [历史对话]
|
||||
消息15-20: [最近对话]
|
||||
|
||||
# inlet 后(存在摘要):7 条消息
|
||||
消息1: [系统提示 + 【历史摘要】...] ◄─ 摘要已注入
|
||||
消息15-20: [最近对话] ◄─ 保留后6条
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2️⃣ outlet 阶段:后台异步处理
|
||||
|
||||
**目标**:计算 Token 数、检查阈值、生成摘要(不阻塞响应)
|
||||
|
||||
**流程**:
|
||||
|
||||
```
|
||||
LLM 响应完成
|
||||
│
|
||||
└─► outlet 处理
|
||||
│
|
||||
└─► 启动后台异步任务(asyncio.create_task)
|
||||
│
|
||||
├─► 立即返回给用户 ✓
|
||||
│ (不等待后台任务完成)
|
||||
│
|
||||
└─► 后台执行 _check_and_generate_summary_async
|
||||
│
|
||||
├─► 在后台线程中计算 Token 数
|
||||
│ (await asyncio.to_thread)
|
||||
│
|
||||
├─► 获取模型阈值配置
|
||||
│ • 优先使用 model_thresholds 中的配置
|
||||
│ • 回退到全局 compression_threshold_tokens
|
||||
│
|
||||
├─► 检查是否触发压缩
|
||||
│ if current_tokens >= threshold:
|
||||
│
|
||||
└─► 触发摘要生成流程
|
||||
```
|
||||
|
||||
**时序图**:
|
||||
```
|
||||
时间线:
|
||||
│
|
||||
├─ T0: LLM 响应完成
|
||||
│
|
||||
├─ T1: outlet 被调用
|
||||
│ └─► 启动后台任务
|
||||
│ └─► 立即返回 ✓
|
||||
│
|
||||
├─ T2: 用户收到响应 ✓✓✓
|
||||
│
|
||||
└─ T3-T10: 后台任务执行
|
||||
├─ 计算 Token
|
||||
├─ 检查阈值
|
||||
├─ 调用 LLM 生成摘要
|
||||
└─ 保存到数据库
|
||||
```
|
||||
|
||||
**关键特性**:
|
||||
- ✅ 用户响应不受影响
|
||||
- ✅ Token 计算不阻塞请求
|
||||
- ✅ 摘要生成异步进行
|
||||
|
||||
---
|
||||
|
||||
### 3️⃣ Token 计数与阈值检查
|
||||
|
||||
**工作流程**:
|
||||
|
||||
```
|
||||
后台线程执行 _check_and_generate_summary_async
|
||||
│
|
||||
├─► Step 1: 计算当前 Token 总数
|
||||
│ │
|
||||
│ ├─ 遍历所有消息
|
||||
│ ├─ 处理多模态内容(提取文本部分)
|
||||
│ ├─ 使用 o200k_base 编码计数
|
||||
│ └─ 返回 total_tokens
|
||||
│
|
||||
├─► Step 2: 获取模型特定阈值
|
||||
│ │
|
||||
│ ├─ 模型 ID: gpt-4
|
||||
│ ├─ 查询 model_thresholds
|
||||
│ │
|
||||
│ ├─ 存在配置?
|
||||
│ │ ├─ 是 ✓ 使用该配置
|
||||
│ │ └─ 否 ✓ 使用全局参数
|
||||
│ │
|
||||
│ ├─ compression_threshold_tokens(默认 64000)
|
||||
│ └─ max_context_tokens(默认 128000)
|
||||
│
|
||||
└─► Step 3: 检查是否触发压缩
|
||||
│
|
||||
if current_tokens >= compression_threshold_tokens:
|
||||
│ └─► 触发摘要生成
|
||||
│
|
||||
else:
|
||||
└─► 无需压缩,任务结束
|
||||
```
|
||||
|
||||
**Token 计数细节**:
|
||||
|
||||
```python
|
||||
def _count_tokens(text):
|
||||
if tiktoken_available:
|
||||
# 使用 o200k_base(统一编码)
|
||||
encoding = tiktoken.get_encoding("o200k_base")
|
||||
return len(encoding.encode(text))
|
||||
else:
|
||||
# 回退:字符估算
|
||||
return len(text) // 4
|
||||
```
|
||||
|
||||
**模型阈值优先级**:
|
||||
```
|
||||
优先级 1: model_thresholds["gpt-4"]
|
||||
优先级 2: model_thresholds["gemini-2.5-flash"]
|
||||
优先级 3: 全局 compression_threshold_tokens
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4️⃣ 递归摘要生成
|
||||
|
||||
**核心机制**:将旧摘要与新消息合并,生成更新的摘要
|
||||
|
||||
**工作流程**:
|
||||
|
||||
```
|
||||
触发 _generate_summary_async
|
||||
│
|
||||
├─► Step 1: 加载旧摘要
|
||||
│ │
|
||||
│ ├─ 从数据库查询
|
||||
│ ├─ 获取 previous_summary
|
||||
│ └─ 获取 compressed_message_count(上次压缩进度)
|
||||
│
|
||||
├─► Step 2: 确定待压缩消息范围
|
||||
│ │
|
||||
│ ├─ start_index = max(compressed_count, keep_first)
|
||||
│ ├─ end_index = len(messages) - keep_last
|
||||
│ │
|
||||
│ ├─ 提取 messages[start_index:end_index]
|
||||
│ └─ 这是【新增对话】部分
|
||||
│
|
||||
├─► Step 3: 构建 LLM 提示词
|
||||
│ │
|
||||
│ ├─ 【已有摘要】= previous_summary
|
||||
│ ├─ 【新增对话】= 格式化的新消息
|
||||
│ │
|
||||
│ └─ 提示词模板:
|
||||
│ "将【已有摘要】和【新增对话】合并..."
|
||||
│
|
||||
├─► Step 4: 调用 LLM 生成摘要
|
||||
│ │
|
||||
│ ├─ 模型选择:summary_model(若配置)或当前模型
|
||||
│ ├─ 参数:
|
||||
│ │ • max_tokens = max_summary_tokens(默认 4000)
|
||||
│ │ • temperature = summary_temperature(默认 0.3)
|
||||
│ │ • stream = False
|
||||
│ │
|
||||
│ └─ 返回 new_summary
|
||||
│
|
||||
├─► Step 5: 保存摘要到数据库
|
||||
│ │
|
||||
│ ├─ 更新 chat_summary 表
|
||||
│ ├─ summary = new_summary
|
||||
│ ├─ compressed_message_count = end_index
|
||||
│ └─ updated_at = now()
|
||||
│
|
||||
└─► Step 6: 记录日志
|
||||
└─ 摘要长度、压缩进度、耗时等
|
||||
```
|
||||
|
||||
**递归摘要示例**:
|
||||
|
||||
```
|
||||
第一轮压缩:
|
||||
旧摘要: 无
|
||||
新消息: 消息2-14(13条)
|
||||
生成: Summary_V1
|
||||
|
||||
保存: compressed_message_count = 14
|
||||
|
||||
第二轮压缩:
|
||||
旧摘要: Summary_V1
|
||||
新消息: 消息15-28(从14开始)
|
||||
生成: Summary_V2 = LLM(Summary_V1 + 新消息14-28)
|
||||
|
||||
保存: compressed_message_count = 28
|
||||
|
||||
结果:
|
||||
✓ 早期信息得以保留(通过 Summary_V1)
|
||||
✓ 新信息与旧摘要融合
|
||||
✓ 历史连贯性维护
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Token 计数机制
|
||||
|
||||
### 编码方案
|
||||
|
||||
```
|
||||
┌─────────────────────────────────┐
|
||||
│ _count_tokens(text) │
|
||||
├─────────────────────────────────┤
|
||||
│ 1. tiktoken 可用? │
|
||||
│ ├─ 是 ✓ │
|
||||
│ │ └─ use o200k_base │
|
||||
│ │ (最新模型适配) │
|
||||
│ │ │
|
||||
│ └─ 否 ✓ │
|
||||
│ └─ 字符估算 │
|
||||
│ (1 token ≈ 4 chars) │
|
||||
└─────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 多模态内容处理
|
||||
|
||||
```python
|
||||
# 消息结构
|
||||
message = {
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": "描述图片..."},
|
||||
{"type": "image_url", "image_url": {...}},
|
||||
{"type": "text", "text": "更多描述..."}
|
||||
]
|
||||
}
|
||||
|
||||
# Token 计数
|
||||
提取所有 text 部分 → 合并 → 计数
|
||||
图片部分被忽略(不消耗文本 token)
|
||||
```
|
||||
|
||||
### 计数流程
|
||||
|
||||
```
|
||||
_calculate_messages_tokens(messages, model)
|
||||
│
|
||||
├─► 遍历每条消息
|
||||
│ │
|
||||
│ ├─ content 是列表?
|
||||
│ │ ├─ 是 ✓ 提取所有文本部分
|
||||
│ │ └─ 否 ✓ 直接使用
|
||||
│ │
|
||||
│ └─ _count_tokens(content)
|
||||
│
|
||||
└─► 累加所有 Token 数
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 递归摘要机制
|
||||
|
||||
### 保证历史连贯性的核心原理
|
||||
|
||||
```
|
||||
传统压缩方式(有问题):
|
||||
时间线:
|
||||
消息1-50 ─► 生成摘要1 ─► 保留 [摘要1 + 消息45-50]
|
||||
│
|
||||
消息51-100 ─► 生成摘要2 ─► 保留 [摘要2 + 消息95-100]
|
||||
└─► ❌ 摘要1 丢失!早期信息无法追溯
|
||||
|
||||
递归摘要方式(本实现):
|
||||
时间线:
|
||||
消息1-50 ──► 生成摘要1 ──► 保存
|
||||
│
|
||||
摘要1 + 消息51-100 ──► 生成摘要2 ──► 保存
|
||||
└─► ✓ 摘要1 信息融入摘要2
|
||||
✓ 历史信息连贯保存
|
||||
```
|
||||
|
||||
### 工作机制
|
||||
|
||||
```
|
||||
inlet 阶段:
|
||||
摘要库查询
|
||||
│
|
||||
├─ previous_summary(已有摘要)
|
||||
└─ compressed_message_count(压缩进度)
|
||||
|
||||
outlet 阶段:
|
||||
如果 current_tokens >= threshold:
|
||||
│
|
||||
├─ 新消息范围:
|
||||
│ [compressed_message_count : len(messages) - keep_last]
|
||||
│
|
||||
└─ LLM 处理:
|
||||
Input: previous_summary + 新消息
|
||||
Output: 更新的摘要(含早期信息 + 新信息)
|
||||
|
||||
保存进度:
|
||||
└─ compressed_message_count = end_index
|
||||
(下次压缩从这里开始)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 配置指南
|
||||
|
||||
### 全局配置
|
||||
|
||||
```python
|
||||
Valves(
|
||||
# Token 阈值
|
||||
compression_threshold_tokens=64000, # 触发压缩
|
||||
max_context_tokens=128000, # 硬性上限
|
||||
|
||||
# 消息保留策略
|
||||
keep_first=1, # 保留首条(系统提示)
|
||||
keep_last=6, # 保留末6条(最近对话)
|
||||
|
||||
# 摘要模型
|
||||
summary_model="gemini-2.5-flash", # 快速经济
|
||||
|
||||
# 摘要参数
|
||||
max_summary_tokens=4000,
|
||||
summary_temperature=0.3,
|
||||
)
|
||||
```
|
||||
|
||||
### 模型特定配置
|
||||
|
||||
```python
|
||||
model_thresholds = {
|
||||
"gpt-4": {
|
||||
"compression_threshold_tokens": 8000,
|
||||
"max_context_tokens": 32000
|
||||
},
|
||||
"gemini-2.5-flash": {
|
||||
"compression_threshold_tokens": 10000,
|
||||
"max_context_tokens": 40000
|
||||
},
|
||||
"llama-70b": {
|
||||
"compression_threshold_tokens": 20000,
|
||||
"max_context_tokens": 80000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 配置选择建议
|
||||
|
||||
```
|
||||
场景1:长对话成本优化
|
||||
compression_threshold_tokens: 32000 ◄─ 更早触发
|
||||
keep_last: 4 ◄─ 保留少一些
|
||||
|
||||
场景2:质量优先
|
||||
compression_threshold_tokens: 100000 ◄─ 晚触发
|
||||
keep_last: 10 ◄─ 保留多一些
|
||||
max_summary_tokens: 8000 ◄─ 更详细摘要
|
||||
|
||||
场景3:平衡方案(推荐)
|
||||
compression_threshold_tokens: 64000 ◄─ 默认
|
||||
keep_last: 6 ◄─ 默认
|
||||
summary_model: "gemini-2.5-flash" ◄─ 快速经济
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 最佳实践
|
||||
|
||||
### 1️⃣ 摘要模型选择
|
||||
|
||||
```
|
||||
推荐模型:
|
||||
✅ gemini-2.5-flash 快速、经济、质量好
|
||||
✅ deepseek-v3 成本低、速度快
|
||||
✅ gpt-4o-mini 通用、质量稳定
|
||||
|
||||
避免:
|
||||
❌ 流水线(Pipe)模型 可能不支持标准 API
|
||||
❌ 本地模型 容易超时、影响体验
|
||||
```
|
||||
|
||||
### 2️⃣ 阈值调优
|
||||
|
||||
```
|
||||
Token 计数验证:
|
||||
1. 启用 debug_mode
|
||||
2. 观察实际 Token 数
|
||||
3. 根据需要调整阈值
|
||||
|
||||
# 日志示例
|
||||
[🔍 后台计算] Token 数: 45320
|
||||
[🔍 后台计算] 未触发压缩阈值 (Token: 45320 < 64000)
|
||||
```
|
||||
|
||||
### 3️⃣ 消息保留策略
|
||||
|
||||
```
|
||||
keep_first 配置:
|
||||
通常值: 1(保留系统提示)
|
||||
某些场景: 0(系统提示在摘要中)
|
||||
|
||||
keep_last 配置:
|
||||
通常值: 6(保留最近对话)
|
||||
长对话: 8-10(更多最近对话)
|
||||
短对话: 3-4(节省 Token)
|
||||
```
|
||||
|
||||
### 4️⃣ 监控与维护
|
||||
|
||||
```
|
||||
关键指标:
|
||||
• 摘要生成耗时
|
||||
• Token 节省率
|
||||
• 摘要质量(通过对话体验)
|
||||
|
||||
数据库维护:
|
||||
# 定期清理过期摘要
|
||||
DELETE FROM chat_summary
|
||||
WHERE updated_at < NOW() - INTERVAL '30 days'
|
||||
|
||||
# 统计压缩效果
|
||||
SELECT
|
||||
COUNT(*) as total_summaries,
|
||||
AVG(compressed_message_count) as avg_compressed
|
||||
FROM chat_summary
|
||||
```
|
||||
|
||||
### 5️⃣ 故障排除
|
||||
|
||||
```
|
||||
问题:摘要未生成
|
||||
检查项:
|
||||
1. Token 数是否达到阈值?
|
||||
→ debug_mode 查看日志
|
||||
2. summary_model 是否配置正确?
|
||||
→ 确保模型存在且可用
|
||||
3. 数据库连接是否正常?
|
||||
→ 检查 DATABASE_URL
|
||||
|
||||
问题:inlet 响应变慢
|
||||
检查项:
|
||||
1. keep_first/keep_last 是否过大?
|
||||
2. 摘要数据是否过大?
|
||||
3. 消息数是否过多?
|
||||
|
||||
问题:摘要质量下降
|
||||
调整方案:
|
||||
1. 增加 max_summary_tokens
|
||||
2. 降低 summary_temperature(更确定性)
|
||||
3. 更换摘要模型
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 性能参考
|
||||
|
||||
### 时间开销
|
||||
|
||||
```
|
||||
inlet 阶段:
|
||||
├─ 数据库查询: 1-2ms
|
||||
├─ 摘要注入: 2-3ms
|
||||
└─ 总计: <10ms ✓ (不影响用户体验)
|
||||
|
||||
outlet 阶段:
|
||||
├─ 启动后台任务: <1ms
|
||||
└─ 立即返回: ✓ (无等待)
|
||||
|
||||
后台处理(不阻塞用户):
|
||||
├─ Token 计数: 10-50ms
|
||||
├─ LLM 调用: 1-5 秒
|
||||
├─ 数据库保存: 1-2ms
|
||||
└─ 总计: 1-6 秒 (后台进行)
|
||||
```
|
||||
|
||||
### Token 节省示例
|
||||
|
||||
```
|
||||
场景:20 条消息对话
|
||||
|
||||
未压缩:
|
||||
总消息: 20 条
|
||||
预估 Token: 8000 个
|
||||
|
||||
压缩后(keep_first=1, keep_last=6):
|
||||
头部消息: 1 条 (1600 Token)
|
||||
摘要: ~800 Token (嵌入在头部)
|
||||
尾部消息: 6 条 (3200 Token)
|
||||
总计: 7 条有效输入 (~5600 Token)
|
||||
|
||||
节省:8000 - 5600 = 2400 Token (30% 节省)
|
||||
|
||||
随对话变长,节省比例可达 65% 以上
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 数据流图
|
||||
|
||||
```
|
||||
用户消息
|
||||
↓
|
||||
[inlet] 摘要注入器
|
||||
├─ 数据库 ← 查询摘要
|
||||
├─ 摘要注入到首条消息
|
||||
└─ 返回压缩消息列表
|
||||
↓
|
||||
LLM 处理
|
||||
├─ 调用语言模型
|
||||
├─ 生成响应
|
||||
└─ 返回给用户 ✓✓✓
|
||||
↓
|
||||
[outlet] 后台处理(asyncio 任务)
|
||||
├─ 计算 Token 数
|
||||
├─ 检查阈值
|
||||
├─ [if 需要] 调用 LLM 生成摘要
|
||||
│ ├─ 加载旧摘要
|
||||
│ ├─ 提取新消息
|
||||
│ ├─ 构建提示词
|
||||
│ └─ 调用 LLM
|
||||
├─ 保存新摘要到数据库
|
||||
└─ 记录日志
|
||||
↓
|
||||
数据库持久化
|
||||
└─ chat_summary 表更新
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 总结
|
||||
|
||||
| 阶段 | 职责 | 耗时 | 特点 |
|
||||
|------|------|------|------|
|
||||
| **inlet** | 摘要注入 | <10ms | 快速、无计算 |
|
||||
| **LLM** | 生成回复 | 变量 | 正常流程 |
|
||||
| **outlet** | 启动后台 | <1ms | 不阻塞响应 |
|
||||
| **后台处理** | Token 计算、摘要生成、数据保存 | 1-6s | 异步执行 |
|
||||
|
||||
**核心优势**:
|
||||
- ✅ 用户响应不受影响
|
||||
- ✅ Token 消耗显著降低
|
||||
- ✅ 历史信息连贯保存
|
||||
- ✅ 灵活的配置选项
|
||||
1100
plugins/filters/async-context-compression/异步上下文压缩.py
Normal file
1100
plugins/filters/async-context-compression/异步上下文压缩.py
Normal file
File diff suppressed because it is too large
Load Diff
45
plugins/filters/async-context-compression/异步上下文压缩优化.md
Normal file
45
plugins/filters/async-context-compression/异步上下文压缩优化.md
Normal file
@@ -0,0 +1,45 @@
|
||||
需求文档:异步上下文压缩插件优化 (Async Context Compression Optimization)
|
||||
1. 核心目标 将现有的基于消息数量的压缩逻辑升级为基于 Token 数量的压缩逻辑,并引入递归摘要机制,以更精准地控制上下文窗口,提高摘要质量,并防止历史信息丢失。
|
||||
|
||||
2. 功能需求
|
||||
|
||||
Token 计数与阈值控制
|
||||
引入 tiktoken: 使用 tiktoken 库进行精确的 Token 计数。如果环境不支持,则回退到字符估算 (1 token ≈ 4 chars)。
|
||||
新配置参数 (Valves):
|
||||
compression_threshold_tokens (默认: 64000): 当上下文总 Token 数超过此值时,触发压缩(生成摘要)。
|
||||
max_context_tokens (默认: 128000): 上下文的硬性上限。如果超过此值,强制移除最早的消息(保留受保护消息除外)。
|
||||
model_thresholds (字典): 支持针对不同模型 ID 配置不同的阈值。例如:{'gpt-4': {'compression_threshold_tokens': 8000, ...}}。
|
||||
废弃旧参数: compression_threshold (基于消息数) 将被标记为废弃,优先使用 Token 阈值。
|
||||
递归摘要 (Recursive Summarization)
|
||||
机制: 在生成新摘要时,必须读取并包含上一次的摘要。
|
||||
逻辑: 新摘要 = LLM(上一次摘要 + 新产生的对话消息)。
|
||||
目的: 防止随着对话进行,最早期的摘要信息被丢弃,确保长期记忆的连续性。
|
||||
消息保护与修剪策略
|
||||
保护机制: keep_first (保留头部 N 条) 和 keep_last (保留尾部 N 条) 的消息绝对不参与压缩,也不被移除。
|
||||
修剪逻辑: 当触发 max_context_tokens 限制时,优先移除 keep_first 之后、keep_last 之前的最早消息。
|
||||
优化的提示词 (Prompt Engineering)
|
||||
目标: 去除无用信息(寒暄、重复),保留关键信号(事实、代码、决策)。
|
||||
指令:
|
||||
提炼与净化: 明确要求移除噪音。
|
||||
关键保留: 强调代码片段必须逐字保留。
|
||||
合并与更新: 明确指示将新信息合并到旧摘要中。
|
||||
语言一致性: 输出语言必须与对话语言保持一致。
|
||||
3. 实现细节
|
||||
|
||||
文件:
|
||||
async_context_compression.py
|
||||
类:
|
||||
Filter
|
||||
关键方法:
|
||||
_count_tokens(text): 实现 Token 计数。
|
||||
_calculate_messages_tokens(messages): 计算消息列表总 Token。
|
||||
_generate_summary_async(...)
|
||||
: 修改为加载旧摘要,并传入 LLM。
|
||||
_call_summary_llm(...)
|
||||
: 更新 Prompt,接受 previous_summary 和 new_messages。
|
||||
inlet(...)
|
||||
:
|
||||
使用 compression_threshold_tokens 判断是否注入摘要。
|
||||
实现 max_context_tokens 的强制修剪逻辑。
|
||||
outlet(...)
|
||||
: 使用 compression_threshold_tokens 判断是否触发后台摘要任务。
|
||||
@@ -0,0 +1,572 @@
|
||||
"""
|
||||
title: Context & Model Enhancement Filter
|
||||
author: Fu-Jie
|
||||
author_url: https://github.com/Fu-Jie
|
||||
funding_url: https://github.com/Fu-Jie/awesome-openwebui
|
||||
version: 0.2
|
||||
|
||||
description:
|
||||
一个功能全面的 Filter 插件,用于增强请求上下文和优化模型功能。提供四大核心功能:
|
||||
|
||||
1. 环境变量注入:在每条用户消息前自动注入用户环境变量(用户名、时间、时区、语言等)
|
||||
- 支持纯文本、图片、多模态消息
|
||||
- 幂等性设计,避免重复注入
|
||||
- 注入成功时发送前端状态提示
|
||||
|
||||
2. Web Search 功能改进:为特定模型优化 Web 搜索功能
|
||||
- 为阿里云通义千问系列、DeepSeek、Gemini 等模型添加搜索能力
|
||||
- 自动识别模型并追加 "-search" 后缀
|
||||
- 管理功能开关,防止冲突
|
||||
- 启用时发送搜索能力状态提示
|
||||
|
||||
3. 模型适配与上下文注入:为特定模型注入 chat_id 等上下文信息
|
||||
- 支持 cfchatqwen、webgemini 等模型的特殊处理
|
||||
- 动态模型重定向
|
||||
- 智能化的模型识别和适配
|
||||
|
||||
4. 智能内容规范化:生产级的内容清洗与修复系统
|
||||
- 智能修复损坏的代码块(前缀、后缀、缩进)
|
||||
- 规范化 LaTeX 公式格式(行内/块级)
|
||||
- 优化思维链标签(</thought>)格式
|
||||
- 自动闭合未结束的代码块
|
||||
- 智能列表格式修复
|
||||
- 清理冗余的 XML 标签
|
||||
- 可配置的规则系统
|
||||
|
||||
features:
|
||||
- 自动化环境变量管理
|
||||
- 智能模型功能适配
|
||||
- 异步状态反馈
|
||||
- 幂等性保证
|
||||
- 多模型支持
|
||||
- 智能内容清洗与规范化
|
||||
"""
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
from typing import Optional, List, Callable
|
||||
import re
|
||||
import logging
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
|
||||
# 配置日志
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@dataclass
|
||||
class NormalizerConfig:
|
||||
"""规范化配置类,用于动态启用/禁用特定规则"""
|
||||
enable_escape_fix: bool = True # 修复转义字符
|
||||
enable_thought_tag_fix: bool = True # 修复思考链标签
|
||||
enable_code_block_fix: bool = True # 修复代码块格式
|
||||
enable_latex_fix: bool = True # 修复 LaTeX 公式格式
|
||||
enable_list_fix: bool = False # 修复列表换行
|
||||
enable_unclosed_block_fix: bool = True # 修复未闭合代码块
|
||||
enable_fullwidth_symbol_fix: bool = False # 修复代码内的全角符号
|
||||
enable_xml_tag_cleanup: bool = True # 清理 XML 残留标签
|
||||
|
||||
# 自定义清理函数列表(高级扩展用)
|
||||
custom_cleaners: List[Callable[[str], str]] = field(default_factory=list)
|
||||
|
||||
class ContentNormalizer:
|
||||
"""LLM 输出内容规范化器 - 生产级实现"""
|
||||
|
||||
# --- 1. 预编译正则表达式(性能优化) ---
|
||||
_PATTERNS = {
|
||||
# 代码块前缀:如果 ``` 前面不是行首也不是换行符
|
||||
'code_block_prefix': re.compile(r'(?<!^)(?<!\n)(```)', re.MULTILINE),
|
||||
|
||||
# 代码块后缀:匹配 ```语言名 后面紧跟非空白字符(没有换行)
|
||||
# 匹配 ```python code 这种情况,但不匹配 ```python 或 ```python\n
|
||||
'code_block_suffix': re.compile(r'(```[\w\+\-\.]*)[ \t]+([^\n\r])'),
|
||||
|
||||
# 代码块缩进:行首的空白字符 + ```
|
||||
'code_block_indent': re.compile(r'^[ \t]+(```)', re.MULTILINE),
|
||||
|
||||
# 思考链标签:</thought> 后可能跟空格或换行
|
||||
'thought_tag': re.compile(r'</thought>[ \t]*\n*'),
|
||||
|
||||
# LaTeX 块级公式:\[ ... \]
|
||||
'latex_bracket_block': re.compile(r'\\\[(.+?)\\\]', re.DOTALL),
|
||||
# LaTeX 行内公式:\( ... \)
|
||||
'latex_paren_inline': re.compile(r'\\\((.+?)\\\)'),
|
||||
|
||||
# 列表项:非换行符 + 数字 + 点 + 空格 (e.g. "Text1. Item")
|
||||
'list_item': re.compile(r'([^\n])(\d+\. )'),
|
||||
|
||||
# XML 残留标签 (如 Claude 的 artifacts)
|
||||
'xml_artifacts': re.compile(r'</?(?:antArtifact|antThinking|artifact)[^>]*>', re.IGNORECASE),
|
||||
}
|
||||
|
||||
def __init__(self, config: Optional[NormalizerConfig] = None):
|
||||
self.config = config or NormalizerConfig()
|
||||
self.applied_fixes = []
|
||||
|
||||
def normalize(self, content: str) -> str:
|
||||
"""主入口:按顺序应用所有规范化规则"""
|
||||
self.applied_fixes = []
|
||||
if not content:
|
||||
return content
|
||||
|
||||
try:
|
||||
# 1. 转义字符修复(必须最先执行,否则影响后续正则)
|
||||
if self.config.enable_escape_fix:
|
||||
original = content
|
||||
content = self._fix_escape_characters(content)
|
||||
if content != original:
|
||||
self.applied_fixes.append("修复转义字符")
|
||||
|
||||
# 2. 思考链标签规范化
|
||||
if self.config.enable_thought_tag_fix:
|
||||
original = content
|
||||
content = self._fix_thought_tags(content)
|
||||
if content != original:
|
||||
self.applied_fixes.append("规范化思考链")
|
||||
|
||||
# 3. 代码块格式修复
|
||||
if self.config.enable_code_block_fix:
|
||||
original = content
|
||||
content = self._fix_code_blocks(content)
|
||||
if content != original:
|
||||
self.applied_fixes.append("修复代码块格式")
|
||||
|
||||
# 4. LaTeX 公式规范化
|
||||
if self.config.enable_latex_fix:
|
||||
original = content
|
||||
content = self._fix_latex_formulas(content)
|
||||
if content != original:
|
||||
self.applied_fixes.append("规范化 LaTeX 公式")
|
||||
|
||||
# 5. 列表格式修复
|
||||
if self.config.enable_list_fix:
|
||||
original = content
|
||||
content = self._fix_list_formatting(content)
|
||||
if content != original:
|
||||
self.applied_fixes.append("修复列表格式")
|
||||
|
||||
# 6. 未闭合代码块检测与修复
|
||||
if self.config.enable_unclosed_block_fix:
|
||||
original = content
|
||||
content = self._fix_unclosed_code_blocks(content)
|
||||
if content != original:
|
||||
self.applied_fixes.append("闭合未结束代码块")
|
||||
|
||||
# 7. 全角符号转半角(仅代码块内)
|
||||
if self.config.enable_fullwidth_symbol_fix:
|
||||
original = content
|
||||
content = self._fix_fullwidth_symbols_in_code(content)
|
||||
if content != original:
|
||||
self.applied_fixes.append("全角符号转半角")
|
||||
|
||||
# 8. XML 标签残留清理
|
||||
if self.config.enable_xml_tag_cleanup:
|
||||
original = content
|
||||
content = self._cleanup_xml_tags(content)
|
||||
if content != original:
|
||||
self.applied_fixes.append("清理 XML 标签")
|
||||
|
||||
# 9. 执行自定义清理函数
|
||||
for cleaner in self.config.custom_cleaners:
|
||||
original = content
|
||||
content = cleaner(content)
|
||||
if content != original:
|
||||
self.applied_fixes.append("执行自定义清理")
|
||||
|
||||
return content
|
||||
|
||||
except Exception as e:
|
||||
# 生产环境保底机制:如果清洗过程报错,返回原始内容,避免阻断服务
|
||||
logger.error(f"内容规范化失败: {e}", exc_info=True)
|
||||
return content
|
||||
|
||||
def _fix_escape_characters(self, content: str) -> str:
|
||||
"""修复过度转义的字符"""
|
||||
# 注意:先处理具体的转义序列,再处理通用的双反斜杠
|
||||
content = content.replace("\\r\\n", "\n")
|
||||
content = content.replace("\\n", "\n")
|
||||
content = content.replace("\\t", "\t")
|
||||
# 修复过度转义的反斜杠 (例如路径 C:\\Users)
|
||||
content = content.replace("\\\\", "\\")
|
||||
return content
|
||||
|
||||
def _fix_thought_tags(self, content: str) -> str:
|
||||
"""规范化 </thought> 标签,统一为空两行"""
|
||||
return self._PATTERNS['thought_tag'].sub("</thought>\n\n", content)
|
||||
|
||||
def _fix_code_blocks(self, content: str) -> str:
|
||||
"""修复代码块格式(独占行、换行、去缩进)"""
|
||||
# C: 移除代码块前的缩进(必须先执行,否则影响下面的判断)
|
||||
content = self._PATTERNS['code_block_indent'].sub(r"\1", content)
|
||||
# A: 确保 ``` 前有换行
|
||||
content = self._PATTERNS['code_block_prefix'].sub(r"\n\1", content)
|
||||
# B: 确保 ```语言标识 后有换行
|
||||
content = self._PATTERNS['code_block_suffix'].sub(r"\1\n\2", content)
|
||||
return content
|
||||
|
||||
def _fix_latex_formulas(self, content: str) -> str:
|
||||
"""规范化 LaTeX 公式:\[ -> $$ (块级), \( -> $ (行内)"""
|
||||
content = self._PATTERNS['latex_bracket_block'].sub(r"$$\1$$", content)
|
||||
content = self._PATTERNS['latex_paren_inline'].sub(r"$\1$", content)
|
||||
return content
|
||||
|
||||
def _fix_list_formatting(self, content: str) -> str:
|
||||
"""修复列表项缺少换行的问题 (如 'text1. item' -> 'text\\n1. item')"""
|
||||
return self._PATTERNS['list_item'].sub(r"\1\n\2", content)
|
||||
|
||||
def _fix_unclosed_code_blocks(self, content: str) -> str:
|
||||
"""检测并修复未闭合的代码块"""
|
||||
if content.count("```") % 2 != 0:
|
||||
logger.warning("检测到未闭合的代码块,自动补全")
|
||||
content += "\n```"
|
||||
return content
|
||||
|
||||
def _fix_fullwidth_symbols_in_code(self, content: str) -> str:
|
||||
"""在代码块内将全角符号转为半角(精细化操作)"""
|
||||
# 常见误用的全角符号映射
|
||||
FULLWIDTH_MAP = {
|
||||
',': ',', '。': '.', '(': '(', ')': ')',
|
||||
'【': '[', '】': ']', ';': ';', ':': ':',
|
||||
'?': '?', '!': '!', '"': '"', '"': '"',
|
||||
''': "'", ''': "'",
|
||||
}
|
||||
|
||||
parts = content.split("```")
|
||||
# 代码块内容位于索引 1, 3, 5... (奇数位)
|
||||
for i in range(1, len(parts), 2):
|
||||
for full, half in FULLWIDTH_MAP.items():
|
||||
parts[i] = parts[i].replace(full, half)
|
||||
|
||||
return "```".join(parts)
|
||||
|
||||
def _cleanup_xml_tags(self, content: str) -> str:
|
||||
"""移除无关的 XML 标签"""
|
||||
return self._PATTERNS['xml_artifacts'].sub("", content)
|
||||
|
||||
class Filter:
|
||||
class Valves(BaseModel):
|
||||
priority: int = Field(
|
||||
default=0, description="Priority level for the filter operations."
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
# Indicates custom file handling logic. This flag helps disengage default routines in favor of custom
|
||||
# implementations, informing the WebUI to defer file-related operations to designated methods within this class.
|
||||
# Alternatively, you can remove the files directly from the body in from the inlet hook
|
||||
# self.file_handler = True
|
||||
|
||||
# Initialize 'valves' with specific configurations. Using 'Valves' instance helps encapsulate settings,
|
||||
# which ensures settings are managed cohesively and not confused with operational flags like 'file_handler'.
|
||||
self.valves = self.Valves()
|
||||
pass
|
||||
|
||||
def inlet(
|
||||
self,
|
||||
body: dict,
|
||||
__user__: Optional[dict] = None,
|
||||
__metadata__: Optional[dict] = None,
|
||||
__model__: Optional[dict] = None,
|
||||
__event_emitter__=None,
|
||||
) -> dict:
|
||||
# Modify the request body or validate it before processing by the chat completion API.
|
||||
# This function is the pre-processor for the API where various checks on the input can be performed.
|
||||
# It can also modify the request before sending it to the API.
|
||||
messages = body.get("messages", [])
|
||||
self.insert_user_env_info(__metadata__, messages, __event_emitter__)
|
||||
# if "测试系统提示词" in str(messages):
|
||||
# messages.insert(0, {"role": "system", "content": "你是一个大数学家"})
|
||||
# print("XXXXX" * 100)
|
||||
# print(body)
|
||||
self.change_web_search(body, __user__, __event_emitter__)
|
||||
body = self.inlet_chat_id(__model__, __metadata__, body)
|
||||
|
||||
return body
|
||||
|
||||
def inlet_chat_id(self, model: dict, metadata: dict, body: dict):
|
||||
if "openai" in model:
|
||||
base_model_id = model["openai"]["id"]
|
||||
|
||||
else:
|
||||
base_model_id = model["info"]["base_model_id"]
|
||||
|
||||
base_model = model["id"] if base_model_id is None else base_model_id
|
||||
if base_model.startswith("cfchatqwen"):
|
||||
# pass
|
||||
body["chat_id"] = metadata["chat_id"]
|
||||
|
||||
if base_model.startswith("webgemini"):
|
||||
body["chat_id"] = metadata["chat_id"]
|
||||
if not model["id"].startswith("webgemini"):
|
||||
body["custom_model_id"] = model["id"]
|
||||
|
||||
# print("我是 body *******************", body)
|
||||
return body
|
||||
|
||||
def change_web_search(self, body, __user__, __event_emitter__=None):
|
||||
"""
|
||||
优化特定模型的 Web 搜索功能。
|
||||
|
||||
功能:
|
||||
- 检测是否启用了 Web 搜索
|
||||
- 为支持搜索的模型启用模型本身的搜索能力
|
||||
- 禁用默认的 web_search 开关以避免冲突
|
||||
- 当使用模型本身的搜索能力时发送状态提示
|
||||
|
||||
参数:
|
||||
body: 请求体字典
|
||||
__user__: 用户信息
|
||||
__event_emitter__: 用于发送前端事件的发射器函数
|
||||
"""
|
||||
features = body.get("features", {})
|
||||
web_search_enabled = (
|
||||
features.get("web_search", False) if isinstance(features, dict) else False
|
||||
)
|
||||
if isinstance(__user__, (list, tuple)):
|
||||
user_email = __user__[0].get("email", "用户") if __user__[0] else "用户"
|
||||
elif isinstance(__user__, dict):
|
||||
user_email = __user__.get("email", "用户")
|
||||
model_name = body.get("model")
|
||||
|
||||
search_enabled_for_model = False
|
||||
if web_search_enabled:
|
||||
if model_name in ["qwen-max-latest", "qwen-max", "qwen-plus-latest"]:
|
||||
body.setdefault("enable_search", True)
|
||||
features["web_search"] = False
|
||||
search_enabled_for_model = True
|
||||
if "search" in model_name or "搜索" in model_name:
|
||||
features["web_search"] = False
|
||||
if model_name.startswith("cfdeepseek-deepseek") and not model_name.endswith(
|
||||
"search"
|
||||
):
|
||||
body["model"] = body["model"] + "-search"
|
||||
features["web_search"] = False
|
||||
search_enabled_for_model = True
|
||||
if model_name.startswith("cfchatqwen") and not model_name.endswith(
|
||||
"search"
|
||||
):
|
||||
body["model"] = body["model"] + "-search"
|
||||
features["web_search"] = False
|
||||
search_enabled_for_model = True
|
||||
if model_name.startswith("gemini-2.5") and "search" not in model_name:
|
||||
body["model"] = body["model"] + "-search"
|
||||
features["web_search"] = False
|
||||
search_enabled_for_model = True
|
||||
if user_email == "yi204o@qq.com":
|
||||
features["web_search"] = False
|
||||
|
||||
# 如果启用了模型本身的搜索能力,发送状态提示
|
||||
if search_enabled_for_model and __event_emitter__:
|
||||
import asyncio
|
||||
|
||||
try:
|
||||
asyncio.create_task(
|
||||
self._emit_search_status(__event_emitter__, model_name)
|
||||
)
|
||||
except RuntimeError:
|
||||
pass
|
||||
|
||||
def insert_user_env_info(
|
||||
self, __metadata__, messages, __event_emitter__=None, model_match_tags=None
|
||||
):
|
||||
"""
|
||||
在第一条用户消息中注入环境变量信息。
|
||||
|
||||
功能特性:
|
||||
- 始终在用户消息内容前注入环境变量的 Markdown 说明
|
||||
- 支持多种消息类型:纯文本、图片、图文混合消息
|
||||
- 幂等性设计:若环境变量信息已存在则更新为最新数据,不会重复添加
|
||||
- 注入成功后通过事件发射器向前端发送"注入成功"的状态提示
|
||||
|
||||
参数:
|
||||
__metadata__: 包含环境变量的元数据字典
|
||||
messages: 消息列表
|
||||
__event_emitter__: 用于发送前端事件的发射器函数
|
||||
model_match_tags: 模型匹配标签(保留参数,当前未使用)
|
||||
"""
|
||||
variables = __metadata__.get("variables", {})
|
||||
if not messages or messages[0]["role"] != "user":
|
||||
return
|
||||
|
||||
env_injected = False
|
||||
if variables:
|
||||
# 构建环境变量的Markdown文本
|
||||
variable_markdown = (
|
||||
"## 用户环境变量\n"
|
||||
"以下信息为用户的环境变量,可用于为用户提供更个性化的服务或满足特定需求时作为参考:\n"
|
||||
f"- **用户姓名**:{variables.get('{{USER_NAME}}', '')}\n"
|
||||
f"- **当前日期时间**:{variables.get('{{CURRENT_DATETIME}}', '')}\n"
|
||||
f"- **当前星期**:{variables.get('{{CURRENT_WEEKDAY}}', '')}\n"
|
||||
f"- **当前时区**:{variables.get('{{CURRENT_TIMEZONE}}', '')}\n"
|
||||
f"- **用户语言**:{variables.get('{{USER_LANGUAGE}}', '')}\n"
|
||||
)
|
||||
|
||||
content = messages[0]["content"]
|
||||
# 环境变量部分的匹配模式
|
||||
env_var_pattern = r"(## 用户环境变量\n以下信息为用户的环境变量,可用于为用户提供更个性化的服务或满足特定需求时作为参考:\n.*?用户语言.*?\n)"
|
||||
# 处理不同内容类型
|
||||
if isinstance(content, list): # 多模态内容(可能包含图片和文本)
|
||||
# 查找第一个文本类型的内容
|
||||
text_index = -1
|
||||
for i, part in enumerate(content):
|
||||
if isinstance(part, dict) and part.get("type") == "text":
|
||||
text_index = i
|
||||
break
|
||||
|
||||
if text_index >= 0:
|
||||
# 存在文本内容,检查是否已存在环境变量信息
|
||||
text_part = content[text_index]
|
||||
text_content = text_part.get("text", "")
|
||||
|
||||
if re.search(env_var_pattern, text_content, flags=re.DOTALL):
|
||||
# 已存在环境变量信息,更新为最新数据
|
||||
text_part["text"] = re.sub(
|
||||
env_var_pattern,
|
||||
variable_markdown,
|
||||
text_content,
|
||||
flags=re.DOTALL,
|
||||
)
|
||||
else:
|
||||
# 不存在环境变量信息,添加到开头
|
||||
text_part["text"] = f"{variable_markdown}\n{text_content}"
|
||||
|
||||
content[text_index] = text_part
|
||||
else:
|
||||
# 没有文本内容(例如只有图片),添加新的文本项
|
||||
content.insert(
|
||||
0, {"type": "text", "text": f"{variable_markdown}\n"}
|
||||
)
|
||||
|
||||
messages[0]["content"] = content
|
||||
|
||||
elif isinstance(content, str): # 纯文本内容
|
||||
# 检查是否已存在环境变量信息
|
||||
if re.search(env_var_pattern, content, flags=re.DOTALL):
|
||||
# 已存在,更新为最新数据
|
||||
messages[0]["content"] = re.sub(
|
||||
env_var_pattern, variable_markdown, content, flags=re.DOTALL
|
||||
)
|
||||
else:
|
||||
# 不存在,添加到开头
|
||||
messages[0]["content"] = f"{variable_markdown}\n{content}"
|
||||
env_injected = True
|
||||
|
||||
else: # 其他类型内容
|
||||
# 转换为字符串并处理
|
||||
str_content = str(content)
|
||||
# 检查是否已存在环境变量信息
|
||||
if re.search(env_var_pattern, str_content, flags=re.DOTALL):
|
||||
# 已存在,更新为最新数据
|
||||
messages[0]["content"] = re.sub(
|
||||
env_var_pattern, variable_markdown, str_content, flags=re.DOTALL
|
||||
)
|
||||
else:
|
||||
# 不存在,添加到开头
|
||||
messages[0]["content"] = f"{variable_markdown}\n{str_content}"
|
||||
env_injected = True
|
||||
|
||||
# 环境变量注入成功后,发送状态提示给用户
|
||||
if env_injected and __event_emitter__:
|
||||
import asyncio
|
||||
|
||||
try:
|
||||
# 如果在异步环境中,使用 await
|
||||
asyncio.create_task(self._emit_env_status(__event_emitter__))
|
||||
except RuntimeError:
|
||||
# 如果不在异步环境中,直接调用
|
||||
pass
|
||||
|
||||
async def _emit_env_status(self, __event_emitter__):
|
||||
"""
|
||||
发送环境变量注入成功的状态提示给前端用户
|
||||
"""
|
||||
try:
|
||||
await __event_emitter__(
|
||||
{
|
||||
"type": "status",
|
||||
"data": {
|
||||
"description": "✓ 用户环境变量已注入成功",
|
||||
"done": True,
|
||||
},
|
||||
}
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"发送状态提示时出错: {e}")
|
||||
|
||||
async def _emit_search_status(self, __event_emitter__, model_name):
|
||||
"""
|
||||
发送模型搜索功能启用的状态提示给前端用户
|
||||
"""
|
||||
try:
|
||||
await __event_emitter__(
|
||||
{
|
||||
"type": "status",
|
||||
"data": {
|
||||
"description": f"🔍 已为 {model_name} 启用搜索能力",
|
||||
"done": True,
|
||||
},
|
||||
}
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"发送搜索状态提示时出错: {e}")
|
||||
|
||||
async def _emit_normalization_status(self, __event_emitter__, applied_fixes: List[str] = None):
|
||||
"""
|
||||
发送内容规范化完成的状态提示
|
||||
"""
|
||||
description = "✓ 内容已自动规范化"
|
||||
if applied_fixes:
|
||||
description += f":{', '.join(applied_fixes)}"
|
||||
|
||||
try:
|
||||
await __event_emitter__(
|
||||
{
|
||||
"type": "status",
|
||||
"data": {
|
||||
"description": description,
|
||||
"done": True,
|
||||
},
|
||||
}
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"发送规范化状态提示时出错: {e}")
|
||||
|
||||
def _contains_html(self, content: str) -> bool:
|
||||
"""
|
||||
检测内容是否包含 HTML 标签
|
||||
"""
|
||||
# 匹配常见的 HTML 标签
|
||||
pattern = r"<\s*/?\s*(?:html|head|body|div|span|p|br|hr|ul|ol|li|table|thead|tbody|tfoot|tr|td|th|img|a|b|i|strong|em|code|pre|blockquote|h[1-6]|script|style|form|input|button|label|select|option|iframe|link|meta|title)\b"
|
||||
return bool(re.search(pattern, content, re.IGNORECASE))
|
||||
|
||||
def outlet(self, body: dict, __user__: Optional[dict] = None, __event_emitter__=None) -> dict:
|
||||
"""
|
||||
处理传出响应体,通过修改最后一条助手消息的内容。
|
||||
使用 ContentNormalizer 进行全面的内容规范化。
|
||||
"""
|
||||
if "messages" in body and body["messages"]:
|
||||
last = body["messages"][-1]
|
||||
content = last.get("content", "") or ""
|
||||
|
||||
if last.get("role") == "assistant" and isinstance(content, str):
|
||||
# 如果包含 HTML,跳过规范化,为了防止错误格式化
|
||||
if self._contains_html(content):
|
||||
return body
|
||||
|
||||
# 初始化规范化器
|
||||
normalizer = ContentNormalizer()
|
||||
|
||||
# 执行规范化
|
||||
new_content = normalizer.normalize(content)
|
||||
|
||||
# 更新内容
|
||||
if new_content != content:
|
||||
last["content"] = new_content
|
||||
# 如果内容发生了改变,发送状态提示
|
||||
if __event_emitter__:
|
||||
import asyncio
|
||||
try:
|
||||
# 传入 applied_fixes
|
||||
asyncio.create_task(self._emit_normalization_status(__event_emitter__, normalizer.applied_fixes))
|
||||
except RuntimeError:
|
||||
# 假如不在循环中,则忽略
|
||||
pass
|
||||
|
||||
return body
|
||||
File diff suppressed because it is too large
Load Diff
212
plugins/filters/multi_model_context_merger.py
Normal file
212
plugins/filters/multi_model_context_merger.py
Normal file
@@ -0,0 +1,212 @@
|
||||
import asyncio
|
||||
from typing import List, Optional, Dict
|
||||
from pydantic import BaseModel, Field
|
||||
from fastapi import Request
|
||||
|
||||
from open_webui.models.chats import Chats
|
||||
|
||||
|
||||
class Filter:
|
||||
class Valves(BaseModel):
|
||||
# 注入的系统消息的前缀
|
||||
CONTEXT_PREFIX: str = Field(
|
||||
default="下面是多个匿名AI模型给出的回答,使用<response>标签包裹:\n\n",
|
||||
description="Prefix for the injected system message containing the raw merged context.",
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
self.valves = self.Valves()
|
||||
self.toggle = True
|
||||
self.type = "filter"
|
||||
self.name = "合并回答"
|
||||
self.description = "在用户提问时,自动注入之前多个模型回答的上下文。"
|
||||
|
||||
async def inlet(
|
||||
self,
|
||||
body: Dict,
|
||||
__user__: Dict,
|
||||
__metadata__: Dict,
|
||||
__request__: Request,
|
||||
__event_emitter__,
|
||||
):
|
||||
"""
|
||||
此方法是过滤器的入口点。它会检查上一回合是否为多模型响应,
|
||||
如果是,则将这些响应直接格式化,并将格式化后的上下文作为系统消息注入到当前请求中。
|
||||
"""
|
||||
print(f"*********** Filter '{self.name}' triggered ***********")
|
||||
chat_id = __metadata__.get("chat_id")
|
||||
if not chat_id:
|
||||
print(
|
||||
f"DEBUG: Filter '{self.name}' skipped: chat_id not found in metadata."
|
||||
)
|
||||
return body
|
||||
|
||||
print(f"DEBUG: Chat ID found: {chat_id}")
|
||||
|
||||
# 1. 从数据库获取完整的聊天历史
|
||||
try:
|
||||
chat = await asyncio.to_thread(Chats.get_chat_by_id, chat_id)
|
||||
|
||||
if (
|
||||
not chat
|
||||
or not hasattr(chat, "chat")
|
||||
or not chat.chat.get("history")
|
||||
or not chat.chat.get("history").get("messages")
|
||||
):
|
||||
print(
|
||||
f"DEBUG: Filter '{self.name}' skipped: Chat history not found or empty for chat_id: {chat_id}"
|
||||
)
|
||||
return body
|
||||
|
||||
messages_map = chat.chat["history"]["messages"]
|
||||
print(
|
||||
f"DEBUG: Successfully loaded {len(messages_map)} messages from history."
|
||||
)
|
||||
|
||||
# Count the number of user messages in the history
|
||||
user_message_count = sum(
|
||||
1 for msg in messages_map.values() if msg.get("role") == "user"
|
||||
)
|
||||
|
||||
# If there are less than 2 user messages, there's no previous turn to merge.
|
||||
if user_message_count < 2:
|
||||
print(
|
||||
f"DEBUG: Filter '{self.name}' skipped: Not enough user messages in history to have a previous turn (found {user_message_count}, required >= 2)."
|
||||
)
|
||||
return body
|
||||
|
||||
except Exception as e:
|
||||
print(
|
||||
f"ERROR: Filter '{self.name}' failed to get chat history from DB: {e}"
|
||||
)
|
||||
return body
|
||||
|
||||
# This filter rebuilds the entire chat history to consolidate all multi-response turns.
|
||||
|
||||
# 1. Get all messages from history and sort by timestamp
|
||||
all_messages = list(messages_map.values())
|
||||
all_messages.sort(key=lambda x: x.get("timestamp", 0))
|
||||
|
||||
# 2. Pre-group all assistant messages by their parentId for efficient lookup
|
||||
assistant_groups = {}
|
||||
for msg in all_messages:
|
||||
if msg.get("role") == "assistant":
|
||||
parent_id = msg.get("parentId")
|
||||
if parent_id:
|
||||
if parent_id not in assistant_groups:
|
||||
assistant_groups[parent_id] = []
|
||||
assistant_groups[parent_id].append(msg)
|
||||
|
||||
final_messages = []
|
||||
processed_parent_ids = set()
|
||||
|
||||
# 3. Iterate through the sorted historical messages to build the final, clean list
|
||||
for msg in all_messages:
|
||||
msg_id = msg.get("id")
|
||||
role = msg.get("role")
|
||||
parent_id = msg.get("parentId")
|
||||
|
||||
if role == "user":
|
||||
# Add user messages directly
|
||||
final_messages.append(msg)
|
||||
|
||||
elif role == "assistant":
|
||||
# If this assistant's parent group has already been processed, skip it
|
||||
if parent_id in processed_parent_ids:
|
||||
continue
|
||||
|
||||
# Process the group of siblings for this parent_id
|
||||
if parent_id in assistant_groups:
|
||||
siblings = assistant_groups[parent_id]
|
||||
|
||||
# Only perform a merge if there are multiple siblings
|
||||
if len(siblings) > 1:
|
||||
print(
|
||||
f"DEBUG: Found a group of {len(siblings)} siblings for parent_id {parent_id}. Merging..."
|
||||
)
|
||||
|
||||
# --- MERGE LOGIC ---
|
||||
merged_content = None
|
||||
merged_message_id = None
|
||||
# Sort siblings by timestamp before processing
|
||||
siblings.sort(key=lambda s: s.get("timestamp", 0))
|
||||
merged_message_timestamp = siblings[0].get("timestamp", 0)
|
||||
|
||||
# Case A: Check for system pre-merged content (merged.status: true and content not empty)
|
||||
merged_content_msg = next(
|
||||
(
|
||||
s
|
||||
for s in siblings
|
||||
if s.get("merged", {}).get("status")
|
||||
and s.get("merged", {}).get("content")
|
||||
),
|
||||
None,
|
||||
)
|
||||
|
||||
if merged_content_msg:
|
||||
merged_content = merged_content_msg["merged"]["content"]
|
||||
merged_message_id = merged_content_msg["id"]
|
||||
merged_message_timestamp = merged_content_msg.get(
|
||||
"timestamp", merged_message_timestamp
|
||||
)
|
||||
print(
|
||||
f"DEBUG: Using pre-merged content from message ID: {merged_message_id}"
|
||||
)
|
||||
else:
|
||||
# Case B: Manually merge content
|
||||
combined_content = []
|
||||
first_sibling_id = None
|
||||
counter = 0
|
||||
|
||||
for s in siblings:
|
||||
if not first_sibling_id:
|
||||
first_sibling_id = s["id"]
|
||||
|
||||
content = s.get("content", "")
|
||||
if (
|
||||
content
|
||||
and content
|
||||
!= "The requested model is not supported."
|
||||
):
|
||||
response_id = chr(ord("a") + counter)
|
||||
combined_content.append(
|
||||
f'<response id="{response_id}">\n{content}\n</response>'
|
||||
)
|
||||
counter += 1
|
||||
|
||||
if combined_content:
|
||||
merged_content = "\n\n".join(combined_content)
|
||||
merged_message_id = first_sibling_id or parent_id
|
||||
|
||||
if merged_content:
|
||||
merged_message = {
|
||||
"id": merged_message_id,
|
||||
"parentId": parent_id,
|
||||
"role": "assistant",
|
||||
"content": f"{self.valves.CONTEXT_PREFIX}{merged_content}",
|
||||
"timestamp": merged_message_timestamp,
|
||||
}
|
||||
final_messages.append(merged_message)
|
||||
else:
|
||||
# If there's only one sibling, add it directly
|
||||
final_messages.append(siblings[0])
|
||||
|
||||
# Mark this group as processed
|
||||
processed_parent_ids.add(parent_id)
|
||||
|
||||
# 4. The new user message from the current request is not in the historical messages_map,
|
||||
# so we need to append it to our newly constructed message list.
|
||||
if body.get("messages"):
|
||||
new_user_message_from_body = body["messages"][-1]
|
||||
# Ensure we don't add a historical message that might be in the body for context
|
||||
if new_user_message_from_body.get("id") not in messages_map:
|
||||
final_messages.append(new_user_message_from_body)
|
||||
|
||||
# 5. Replace the original message list with the new, cleaned-up list
|
||||
body["messages"] = final_messages
|
||||
print(
|
||||
f"DEBUG: Rebuilt message history with {len(final_messages)} messages, consolidating all multi-response turns."
|
||||
)
|
||||
|
||||
print(f"*********** Filter '{self.name}' finished successfully ***********")
|
||||
return body
|
||||
Reference in New Issue
Block a user