docs(async-context-compression): add community post drafts
This commit is contained in:
270
plugins/filters/async-context-compression/community_post.md
Normal file
270
plugins/filters/async-context-compression/community_post.md
Normal file
@@ -0,0 +1,270 @@
|
|||||||
|
[](https://openwebui.com/posts/async_context_compression_b1655bc8)
|
||||||
|
|
||||||
|
# Async Context Compression: A Production-Scale Working-Memory Filter for OpenWebUI
|
||||||
|
|
||||||
|
Long chats do not just get expensive. They also get fragile.
|
||||||
|
|
||||||
|
Once a conversation grows large enough, you usually have to choose between two bad options:
|
||||||
|
|
||||||
|
- keep the full history and pay a heavy context cost
|
||||||
|
- trim aggressively and risk losing continuity, tool state, or important prior decisions
|
||||||
|
|
||||||
|
`Async Context Compression` is built to avoid that tradeoff.
|
||||||
|
|
||||||
|
It is not a simple “summarize old messages” utility. It is a structure-aware, async, database-backed working-memory system for OpenWebUI that can compress long conversations while preserving conversational continuity, tool-calling integrity, and now, as of `v1.5.0`, referenced-chat context injection as well.
|
||||||
|
|
||||||
|
This plugin has now reached the point where it feels complete enough to be described as a serious, high-capability filter rather than a small convenience add-on.
|
||||||
|
|
||||||
|
**[📖 Full README](https://github.com/Fu-Jie/openwebui-extensions/blob/main/plugins/filters/async-context-compression/README.md)**
|
||||||
|
**[📝 v1.5.0 Release Notes](https://github.com/Fu-Jie/openwebui-extensions/blob/main/plugins/filters/async-context-compression/v1.5.0.md)**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why This Plugin Exists
|
||||||
|
|
||||||
|
OpenWebUI conversations often contain much more than plain chat:
|
||||||
|
|
||||||
|
- long-running planning threads
|
||||||
|
- coding sessions with repeated tool use
|
||||||
|
- model-specific context limits
|
||||||
|
- multimodal messages
|
||||||
|
- external referenced chats
|
||||||
|
- custom models with different context windows
|
||||||
|
|
||||||
|
A naive compression strategy is not enough in those environments.
|
||||||
|
|
||||||
|
If a filter only drops earlier messages based on length, it can:
|
||||||
|
|
||||||
|
- break native tool-calling chains
|
||||||
|
- lose critical task state
|
||||||
|
- destroy continuity in old chats
|
||||||
|
- make debugging impossible
|
||||||
|
- hide important provider-side failures
|
||||||
|
|
||||||
|
`Async Context Compression` is designed around a stronger premise:
|
||||||
|
|
||||||
|
> compress history without treating conversation structure as disposable
|
||||||
|
|
||||||
|
That means it tries to preserve what actually matters for the next turn:
|
||||||
|
|
||||||
|
- the current goal
|
||||||
|
- durable user preferences
|
||||||
|
- recent progress
|
||||||
|
- tool outputs that still matter
|
||||||
|
- error state
|
||||||
|
- summary continuity
|
||||||
|
- referenced context from other chats
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What Makes It Different
|
||||||
|
|
||||||
|
This plugin now combines several capabilities that are usually split across separate systems:
|
||||||
|
|
||||||
|
### 1. Asynchronous working-memory generation
|
||||||
|
|
||||||
|
The current reply is not blocked while the plugin generates a new summary in the background.
|
||||||
|
|
||||||
|
### 2. Persistent summary storage
|
||||||
|
|
||||||
|
Summaries are stored in OpenWebUI's shared database and reused across turns, instead of being regenerated from scratch every time.
|
||||||
|
|
||||||
|
### 3. Structure-aware trimming
|
||||||
|
|
||||||
|
The filter respects atomic message boundaries so native tool-calling history is not corrupted by compression.
|
||||||
|
|
||||||
|
### 4. External chat reference summarization
|
||||||
|
|
||||||
|
New in `v1.5.0`: referenced chats can now be reused as cached summaries, injected directly if small enough, or summarized before injection if too large.
|
||||||
|
|
||||||
|
### 5. Mixed-script token estimation
|
||||||
|
|
||||||
|
The plugin now uses a much stronger multilingual token estimation path before falling back to exact counting, which helps reduce unnecessary expensive token calculations while staying much closer to real usage.
|
||||||
|
|
||||||
|
### 6. Real failure visibility
|
||||||
|
|
||||||
|
Important background summary failures are surfaced to the browser console and status messages instead of disappearing silently.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Workflow Overview
|
||||||
|
|
||||||
|
This is the current high-level flow:
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TD
|
||||||
|
A[Request enters inlet] --> B[Normalize tool IDs and optionally trim large tool outputs]
|
||||||
|
B --> C{Referenced chats attached?}
|
||||||
|
C -- No --> D[Load current chat summary if available]
|
||||||
|
C -- Yes --> E[Inspect each referenced chat]
|
||||||
|
|
||||||
|
E --> F{Existing cached summary?}
|
||||||
|
F -- Yes --> G[Reuse cached summary]
|
||||||
|
F -- No --> H{Fits direct budget?}
|
||||||
|
H -- Yes --> I[Inject full referenced chat text]
|
||||||
|
H -- No --> J[Prepare referenced-chat summary input]
|
||||||
|
|
||||||
|
J --> K{Referenced-chat summary call succeeds?}
|
||||||
|
K -- Yes --> L[Inject generated referenced summary]
|
||||||
|
K -- No --> M[Fallback to direct contextual injection]
|
||||||
|
|
||||||
|
G --> D
|
||||||
|
I --> D
|
||||||
|
L --> D
|
||||||
|
M --> D
|
||||||
|
|
||||||
|
D --> N[Build current-chat Head + Summary + Tail]
|
||||||
|
N --> O{Over max_context_tokens?}
|
||||||
|
O -- Yes --> P[Trim oldest atomic groups]
|
||||||
|
O -- No --> Q[Send final context to the model]
|
||||||
|
P --> Q
|
||||||
|
|
||||||
|
Q --> R[Model returns the reply]
|
||||||
|
R --> S[Outlet rebuilds the full history]
|
||||||
|
S --> T{Reached compression threshold?}
|
||||||
|
T -- No --> U[Finish]
|
||||||
|
T -- Yes --> V[Fit summary input to the summary model context]
|
||||||
|
|
||||||
|
V --> W{Background summary call succeeds?}
|
||||||
|
W -- Yes --> X[Save new chat summary and update status]
|
||||||
|
W -- No --> Y[Force browser-console error and show status hint]
|
||||||
|
```
|
||||||
|
|
||||||
|
This is why I consider the plugin “powerful” now: it is no longer solving a single problem. It is coordinating context reduction, summary persistence, tool safety, referenced-chat handling, and model-budget control inside one filter.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## New in v1.5.0
|
||||||
|
|
||||||
|
This release is important because it turns the plugin from “long-chat compression with strong tool safety” into something closer to a reusable context-management layer.
|
||||||
|
|
||||||
|
### External chat reference summaries
|
||||||
|
|
||||||
|
This is a new feature in `v1.5.0`, not just a small adjustment.
|
||||||
|
|
||||||
|
When a user references another chat:
|
||||||
|
|
||||||
|
- the plugin can reuse an existing cached summary
|
||||||
|
- inject the full referenced chat if it is small enough
|
||||||
|
- or generate a summary first if the referenced chat is too large
|
||||||
|
|
||||||
|
That means the filter can now carry relevant context across chats, not just across turns inside the same chat.
|
||||||
|
|
||||||
|
### Fast multilingual token estimation
|
||||||
|
|
||||||
|
Also new in `v1.5.0`.
|
||||||
|
|
||||||
|
The plugin no longer relies on a rough one-size-fits-all character ratio. It now estimates token usage with mixed-script heuristics that behave much better for:
|
||||||
|
|
||||||
|
- English
|
||||||
|
- Chinese
|
||||||
|
- Japanese
|
||||||
|
- Korean
|
||||||
|
- Cyrillic
|
||||||
|
- Arabic
|
||||||
|
- Thai
|
||||||
|
- mixed-language conversations
|
||||||
|
|
||||||
|
This matters because the plugin makes context decisions constantly. Better estimation means fewer unnecessary exact counts and fewer bad preflight assumptions.
|
||||||
|
|
||||||
|
### Stronger final-prompt budgeting
|
||||||
|
|
||||||
|
The summary path now fits the **real final summary request**, not just an intermediate estimate. That includes:
|
||||||
|
|
||||||
|
- prompt wrapper
|
||||||
|
- formatted conversation text
|
||||||
|
- previous summary
|
||||||
|
- reserved output budget
|
||||||
|
- safety margin
|
||||||
|
|
||||||
|
This directly improves reliability in the large old-chat cases that are hardest to handle.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why It Feels Complete Now
|
||||||
|
|
||||||
|
I would describe the current plugin as “feature-complete for the main problem space,” because it now covers the major operational surfaces that matter in real usage:
|
||||||
|
|
||||||
|
- long plain-chat conversations
|
||||||
|
- multi-step coding threads
|
||||||
|
- native tool-calling conversations
|
||||||
|
- persistent summaries
|
||||||
|
- custom model thresholds
|
||||||
|
- background async generation
|
||||||
|
- external chat references
|
||||||
|
- multilingual token estimation
|
||||||
|
- failure surfacing for debugging
|
||||||
|
|
||||||
|
That does not mean it is finished forever. It means the plugin has crossed the line from a narrow experimental filter into a robust context-management system with enough breadth to support demanding OpenWebUI usage patterns.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scale and Engineering Depth
|
||||||
|
|
||||||
|
For people who care about implementation depth, this plugin is not small anymore.
|
||||||
|
|
||||||
|
Current code size:
|
||||||
|
|
||||||
|
- main plugin: **4,573 lines**
|
||||||
|
- focused test file: **1,037 lines**
|
||||||
|
- combined visible implementation + regression coverage: **5,610 lines**
|
||||||
|
|
||||||
|
Line count is not a quality metric by itself, but at this scale it does say something real:
|
||||||
|
|
||||||
|
- the plugin has grown well beyond a toy filter
|
||||||
|
- the behavior surface is large enough to require explicit regression testing
|
||||||
|
- the plugin now encodes a lot of edge-case handling that only shows up after repeated real-world usage
|
||||||
|
|
||||||
|
In other words: this is no longer “just summarize old messages.” It is a fairly serious stateful filter.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical Benefits
|
||||||
|
|
||||||
|
If you use OpenWebUI heavily, the value is straightforward:
|
||||||
|
|
||||||
|
- lower token consumption in long chats
|
||||||
|
- better continuity across long-running sessions
|
||||||
|
- safer native tool-calling history
|
||||||
|
- fewer broken conversations after compression
|
||||||
|
- more stable summary generation on large histories
|
||||||
|
- better visibility when the provider rejects a summary request
|
||||||
|
- useful reuse of context from referenced chats
|
||||||
|
|
||||||
|
This plugin is especially valuable if you:
|
||||||
|
|
||||||
|
- regularly work in long coding chats
|
||||||
|
- use models with strict context budgets
|
||||||
|
- rely on native tool calling
|
||||||
|
- revisit old project chats
|
||||||
|
- want summaries to behave like working memory, not like lossy notes
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
- OpenWebUI Community: <https://openwebui.com/posts/async_context_compression_b1655bc8>
|
||||||
|
- Source: <https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression>
|
||||||
|
|
||||||
|
If you want the full valve list, deployment notes, and troubleshooting details, the README is the best reference.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Final Note
|
||||||
|
|
||||||
|
Do I think this plugin is powerful?
|
||||||
|
|
||||||
|
Yes, genuinely.
|
||||||
|
|
||||||
|
Not because it is large, but because it now solves the right combination of problems at once:
|
||||||
|
|
||||||
|
- cost control
|
||||||
|
- continuity
|
||||||
|
- structural safety
|
||||||
|
- async persistence
|
||||||
|
- cross-chat reuse
|
||||||
|
- operational debuggability
|
||||||
|
|
||||||
|
That combination is what makes it feel strong.
|
||||||
|
|
||||||
|
If you have been looking for a serious long-conversation memory/compression filter for OpenWebUI, `Async Context Compression` is now in that category.
|
||||||
282
plugins/filters/async-context-compression/community_post_CN.md
Normal file
282
plugins/filters/async-context-compression/community_post_CN.md
Normal file
@@ -0,0 +1,282 @@
|
|||||||
|
[](https://openwebui.com/posts/async_context_compression_b1655bc8)
|
||||||
|
|
||||||
|
# Async Context Compression:一个面向生产场景的 OpenWebUI 工作记忆过滤器
|
||||||
|
|
||||||
|
长对话的问题,从来不只是“贵”。
|
||||||
|
|
||||||
|
当聊天足够长时,通常只剩下两个都不太好的选择:
|
||||||
|
|
||||||
|
- 保留完整历史,继续承担很高的上下文成本
|
||||||
|
- 粗暴裁剪旧消息,但冒着丢失上下文、工具状态和关键决策的风险
|
||||||
|
|
||||||
|
`Async Context Compression` 的目标,就是尽量避免这个二选一。
|
||||||
|
|
||||||
|
它不是一个简单的“把老消息总结一下”的小工具,而是一个带有结构感知、异步摘要、数据库持久化能力的 OpenWebUI 工作记忆系统。它的任务不是单纯缩短上下文,而是在压缩长对话的同时,尽量保留:
|
||||||
|
|
||||||
|
- 对话连续性
|
||||||
|
- 工具调用状态完整性
|
||||||
|
- 历史摘要进度
|
||||||
|
- 跨聊天引用上下文
|
||||||
|
- 出错时的可诊断性
|
||||||
|
|
||||||
|
到 `v1.5.0` 这个阶段,我认为它已经不再只是一个“方便的小过滤器”,而是一个足够完整、足够强、也足够有工程深度的上下文管理插件。
|
||||||
|
|
||||||
|
**[📖 完整 README](https://github.com/Fu-Jie/openwebui-extensions/blob/main/plugins/filters/async-context-compression/README_CN.md)**
|
||||||
|
**[📝 v1.5.0 发布说明](https://github.com/Fu-Jie/openwebui-extensions/blob/main/plugins/filters/async-context-compression/v1.5.0_CN.md)**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 为什么会有这个插件
|
||||||
|
|
||||||
|
OpenWebUI 里的真实对话,通常并不只是“用户问一句,模型答一句”。
|
||||||
|
|
||||||
|
它常常还包含:
|
||||||
|
|
||||||
|
- 很长的项目型对话
|
||||||
|
- 多轮编码与调试
|
||||||
|
- 原生工具调用
|
||||||
|
- 多模态消息
|
||||||
|
- 不同模型上下文窗口差异
|
||||||
|
- 其他聊天的引用上下文
|
||||||
|
|
||||||
|
在这种环境里,单纯靠“按长度裁掉旧消息”其实不够。
|
||||||
|
|
||||||
|
如果一个过滤器只会按长度或索引裁剪消息,它很容易:
|
||||||
|
|
||||||
|
- 把原生 tool-calling 历史裁坏
|
||||||
|
- 丢掉仍然会影响下一轮回复的关键信息
|
||||||
|
- 在老聊天里破坏连续性
|
||||||
|
- 出问题时几乎无法排查
|
||||||
|
- 把上游 provider 报错伪装成模糊的内部错误
|
||||||
|
|
||||||
|
`Async Context Compression` 的核心思路更强一些:
|
||||||
|
|
||||||
|
> 可以压缩历史,但不能把“对话结构”当成无关紧要的东西一起压掉
|
||||||
|
|
||||||
|
它真正想保留的是下一轮最需要的状态:
|
||||||
|
|
||||||
|
- 当前目标
|
||||||
|
- 持久偏好
|
||||||
|
- 最近进展
|
||||||
|
- 仍然有效的工具结果
|
||||||
|
- 错误状态
|
||||||
|
- 已有摘要的连续性
|
||||||
|
- 来自其他聊天的相关上下文
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 它和普通摘要插件有什么不同
|
||||||
|
|
||||||
|
现在这个插件,实际上已经把几个通常要分散在不同系统里的能力组合到了一起:
|
||||||
|
|
||||||
|
### 1. 异步工作记忆生成
|
||||||
|
|
||||||
|
用户当前这次回复不会被后台摘要阻塞。
|
||||||
|
|
||||||
|
### 2. 持久化摘要存储
|
||||||
|
|
||||||
|
摘要会写入 OpenWebUI 共享数据库,并在后续轮次中复用,而不是每次都从头重算。
|
||||||
|
|
||||||
|
### 3. 结构感知裁剪
|
||||||
|
|
||||||
|
裁剪逻辑会尊重原子消息边界,避免把原生 tool-calling 历史裁坏。
|
||||||
|
|
||||||
|
### 4. 外部聊天引用摘要
|
||||||
|
|
||||||
|
这是 `v1.5.0` 新增的重要能力:被引用聊天现在可以直接复用缓存摘要、在小体量时直接注入、或者在过大时先生成摘要再注入。
|
||||||
|
|
||||||
|
### 5. 多语言 Token 预估
|
||||||
|
|
||||||
|
插件现在具备更强的多脚本文本 Token 预估逻辑,在很多情况下可以减少不必要的精确计数,同时明显比旧的粗略字符比值更贴近真实用量。
|
||||||
|
|
||||||
|
### 6. 失败可见性
|
||||||
|
|
||||||
|
关键的后台摘要失败现在会出现在浏览器控制台和状态提示里,不再悄悄消失。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 工作流总览
|
||||||
|
|
||||||
|
下面是当前的高层流程:
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TD
|
||||||
|
A[Request enters inlet] --> B[Normalize tool IDs and optionally trim large tool outputs]
|
||||||
|
B --> C{Referenced chats attached?}
|
||||||
|
C -- No --> D[Load current chat summary if available]
|
||||||
|
C -- Yes --> E[Inspect each referenced chat]
|
||||||
|
|
||||||
|
E --> F{Existing cached summary?}
|
||||||
|
F -- Yes --> G[Reuse cached summary]
|
||||||
|
F -- No --> H{Fits direct budget?}
|
||||||
|
H -- Yes --> I[Inject full referenced chat text]
|
||||||
|
H -- No --> J[Prepare referenced-chat summary input]
|
||||||
|
|
||||||
|
J --> K{Referenced-chat summary call succeeds?}
|
||||||
|
K -- Yes --> L[Inject generated referenced summary]
|
||||||
|
K -- No --> M[Fallback to direct contextual injection]
|
||||||
|
|
||||||
|
G --> D
|
||||||
|
I --> D
|
||||||
|
L --> D
|
||||||
|
M --> D
|
||||||
|
|
||||||
|
D --> N[Build current-chat Head + Summary + Tail]
|
||||||
|
N --> O{Over max_context_tokens?}
|
||||||
|
O -- Yes --> P[Trim oldest atomic groups]
|
||||||
|
O -- No --> Q[Send final context to the model]
|
||||||
|
P --> Q
|
||||||
|
|
||||||
|
Q --> R[Model returns the reply]
|
||||||
|
R --> S[Outlet rebuilds the full history]
|
||||||
|
S --> T{Reached compression threshold?}
|
||||||
|
T -- No --> U[Finish]
|
||||||
|
T -- Yes --> V[Fit summary input to the summary model context]
|
||||||
|
|
||||||
|
V --> W{Background summary call succeeds?}
|
||||||
|
W -- Yes --> X[Save new chat summary and update status]
|
||||||
|
W -- No --> Y[Force browser-console error and show status hint]
|
||||||
|
```
|
||||||
|
|
||||||
|
这也是为什么我会觉得它现在“强”:它已经不再只解决一个问题,而是在一个过滤器里同时协调:
|
||||||
|
|
||||||
|
- 上下文压缩
|
||||||
|
- 历史摘要复用
|
||||||
|
- 工具调用安全性
|
||||||
|
- 被引用聊天上下文
|
||||||
|
- 模型预算控制
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## v1.5.0 为什么重要
|
||||||
|
|
||||||
|
这个版本的重要性在于,它把插件从“长对话压缩器”推进成了一个更接近“上下文管理层”的东西。
|
||||||
|
|
||||||
|
### 外部聊天引用摘要
|
||||||
|
|
||||||
|
这是 `v1.5.0` 的新功能,不是小修小补。
|
||||||
|
|
||||||
|
当用户引用另一个聊天时,插件现在可以:
|
||||||
|
|
||||||
|
- 直接复用已有缓存摘要
|
||||||
|
- 如果聊天足够小,直接把完整内容注入
|
||||||
|
- 如果聊天太大,先生成摘要再注入
|
||||||
|
|
||||||
|
这意味着它现在不仅能跨“轮次”保留上下文,也能开始跨“聊天”携带相关上下文。
|
||||||
|
|
||||||
|
### 快速多语言 Token 预估
|
||||||
|
|
||||||
|
这同样是 `v1.5.0` 的新能力。
|
||||||
|
|
||||||
|
插件不再依赖简单粗暴的统一字符比值,而是改用更适合混合语言文本的估算方式,尤其对下面这些场景更有意义:
|
||||||
|
|
||||||
|
- 英文
|
||||||
|
- 中文
|
||||||
|
- 日文
|
||||||
|
- 韩文
|
||||||
|
- 西里尔字符
|
||||||
|
- 阿拉伯语
|
||||||
|
- 泰语
|
||||||
|
- 中英混合或多语言混合对话
|
||||||
|
|
||||||
|
这很重要,因为上下文管理类插件会不断做预算判断。预估更准,就意味着更少无意义的精确计算,也更不容易在预检阶段做出错误判断。
|
||||||
|
|
||||||
|
### 更强的最终请求预算控制
|
||||||
|
|
||||||
|
现在的摘要路径会去拟合“真实最终 summary request”,而不是只看一个中间估算值。它会把这些内容都算进去:
|
||||||
|
|
||||||
|
- prompt 包装
|
||||||
|
- 格式化后的对话文本
|
||||||
|
- previous summary
|
||||||
|
- 预留输出预算
|
||||||
|
- 安全余量
|
||||||
|
|
||||||
|
这对老聊天、大聊天和最难处理的边界情况特别关键。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 为什么我觉得它现在已经足够完整
|
||||||
|
|
||||||
|
如果把“问题空间”列出来,我会说这个插件现在对主要场景已经覆盖得比较完整了:
|
||||||
|
|
||||||
|
- 很长的普通聊天
|
||||||
|
- 多轮编码与调试对话
|
||||||
|
- 原生工具调用
|
||||||
|
- 历史摘要持久化
|
||||||
|
- 自定义模型阈值
|
||||||
|
- 异步后台摘要
|
||||||
|
- 外部聊天引用
|
||||||
|
- 多语言 Token 预估
|
||||||
|
- 调试可见性
|
||||||
|
|
||||||
|
这并不代表它永远不会再迭代,而是说它已经越过了“窄功能实验品”的阶段,进入了一个更像“通用上下文管理系统”的形态。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 代码规模与工程深度
|
||||||
|
|
||||||
|
如果你关心实现深度,这个插件现在已经不小了。
|
||||||
|
|
||||||
|
当前代码规模:
|
||||||
|
|
||||||
|
- 主插件文件:**4,573 行**
|
||||||
|
- 聚焦测试文件:**1,037 行**
|
||||||
|
- 可见实现 + 回归测试合计:**5,610 行**
|
||||||
|
|
||||||
|
代码行数本身不等于质量,但在这个量级上,它至少说明了几件真实的事:
|
||||||
|
|
||||||
|
- 这已经不是一个玩具级过滤器
|
||||||
|
- 这个插件的行为面足够大,必须靠专门回归测试兜住
|
||||||
|
- 它已经积累了很多只有在真实使用中才会暴露出来的边界处理逻辑
|
||||||
|
|
||||||
|
也就是说,它现在做的事情,已经明显不是“把老消息总结一下”那么简单。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 实际价值
|
||||||
|
|
||||||
|
如果你是 OpenWebUI 的重度用户,这个插件的价值其实很直接:
|
||||||
|
|
||||||
|
- 长聊天更省 Token
|
||||||
|
- 长会话连续性更好
|
||||||
|
- 原生 tool-calling 更安全
|
||||||
|
- 压缩后更不容易把会话搞坏
|
||||||
|
- 大历史摘要生成更稳定
|
||||||
|
- provider 拒绝摘要请求时更容易看到真错误
|
||||||
|
- 能复用其他聊天里的有效上下文
|
||||||
|
|
||||||
|
尤其适合这些用户:
|
||||||
|
|
||||||
|
- 经常做长时间编码聊天
|
||||||
|
- 使用上下文窗口比较紧的模型
|
||||||
|
- 依赖原生工具调用
|
||||||
|
- 经常回看旧项目聊天
|
||||||
|
- 希望摘要更像“工作记忆”而不是“丢失细节的简要笔记”
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 安装
|
||||||
|
|
||||||
|
- OpenWebUI 社区:<https://openwebui.com/posts/async_context_compression_b1655bc8>
|
||||||
|
- 源码目录:<https://github.com/Fu-Jie/openwebui-extensions/tree/main/plugins/filters/async-context-compression>
|
||||||
|
|
||||||
|
如果你想看完整的 valves、部署说明和故障排查,README 仍然是最完整的参考入口。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 最后一句
|
||||||
|
|
||||||
|
你问我这个插件是不是强大。
|
||||||
|
|
||||||
|
我的答案是:**是,确实强,而且现在已经不是“看起来强”,而是“问题空间覆盖得比较完整”的那种强。**
|
||||||
|
|
||||||
|
不是因为它代码多,而是因为它现在同时解决的是一组真正相关的问题:
|
||||||
|
|
||||||
|
- 成本控制
|
||||||
|
- 连续性
|
||||||
|
- 结构安全
|
||||||
|
- 异步持久化
|
||||||
|
- 跨聊天上下文复用
|
||||||
|
- 出错时的可诊断性
|
||||||
|
|
||||||
|
正是这几个东西一起成立,才让它现在像一个真正成熟的长对话上下文管理插件。
|
||||||
Reference in New Issue
Block a user