feat(async-context-compression): release v1.2.1 with smart config & optimizations
This release introduces significant improvements to configuration flexibility, performance, and stability.
**Key Changes:**
* **Smart Configuration:**
* Added `summary_model_max_context` to allow independent context limits for the summary model (e.g., using `gemini-flash` with 1M context to summarize `gpt-4` history).
* Implemented auto-detection of base model settings for custom models, ensuring correct threshold application.
* **Performance & Refactoring:**
* Optimized `model_thresholds` parsing with caching to reduce overhead.
* Refactored `inlet` and `outlet` logic to remove redundant code and improve maintainability.
* Replaced all `print` statements with proper `logging` calls for better production monitoring.
* **Bug Fixes & Modernization:**
* Fixed `datetime.utcnow()` deprecation warnings by switching to timezone-aware `datetime.now(timezone.utc)`.
* Corrected type annotations and improved error handling for `JSONResponse` objects from LLM backends.
* Removed hard truncation in summary generation to allow full context usage.
**Files Updated:**
* Plugin source code (English & Chinese)
* Documentation and READMEs
* Version bumped to 1.2.1
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
# Async Context Compression
|
||||
|
||||
<span class="category-badge filter">Filter</span>
|
||||
<span class="version-badge">v1.2.0</span>
|
||||
<span class="version-badge">v1.2.1</span>
|
||||
|
||||
Reduces token consumption in long conversations through intelligent summarization while maintaining conversational coherence.
|
||||
|
||||
@@ -38,6 +38,8 @@ This is especially useful for:
|
||||
- :material-format-align-justify: **Structure-Aware Trimming**: Preserves document structure
|
||||
- :material-content-cut: **Native Tool Output Trimming**: Trims verbose tool outputs (Note: Non-native tool outputs are not fully injected into context)
|
||||
- :material-chart-bar: **Detailed Token Logging**: Granular token breakdown
|
||||
- :material-account-search: **Smart Model Matching**: Inherit config from base models
|
||||
- :material-image-off: **Multimodal Support**: Images are preserved but tokens are **NOT** calculated
|
||||
|
||||
---
|
||||
|
||||
@@ -73,6 +75,7 @@ graph TD
|
||||
| `keep_first` | integer | `1` | Always keep the first N messages |
|
||||
| `keep_last` | integer | `6` | Always keep the last N messages |
|
||||
| `summary_model` | string | `None` | Model to use for summarization |
|
||||
| `summary_model_max_context` | integer | `0` | Max context tokens for summary model |
|
||||
| `max_summary_tokens` | integer | `16384` | Maximum tokens for the summary |
|
||||
| `enable_tool_output_trimming` | boolean | `false` | Enable trimming of large tool outputs |
|
||||
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# Async Context Compression(异步上下文压缩)
|
||||
|
||||
<span class="category-badge filter">Filter</span>
|
||||
<span class="version-badge">v1.2.0</span>
|
||||
<span class="version-badge">v1.2.1</span>
|
||||
|
||||
通过智能摘要减少长对话的 token 消耗,同时保持对话连贯。
|
||||
|
||||
@@ -38,6 +38,8 @@ Async Context Compression 过滤器通过以下方式帮助管理长对话的 to
|
||||
- :material-format-align-justify: **结构感知裁剪**:保留文档结构的智能裁剪
|
||||
- :material-content-cut: **原生工具输出裁剪**:自动裁剪冗长的工具输出(注意:非原生工具调用输出不会完整注入上下文)
|
||||
- :material-chart-bar: **详细 Token 日志**:提供细粒度的 Token 统计
|
||||
- :material-account-search: **智能模型匹配**:自定义模型自动继承基础模型配置
|
||||
- :material-image-off: **多模态支持**:图片内容保留但 Token **不参与计算**
|
||||
|
||||
---
|
||||
|
||||
@@ -73,6 +75,7 @@ graph TD
|
||||
| `keep_first` | integer | `1` | 始终保留的前 N 条消息 |
|
||||
| `keep_last` | integer | `6` | 始终保留的后 N 条消息 |
|
||||
| `summary_model` | string | `None` | 用于摘要的模型 |
|
||||
| `summary_model_max_context` | integer | `0` | 摘要模型的最大上下文 Token 数 |
|
||||
| `max_summary_tokens` | integer | `16384` | 摘要的最大 token 数 |
|
||||
| `enable_tool_output_trimming` | boolean | `false` | 启用长工具输出裁剪 |
|
||||
|
||||
|
||||
@@ -22,7 +22,7 @@ Filters act as middleware in the message pipeline:
|
||||
|
||||
Reduces token consumption in long conversations through intelligent summarization while maintaining coherence.
|
||||
|
||||
**Version:** 1.1.3
|
||||
**Version:** 1.2.1
|
||||
|
||||
[:octicons-arrow-right-24: Documentation](async-context-compression.md)
|
||||
|
||||
|
||||
@@ -22,7 +22,7 @@ Filter 充当消息管线中的中间件:
|
||||
|
||||
通过智能总结减少长对话的 token 消耗,同时保持连贯性。
|
||||
|
||||
**版本:** 1.1.3
|
||||
**版本:** 1.2.1
|
||||
|
||||
[:octicons-arrow-right-24: 查看文档](async-context-compression.md)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user