Files
Fu-Jie_openwebui-extensions/plugins/debug/byok-infinite-session-research/data-flow-analysis.md
fujie d29c24ba4a feat(openwebui-skills-manager): enhance auto-discovery and structural refactoring
- Enable default overwrite installation policy for overlapping skills
- Support deep recursive GitHub trees discovery mechanism to resolve #58
- Refactor internal architecture to fully decouple stateless helper logic
- READMEs and docs synced (v0.3.0)
2026-03-08 18:21:21 +08:00

9.7 KiB
Raw Blame History

数据流分析SDK如何获知用户设计的数据

当前数据流从OpenWebUI → Pipe → SDK

┌─────────────────────┐
│   OpenWebUI UI       │
│  (用户选择模型)      │
└──────────┬──────────┘
           │
           ├─ body.model = "gpt-4"
           ├─ body.messages = [...]
           ├─ __metadata__.base_model_id = ?
           ├─ __metadata__.custom_fields = ?
           └─ __user__.settings = ?
           │
┌──────────▼──────────┐
│  Pipe (github-     │
│   copilot-sdk.py)   │
│                     │
│ 1. 提取model信息    │
│ 2. 应用Valves配置  │
│ 3. 建立SDK会话     │
└──────────┬──────────┘
           │
           ├─ SessionConfig {
           │    model: real_model_id
           │    provider: ProviderConfig (若BYOK)
           │    infinite_sessions: {...}
           │    system_message: {...}
           │    ...
           │  }
           │
┌──────────▼──────────┐
│  Copilot SDK        │
│ (create_session)    │
│                     │
│ 返回:ModelInfo {    │
│   capabilities {    │
│    limits {         │
│      max_context_   │
│      window_tokens  │
│    }               │
│   }                │
│ }                  │
└─────────────────────┘

关键问题当前的3个瓶颈

瓶颈1用户数据的输入点

当前支持的输入方式:

  1. Valves配置全局 + 用户级)

    # 全局设置Admin
    Valves.BYOK_BASE_URL = "https://api.openai.com/v1"
    Valves.BYOK_API_KEY = "sk-..."
    
    # 用户级覆盖
    UserValves.BYOK_API_KEY = "sk-..." (用户自己的key)
    UserValves.BYOK_BASE_URL = "..."
    

    问题无法为特定的BYOK模型设置上下文窗口大小

  2. metadata来自OpenWebUI

    __metadata__ = {
        "base_model_id": "...",
        "custom_fields": {...},  # ← 可能包含额外信息
        "tool_ids": [...],
    }
    

    问题不清楚OpenWebUI是否支持通过metadata传递模型的上下文窗口

  3. body来自对话请求

    body = {
        "model": "gpt-4",
        "messages": [...],
        "temperature": 0.7,
        # ← 这里能否添加自定义字段?
    }
    

瓶颈2模型信息的识别和存储

当前代码 (line 5905+)

# 解析用户选择的模型
request_model = body.get("model", "")  # e.g., "gpt-4"
real_model_id = request_model

# 确定实际模型ID
base_model_id = _container_get(__metadata__, "base_model_id", "")

if base_model_id:
    resolved_id = base_model_id  # 使用元数据中的ID
else:
    resolved_id = request_model   # 使用用户选择的ID

问题

  • 没有维护一个"模型元数据缓存"
  • 对相同模型的重复请求,每次都需要重新识别
  • 不能为特定模型持久化上下文窗口大小

瓶颈3SDK会话配置的构建

当前实现 (line 5058-5100)

def _build_session_config(
    self,
    real_model_id,      # ← 模型ID
    system_prompt_content,
    is_streaming=True,
    is_admin=False,
    # ... 其他参数
):
    # 无条件地创建infinite session
    if self.valves.INFINITE_SESSION:
        infinite_session_config = InfiniteSessionConfig(
            enabled=True,
            background_compaction_threshold=self.valves.COMPACTION_THRESHOLD,  # 0.80
            buffer_exhaustion_threshold=self.valves.BUFFER_THRESHOLD,          # 0.95
        )
    
    # ❌ 这里没有查询该模型的实际上下文窗口大小
    # ❌ 无法根据模型的真实限制调整压缩阈值

解决方案3个数据流改进步骤

步骤1添加模型元数据配置优先级

在Valves中添加一个模型元数据映射

class Valves(BaseModel):
    # ... 现有配置 ...
    
    # 新增:模型上下文窗口映射 (JSON格式)
    MODEL_CONTEXT_WINDOWS: str = Field(
        default="{}",  # JSON string
        description='Model context window mapping (JSON). Example: {"gpt-4": 8192, "gpt-4-turbo": 128000, "claude-3": 200000}'
    )
    
    # 新增BYOK模型特定设置 (JSON格式)
    BYOK_MODEL_CONFIG: str = Field(
        default="{}",  # JSON string
        description='BYOK-specific model configuration (JSON). Example: {"gpt-4": {"context_window": 8192, "enable_compression": true}}'
    )

如何使用

# Valves中设置
MODEL_CONTEXT_WINDOWS = '{"gpt-4": 8192, "claude-3-5-sonnet": 200000}'

# Pipe中解析
def _get_model_context_window(self, model_id: str) -> Optional[int]:
    """从配置中获取模型的上下文窗口大小"""
    try:
        config = json.loads(self.valves.MODEL_CONTEXT_WINDOWS or "{}")
        return config.get(model_id)
    except:
        return None

步骤2建立模型信息缓存优先级

在Pipe中维护一个模型信息缓存

class Pipe:
    def __init__(self):
        # ... 现有代码 ...
        self._model_info_cache = {}  # model_id -> ModelInfo
        self._context_window_cache = {}  # model_id -> context_window_tokens

    def _cache_model_info(self, model_id: str, model_info: ModelInfo):
        """缓存SDK返回的模型信息"""
        self._model_info_cache[model_id] = model_info
        if model_info.capabilities and model_info.capabilities.limits:
            self._context_window_cache[model_id] = (
                model_info.capabilities.limits.max_context_window_tokens
            )

    def _get_context_window(self, model_id: str) -> Optional[int]:
        """获取模型的上下文窗口大小优先级SDK > Valves配置 > 默认值)"""
        # 1. 优先从SDK缓存获取最可靠
        if model_id in self._context_window_cache:
            return self._context_window_cache[model_id]
        
        # 2. 其次从Valves配置获取
        context_window = self._get_model_context_window(model_id)
        if context_window:
            return context_window
        
        # 3. 默认值(未知)
        return None

步骤3使用真实的上下文窗口来优化压缩策略优先级

修改_build_session_config

def _build_session_config(
    self,
    real_model_id,
    # ... 其他参数 ...
    **kwargs
):
    # 获取模型的真实上下文窗口大小
    actual_context_window = self._get_context_window(real_model_id)
    
    # 只对有明确上下文窗口的模型启用压缩
    infinite_session_config = None
    if self.valves.INFINITE_SESSION and actual_context_window:
        # 现在压缩阈值有了明确的含义
        infinite_session_config = InfiniteSessionConfig(
            enabled=True,
            # 80% of actual context window
            background_compaction_threshold=self.valves.COMPACTION_THRESHOLD,
            # 95% of actual context window
            buffer_exhaustion_threshold=self.valves.BUFFER_THRESHOLD,
        )
        
        await self._emit_debug_log(
            f"Infinite Session: model_context={actual_context_window}tokens, "
            f"compaction_triggers_at={int(actual_context_window * self.valves.COMPACTION_THRESHOLD)}, "
            f"buffer_triggers_at={int(actual_context_window * self.valves.BUFFER_THRESHOLD)}",
            __event_call__,
        )
    elif self.valves.INFINITE_SESSION and not actual_context_window:
        logger.warning(
            f"Infinite Session: Unknown context window for {real_model_id}, "
            f"compression disabled. Set MODEL_CONTEXT_WINDOWS in Valves to enable."
        )

具体的配置示例

例子1用户配置BYOK模型的上下文窗口

Valves设置

MODEL_CONTEXT_WINDOWS = {
  "gpt-4": 8192,
  "gpt-4-turbo": 128000,
  "gpt-4o": 128000,
  "claude-3": 200000,
  "claude-3.5-sonnet": 200000,
  "llama-2-70b": 4096
}

效果

  • Pipe会知道"gpt-4"的上下文是8192 tokens
  • 压缩会在 ~6553 tokens (80%) 时触发
  • 缓冲会在 ~7782 tokens (95%) 时阻塞

例子2为特定BYOK模型启用/禁用压缩

Valves设置

BYOK_MODEL_CONFIG = {
  "gpt-4": {
    "context_window": 8192,
    "enable_infinite_session": true,
    "compaction_threshold": 0.75
  },
  "llama-2-70b": {
    "context_window": 4096,
    "enable_infinite_session": false  # 禁用压缩
  }
}

Pipe逻辑

# 检查模型特定的压缩设置
def _get_compression_enabled(self, model_id: str) -> bool:
    try:
        config = json.loads(self.valves.BYOK_MODEL_CONFIG or "{}")
        model_config = config.get(model_id, {})
        return model_config.get("enable_infinite_session", self.valves.INFINITE_SESSION)
    except:
        return self.valves.INFINITE_SESSION

总结SDK如何获知用户设计的数据

来源 方式 更新 示例
Valves 全局配置 Admin提前设置 MODEL_CONTEXT_WINDOWS JSON
SDK SessionConfig返回 每次会话创建 model_info.capabilities.limits
缓存 Pipe本地存储 首次获取后缓存 _context_window_cache
metadata OpenWebUI传递 每次请求随带 base_model_id, custom fields

流程

  1. 用户在Valves中配置 MODEL_CONTEXT_WINDOWS
  2. Pipe在session创建时获取SDK返回的model_info
  3. Pipe缓存上下文窗口大小
  4. Pipe根据真实窗口大小调整infinite session的阈值
  5. SDK使用正确的压缩策略

这样,SDK完全知道用户设计的数据而无需任何修改SDK本身。