feat(openwebui-skills-manager): enhance auto-discovery and structural refactoring

- Enable default overwrite installation policy for overlapping skills - Support deep recursive GitHub trees discovery mechanism to resolve #58 - Refactor internal architecture to fully decouple stateless helper logic - READMEs and docs synced (v0.3.0)
2026-03-08 18:21:21 +08:00
parent 55a9c6ffb5
commit d29c24ba4a
30 changed files with 5417 additions and 598 deletions
--- a/plugins/debug/byok-infinite-session-research/analysis.md
+++ b/plugins/debug/byok-infinite-session-research/analysis.md
@@ -0,0 +1,206 @@
+# BYOK模式与Infinite Session(自动上下文压缩)兼容性研究
+
+**日期**: 2026-03-08  
+**研究范围**: Copilot SDK v0.1.30 + OpenWebUI Extensions Pipe v0.10.0
+
+## 研究问题
+在BYOK (Bring Your Own Key) 模式下，是否应该支持自动上下文压缩(Infinite Sessions)?  
+用户报告：BYOK模式本不应该触发压缩，但当模型名称与Copilot内置模型一致时，意外地支持了压缩。
+
+---
+
+## 核心发现
+
+### 1. SDK层面（copilot-sdk/python/copilot/types.py）
+
+**InfiniteSessionConfig 定义** (line 453-470):
+```python
+class InfiniteSessionConfig(TypedDict, total=False):
+    """
+    Configuration for infinite sessions with automatic context compaction
+    and workspace persistence.
+    """
+    enabled: bool
+    background_compaction_threshold: float  # 0.0-1.0, default: 0.80
+    buffer_exhaustion_threshold: float      # 0.0-1.0, default: 0.95
+```
+
+**SessionConfig结构** (line 475+):
+- `provider: ProviderConfig` - 用于BYOK配置
+- `infinite_sessions: InfiniteSessionConfig` - 上下文压缩配置
+- **关键**: 这两个配置是**完全独立的**，没有相互依赖关系
+
+### 2. OpenWebUI Pipe层面（github_copilot_sdk.py）
+
+**Infinite Session初始化** (line 5063-5069):
+```python
+infinite_session_config = None
+if self.valves.INFINITE_SESSION:  # 默认值: True
+    infinite_session_config = InfiniteSessionConfig(
+        enabled=True,
+        background_compaction_threshold=self.valves.COMPACTION_THRESHOLD,
+        buffer_exhaustion_threshold=self.valves.BUFFER_THRESHOLD,
+    )
+```
+
+**关键问题**: 
+- ✗ 没有任何条件检查 `is_byok_model`
+- ✗ 无论使用官方模型还是BYOK模型，都会应用相同的infinite session配置
+- ✓ 回对比，reasoning_effort被正确地在BYOK模式下禁用（line 6329-6331）
+
+### 3. 模型识别逻辑（line 6199+）
+
+```python
+if m_info and "source" in m_info:
+    is_byok_model = m_info["source"] == "byok"
+else:
+    is_byok_model = not has_multiplier and byok_active
+```
+
+BYOK模型识别基于:
+1. 模型元数据中的 `source` 字段
+2. 或者根据是否有乘数标签 (如 "4x", "0.5x") 和globally active的BYOK配置
+
+---
+
+## 技术可行性分析
+
+### ✅ Infinite Sessions在BYOK模式下是技术可行的：
+
+1. **SDK支持**: Copilot SDK允许在任何provider (官方、BYOK、Azure等) 下使用infinite session配置
+2. **配置独立性**: provider和infinite_sessions配置在SessionConfig中是独立的字段
+3. **无文档限制**: SDK文档中没有说BYOK模式不支持infinite sessions
+4. **测试覆盖**: SDK虽然有单独的BYOK测试和infinite-sessions测试，但缺少组合测试
+
+### ⚠️ 但存在以下设计问题：
+
+#### 问题1: 意外的自动启用
+- BYOK模式通常用于**精确控制**自己的API使用
+- 自动压缩可能会导致**意外的额外请求**和API成本增加
+- 没有明确的警告或文档说明BYOK也会压缩
+
+#### 问题2: 没有模式特定的配置
+```python
+# 当前实现 - 一刀切
+if self.valves.INFINITE_SESSION:
+    # 同时应用于官方模型和BYOK模型
+    
+# 应该是 - 模式感知
+if self.valves.INFINITE_SESSION and not is_byok_model:
+    # 仅对官方模型启用
+# 或者
+if self.valves.INFINITE_SESSION_BYOK and is_byok_model:
+    # BYOK专用配置
+```
+
+#### 问题3: 压缩质量不确定性
+- BYOK模型可能是自部署的或开源模型
+- 上下文压缩由Copilot CLI处理，质量取决于CLI版本
+- 没有标准化的压缩效果评估
+
+---
+
+## 用户报告现象的根本原因
+
+用户说："BYOK模式本不应该触发压缩，但碰巧用的模型名称与Copilot内置模型相同，结果意外触发了压缩"
+
+**分析**:
+1. OpenWebUI Pipe中，infinite_session配置是**全局启用**的 (INFINITE_SESSION=True)
+2. 模型识别逻辑中，如果模型元数据丢失，会根据模型名称和BYOK活跃状态来推断
+3. 如果用户使用的BYOK模型名称恰好是 "gpt-4", "claude-3-5-sonnet" 等，可能被识别错误
+4. 或者用户根本没意识到infinite session在BYOK模式下也被启用了
+
+---
+
+## 建议方案
+
+### 方案1: 保守方案（推荐）
+**禁用BYOK模式下的automatic compression**
+
+```python
+infinite_session_config = None
+# 只对标准官方模型启用，不对BYOK启用
+if self.valves.INFINITE_SESSION and not is_byok_model:
+    infinite_session_config = InfiniteSessionConfig(
+        enabled=True,
+        background_compaction_threshold=self.valves.COMPACTION_THRESHOLD,
+        buffer_exhaustion_threshold=self.valves.BUFFER_THRESHOLD,
+    )
+```
+
+**优点**:
+- 尊重BYOK用户的成本控制意愿
+- 降低意外API使用风险
+- 与reasoning_effort的BYOK禁用保持一致
+
+**缺点**: 限制了BYOK用户的功能
+
+### 方案2: 灵活方案
+**添加独立的BYOK compression配置**
+
+```python
+class Valves(BaseModel):
+    INFINITE_SESSION: bool = Field(
+        default=True,
+        description="Enable Infinite Sessions for standard Copilot models"
+    )
+    INFINITE_SESSION_BYOK: bool = Field(
+        default=False,
+        description="Enable Infinite Sessions for BYOK models (advanced users only)"
+    )
+
+# 使用逻辑
+if (self.valves.INFINITE_SESSION and not is_byok_model) or \
+   (self.valves.INFINITE_SESSION_BYOK and is_byok_model):
+    infinite_session_config = InfiniteSessionConfig(...)
+```
+
+**优点**:
+- 给BYOK用户完全控制
+- 保持向后兼容性
+- 允许高级用户启用
+
+**缺点**: 增加配置复杂度
+
+### 方案3: 警告+ 文档
+**保持当前实现，但添加文档说明**
+
+- 在README中明确说明infinite session对所有provider类型都启用
+- 添加Valve描述提示: "Applies to both standard Copilot and BYOK models"
+- 在BYOK配置部分明确提到压缩成本
+
+**优点**: 减少实现负担，给用户知情权
+
+**缺点**: 对已经启用的用户无帮助
+
+---
+
+## 推荐实施
+
+**优先级**: 高  
+**建议实施方案**: **方案1 (保守方案)** 或 **方案2 (灵活方案)**
+
+如果选择方案1: 修改line 5063处的条件判断  
+如果选择方案2: 添加INFINITE_SESSION_BYOK配置 + 修改初始化逻辑
+
+---
+
+## 相关代码位置
+
+| 文件 | 行号 | 说明 |
+|-----|------|------|
+| `github_copilot_sdk.py` | 364-366 | INFINITE_SESSION Valve定义 |
+| `github_copilot_sdk.py` | 5063-5069 | Infinite session初始化 |
+| `github_copilot_sdk.py` | 6199-6220 | is_byok_model判断逻辑 |
+| `github_copilot_sdk.py` | 6329-6331 | reasoning_effort BYOK处理（参考） |
+
+---
+
+## 结论
+
+**BYOK模式与Infinite Sessions的兼容性**:
+- ✅ 技术上完全可行
+- ⚠️ 但存在设计意图不清的问题
+- ✗ 当前实现对BYOK用户可能不友好
+
+**推荐**: 实施方案1或2之一，增加BYOK模式的控制粒度。
--- a/plugins/debug/byok-infinite-session-research/client-architecture.md
+++ b/plugins/debug/byok-infinite-session-research/client-architecture.md
@@ -0,0 +1,295 @@
+# Client传入和管理分析
+
+## 当前的Client管理架构
+
+```
+┌────────────────────────────────────────┐
+│  Pipe Instance (github_copilot_sdk.py)  │
+│                                        │
+│  _shared_clients = {                   │
+│    "token_hash_1": CopilotClient(...), │  ← 基于GitHub Token缓存
+│    "token_hash_2": CopilotClient(...), │
+│  }                                     │
+└────────────────────────────────────────┘
+         │
+         │ await _get_client(token)
+         │
+         ▼
+┌────────────────────────────────────────┐
+│  CopilotClient Instance                │
+│                                        │
+│  [仅需GitHub Token配置]                │
+│                                        │
+│  config {                              │
+│    github_token: "ghp_...",            │
+│    cli_path: "...",                    │
+│    config_dir: "...",                  │
+│    env: {...},                         │
+│    cwd: "..."                          │
+│  }                                     │
+└────────────────────────────────────────┘
+         │
+         │ create_session(session_config)
+         │
+         ▼
+┌────────────────────────────────────────┐
+│  Session (per-session configuration)   │
+│                                        │
+│  session_config {                      │
+│    model: "real_model_id",             │
+│    provider: {                         │ ← ⭐ BYOK配置在这里
+│      type: "openai",                   │
+│      base_url: "https://api.openai...", 
+│      api_key: "sk-...",                │
+│      ...                               │
+│    },                                  │
+│    infinite_sessions: {...},           │
+│    system_message: {...},              │
+│    ...                                 │
+│  }                                     │
+└────────────────────────────────────────┘
+```
+
+---
+
+## 目前的流程（代码实际位置）
+
+### 步骤1：获取或创建Client（line 6208）
+```python
+# _pipe_impl中
+client = await self._get_client(token)
+```
+
+### 步骤2：_get_client函数（line 5523-5561）
+```python
+async def _get_client(self, token: str) -> Any:
+    """Get or create the persistent CopilotClient from the pool based on token."""
+    if not token:
+        raise ValueError("GitHub Token is required to initialize CopilotClient")
+    
+    token_hash = hashlib.md5(token.encode()).hexdigest()
+    
+    # 查看是否已有缓存的client
+    client = self.__class__._shared_clients.get(token_hash)
+    if client and client状态正常:
+        return client  # ← 复用已有的client
+    
+    # 否则创建新client
+    client_config = self._build_client_config(user_id=None, chat_id=None)
+    client_config["github_token"] = token
+    new_client = CopilotClient(client_config)
+    await new_client.start()
+    self.__class__._shared_clients[token_hash] = new_client
+    return new_client
+```
+
+### 步骤3：创建会话时传入provider（line 6253-6270）
+```python
+# _pipe_impl中，BYOK部分
+if is_byok_model:
+    provider_config = {
+        "type": byok_type,          # "openai" or "anthropic"
+        "wire_api": byok_wire_api,
+        "base_url": byok_base_url,
+        "api_key": byok_api_key or None,
+        "bearer_token": byok_bearer_token or None,
+    }
+
+# 然后传入session config
+session = await client.create_session(config={
+    "model": real_model_id,
+    "provider": provider_config,  # ← provider在这里传给session
+    ...
+})
+```
+
+---
+
+## 关键问题：架构的2个层级
+
+| 层级 | 用途 | 配置内容 | 缓存方式 |
+|------|------|---------|---------|
+| **CopilotClient** | CLI和运行时底层逻辑 | GitHub Token, CLI path, 环境变量 | 基于token_hash全局缓存 |
+| **Session** | 具体的对话会话 | Model, Provider(BYOK), Tools, System Prompt | 不缓存（每次新建） |
+
+---
+
+## 当前的问题
+
+### 问题1：Client是全局缓存的，但Provider是会话级别的
+```python
+# ❓ 如果用户想为不同的BYOK模型使用不同的Client呢？
+# 当前无法做到，因为Client基于token缓存是全局的
+
+# 例子：
+# Client A: OpenAI API key (token_hash_1)
+# Client B: Anthropic API key (token_hash_2)
+
+# 但在Pipe中，只有一个GH_TOKEN，导致只能有一个Client
+```
+
+### 问题2：Provider和Client是不同的东西
+```python
+# CopilotClient = GitHub Copilot SDK客户端
+# ProviderConfig = OpenAI/Anthropic等的API配置
+
+# 用户可能混淆：
+# "怎么传入BYOK的client和provider"
+# → 实际上只能传provider到session，client是全局的
+```
+
+### 问题3：BYOK模型混用的情况处理不清楚
+```python
+# 如果用户想在同一个Pipe中：
+# - Model A 用 OpenAI API
+# - Model B 用 Anthropic API
+# - Model C 用自己的本地LLM
+
+# 当前代码是基于全局BYOK配置的，无法为各模型单独设置
+```
+
+---
+
+## 改进方案
+
+### 方案A：保持当前架构，只改Provider映射
+
+**思路**：Client保持全局（基于GH_TOKEN），但Provider配置基于模型动态选择
+
+```python
+# 在Valves中添加
+class Valves(BaseModel):
+    # ... 现有配置 ...
+    
+    # 新增：模型到Provider的映射 (JSON)
+    MODEL_PROVIDER_MAP: str = Field(
+        default="{}",
+        description='Map model IDs to BYOK providers (JSON). Example: '
+                    '{"gpt-4": {"type": "openai", "base_url": "...", "api_key": "..."}, '
+                    '"claude-3": {"type": "anthropic", "base_url": "...", "api_key": "..."}}'
+    )
+
+# 在_pipe_impl中
+def _get_provider_config(self, model_id: str, byok_active: bool) -> Optional[dict]:
+    """Get provider config for a specific model"""
+    if not byok_active:
+        return None
+    
+    try:
+        model_map = json.loads(self.valves.MODEL_PROVIDER_MAP or "{}")
+        return model_map.get(model_id)
+    except:
+        return None
+
+# 使用时
+provider_config = self._get_provider_config(real_model_id, byok_active) or {
+    "type": byok_type,
+    "base_url": byok_base_url,
+    "api_key": byok_api_key,
+    ...
+}
+```
+
+**优点**：最小改动，复用现有Client架构  
+**缺点**：多个BYOK模型仍共享一个Client（只要GH_TOKEN相同）
+
+---
+
+### 方案B：为不同BYOK提供商创建不同的Client
+
+**思路**：扩展_get_client，支持基于provider_type的多client缓存
+
+```python
+async def _get_or_create_client(
+    self, 
+    token: str,
+    provider_type: str = "github"  # "github", "openai", "anthropic"
+) -> Any:
+    """Get or create client based on token and provider type"""
+    
+    if provider_type == "github" or not provider_type:
+        # 现有逻辑
+        token_hash = hashlib.md5(token.encode()).hexdigest()
+    else:
+        # 为BYOK提供商创建不同的client
+        composite_key = f"{token}:{provider_type}"
+        token_hash = hashlib.md5(composite_key.encode()).hexdigest()
+    
+    # 从缓存获取或创建
+    ...
+```
+
+**优点**：隔离不同BYOK提供商的Client  
+**缺点**：更复杂，需要更多改动
+
+---
+
+## 建议的改进路线
+
+**优先级1（高）：方案A - 模型到Provider的映射**
+
+添加Valves配置：
+```python
+MODEL_PROVIDER_MAP: str = Field(
+    default="{}",
+    description='Map specific models to their BYOK providers (JSON format)'
+)
+```
+
+使用方式：
+```
+{
+  "gpt-4": {
+    "type": "openai",
+    "base_url": "https://api.openai.com/v1",
+    "api_key": "sk-..."
+  },
+  "claude-3": {
+    "type": "anthropic",
+    "base_url": "https://api.anthropic.com/v1",
+    "api_key": "ant-..."
+  },
+  "llama-2": {
+    "type": "openai",  # 开源模型通常使用openai兼容API
+    "base_url": "http://localhost:8000/v1",
+    "api_key": "sk-local"
+  }
+}
+```
+
+**优先级2（中）：在_build_session_config中考虑provider_config**
+
+修改infinite_session初始化，基于provider_config判断：
+```python
+def _build_session_config(..., provider_config=None):
+    # 如果使用了BYOK provider，需要特殊处理infinite_session
+    infinite_session_config = None
+    if self.valves.INFINITE_SESSION and provider_config is None:
+        # 仅官方Copilot模型启用compression
+        infinite_session_config = InfiniteSessionConfig(...)
+```
+
+**优先级3（低）：方案B - 多client缓存（长期改进）**
+
+如果需要完全隔离不同BYOK提供商的Client。
+
+---
+
+## 总结：如果你要传入BYOK client
+
+**现状**：
+- CopilotClient是基于GH_TOKEN全局缓存的
+- Provider配置是在SessionConfig级别动态设置的
+- 一个Client可以创建多个Session，每个Session用不同的Provider
+
+**改进后**：
+- 添加MODEL_PROVIDER_MAP配置
+- 对每个模型的请求，动态选择对应的Provider配置
+- 同一个Client可以为不同Provider服务不同的models
+
+**你需要做的**：
+1. 在Valves中配置MODEL_PROVIDER_MAP
+2. 在模型选择时读取这个映射
+3. 创建session时用对应的provider_config
+
+无需修改Client的创建逻辑！
--- a/plugins/debug/byok-infinite-session-research/data-flow-analysis.md
+++ b/plugins/debug/byok-infinite-session-research/data-flow-analysis.md
@@ -0,0 +1,324 @@
+# 数据流分析：SDK如何获知用户设计的数据
+
+## 当前数据流（从OpenWebUI → Pipe → SDK）
+
+```
+┌─────────────────────┐
+│   OpenWebUI UI       │
+│  (用户选择模型)      │
+└──────────┬──────────┘
+           │
+           ├─ body.model = "gpt-4"
+           ├─ body.messages = [...]
+           ├─ __metadata__.base_model_id = ?
+           ├─ __metadata__.custom_fields = ?
+           └─ __user__.settings = ?
+           │
+┌──────────▼──────────┐
+│  Pipe (github-     │
+│   copilot-sdk.py)   │
+│                     │
+│ 1. 提取model信息    │
+│ 2. 应用Valves配置  │
+│ 3. 建立SDK会话     │
+└──────────┬──────────┘
+           │
+           ├─ SessionConfig {
+           │    model: real_model_id
+           │    provider: ProviderConfig (若BYOK)
+           │    infinite_sessions: {...}
+           │    system_message: {...}
+           │    ...
+           │  }
+           │
+┌──────────▼──────────┐
+│  Copilot SDK        │
+│ (create_session)    │
+│                     │
+│ 返回:ModelInfo {    │
+│   capabilities {    │
+│    limits {         │
+│      max_context_   │
+│      window_tokens  │
+│    }               │
+│   }                │
+│ }                  │
+└─────────────────────┘
+```
+
+---
+
+## 关键问题：当前的3个瓶颈
+
+### 瓶颈1：用户数据的输入点
+
+**当前支持的输入方式：**
+
+1. **Valves配置（全局 + 用户级）**
+   ```python
+   # 全局设置（Admin）
+   Valves.BYOK_BASE_URL = "https://api.openai.com/v1"
+   Valves.BYOK_API_KEY = "sk-..."
+   
+   # 用户级覆盖
+   UserValves.BYOK_API_KEY = "sk-..." (用户自己的key)
+   UserValves.BYOK_BASE_URL = "..."
+   ```
+   
+   **问题**：无法为特定的BYOK模型设置上下文窗口大小
+
+2. **__metadata__（来自OpenWebUI）**
+   ```python
+   __metadata__ = {
+       "base_model_id": "...",
+       "custom_fields": {...},  # ← 可能包含额外信息
+       "tool_ids": [...],
+   }
+   ```
+   
+   **问题**：不清楚OpenWebUI是否支持通过metadata传递模型的上下文窗口
+
+3. **body（来自对话请求）**
+   ```python
+   body = {
+       "model": "gpt-4",
+       "messages": [...],
+       "temperature": 0.7,
+       # ← 这里能否添加自定义字段？
+   }
+   ```
+
+---
+
+### 瓶颈2：模型信息的识别和存储
+
+**当前代码** (line 5905+)：
+```python
+# 解析用户选择的模型
+request_model = body.get("model", "")  # e.g., "gpt-4"
+real_model_id = request_model
+
+# 确定实际模型ID
+base_model_id = _container_get(__metadata__, "base_model_id", "")
+
+if base_model_id:
+    resolved_id = base_model_id  # 使用元数据中的ID
+else:
+    resolved_id = request_model   # 使用用户选择的ID
+```
+
+**问题**：
+- ❌ 没有维护一个"模型元数据缓存"
+- ❌ 对相同模型的重复请求，每次都需要重新识别
+- ❌ 不能为特定模型持久化上下文窗口大小
+
+---
+
+### 瓶颈3：SDK会话配置的构建
+
+**当前实现** (line 5058-5100)：
+```python
+def _build_session_config(
+    self,
+    real_model_id,      # ← 模型ID
+    system_prompt_content,
+    is_streaming=True,
+    is_admin=False,
+    # ... 其他参数
+):
+    # 无条件地创建infinite session
+    if self.valves.INFINITE_SESSION:
+        infinite_session_config = InfiniteSessionConfig(
+            enabled=True,
+            background_compaction_threshold=self.valves.COMPACTION_THRESHOLD,  # 0.80
+            buffer_exhaustion_threshold=self.valves.BUFFER_THRESHOLD,          # 0.95
+        )
+    
+    # ❌ 这里没有查询该模型的实际上下文窗口大小
+    # ❌ 无法根据模型的真实限制调整压缩阈值
+```
+
+---
+
+## 解决方案：3个数据流改进步骤
+
+### 步骤1：添加模型元数据配置（优先级：高）
+
+在Valves中添加一个**模型元数据映射**：
+
+```python
+class Valves(BaseModel):
+    # ... 现有配置 ...
+    
+    # 新增：模型上下文窗口映射 (JSON格式)
+    MODEL_CONTEXT_WINDOWS: str = Field(
+        default="{}",  # JSON string
+        description='Model context window mapping (JSON). Example: {"gpt-4": 8192, "gpt-4-turbo": 128000, "claude-3": 200000}'
+    )
+    
+    # 新增：BYOK模型特定设置 (JSON格式)
+    BYOK_MODEL_CONFIG: str = Field(
+        default="{}",  # JSON string
+        description='BYOK-specific model configuration (JSON). Example: {"gpt-4": {"context_window": 8192, "enable_compression": true}}'
+    )
+```
+
+**如何使用**：
+```python
+# Valves中设置
+MODEL_CONTEXT_WINDOWS = '{"gpt-4": 8192, "claude-3-5-sonnet": 200000}'
+
+# Pipe中解析
+def _get_model_context_window(self, model_id: str) -> Optional[int]:
+    """从配置中获取模型的上下文窗口大小"""
+    try:
+        config = json.loads(self.valves.MODEL_CONTEXT_WINDOWS or "{}")
+        return config.get(model_id)
+    except:
+        return None
+```
+
+### 步骤2：建立模型信息缓存（优先级：中）
+
+在Pipe中维护一个模型信息缓存：
+
+```python
+class Pipe:
+    def __init__(self):
+        # ... 现有代码 ...
+        self._model_info_cache = {}  # model_id -> ModelInfo
+        self._context_window_cache = {}  # model_id -> context_window_tokens
+
+    def _cache_model_info(self, model_id: str, model_info: ModelInfo):
+        """缓存SDK返回的模型信息"""
+        self._model_info_cache[model_id] = model_info
+        if model_info.capabilities and model_info.capabilities.limits:
+            self._context_window_cache[model_id] = (
+                model_info.capabilities.limits.max_context_window_tokens
+            )
+
+    def _get_context_window(self, model_id: str) -> Optional[int]:
+        """获取模型的上下文窗口大小（优先级：SDK > Valves配置 > 默认值）"""
+        # 1. 优先从SDK缓存获取（最可靠）
+        if model_id in self._context_window_cache:
+            return self._context_window_cache[model_id]
+        
+        # 2. 其次从Valves配置获取
+        context_window = self._get_model_context_window(model_id)
+        if context_window:
+            return context_window
+        
+        # 3. 默认值（未知）
+        return None
+```
+
+### 步骤3：使用真实的上下文窗口来优化压缩策略（优先级：中）
+
+修改_build_session_config：
+
+```python
+def _build_session_config(
+    self,
+    real_model_id,
+    # ... 其他参数 ...
+    **kwargs
+):
+    # 获取模型的真实上下文窗口大小
+    actual_context_window = self._get_context_window(real_model_id)
+    
+    # 只对有明确上下文窗口的模型启用压缩
+    infinite_session_config = None
+    if self.valves.INFINITE_SESSION and actual_context_window:
+        # 现在压缩阈值有了明确的含义
+        infinite_session_config = InfiniteSessionConfig(
+            enabled=True,
+            # 80% of actual context window
+            background_compaction_threshold=self.valves.COMPACTION_THRESHOLD,
+            # 95% of actual context window
+            buffer_exhaustion_threshold=self.valves.BUFFER_THRESHOLD,
+        )
+        
+        await self._emit_debug_log(
+            f"Infinite Session: model_context={actual_context_window}tokens, "
+            f"compaction_triggers_at={int(actual_context_window * self.valves.COMPACTION_THRESHOLD)}, "
+            f"buffer_triggers_at={int(actual_context_window * self.valves.BUFFER_THRESHOLD)}",
+            __event_call__,
+        )
+    elif self.valves.INFINITE_SESSION and not actual_context_window:
+        logger.warning(
+            f"Infinite Session: Unknown context window for {real_model_id}, "
+            f"compression disabled. Set MODEL_CONTEXT_WINDOWS in Valves to enable."
+        )
+```
+
+---
+
+## 具体的配置示例
+
+### 例子1：用户配置BYOK模型的上下文窗口
+
+**Valves设置**：
+```
+MODEL_CONTEXT_WINDOWS = {
+  "gpt-4": 8192,
+  "gpt-4-turbo": 128000,
+  "gpt-4o": 128000,
+  "claude-3": 200000,
+  "claude-3.5-sonnet": 200000,
+  "llama-2-70b": 4096
+}
+```
+
+**效果**：
+- Pipe会知道"gpt-4"的上下文是8192 tokens
+- 压缩会在 ~6553 tokens (80%) 时触发
+- 缓冲会在 ~7782 tokens (95%) 时阻塞
+
+### 例子2：为特定BYOK模型启用/禁用压缩
+
+**Valves设置**：
+```
+BYOK_MODEL_CONFIG = {
+  "gpt-4": {
+    "context_window": 8192,
+    "enable_infinite_session": true,
+    "compaction_threshold": 0.75
+  },
+  "llama-2-70b": {
+    "context_window": 4096,
+    "enable_infinite_session": false  # 禁用压缩
+  }
+}
+```
+
+**Pipe逻辑**：
+```python
+# 检查模型特定的压缩设置
+def _get_compression_enabled(self, model_id: str) -> bool:
+    try:
+        config = json.loads(self.valves.BYOK_MODEL_CONFIG or "{}")
+        model_config = config.get(model_id, {})
+        return model_config.get("enable_infinite_session", self.valves.INFINITE_SESSION)
+    except:
+        return self.valves.INFINITE_SESSION
+```
+
+---
+
+## 总结：SDK如何获知用户设计的数据
+
+| 来源 | 方式 | 更新 | 示例 |
+|------|------|------|------|
+| **Valves** | 全局配置 | Admin提前设置 | `MODEL_CONTEXT_WINDOWS` JSON |
+| **SDK** | SessionConfig返回 | 每次会话创建 | `model_info.capabilities.limits` |
+| **缓存** | Pipe本地存储 | 首次获取后缓存 | `_context_window_cache` |
+| **__metadata__** | OpenWebUI传递 | 每次请求随带 | `base_model_id`, custom fields |
+
+**流程**：
+1. 用户在Valves中配置 `MODEL_CONTEXT_WINDOWS`
+2. Pipe在session创建时获取SDK返回的model_info
+3. Pipe缓存上下文窗口大小
+4. Pipe根据真实窗口大小调整infinite session的阈值
+5. SDK使用正确的压缩策略
+
+这样，**SDK完全知道用户设计的数据**，而无需任何修改SDK本身。
--- a/plugins/debug/byok-infinite-session-research/sdk-context-limits.md
+++ b/plugins/debug/byok-infinite-session-research/sdk-context-limits.md
@@ -0,0 +1,163 @@
+# SDK中的上下文限制信息
+
+## SDK类型定义
+
+### 1. ModelLimits（copilot-sdk/python/copilot/types.py, line 761-789）
+
+```python
+@dataclass
+class ModelLimits:
+    """Model limits"""
+    
+    max_prompt_tokens: int | None = None           # 最大提示符tokens
+    max_context_window_tokens: int | None = None   # 最大上下文窗口tokens
+    vision: ModelVisionLimits | None = None        # 视觉相关限制
+```
+
+### 2. ModelCapabilities（line 817-843）
+
+```python
+@dataclass
+class ModelCapabilities:
+    """Model capabilities and limits"""
+    
+    supports: ModelSupports      # 支持的功能（vision, reasoning_effort等）
+    limits: ModelLimits          # 上下文和token限制
+```
+
+### 3. ModelInfo（line 889-949）
+
+```python
+@dataclass
+class ModelInfo:
+    """Information about an available model"""
+    
+    id: str
+    name: str
+    capabilities: ModelCapabilities  # ← 包含limits信息
+    policy: ModelPolicy | None = None
+    billing: ModelBilling | None = None
+    supported_reasoning_efforts: list[str] | None = None
+    default_reasoning_effort: str | None = None
+```
+
+---
+
+## 关键发现
+
+### ✅ SDK提供的信息
+- `model.capabilities.limits.max_context_window_tokens` - 模型的上下文窗口大小
+- `model.capabilities.limits.max_prompt_tokens` - 最大提示符tokens
+
+### ❌ OpenWebUI Pipe中的问题
+**目前Pipe完全没有使用这些信息！**
+
+在 `github_copilot_sdk.py` 中搜索 `max_context_window`, `capabilities`, `limits` 等，结果为空。
+
+---
+
+## 这对BYOK意味着什么？
+
+### 问题1: BYOK模型的上下文限制未知
+```python
+# BYOK模型的capabilities来自哪里？
+if is_byok_model:
+    # ❓ BYOK模型没有能力信息返回吗？
+    # ❓ 如何知道它的max_context_window_tokens？
+    pass
+```
+
+### 问题2: Infinite Session的阈值是硬编码的
+```python
+COMPACTION_THRESHOLD: float = Field(
+    default=0.80,  # 80%时触发后台压缩
+    description="Background compaction threshold (0.0-1.0)"
+)
+BUFFER_THRESHOLD: float = Field(
+    default=0.95,  # 95%时阻塞直到压缩完成
+    description="Buffer exhaustion threshold (0.0-1.0)"
+)
+
+# 但是 0.80 和 0.95 是什么的百分比？
+# - 是模型的max_context_window_tokens吗？
+# - 还是固定的某个值？
+# - BYOK模型的上下文窗口可能完全不同！
+```
+
+---
+
+## 改进方向
+
+### 方案A: 利用SDK提供的模型限制信息
+```python
+# 在获取模型信息时，保存capabilities
+self._model_capabilities = model_info.capabilities
+
+# 在初始化infinite session时，使用实际的上下文窗口
+if model_info.capabilities.limits.max_context_window_tokens:
+    actual_context_window = model_info.capabilities.limits.max_context_window_tokens
+    
+    # 动态调整压缩阈值而不是固定值
+    compaction_threshold = self.valves.COMPACTION_THRESHOLD
+    buffer_threshold = self.valves.BUFFER_THRESHOLD
+    # 这些现在有了明确的含义：是模型实际上下文窗口大小的百分比
+```
+
+### 方案B: BYOK模型的显式配置
+如果BYOK模型不提供capabilities信息，需要用户手动设置：
+
+```python
+class Valves(BaseModel):
+    # ... existing config ...
+    
+    BYOK_CONTEXT_WINDOW: int = Field(
+        default=0,  # 0表示自动检测或禁用compression
+        description="Manual context window size for BYOK models (tokens). 0=auto-detect or disabled"
+    )
+    
+    BYOK_INFINITE_SESSION: bool = Field(
+        default=False,
+        description="Enable infinite sessions for BYOK models (requires BYOK_CONTEXT_WINDOW > 0)"
+    )
+```
+
+### 方案C: 从会话反馈中学习（最可靠）
+```python
+# infinite session压缩完成时，获取实际的context window使用情况
+# (需要SDK或CLI提供反馈)
+```
+
+---
+
+## 建议实施路线
+
+**优先级1（必须）**: 检查BYOK模式下是否能获取capabilities
+```python
+# 测试代码
+if is_byok_model:
+    # 发送一个测试请求，看是否能从响应中获取model capabilities
+    session = await client.create_session(config=session_config)
+    # session是否包含model info？
+    # 能否访问session.model_capabilities？
+```
+
+**优先级2（重要）**: 如果BYOK没有capabilities，添加手动配置
+```python
+# 在BYOK配置中添加context_window字段
+BYOK_CONTEXT_WINDOW: int = Field(default=0)
+```
+
+**优先级3（长期）**: 利用真实的上下文窗口来调整压缩策略
+```python
+# 而不是单纯的百分比，使用实际的token数
+```
+
+---
+
+## 关键问题列表
+
+1. [ ] BYOK模型在create_session后能否获取capabilities信息？
+2. [ ] 如果能获取，max_context_window_tokens的值是否准确？
+3. [ ] 如果不能获取，是否需要用户手动提供？
+4. [ ] 当前的0.80/0.95阈值是否对所有模型都适用？
+5. [ ] 不同的BYOK提供商(OpenAI vs Anthropic)的上下文窗口差异有多大？