Files
Fu-Jie_openwebui-extensions/plugins/debug/github-copilot-sdk/guides/WORKFLOW.md

27 KiB

GitHub Copilot SDK Integration Workflow

Author: Fu-Jie
Version: 0.2.3
Last Updated: 2026-01-27


Table of Contents

  1. Architecture Overview
  2. Request Processing Flow
  3. Session Management
  4. Streaming Response Handling
  5. Event Processing Mechanism
  6. Tool Execution Flow
  7. System Prompt Extraction
  8. Configuration Parameters
  9. Key Functions Reference

Architecture Overview

Component Diagram

┌─────────────────────────────────────────────────────────────┐
│                       OpenWebUI                              │
│  ┌───────────────────────────────────────────────────────┐  │
│  │              Pipe Interface (Entry Point)             │  │
│  └─────────────────────┬─────────────────────────────────┘  │
│                        │                                     │
│                        ▼                                     │
│  ┌───────────────────────────────────────────────────────┐  │
│  │           _pipe_impl (Main Logic)                     │  │
│  │  ┌──────────────────────────────────────────────────┐ │  │
│  │  │ 1. Environment Setup (_setup_env)               │ │  │
│  │  │ 2. Model Selection (request_model parsing)      │ │  │
│  │  │ 3. Chat Context Extraction                       │ │  │
│  │  │ 4. System Prompt Extraction                      │ │  │
│  │  │ 5. Session Management (create/resume)            │ │  │
│  │  │ 6. Streaming/Non-streaming Response              │ │  │
│  │  └──────────────────────────────────────────────────┘ │  │
│  └─────────────────────┬─────────────────────────────────┘  │
│                        │                                     │
│                        ▼                                     │
│  ┌───────────────────────────────────────────────────────┐  │
│  │           GitHub Copilot Client                       │  │
│  │  ┌──────────────────────────────────────────────────┐ │  │
│  │  │ • CopilotClient (SDK instance)                   │ │  │
│  │  │ • Session (conversation context)                 │ │  │
│  │  │ • Event Stream (async events)                    │ │  │
│  │  └──────────────────────────────────────────────────┘ │  │
│  └─────────────────────┬─────────────────────────────────┘  │
│                        │                                     │
└────────────────────────┼─────────────────────────────────────┘
                         ▼
              ┌──────────────────────┐
              │  Copilot CLI Process │
              │  (Backend Agent)     │
              └──────────────────────┘

Key Components

  1. Pipe Interface: OpenWebUI's standard entry point
  2. Environment Manager: CLI setup, token validation, environment variables
  3. Session Manager: Persistent conversation state with automatic compaction
  4. Event Processor: Asynchronous streaming event handler
  5. Tool System: Custom tool registration and execution
  6. Debug Logger: Frontend console logging for troubleshooting

Request Processing Flow

Complete Request Lifecycle

graph TD
    A[OpenWebUI Request] --> B[pipe Entry Point]
    B --> C[_pipe_impl]
    C --> D{Setup Environment}
    D --> E[Parse Model ID]
    E --> F[Extract Chat Context]
    F --> G[Extract System Prompt]
    G --> H{Session Exists?}
    H -->|Yes| I[Resume Session]
    H -->|No| J[Create New Session]
    I --> K[Initialize Tools]
    J --> K
    K --> L[Process Images]
    L --> M{Streaming Mode?}
    M -->|Yes| N[stream_response]
    M -->|No| O[send_and_wait]
    N --> P[Async Event Stream]
    O --> Q[Direct Response]
    P --> R[Return to OpenWebUI]
    Q --> R

Step-by-Step Breakdown

1. Environment Setup (_setup_env)

def _setup_env(self, __event_call__=None):
    """
    Priority:
    1. Check VALVES.CLI_PATH
    2. Search system PATH
    3. Auto-install via curl (if not found)
    4. Set GH_TOKEN environment variables
    """

Actions:

  • Locate Copilot CLI binary
  • Set COPILOT_CLI_PATH environment variable
  • Configure GH_TOKEN for authentication
  • Apply custom environment variables

2. Model Selection

# Input: body["model"] = "copilotsdk-claude-sonnet-4.5"
request_model = body.get("model", "")
if request_model.startswith(f"{self.id}-"):
    real_model_id = request_model[len(f"{self.id}-"):]  # "claude-sonnet-4.5"

3. Chat Context Extraction (_get_chat_context)

# Priority order for chat_id:
# 1. __metadata__ (most reliable)
# 2. body["chat_id"]
# 3. body["metadata"]["chat_id"]
chat_ctx = self._get_chat_context(body, __metadata__, __event_call__)
chat_id = chat_ctx.get("chat_id")

4. System Prompt Extraction (_extract_system_prompt)

Multi-source fallback strategy:

  1. metadata.model.params.system
  2. Model database lookup (by model_id)
  3. body.params.system
  4. Messages with role="system"

5. Session Creation/Resumption

New Session:

session_config = SessionConfig(
    session_id=chat_id,
    model=real_model_id,
    streaming=is_streaming,
    tools=custom_tools,
    system_message={"mode": "append", "content": system_prompt_content},
    infinite_sessions=InfiniteSessionConfig(
        enabled=True,
        background_compaction_threshold=0.8,
        buffer_exhaustion_threshold=0.95
    )
)
session = await client.create_session(config=session_config)

Resume Session:

try:
    session = await client.resume_session(chat_id)
    # Session state preserved: history, tools, workspace
except Exception:
    # Fallback to creating new session

Session Management

Infinite Sessions Architecture

┌─────────────────────────────────────────────────────────┐
│              Session Lifecycle                          │
│                                                         │
│  ┌──────────┐  create   ┌──────────┐  resume  ┌───────┴───┐
│  │ Chat ID  │─────────▶ │ Session  │ ◀────────│  OpenWebUI │
│  └──────────┘           │  State   │          └───────────┘
│                         └─────┬────┘                       │
│                               │                            │
│                               ▼                            │
│  ┌─────────────────────────────────────────────────────┐  │
│  │          Context Window Management                  │  │
│  │  ┌──────────────────────────────────────────────┐  │  │
│  │  │ Messages [user, assistant, tool_results...]  │  │  │
│  │  │ Token Usage: ████████████░░░░ (80%)          │  │  │
│  │  └──────────────────────────────────────────────┘  │  │
│  │                      │                              │  │
│  │                      ▼                              │  │
│  │  ┌──────────────────────────────────────────────┐  │  │
│  │  │  Threshold Reached (0.8)                     │  │  │
│  │  │  → Background Compaction Triggered           │  │  │
│  │  └──────────────────────────────────────────────┘  │  │
│  │                      │                              │  │
│  │                      ▼                              │  │
│  │  ┌──────────────────────────────────────────────┐  │  │
│  │  │  Compacted Summary + Recent Messages         │  │  │
│  │  │  Token Usage: ██████░░░░░░░░░░░ (40%)        │  │  │
│  │  └──────────────────────────────────────────────┘  │  │
│  └─────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Configuration Parameters

InfiniteSessionConfig(
    enabled=True,                              # Enable infinite sessions
    background_compaction_threshold=0.8,       # Start compaction at 80% token usage
    buffer_exhaustion_threshold=0.95           # Emergency threshold at 95%
)

Behavior:

  • < 80%: Normal operation, no compaction
  • 80-95%: Background compaction (summarize older messages)
  • > 95%: Force compaction before next request

Streaming Response Handling

Event-Driven Architecture

async def stream_response(
    self, client, session, send_payload, init_message: str = "", __event_call__=None
) -> AsyncGenerator:
    """
    Asynchronous event processing with queue-based buffering.
    
    Flow:
    1. Start async send task
    2. Register event handler
    3. Process events via queue
    4. Yield chunks to OpenWebUI
    5. Clean up resources
    """

Event Processing Pipeline

┌────────────────────────────────────────────────────────────┐
│              Copilot SDK Event Stream                      │
└────────────────────┬───────────────────────────────────────┘
                     │
                     ▼
        ┌────────────────────────┐
        │  Event Handler         │
        │  (Sync Callback)       │
        └────────┬───────────────┘
                 │
                 ▼
        ┌────────────────────────┐
        │  Async Queue           │
        │  (Thread-safe)         │
        └────────┬───────────────┘
                 │
                 ▼
        ┌────────────────────────┐
        │  Consumer Loop         │
        │  (async for)           │
        └────────┬───────────────┘
                 │
                 ▼
        ┌────────────────────────┐
        │  yield to OpenWebUI    │
        └────────────────────────┘

State Management During Streaming

state = {
    "thinking_started": False,   # <think> tags opened
    "content_sent": False        # Main content has started
}
active_tools = {}  # Track concurrent tool executions

State Transitions:

  1. reasoning_delta arrives → thinking_started = True → Output: <think>\n{reasoning}
  2. message_delta arrives → Close </think> if open → content_sent = True → Output: {content}
  3. tool.execution_start → Output tool indicator (inside/outside <think>)
  4. session.complete → Finalize stream

Event Processing Mechanism

Event Type Reference

Following official SDK patterns (from copilot.SessionEventType):

Event Type Description Key Data Fields Handler Action
assistant.message_delta Main content streaming delta_content Yield text chunk
assistant.reasoning_delta Chain-of-thought delta_content Wrap in <think> tags
tool.execution_start Tool call initiated name, tool_call_id Display tool indicator
tool.execution_complete Tool finished result.content Show completion status
session.compaction_start Context compaction begins - Log debug info
session.compaction_complete Compaction done - Log debug info
session.error Error occurred error, message Emit error notification

Event Handler Implementation

def handler(event):
    """Process streaming events following official SDK patterns."""
    event_type = get_event_type(event)  # Handle enum/string types
    
    # Extract data using safe_get_data_attr (handles dict/object)
    if event_type == "assistant.message_delta":
        delta = safe_get_data_attr(event, "delta_content")
        if delta:
            queue.put_nowait(delta)  # Thread-safe enqueue

Official SDK Pattern Compliance

def safe_get_data_attr(event, attr: str, default=None):
    """
    Official pattern: event.data.delta_content
    Handles both dict and object access patterns.
    """
    if not hasattr(event, "data") or event.data is None:
        return default
    
    data = event.data
    
    # Dict access (JSON-like)
    if isinstance(data, dict):
        return data.get(attr, default)
    
    # Object attribute (Python SDK)
    return getattr(data, attr, default)

Tool Execution Flow

Tool Registration

# 1. Define tool at module level
@define_tool(description="Generate a random integer within a specified range.")
async def generate_random_number(params: RandomNumberParams) -> str:
    number = random.randint(params.min, params.max)
    return f"Generated random number: {number}"

# 2. Register in _initialize_custom_tools
def _initialize_custom_tools(self):
    if not self.valves.ENABLE_TOOLS:
        return []
    
    all_tools = {
        "generate_random_number": generate_random_number,
    }
    
    # Filter based on AVAILABLE_TOOLS valve
    if self.valves.AVAILABLE_TOOLS == "all":
        return list(all_tools.values())
    
    enabled = [t.strip() for t in self.valves.AVAILABLE_TOOLS.split(",")]
    return [all_tools[name] for name in enabled if name in all_tools]

Tool Execution Timeline

User Message: "Generate a random number between 1 and 100"
     │
     ▼
Model Decision: Use tool `generate_random_number`
     │
     ▼
Event: tool.execution_start
     │  → Display: "🔧 Running Tool: generate_random_number"
     ▼
Tool Function Execution (async)
     │
     ▼
Event: tool.execution_complete
     │  → Result: "Generated random number: 42"
     │  → Display: "✅ Tool Completed: 42"
     ▼
Model generates response using tool result
     │
     ▼
Event: assistant.message_delta
     │  → "I generated the number 42 for you."
     ▼
Stream Complete

Visual Indicators

Before Content:

<think>
Running Tool: generate_random_number...
Tool `generate_random_number` Completed. Result: 42
</think>

I generated the number 42 for you.

After Content Started:

The number is

> 🔧 **Running Tool**: `generate_random_number`

> ✅ **Tool Completed**: 42

actually 42.

System Prompt Extraction

Multi-Source Priority System

async def _extract_system_prompt(self, body, messages, request_model, real_model_id):
    """
    Priority order:
    1. metadata.model.params.system (highest)
    2. Model database lookup
    3. body.params.system
    4. messages[role="system"] (fallback)
    """

Source 1: Metadata Model Params

# OpenWebUI injects model configuration
metadata = body.get("metadata", {})
meta_model = metadata.get("model", {})
meta_params = meta_model.get("params", {})
system_prompt = meta_params.get("system")  # Priority 1

Source 2: Model Database

from open_webui.models.models import Models

# Try multiple model ID variations
model_ids_to_try = [
    request_model,                    # "copilotsdk-claude-sonnet-4.5"
    request_model.removeprefix(...),  # "claude-sonnet-4.5"
    real_model_id,                    # From valves
]

for mid in model_ids_to_try:
    model_record = Models.get_model_by_id(mid)
    if model_record and hasattr(model_record, "params"):
        system_prompt = model_record.params.get("system")
        if system_prompt:
            break

Source 3: Body Params

body_params = body.get("params", {})
system_prompt = body_params.get("system")

Source 4: System Message

for msg in messages:
    if msg.get("role") == "system":
        system_prompt = self._extract_text_from_content(msg.get("content"))
        break

Configuration in SessionConfig

system_message_config = {
    "mode": "append",           # Append to conversation context
    "content": system_prompt_content
}

session_config = SessionConfig(
    system_message=system_message_config,
    # ... other params
)

Configuration Parameters

Valve Definitions

Parameter Type Default Description
GH_TOKEN str "" GitHub Fine-grained Token (requires 'Copilot Requests' permission)
MODEL_ID str "claude-sonnet-4.5" Default model when dynamic fetching fails
CLI_PATH str "/usr/local/bin/copilot" Path to Copilot CLI binary
DEBUG bool False Enable frontend console debug logging
LOG_LEVEL str "error" CLI log level: none, error, warning, info, debug, all
SHOW_THINKING bool True Display model reasoning in <think> tags
SHOW_WORKSPACE_INFO bool True Show session workspace path in debug mode
EXCLUDE_KEYWORDS str "" Comma-separated keywords to exclude models
WORKSPACE_DIR str "" Restricted workspace directory (empty = process cwd)
INFINITE_SESSION bool True Enable automatic context compaction
COMPACTION_THRESHOLD float 0.8 Background compaction at 80% token usage
BUFFER_THRESHOLD float 0.95 Emergency threshold at 95%
TIMEOUT int 300 Stream chunk timeout (seconds)
CUSTOM_ENV_VARS str "" JSON string of custom environment variables
ENABLE_TOOLS bool False Enable custom tool system
AVAILABLE_TOOLS str "all" Available tools: "all" or comma-separated list

Environment Variables

# Set by _setup_env
export COPILOT_CLI_PATH="/usr/local/bin/copilot"
export GH_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"

# Custom variables (from CUSTOM_ENV_VARS valve)
export CUSTOM_VAR_1="value1"
export CUSTOM_VAR_2="value2"

Key Functions Reference

Entry Points

pipe(body, __metadata__, __event_emitter__, __event_call__)

  • Purpose: OpenWebUI stable entry point
  • Returns: Delegates to _pipe_impl

_pipe_impl(body, __metadata__, __event_emitter__, __event_call__)

  • Purpose: Main request processing logic
  • Flow: Setup → Extract → Session → Response
  • Returns: str (non-streaming) or AsyncGenerator (streaming)

pipes()

  • Purpose: Dynamic model list fetching
  • Returns: List of available models with multiplier info
  • Caching: Uses _model_cache to avoid repeated API calls

Session Management

_build_session_config(chat_id, real_model_id, custom_tools, system_prompt_content, is_streaming)

  • Purpose: Construct SessionConfig object
  • Returns: SessionConfig with infinite sessions and tools

_get_chat_context(body, __metadata__, __event_call__)

  • Purpose: Extract chat_id with priority fallback
  • Returns: {"chat_id": str}

Streaming

stream_response(client, session, send_payload, init_message, __event_call__)

  • Purpose: Async streaming event processor
  • Yields: Text chunks to OpenWebUI
  • Resources: Auto-cleanup client and session

handler(event)

  • Purpose: Sync event callback (inside stream_response)
  • Action: Parse event → Enqueue chunks → Update state

Helpers

_emit_debug_log(message, __event_call__)

  • Purpose: Send debug logs to frontend console
  • Condition: Only when DEBUG=True

_setup_env(__event_call__)

  • Purpose: Locate CLI, set environment variables
  • Side Effects: Modifies os.environ

_extract_system_prompt(body, messages, request_model, real_model_id, __event_call__)

  • Purpose: Multi-source system prompt extraction
  • Returns: (system_prompt_content, source_name)

_process_images(messages, __event_call__)

  • Purpose: Extract text and images from multimodal messages
  • Returns: (text_content, attachments_list)

_initialize_custom_tools()

  • Purpose: Register and filter custom tools
  • Returns: List of tool functions

Utility Functions

get_event_type(event) -> str

  • Purpose: Extract event type string from enum/string
  • Handles: SessionEventType enum → .value extraction

safe_get_data_attr(event, attr: str, default=None)

  • Purpose: Safe attribute extraction from event.data
  • Handles: Both dict access and object attribute access

Troubleshooting Guide

Enable Debug Mode

# In OpenWebUI Valves UI:
DEBUG = True
SHOW_WORKSPACE_INFO = True
LOG_LEVEL = "debug"

Debug Output Location

Frontend Console:

// Open browser DevTools (F12)
// Look for logs with prefix: [Copilot Pipe]
console.debug("[Copilot Pipe] Extracted ChatID: abc123 (Source: __metadata__)")

Backend Logs:

# Python logging output
logger.debug(f"[Copilot Pipe] Session resumed: {chat_id}")

Common Issues

1. Session Not Resuming

Symptom: New session created every request
Causes:

  • chat_id not extracted correctly
  • Session expired on Copilot side
  • INFINITE_SESSION=False (sessions not persistent)

Solution:

# Check debug logs for:
"Extracted ChatID: <id> (Source: ...)"
"Session <id> not found (...), creating new."

2. System Prompt Not Applied

Symptom: Model ignores configured system prompt
Causes:

  • Not found in any of 4 sources
  • Session resumed (system prompt only set on creation)

Solution:

# Check debug logs for:
"Extracted system prompt from <source> (length: X)"
"Configured system message (mode: append)"

3. Tools Not Available

Symptom: Model can't use custom tools
Causes:

  • ENABLE_TOOLS=False
  • Tool not registered in _initialize_custom_tools
  • Wrong AVAILABLE_TOOLS filter

Solution:

# Check debug logs for:
"Enabled X custom tools: ['tool1', 'tool2']"

Performance Optimization

Model List Caching

# First request: Fetch from API
models = await client.list_models()
self._model_cache = [...]  # Cache result

# Subsequent requests: Use cache
if self._model_cache:
    return self._model_cache

Session Persistence

Impact: Eliminates redundant model initialization on every request

# Without session:
# Each request: Initialize model → Load context → Generate → Discard

# With session (chat_id):
# First request: Initialize model → Load context → Generate → Save
# Later: Resume → Generate (instant)

Streaming vs Non-streaming

Streaming:

  • Lower perceived latency (first token faster)
  • Better UX for long responses
  • Resource cleanup via generator exit

Non-streaming:

  • Simpler error handling
  • Atomic response (no partial output)
  • Use for short responses

Security Considerations

Token Protection

# ❌ Never log tokens
logger.debug(f"Token: {self.valves.GH_TOKEN}")  # DON'T DO THIS

# ✅ Mask sensitive data
logger.debug(f"Token configured: {'*' * 10}")

Workspace Isolation

# Set WORKSPACE_DIR to restrict file access
WORKSPACE_DIR = "/safe/sandbox/path"

# Copilot CLI respects this directory
client_config["cwd"] = WORKSPACE_DIR

Input Validation

# Validate chat_id format
if chat_id and not re.match(r'^[a-zA-Z0-9_-]+$', chat_id):
    logger.warning(f"Invalid chat_id format: {chat_id}")
    chat_id = None

Future Enhancements

Planned Features

  1. Multi-Session Management: Support multiple parallel sessions per user
  2. Session Analytics: Track token usage, compaction frequency
  3. Tool Result Caching: Avoid redundant tool calls
  4. Custom Event Filters: User-configurable event handling
  5. Workspace Templates: Pre-configured workspace environments
  6. Streaming Abort: Graceful cancellation of long-running requests

API Evolution

Monitoring Copilot SDK updates for:

  • New event types (e.g., assistant.function_call)
  • Enhanced tool capabilities
  • Improved session serialization

References


License: MIT
Maintainer: Fu-Jie (@Fu-Jie)