- Fix issue where mcp tool filtering logic (function_name_filter_list) in admin backend caused all tools to be hidden due to ID prefix mismatch - Force enable web_search tool for Copilot Agent regardless of UI toggles, providing full autonomy for search-related intents - Updated README and version to v0.9.1
5.4 KiB
5.4 KiB
🧭 Agents Stability & Friendliness Guide
This guide focuses on how to improve reliability and user experience of agents in github_copilot_sdk.py.
1) Goals
- Reduce avoidable failures (timeouts, tool-call dead ends, invalid outputs).
- Keep responses predictable under stress (large context, unstable upstream, partial tool failures).
- Make interaction friendly (clear progress, clarification before risky actions, graceful fallback).
- Preserve backwards compatibility while introducing stronger defaults.
2) Stability model (4 layers)
Layer A — Input safety
- Validate essential runtime context early (user/chat/model/tool availability).
- Use strict parsing for JSON-like user/task config (never trust raw free text).
- Add guardrails for unsupported mode combinations (e.g., no tools + tool-required task).
Implementation hints
- Add preflight validator before
create_session. - Return fast-fail structured errors with recovery actions.
Layer B — Session safety
- Use profile-driven defaults (
model,reasoning_effort,infinite_sessionsthresholds). - Auto-fallback to safe profile when unknown profile is requested.
- Isolate each chat in a deterministic workspace path.
Implementation hints
- Add
AGENT_PROFILE+ fallback todefault. - Keep
infinite_sessionsenabled by default for long tasks.
Layer C — Tool-call safety
- Add
on_pre_tool_useto validate and sanitize args. - Add denylist/allowlist checks for dangerous operations.
- Add timeout budget per tool class (file/network/shell).
Implementation hints
- Keep current
on_post_tool_usebehavior. - Extend hooks gradually:
on_pre_tool_usefirst, thenon_error_occurred.
Layer D — Recovery safety
- Retry only idempotent operations with capped attempts.
- Distinguish recoverable vs non-recoverable failures.
- Add deterministic fallback path (summary answer + explicit limitation).
Implementation hints
- Retry policy table by event type.
- Emit "what succeeded / what failed / what to do next" blocks.
3) Friendliness model (UX contract)
A. Clarification first for ambiguity
Use on_user_input_request for:
- Missing constraints (scope, target path, output format)
- High-risk actions (delete/migrate/overwrite)
- Contradictory instructions
Rule: ask once with concise choices; avoid repeated back-and-forth.
B. Progress visibility
Emit status in major phases:
- Context check
- Planning/analysis
- Tool execution
- Verification
- Final result
Rule: no silent waits > 8 seconds.
C. Friendly failure style
Every failure should include:
- what failed
- why (short)
- what was already done
- next recommended action
D. Output readability
Standardize final response blocks:
OutcomeChangesValidationLimitationsNext Step
4) High-value features to add (priority)
P0 (immediate)
on_user_input_requesthandler with default answer strategyon_pre_tool_usefor argument checks + risk gates- Structured progress events (phase-based)
P1 (short-term)
- Error taxonomy + retry policy (
network,provider,tool,validation) - Profile-based session factory with safe fallback
- Auto quality gate for final output sections
P2 (mid-term)
- Transport flexibility (
cli_url,use_stdio,port) for deployment resilience - Azure provider path completion
- Foreground session lifecycle support for advanced multi-session control
5) Suggested valves for stability/friendliness
AGENT_PROFILE:default | builder | analyst | reviewerENABLE_USER_INPUT_REQUEST:boolDEFAULT_USER_INPUT_ANSWER:strTOOL_CALL_TIMEOUT_SECONDS:intMAX_RETRY_ATTEMPTS:intENABLE_SAFE_TOOL_GUARD:boolENABLE_PHASE_STATUS_EVENTS:boolENABLE_FRIENDLY_FAILURE_TEMPLATE:bool
6) Failure playbooks (practical)
Playbook A — Provider timeout
- Retry once if request is idempotent.
- Downgrade reasoning effort if timeout persists.
- Return concise fallback and preserve partial result.
Playbook B — Tool argument mismatch
- Block execution in
on_pre_tool_use. - Ask user one clarification question if recoverable.
- Otherwise skip tool and explain impact.
Playbook C — Large output overflow
- Save large output to workspace file.
- Return file path + short summary.
- Avoid flooding chat with huge payload.
Playbook D — Conflicting user instructions
- Surface conflict explicitly.
- Offer 2-3 fixed choices.
- Continue only after user selection.
7) Metrics to track
- Session success rate
- Tool-call success rate
- Average recovery rate after first failure
- Clarification rate vs hallucination rate
- Mean time to first useful output
- User follow-up dissatisfaction signals (e.g., “not what I asked”)
8) Minimal rollout plan
- Add
on_user_input_request+on_pre_tool_use(feature-gated). - Add phase status events and friendly failure template.
- Add retry policy + error taxonomy.
- Add profile fallback and deployment transport options.
- Observe metrics for 1-2 weeks, then tighten defaults.
9) Quick acceptance checklist
- Agent asks clarification only when necessary.
- No long silent period without status updates.
- Failures always include next actionable step.
- Unknown profile/provider config does not crash session.
- Large outputs are safely redirected to file.
- Final response follows a stable structure.