Files

fujie c6279240b9 fix(pipes): fix mcp tool filtering and force-enable autonomous web search

- Fix issue where mcp tool filtering logic (function_name_filter_list) in admin backend caused all tools to be hidden due to ID prefix mismatch
- Force enable web_search tool for Copilot Agent regardless of UI toggles, providing full autonomy for search-related intents
- Updated README and version to v0.9.1

2026-03-04 00:11:28 +08:00

5.4 KiB

Raw Blame History

🧭 Agents Stability & Friendliness Guide

This guide focuses on how to improve reliability and user experience of agents in github_copilot_sdk.py.

1) Goals

Reduce avoidable failures (timeouts, tool-call dead ends, invalid outputs).
Keep responses predictable under stress (large context, unstable upstream, partial tool failures).
Make interaction friendly (clear progress, clarification before risky actions, graceful fallback).
Preserve backwards compatibility while introducing stronger defaults.

2) Stability model (4 layers)

Layer A — Input safety

Validate essential runtime context early (user/chat/model/tool availability).
Use strict parsing for JSON-like user/task config (never trust raw free text).
Add guardrails for unsupported mode combinations (e.g., no tools + tool-required task).

Implementation hints

Add preflight validator before create_session.
Return fast-fail structured errors with recovery actions.

Layer B — Session safety

Use profile-driven defaults (model, reasoning_effort, infinite_sessions thresholds).
Auto-fallback to safe profile when unknown profile is requested.
Isolate each chat in a deterministic workspace path.

Implementation hints

Add AGENT_PROFILE + fallback to default.
Keep infinite_sessions enabled by default for long tasks.

Layer C — Tool-call safety

Add on_pre_tool_use to validate and sanitize args.
Add denylist/allowlist checks for dangerous operations.
Add timeout budget per tool class (file/network/shell).

Implementation hints

Keep current on_post_tool_use behavior.
Extend hooks gradually: on_pre_tool_use first, then on_error_occurred.

Layer D — Recovery safety

Retry only idempotent operations with capped attempts.
Distinguish recoverable vs non-recoverable failures.
Add deterministic fallback path (summary answer + explicit limitation).

Implementation hints

Retry policy table by event type.
Emit "what succeeded / what failed / what to do next" blocks.

3) Friendliness model (UX contract)

A. Clarification first for ambiguity

Use on_user_input_request for:

Missing constraints (scope, target path, output format)
High-risk actions (delete/migrate/overwrite)
Contradictory instructions

Rule: ask once with concise choices; avoid repeated back-and-forth.

B. Progress visibility

Emit status in major phases:

Context check
Planning/analysis
Tool execution
Verification
Final result

Rule: no silent waits > 8 seconds.

C. Friendly failure style

Every failure should include:

what failed
why (short)
what was already done
next recommended action

D. Output readability

Standardize final response blocks:

Outcome
Changes
Validation
Limitations
Next Step

4) High-value features to add (priority)

P0 (immediate)

on_user_input_request handler with default answer strategy
on_pre_tool_use for argument checks + risk gates
Structured progress events (phase-based)

P1 (short-term)

Error taxonomy + retry policy (network, provider, tool, validation)
Profile-based session factory with safe fallback
Auto quality gate for final output sections

P2 (mid-term)

Transport flexibility (cli_url, use_stdio, port) for deployment resilience
Azure provider path completion
Foreground session lifecycle support for advanced multi-session control

5) Suggested valves for stability/friendliness

AGENT_PROFILE: default | builder | analyst | reviewer
ENABLE_USER_INPUT_REQUEST: bool
DEFAULT_USER_INPUT_ANSWER: str
TOOL_CALL_TIMEOUT_SECONDS: int
MAX_RETRY_ATTEMPTS: int
ENABLE_SAFE_TOOL_GUARD: bool
ENABLE_PHASE_STATUS_EVENTS: bool
ENABLE_FRIENDLY_FAILURE_TEMPLATE: bool

6) Failure playbooks (practical)

Playbook A — Provider timeout

Retry once if request is idempotent.
Downgrade reasoning effort if timeout persists.
Return concise fallback and preserve partial result.

Playbook B — Tool argument mismatch

Block execution in on_pre_tool_use.
Ask user one clarification question if recoverable.
Otherwise skip tool and explain impact.

Playbook C — Large output overflow

Save large output to workspace file.
Return file path + short summary.
Avoid flooding chat with huge payload.

Playbook D — Conflicting user instructions

Surface conflict explicitly.
Offer 2-3 fixed choices.
Continue only after user selection.

7) Metrics to track

Session success rate
Tool-call success rate
Average recovery rate after first failure
Clarification rate vs hallucination rate
Mean time to first useful output
User follow-up dissatisfaction signals (e.g., “not what I asked”)

8) Minimal rollout plan

Add on_user_input_request + on_pre_tool_use (feature-gated).
Add phase status events and friendly failure template.
Add retry policy + error taxonomy.
Add profile fallback and deployment transport options.
Observe metrics for 1-2 weeks, then tighten defaults.

9) Quick acceptance checklist

Agent asks clarification only when necessary.
No long silent period without status updates.
Failures always include next actionable step.
Unknown profile/provider config does not crash session.
Large outputs are safely redirected to file.
Final response follows a stable structure.

5.4 KiB Raw Blame History

🧭 Agents Stability & Friendliness Guide

1) Goals

2) Stability model (4 layers)

Layer A — Input safety

Layer B — Session safety

Layer C — Tool-call safety

Layer D — Recovery safety

3) Friendliness model (UX contract)

A. Clarification first for ambiguity

B. Progress visibility

C. Friendly failure style

D. Output readability

4) High-value features to add (priority)

P0 (immediate)

P1 (short-term)

P2 (mid-term)

5) Suggested valves for stability/friendliness

6) Failure playbooks (practical)

Playbook A — Provider timeout

Playbook B — Tool argument mismatch

Playbook C — Large output overflow

Playbook D — Conflicting user instructions

7) Metrics to track

8) Minimal rollout plan

9) Quick acceptance checklist

5.4 KiB

Raw Blame History