fix(pipes): fix mcp tool filtering and force-enable autonomous web search
- Fix issue where mcp tool filtering logic (function_name_filter_list) in admin backend caused all tools to be hidden due to ID prefix mismatch - Force enable web_search tool for Copilot Agent regardless of UI toggles, providing full autonomy for search-related intents - Updated README and version to v0.9.1
This commit is contained in:
@@ -0,0 +1,192 @@
|
||||
# 🧭 Agents Stability & Friendliness Guide
|
||||
|
||||
This guide focuses on how to improve **reliability** and **user experience** of agents in `github_copilot_sdk.py`.
|
||||
|
||||
---
|
||||
|
||||
## 1) Goals
|
||||
|
||||
- Reduce avoidable failures (timeouts, tool-call dead ends, invalid outputs).
|
||||
- Keep responses predictable under stress (large context, unstable upstream, partial tool failures).
|
||||
- Make interaction friendly (clear progress, clarification before risky actions, graceful fallback).
|
||||
- Preserve backwards compatibility while introducing stronger defaults.
|
||||
|
||||
---
|
||||
|
||||
## 2) Stability model (4 layers)
|
||||
|
||||
## Layer A — Input safety
|
||||
|
||||
- Validate essential runtime context early (user/chat/model/tool availability).
|
||||
- Use strict parsing for JSON-like user/task config (never trust raw free text).
|
||||
- Add guardrails for unsupported mode combinations (e.g., no tools + tool-required task).
|
||||
|
||||
**Implementation hints**
|
||||
- Add preflight validator before `create_session`.
|
||||
- Return fast-fail structured errors with recovery actions.
|
||||
|
||||
## Layer B — Session safety
|
||||
|
||||
- Use profile-driven defaults (`model`, `reasoning_effort`, `infinite_sessions` thresholds).
|
||||
- Auto-fallback to safe profile when unknown profile is requested.
|
||||
- Isolate each chat in a deterministic workspace path.
|
||||
|
||||
**Implementation hints**
|
||||
- Add `AGENT_PROFILE` + fallback to `default`.
|
||||
- Keep `infinite_sessions` enabled by default for long tasks.
|
||||
|
||||
## Layer C — Tool-call safety
|
||||
|
||||
- Add `on_pre_tool_use` to validate and sanitize args.
|
||||
- Add denylist/allowlist checks for dangerous operations.
|
||||
- Add timeout budget per tool class (file/network/shell).
|
||||
|
||||
**Implementation hints**
|
||||
- Keep current `on_post_tool_use` behavior.
|
||||
- Extend hooks gradually: `on_pre_tool_use` first, then `on_error_occurred`.
|
||||
|
||||
## Layer D — Recovery safety
|
||||
|
||||
- Retry only idempotent operations with capped attempts.
|
||||
- Distinguish recoverable vs non-recoverable failures.
|
||||
- Add deterministic fallback path (summary answer + explicit limitation).
|
||||
|
||||
**Implementation hints**
|
||||
- Retry policy table by event type.
|
||||
- Emit "what succeeded / what failed / what to do next" blocks.
|
||||
|
||||
---
|
||||
|
||||
## 3) Friendliness model (UX contract)
|
||||
|
||||
## A. Clarification first for ambiguity
|
||||
|
||||
Use `on_user_input_request` for:
|
||||
- Missing constraints (scope, target path, output format)
|
||||
- High-risk actions (delete/migrate/overwrite)
|
||||
- Contradictory instructions
|
||||
|
||||
**Rule**: ask once with concise choices; avoid repeated back-and-forth.
|
||||
|
||||
## B. Progress visibility
|
||||
|
||||
Emit status in major phases:
|
||||
1. Context check
|
||||
2. Planning/analysis
|
||||
3. Tool execution
|
||||
4. Verification
|
||||
5. Final result
|
||||
|
||||
**Rule**: no silent waits > 8 seconds.
|
||||
|
||||
## C. Friendly failure style
|
||||
|
||||
Every failure should include:
|
||||
- what failed
|
||||
- why (short)
|
||||
- what was already done
|
||||
- next recommended action
|
||||
|
||||
## D. Output readability
|
||||
|
||||
Standardize final response blocks:
|
||||
- `Outcome`
|
||||
- `Changes`
|
||||
- `Validation`
|
||||
- `Limitations`
|
||||
- `Next Step`
|
||||
|
||||
---
|
||||
|
||||
## 4) High-value features to add (priority)
|
||||
|
||||
## P0 (immediate)
|
||||
|
||||
1. `on_user_input_request` handler with default answer strategy
|
||||
2. `on_pre_tool_use` for argument checks + risk gates
|
||||
3. Structured progress events (phase-based)
|
||||
|
||||
## P1 (short-term)
|
||||
|
||||
4. Error taxonomy + retry policy (`network`, `provider`, `tool`, `validation`)
|
||||
5. Profile-based session factory with safe fallback
|
||||
6. Auto quality gate for final output sections
|
||||
|
||||
## P2 (mid-term)
|
||||
|
||||
7. Transport flexibility (`cli_url`, `use_stdio`, `port`) for deployment resilience
|
||||
8. Azure provider path completion
|
||||
9. Foreground session lifecycle support for advanced multi-session control
|
||||
|
||||
---
|
||||
|
||||
## 5) Suggested valves for stability/friendliness
|
||||
|
||||
- `AGENT_PROFILE`: `default | builder | analyst | reviewer`
|
||||
- `ENABLE_USER_INPUT_REQUEST`: `bool`
|
||||
- `DEFAULT_USER_INPUT_ANSWER`: `str`
|
||||
- `TOOL_CALL_TIMEOUT_SECONDS`: `int`
|
||||
- `MAX_RETRY_ATTEMPTS`: `int`
|
||||
- `ENABLE_SAFE_TOOL_GUARD`: `bool`
|
||||
- `ENABLE_PHASE_STATUS_EVENTS`: `bool`
|
||||
- `ENABLE_FRIENDLY_FAILURE_TEMPLATE`: `bool`
|
||||
|
||||
---
|
||||
|
||||
## 6) Failure playbooks (practical)
|
||||
|
||||
## Playbook A — Provider timeout
|
||||
|
||||
- Retry once if request is idempotent.
|
||||
- Downgrade reasoning effort if timeout persists.
|
||||
- Return concise fallback and preserve partial result.
|
||||
|
||||
## Playbook B — Tool argument mismatch
|
||||
|
||||
- Block execution in `on_pre_tool_use`.
|
||||
- Ask user one clarification question if recoverable.
|
||||
- Otherwise skip tool and explain impact.
|
||||
|
||||
## Playbook C — Large output overflow
|
||||
|
||||
- Save large output to workspace file.
|
||||
- Return file path + short summary.
|
||||
- Avoid flooding chat with huge payload.
|
||||
|
||||
## Playbook D — Conflicting user instructions
|
||||
|
||||
- Surface conflict explicitly.
|
||||
- Offer 2-3 fixed choices.
|
||||
- Continue only after user selection.
|
||||
|
||||
---
|
||||
|
||||
## 7) Metrics to track
|
||||
|
||||
- Session success rate
|
||||
- Tool-call success rate
|
||||
- Average recovery rate after first failure
|
||||
- Clarification rate vs hallucination rate
|
||||
- Mean time to first useful output
|
||||
- User follow-up dissatisfaction signals (e.g., “not what I asked”)
|
||||
|
||||
---
|
||||
|
||||
## 8) Minimal rollout plan
|
||||
|
||||
1. Add `on_user_input_request` + `on_pre_tool_use` (feature-gated).
|
||||
2. Add phase status events and friendly failure template.
|
||||
3. Add retry policy + error taxonomy.
|
||||
4. Add profile fallback and deployment transport options.
|
||||
5. Observe metrics for 1-2 weeks, then tighten defaults.
|
||||
|
||||
---
|
||||
|
||||
## 9) Quick acceptance checklist
|
||||
|
||||
- Agent asks clarification only when necessary.
|
||||
- No long silent period without status updates.
|
||||
- Failures always include next actionable step.
|
||||
- Unknown profile/provider config does not crash session.
|
||||
- Large outputs are safely redirected to file.
|
||||
- Final response follows a stable structure.
|
||||
Reference in New Issue
Block a user