Files

300 lines
8.0 KiB
Markdown
Raw Permalink Normal View History

# Auto-Discovery and Deduplication Guide
## Feature Overview
The OpenWebUI Skills Manager Tool now automatically discovers and installs all skills from GitHub repositories, with built-in duplicate handling.
## Features Added
### 1. **Automatic Repo Root Detection** 🎯
When you provide a GitHub repository root URL (without `/tree/`), the system automatically converts it to discovery mode.
#### Examples
```
Input: https://github.com/nicobailon/visual-explainer
Auto-converted to: https://github.com/nicobailon/visual-explainer/tree/main
Discovers all skill subdirectories
```
### 2. **Automatic Skill Discovery** 🔍
Once a tree URL is detected, the tool automatically:
- Queries the GitHub API to list all subdirectories
- Creates skill installation URLs for each subdirectory
- Attempts to fetch `SKILL.md` or `README.md` from each subdirectory
- Installs all discovered skills in batch mode
#### Supported URL Formats
```
✓ https://github.com/owner/repo → Auto-detected as repo root
✓ https://github.com/owner/repo/ → With trailing slash
✓ https://github.com/owner/repo/tree/main → Existing tree format
✓ https://github.com/owner/repo/tree/main/skills → Nested skill directory
```
### 3. **Duplicate URL Removal** 🔄
When installing multiple skills, the system automatically:
- Detects duplicate URLs
- Removes duplicates while preserving order
- Notifies user how many duplicates were removed
- Skips processing duplicate URLs
#### Example
```
Input URLs (5 total):
- https://github.com/user/repo/tree/main/skill1
- https://github.com/user/repo/tree/main/skill1 ← Duplicate
- https://github.com/user/repo/tree/main/skill2
- https://github.com/user/repo/tree/main/skill2 ← Duplicate
- https://github.com/user/repo/tree/main/skill3
Processing:
- Unique URLs: 3
- Duplicates Removed: 2
- Status: "Removed 2 duplicate URL(s) from batch"
```
### 4. **Duplicate Skill Name Detection** ⚠️
If multiple URLs result in the same skill name during batch installation:
- System detects the duplicate installation
- Logs warning with details
- Notifies user of the conflict
- Shows which action was taken (installed/updated)
#### Example Scenario
```
Skill A: skill1.zip → creates skill "report-generator"
Skill B: skill2.zip → creates skill "report-generator" ← Same name!
Warning: "Duplicate skill name 'report-generator' - installed multiple times"
Note: The latest install may have overwritten the earlier one
(depending on ALLOW_OVERWRITE_ON_CREATE setting)
```
## Usage Examples
### Example 1: Simple Repo Root
```
User Input:
"Install skills from https://github.com/nicobailon/visual-explainer"
System Response:
"Detected GitHub repo root: https://github.com/nicobailon/visual-explainer.
Auto-converting to discovery mode..."
"Discovering skills in https://github.com/nicobailon/visual-explainer/tree/main..."
"Installing 5 skill(s)..."
```
### Example 2: With Nested Skills Directory
```
User Input:
"Install all skills from https://github.com/anthropics/skills"
System Response:
"Detected GitHub repo root: https://github.com/anthropics/skills.
Auto-converting to discovery mode..."
"Discovering skills in https://github.com/anthropics/skills/tree/main..."
"Installing 12 skill(s)..."
```
### Example 3: Duplicate Handling
```
User Input (batch):
[
"https://github.com/user/repo/tree/main/skill-a",
"https://github.com/user/repo/tree/main/skill-a", ← Duplicate
"https://github.com/user/repo/tree/main/skill-b"
]
System Response:
"Removed 1 duplicate URL(s) from batch."
"Installing 2 skill(s)..."
Result:
- Batch install completed: 2 succeeded, 0 failed
```
## Implementation Details
### Detection Logic
**Repo root detection** uses regex pattern:
```python
^https://github\.com/([^/]+)/([^/]+)/?$
# Matches:
# https://github.com/owner/repo ✓
# https://github.com/owner/repo/ ✓
# Does NOT match:
# https://github.com/owner/repo/tree/main ✗
# https://github.com/owner/repo/blob/main/file.md ✗
```
### Normalization
Detected repo root URLs are converted with:
```python
https://github.com/{owner}/{repo} → https://github.com/{owner}/{repo}/tree/main
```
The `main` branch is attempted first; the GitHub API handles fallback to `master` if needed.
### Discovery Process
1. Parse tree URL with regex to extract owner, repo, branch, and path
2. Query GitHub API: `/repos/{owner}/{repo}/contents{path}?ref={branch}`
3. Filter for directories (skip hidden directories starting with `.`)
4. For each subdirectory, create a tree URL pointing to it
5. Return list of discovered tree URLs for batch installation
### Deduplication Strategy
```python
seen_urls = set()
unique_urls = []
duplicates_removed = 0
for url in input_urls:
if url not in seen_urls:
unique_urls.append(url)
seen_urls.add(url)
else:
duplicates_removed += 1
```
- Preserves URL order
- O(n) time complexity
- Low memory overhead
### Duplicate Name Tracking
During batch installation:
```python
installed_names = {} # {lowercase_name: url}
for skill in results:
if success:
name_lower = skill["name"].lower()
if name_lower in installed_names:
# Duplicate detected
warn_user(name_lower, installed_names[name_lower])
else:
installed_names[name_lower] = current_url
```
## Configuration
No new Valve parameters are required. Existing settings continue to work:
| Parameter | Impact |
|-----------|--------|
| `ALLOW_OVERWRITE_ON_CREATE` | Controls whether duplicate skill names result in updates or errors |
| `TRUSTED_DOMAINS` | Still enforced for all discovered URLs |
| `INSTALL_FETCH_TIMEOUT` | Applies to each GitHub API discovery call |
| `SHOW_STATUS` | Shows all discovery and deduplication messages |
## API Changes
### install_skill() Method
**New Behavior:**
- Automatically converts repo root URLs to tree format
- Auto-discovers all skill subdirectories for tree URLs
- Deduplicates URL list before batch processing
- Tracks duplicate skill names during installation
**Parameters:** (unchanged)
- `url`: Can now be repo root (e.g., `https://github.com/owner/repo`)
- `name`: Ignored in batch/auto-discovery mode
- `overwrite`: Controls behavior on skill name conflicts
- Other parameters remain the same
**Return Value:** (unchanged)
- Single skill: Returns installation metadata
- Batch install: Returns batch summary with success/failure counts
## Error Handling
### Discovery Failures
- If repo root normalization fails → treated as normal URL
- If tree discovery API fails → logs warning, continues single-file install attempt
- If no SKILL.md or README.md found → specific error for that URL
### Batch Failures
- Duplicate URL removal → notifies user but continues
- Individual skill failures → logs error, continues with next skill
- Final summary shows succeeded/failed counts
## Telemetry & Logging
All operations emit status updates:
- ✓ "Detected GitHub repo root: ..."
- ✓ "Removed {count} duplicate URL(s) from batch"
- ⚠️ "Warning: Duplicate skill name '{name}'"
- ✗ "Installation failed for {url}: {reason}"
Check OpenWebUI logs for detailed error traces.
## Testing
Run the included test suite:
```bash
python3 docs/test_auto_discovery.py
```
Tests coverage:
- ✓ Repo root URL detection (6 cases)
- ✓ URL normalization for discovery (4 cases)
- ✓ Duplicate removal logic (3 scenarios)
- ✓ Total: 13/13 test cases passing
## Backward Compatibility
**Fully backward compatible.**
- Existing tree URLs work as before
- Existing blob/raw URLs function unchanged
- Existing batch installations unaffected
- New features are automatic (no user action required)
- No breaking changes to API
## Future Enhancements
Possible future improvements:
1. Support for GitLab, Gitea, and other Git platforms
2. Smart branch detection (master → main fallback)
3. Skill filtering by name pattern during auto-discovery
4. Batch installation with conflict resolution strategies
5. Caching of discovery results to reduce API calls