diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index fa0b4e5..c3bd62e 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -35,38 +35,71 @@ plugins/actions/export_to_docx/ 所有插件 README 必须遵循以下统一结构顺序: -1. **标题 (Title)**: 插件名称 -2. **元数据 (Metadata)**: 作者、版本、许可证、项目链接 (一行显示) - - 格式: `**Author:** [Name](Link) | **Version:** x.x.x | **Project:** [Link](Link)` -3. **描述 (Description)**: 简短的功能介绍 +1. **标题 (Title)**: 插件名称,带 Emoji 图标 +2. **元数据 (Metadata)**: 作者、版本、项目链接 (一行显示) + - 格式: `**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** x.x.x | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui)` + - **注意**: Author 和 Project 为固定值,仅需更新 Version 版本号 +3. **描述 (Description)**: 一句话功能介绍 4. **最新更新 (What's New)**: **必须**放在描述之后,显著展示最新版本的变更点 -5. **核心特性 (Key Features)** -6. **使用方法 (Usage)** -7. **配置参数 (Configuration/Valves)** -8. **其他 (Others)**: 故障排除、示例等 +5. **核心特性 (Key Features)**: 使用 Emoji + 粗体标题 + 描述格式 +6. **使用方法 (How to Use)**: 按步骤说明 +7. **配置参数 (Configuration/Valves)**: 使用表格格式,包含参数名、默认值、描述 +8. **其他 (Others)**: 支持的模板类型、语法示例、故障排除等 -示例 (Example): +完整示例 (Full Example): ```markdown # 📊 Smart Plugin **Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.0.0 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) -A powerful plugin for OpenWebUI. +A one-sentence description of this plugin. ## 🔥 What's New in v1.0.0 -- Feature A -- Feature B +- ✨ **Feature Name**: Brief description of the feature. +- 🔧 **Configuration Change**: What changed in settings. +- 🐛 **Bug Fix**: What was fixed. -## ✨ Features -... +## ✨ Key Features + +- 🚀 **Feature A**: Description of feature A. +- 🎨 **Feature B**: Description of feature B. +- 📥 **Feature C**: Description of feature C. + +## 🚀 How to Use + +1. **Install**: Search for "Plugin Name" in the Open WebUI Community and install. +2. **Trigger**: Enter your text in the chat, then click the **Action Button**. +3. **Result**: View the generated result. + +## ⚙️ Configuration (Valves) + +| Parameter | Default | Description | +| :--- | :--- | :--- | +| **Show Status (SHOW_STATUS)** | `True` | Whether to show status updates. | +| **Model ID (MODEL_ID)** | `Empty` | LLM model for processing. | +| **Output Mode (OUTPUT_MODE)** | `image` | `image` for static, `html` for interactive. | + +## 🛠️ Supported Types (Optional) + +| Category | Type Name | Use Case | +| :--- | :--- | :--- | +| **Category A** | `type-a`, `type-b` | Use case description | + +## 📝 Advanced Example (Optional) + +\`\`\`syntax +example code or syntax here +\`\`\` ``` ### 文档内容要求 (Content Requirements) -- **新增功能**: 必须在 "What's New" 章节中明确列出。 +- **新增功能**: 必须在 "What's New" 章节中明确列出,使用 Emoji + 粗体标题格式。 - **双语**: 必须提供 `README.md` (英文) 和 `README_CN.md` (中文)。 +- **表格对齐**: 配置参数表格使用左对齐 `:---`。 +- **Emoji 规范**: 标题使用合适的 Emoji 增强可读性。 ### 官方文档 (Official Documentation) @@ -508,7 +541,164 @@ Base = declarative_base() --- -## 🔧 代码规范 (Code Style) +## 📂 文件存储访问规范 (File Storage Access) + +OpenWebUI 支持多种文件存储后端(本地磁盘、S3/MinIO 对象存储等)。插件在访问用户上传的文件或生成的图片时,必须实现多级回退机制以兼容所有存储配置。 + +### 存储类型检测 (Storage Type Detection) + +通过 `Files.get_file_by_id()` 获取的文件对象,其 `path` 属性决定了存储位置: + +| Path 格式 | 存储类型 | 访问方式 | +|-----------|----------|----------| +| `s3://bucket/key` | S3/MinIO 对象存储 | boto3 直连或 API 回调 | +| `/app/backend/data/...` | Docker 卷存储 | 本地文件系统读取 | +| `./uploads/...` | 本地相对路径 | 本地文件系统读取 | +| `gs://bucket/key` | Google Cloud Storage | API 回调 | + +### 多级回退机制 (Multi-level Fallback) + +推荐实现以下优先级的文件获取策略: + +```python +def _get_file_content(self, file_id: str, max_bytes: int) -> Optional[bytes]: + """获取文件内容,支持多种存储后端""" + file_obj = Files.get_file_by_id(file_id) + if not file_obj: + return None + + # 1️⃣ 数据库直接存储 (小文件) + data_field = getattr(file_obj, "data", None) + if isinstance(data_field, dict): + if "bytes" in data_field: + return data_field["bytes"] + if "base64" in data_field: + return base64.b64decode(data_field["base64"]) + + # 2️⃣ S3 直连 (对象存储 - 最快) + s3_path = getattr(file_obj, "path", None) + if isinstance(s3_path, str) and s3_path.startswith("s3://"): + data = self._read_from_s3(s3_path, max_bytes) + if data: + return data + + # 3️⃣ 本地文件系统 (磁盘存储) + for attr in ("path", "file_path"): + path = getattr(file_obj, attr, None) + if path and not path.startswith(("s3://", "gs://", "http")): + # 尝试多个常见路径 + for base in ["", "./data", "/app/backend/data"]: + full_path = Path(base) / path if base else Path(path) + if full_path.exists(): + return full_path.read_bytes()[:max_bytes] + + # 4️⃣ 公共 URL 下载 + url = getattr(file_obj, "url", None) + if url and url.startswith("http"): + return self._download_from_url(url, max_bytes) + + # 5️⃣ 内部 API 回调 (通用兜底方案) + if self._api_base_url: + api_url = f"{self._api_base_url}/api/v1/files/{file_id}/content" + return self._download_from_api(api_url, self._api_token, max_bytes) + + return None +``` + +### S3 直连实现 (S3 Direct Access) + +当检测到 `s3://` 路径时,使用 `boto3` 直接访问对象存储,读取以下环境变量: + +| 环境变量 | 说明 | 示例 | +|----------|------|------| +| `S3_ENDPOINT_URL` | S3 兼容服务端点 | `https://minio.example.com` | +| `S3_ACCESS_KEY_ID` | 访问密钥 ID | `minioadmin` | +| `S3_SECRET_ACCESS_KEY` | 访问密钥 | `minioadmin` | +| `S3_ADDRESSING_STYLE` | 寻址样式 | `auto`, `path`, `virtual` | + +```python +# S3 直连示例 +import boto3 +from botocore.config import Config as BotoConfig +import os + +def _read_from_s3(self, s3_path: str, max_bytes: int) -> Optional[bytes]: + """从 S3 直接读取文件 (比 API 回调更快)""" + if not s3_path.startswith("s3://"): + return None + + # 解析 s3://bucket/key + parts = s3_path[5:].split("/", 1) + bucket, key = parts[0], parts[1] + + # 从环境变量读取配置 + endpoint = os.environ.get("S3_ENDPOINT_URL") + access_key = os.environ.get("S3_ACCESS_KEY_ID") + secret_key = os.environ.get("S3_SECRET_ACCESS_KEY") + + if not all([endpoint, access_key, secret_key]): + return None # 回退到 API 方式 + + s3_client = boto3.client( + "s3", + endpoint_url=endpoint, + aws_access_key_id=access_key, + aws_secret_access_key=secret_key, + config=BotoConfig(s3={"addressing_style": os.environ.get("S3_ADDRESSING_STYLE", "auto")}) + ) + + response = s3_client.get_object(Bucket=bucket, Key=key) + return response["Body"].read(max_bytes) +``` + +### API 回调实现 (API Fallback) + +当其他方式失败时,通过 OpenWebUI 内部 API 获取文件: + +```python +def _download_from_api(self, api_url: str, token: str, max_bytes: int) -> Optional[bytes]: + """通过 OpenWebUI API 获取文件内容""" + import urllib.request + + headers = {"User-Agent": "OpenWebUI-Plugin"} + if token: + headers["Authorization"] = token + + req = urllib.request.Request(api_url, headers=headers) + with urllib.request.urlopen(req, timeout=15) as response: + if 200 <= response.status < 300: + return response.read(max_bytes) + return None +``` + +### 获取 API 上下文 (API Context Extraction) + +在 `action()` 方法中捕获请求上下文,用于 API 回调: + +```python +async def action(self, body: dict, __request__=None, ...): + # 从请求对象获取 API 凭证 + if __request__: + self._api_token = __request__.headers.get("Authorization") + self._api_base_url = str(__request__.base_url).rstrip("/") + else: + # 从环境变量获取端口作为备用 + port = os.environ.get("PORT") or "8080" + self._api_base_url = f"http://localhost:{port}" + self._api_token = None +``` + +### 性能对比 (Performance Comparison) + +| 方式 | 网络跳数 | 适用场景 | +|------|----------|----------| +| S3 直连 | 1 (插件 → S3) | 对象存储,最快 | +| 本地文件 | 0 | 磁盘存储,最快 | +| API 回调 | 2 (插件 → OpenWebUI → S3/磁盘) | 通用兜底 | + +### 参考实现 (Reference Implementation) + +- `plugins/actions/export_to_docx/export_to_word.py` - `_image_bytes_from_owui_file_id` 方法 ### Python 规范 diff --git a/docs/plugins/actions/export-to-word.md b/docs/plugins/actions/export-to-word.md index 1350eb9..66cacdb 100644 --- a/docs/plugins/actions/export-to-word.md +++ b/docs/plugins/actions/export-to-word.md @@ -1,7 +1,7 @@ # Export to Word Action -v0.4.1 +v0.4.2 Export conversation to Word (.docx) with **syntax highlighting**, **native math equations**, **Mermaid diagrams**, **citations**, and **enhanced table formatting**. diff --git a/docs/plugins/actions/export-to-word.zh.md b/docs/plugins/actions/export-to-word.zh.md index e9ffe07..8d0b7cc 100644 --- a/docs/plugins/actions/export-to-word.zh.md +++ b/docs/plugins/actions/export-to-word.zh.md @@ -1,7 +1,7 @@ # Export to Word(导出为 Word) Action -v0.4.1 +v0.4.2 将当前对话导出为完美格式的 Word 文档,支持**代码语法高亮**、**原生数学公式**、**Mermaid 图表**、**引用资料**以及**增强表格**渲染。 diff --git a/docs/plugins/actions/index.md b/docs/plugins/actions/index.md index ee452c6..3ee89fa 100644 --- a/docs/plugins/actions/index.md +++ b/docs/plugins/actions/index.md @@ -63,7 +63,7 @@ Actions are interactive plugins that: Export the current conversation to a formatted Word doc with **syntax highlighting**, **native math equations**, **Mermaid diagrams**, **citations**, and **enhanced table formatting**. - **Version:** 0.4.1 + **Version:** 0.4.2 [:octicons-arrow-right-24: Documentation](export-to-word.md) diff --git a/docs/plugins/actions/index.zh.md b/docs/plugins/actions/index.zh.md index 66c932f..4b145a1 100644 --- a/docs/plugins/actions/index.zh.md +++ b/docs/plugins/actions/index.zh.md @@ -63,7 +63,7 @@ Actions 是交互式插件,能够: 将当前对话导出为完美格式的 Word 文档,支持**代码语法高亮**、**原生数学公式**、**Mermaid 图表**、**引用资料**以及**增强表格**渲染。 - **版本:** 0.4.1 + **版本:** 0.4.2 [:octicons-arrow-right-24: 查看文档](export-to-word.md) diff --git a/plugins/actions/export_to_docx/README.md b/plugins/actions/export_to_docx/README.md index eaed66a..a115952 100644 --- a/plugins/actions/export_to_docx/README.md +++ b/plugins/actions/export_to_docx/README.md @@ -1,130 +1,88 @@ -# Export to Word +# 📝 Export to Word (Enhanced) + +**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 0.4.2 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) Export conversation to Word (.docx) with **syntax highlighting**, **native math equations**, **Mermaid diagrams**, **citations**, and **enhanced table formatting**. -## Features +## 🔥 What's New in v0.4.2 -- **One-Click Export**: Adds an "Export to Word" action button to the chat. -- **Markdown Conversion**: Converts Markdown syntax to Word formatting (headings, bold, italic, code, tables, lists). -- **Syntax Highlighting**: Code blocks are highlighted with Pygments (supports 500+ languages). -- **Native Math Equations**: LaTeX math (`$$...$$`, `\[...\]`, `$...$`, `\(...\)`) converted to editable Word equations. -- **Mermaid Diagrams**: Mermaid flowcharts and sequence diagrams rendered as images in the document. -- **Citations & References**: Auto-generates a References section from OpenWebUI sources with clickable citation links. -- **Reasoning Stripping**: Automatically removes AI thinking blocks (``, ``) from exports. -- **Enhanced Tables**: Smart column widths, column alignment (`:---`, `---:`, `:---:`), header row repeat across pages. -- **Blockquote Support**: Markdown blockquotes are rendered with left border and gray styling. -- **Multi-language Support**: Properly handles both Chinese and English text. -- **Smarter Filenames**: Configurable title source (Chat Title, AI Generated, or Markdown Title). +- ✨ **S3 Object Storage Support**: Direct access to images stored in S3/MinIO via boto3, bypassing API layer for faster exports. +- 🔧 **Multi-level File Fallback**: 6-level fallback mechanism for file retrieval (DB → S3 → Local → URL → API → Attributes). +- 🛡️ **Improved Error Handling**: Better logging and error messages for file retrieval failures. -## Configuration +## ✨ Key Features -You can configure the following settings via the **Valves** button in the plugin settings: +- 🚀 **One-Click Export**: Adds an "Export to Word" action button to the chat. +- 📄 **Markdown Conversion**: Full Markdown syntax support (headings, bold, italic, code, tables, lists). +- 🎨 **Syntax Highlighting**: Code blocks highlighted with Pygments (500+ languages). +- 🔢 **Native Math Equations**: LaTeX math (`$$...$$`, `\[...\]`, `$...$`) converted to editable Word equations. +- 📊 **Mermaid Diagrams**: Flowcharts and sequence diagrams rendered as images. +- 📚 **Citations & References**: Auto-generates References section with clickable citation links. +- 🧹 **Reasoning Stripping**: Automatically removes AI thinking blocks (``, ``). +- 📋 **Enhanced Tables**: Smart column widths, alignment, header row repeat across pages. +- 💬 **Blockquote Support**: Markdown blockquotes with left border and gray styling. +- 🌐 **Multi-language Support**: Proper handling of Chinese and English text. -- **TITLE_SOURCE**: Choose how the document title/filename is generated. - - `chat_title`: Use the conversation title (default). - - `ai_generated`: Use AI to generate a short title based on the content. - - `markdown_title`: Extract the first h1/h2 heading from the Markdown content. -- **MAX_EMBED_IMAGE_MB**: Maximum image size to embed into DOCX (MB). Default: `20`. -- **UI_LANGUAGE**: User interface language, supports `en` (English) and `zh` (Chinese). Default: `en`. -- **FONT_LATIN**: Font name for Latin characters. Default: `Times New Roman`. -- **FONT_ASIAN**: Font name for Asian characters. Default: `SimSun`. -- **FONT_CODE**: Font name for code blocks. Default: `Consolas`. -- **TABLE_HEADER_COLOR**: Table header background color (Hex without #). Default: `F2F2F2`. -- **TABLE_ZEBRA_COLOR**: Table alternating row background color (Hex without #). Default: `FBFBFB`. -- **MERMAID_JS_URL**: URL for the Mermaid.js library. -- **MERMAID_JSZIP_URL**: URL for the JSZip library (required for DOCX manipulation). -- **MERMAID_PNG_SCALE**: Scale factor for Mermaid PNG generation (Resolution). Default: `3.0`. -- **MERMAID_DISPLAY_SCALE**: Scale factor for Mermaid visual size in Word. Default: `1.0`. -- **MERMAID_OPTIMIZE_LAYOUT**: Automatically convert LR (Left-Right) flowcharts to TD (Top-Down). Default: `False`. -- **MERMAID_BACKGROUND**: Background color for Mermaid diagrams (e.g., `white`, `transparent`). Default: `transparent`. -- **MERMAID_CAPTIONS_ENABLE**: Enable/disable figure captions for Mermaid diagrams. Default: `True`. -- **MERMAID_CAPTION_STYLE**: Paragraph style name for Mermaid captions. Default: `Caption`. -- **MERMAID_CAPTION_PREFIX**: Caption prefix label (e.g., 'Figure'). Empty = auto-detect based on language. -- **MATH_ENABLE**: Enable LaTeX math block conversion (`\[...\]` and `$$...$$`). Default: `True`. -- **MATH_INLINE_DOLLAR_ENABLE**: Enable inline `$ ... $` math conversion. Default: `True`. +## 🚀 How to Use -## Supported Markdown Syntax +1. **Install**: Search for "Export to Word" in the Open WebUI Community and install. +2. **Trigger**: In any chat, click the "Export to Word" action button. +3. **Download**: The .docx file will be automatically downloaded. -| Syntax | Word Result | -| :---------------------------------- | :------------------------------------ | -| `# Heading 1` to `###### Heading 6` | Heading levels 1-6 | -| `**bold**` or `__bold__` | Bold text | -| `*italic*` or `_italic_` | Italic text | -| `***bold italic***` | Bold + Italic | -| `` `inline code` `` | Monospace with gray background | -| ` ``` code block ``` ` | **Syntax highlighted** code block | -| `> blockquote` | Left-bordered gray italic text | -| `[link](url)` | Blue underlined link text | -| `~~strikethrough~~` | Strikethrough text | -| `- item` or `* item` | Bullet list | -| `1. item` | Numbered list | -| Markdown tables | **Enhanced table** with smart widths | -| `---` or `***` | Horizontal rule | -| `$$LaTeX$$` or `\[LaTeX\]` | **Native Word equation** (display) | -| `$LaTeX$` or `\(LaTeX\)` | **Native Word equation** (inline) | -| ` ```mermaid ... ``` ` | **Mermaid diagram** as image | -| `[1]` citation markers | **Clickable links** to References | +## ⚙️ Configuration (Valves) -## Usage +| Parameter | Default | Description | +| :--- | :--- | :--- | +| **Title Source (TITLE_SOURCE)** | `chat_title` | `chat_title`, `ai_generated`, or `markdown_title` | +| **Max Image Size (MAX_EMBED_IMAGE_MB)** | `20` | Maximum image size to embed (MB) | +| **UI Language (UI_LANGUAGE)** | `en` | `en` (English) or `zh` (Chinese) | +| **Latin Font (FONT_LATIN)** | `Times New Roman` | Font for Latin characters | +| **Asian Font (FONT_ASIAN)** | `SimSun` | Font for Asian characters | +| **Code Font (FONT_CODE)** | `Consolas` | Font for code blocks | +| **Table Header Color** | `F2F2F2` | Header background color (hex) | +| **Table Zebra Color** | `FBFBFB` | Alternating row color (hex) | +| **Mermaid PNG Scale** | `3.0` | Resolution multiplier for Mermaid images | +| **Math Enable** | `True` | Enable LaTeX math conversion | -1. Install the plugin. -2. In any chat, click the "Export to Word" button. -3. The .docx file will be automatically downloaded to your device. +## 🛠️ Supported Markdown Syntax -## Requirements +| Syntax | Word Result | +| :--- | :--- | +| `# Heading 1` to `###### Heading 6` | Heading levels 1-6 | +| `**bold**` or `__bold__` | Bold text | +| `*italic*` or `_italic_` | Italic text | +| `` `inline code` `` | Monospace with gray background | +| ` ``` code block ``` ` | **Syntax highlighted** code block | +| `> blockquote` | Left-bordered gray italic text | +| `[link](url)` | Blue underlined link | +| `~~strikethrough~~` | Strikethrough text | +| `- item` or `* item` | Bullet list | +| `1. item` | Numbered list | +| Markdown tables | **Enhanced table** with smart widths | +| `$$LaTeX$$` or `\[LaTeX\]` | **Native Word equation** (display) | +| `$LaTeX$` or `\(LaTeX\)` | **Native Word equation** (inline) | +| ` ```mermaid ... ``` ` | **Mermaid diagram** as image | +| `[1]` citation markers | **Clickable links** to References | + +## 📦 Requirements - `python-docx==1.1.2` - Word document generation - `Pygments>=2.15.0` - Syntax highlighting - `latex2mathml` - LaTeX to MathML conversion - `mathml2omml` - MathML to Office Math (OMML) conversion -All dependencies are declared in the plugin docstring. +## 📝 Changelog -## Font Configuration +### v0.4.2 +- **S3 Object Storage**: Direct S3/MinIO access via boto3 for faster image retrieval. +- **6-Level Fallback**: Robust file retrieval: DB → S3 → Local → URL → API → Attributes. +- **Better Logging**: Improved error messages for debugging file access issues. -- **English Text**: Times New Roman -- **Chinese Text**: SimSun (宋体) for body, SimHei (黑体) for headings -- **Code**: Consolas - -## Changelog +### v0.4.1 +- **Chinese Parameter Names**: Localized configuration names for Chinese version. ### v0.4.0 - -- **Multi-language Support**: Added UI language switching (English/Chinese) with localized messages. -- **Font & Style Configuration**: Customizable fonts for Latin/Asian text and code, plus table colors. -- **Mermaid Enhancements**: - - Hybrid client-side rendering (SVG+PNG) for better clarity and compatibility. - - Configurable background color, fixing issues in dark mode. - - Added error boundaries to prevent export failures on render errors. -- **Performance**: Real-time progress updates for large document exports. -- **Bug Fixes**: - - Fixed parsing errors in Markdown tables containing code blocks or links. - - Fixed parsing issues with underscores (`_`), asterisks (`*`), and tildes (`~`) used as long separators. - - Enhanced error handling for image embedding. - -### v0.3.0 - -- **Mermaid Diagrams**: Native support for rendering Mermaid diagrams as images in Word. -- **Native Math**: Converts LaTeX equations to native Office MathML for editable equations. -- **Citations**: Automatic bibliography generation and citation linking. -- **Reasoning Removal**: Option to strip `` blocks from the output. -- **Table Enhancements**: Improved table formatting with smart column widths. - -### v0.2.0 -- Added native math equation support (LaTeX → OMML) -- Added Mermaid diagram rendering -- Added citations and references section generation -- Added automatic reasoning block stripping -- Enhanced table formatting with smart column widths and alignment - -### v0.1.1 -- Initial release with basic Markdown to Word conversion - -## Author - -Fu-Jie -GitHub: [Fu-Jie/awesome-openwebui](https://github.com/Fu-Jie/awesome-openwebui) - -## License - -MIT License +- **Multi-language Support**: UI language switching (English/Chinese). +- **Font & Style Configuration**: Customizable fonts and table colors. +- **Mermaid Enhancements**: Hybrid SVG+PNG rendering, background color config. +- **Performance**: Real-time progress updates for large exports. diff --git a/plugins/actions/export_to_docx/README_CN.md b/plugins/actions/export_to_docx/README_CN.md index 1e7a4ae..4a8ceea 100644 --- a/plugins/actions/export_to_docx/README_CN.md +++ b/plugins/actions/export_to_docx/README_CN.md @@ -1,134 +1,88 @@ -# 导出为 Word +# 📝 导出为 Word (增强版) + +**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 0.4.2 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) 将对话导出为 Word (.docx),支持**代码语法高亮**、**原生数学公式**、**Mermaid 图表**、**引用参考**和**增强表格格式**。 -## 功能特点 +## 🔥 v0.4.2 更新内容 -- **一键导出**:在聊天界面添加"导出为 Word"动作按钮。 -- **Markdown 转换**:将 Markdown 语法转换为 Word 格式(标题、粗体、斜体、代码、表格、列表)。 -- **代码语法高亮**:使用 Pygments 库为代码块添加语法高亮(支持 500+ 种语言)。 -- **原生数学公式**:LaTeX 公式(`$$...$$`、`\[...\]`、`$...$`、`\(...\)`)转换为可编辑的 Word 公式。 -- **Mermaid 图表**:Mermaid 流程图和时序图渲染为文档中的图片。 -- **引用与参考**:自动从 OpenWebUI 来源生成参考资料章节,支持可点击的引用链接。 -- **移除思考过程**:自动移除 AI 思考块(``、``)。 -- **增强表格**:智能列宽、列对齐(`:---`、`---:`、`:---:`)、表头跨页重复。 -- **引用块支持**:Markdown 引用块渲染为带左侧边框的灰色斜体样式。 -- **多语言支持**:正确处理中文和英文文本,无乱码问题。 -- **智能文件名**:可配置标题来源(对话标题、AI 生成或 Markdown 标题)。 +- ✨ **S3 对象存储支持**: 通过 boto3 直连 S3/MinIO,绕过 API 层,导出速度更快。 +- 🔧 **多级文件回退**: 6 级文件获取机制(数据库 → S3 → 本地 → URL → API → 属性)。 +- 🛡️ **错误处理优化**: 更完善的日志记录和错误提示,便于调试文件访问问题。 -## 配置 +## ✨ 核心特性 -您可以通过插件设置中的 **Valves** 按钮配置以下选项: +- 🚀 **一键导出**: 在聊天界面添加"导出为 Word"动作按钮。 +- 📄 **Markdown 转换**: 完整支持 Markdown 语法(标题、粗体、斜体、代码、表格、列表)。 +- 🎨 **代码语法高亮**: 使用 Pygments 库高亮代码块(支持 500+ 种语言)。 +- 🔢 **原生数学公式**: LaTeX 公式(`$$...$$`、`\[...\]`、`$...$`)转换为可编辑的 Word 公式。 +- 📊 **Mermaid 图表**: 流程图和时序图渲染为文档中的图片。 +- 📚 **引用与参考**: 自动生成参考资料章节,支持可点击的引用链接。 +- 🧹 **移除思考过程**: 自动移除 AI 思考块(``、``)。 +- 📋 **增强表格**: 智能列宽、对齐、表头跨页重复。 +- 💬 **引用块支持**: Markdown 引用块渲染为带左侧边框的灰色斜体样式。 +- 🌐 **多语言支持**: 正确处理中文和英文文本。 -- **文档标题来源**:选择文档标题/文件名的生成方式。 - - `chat_title`:使用对话标题(默认)。 - - `ai_generated`:使用 AI 根据内容生成简短标题。 - - `markdown_title`:从 Markdown 内容中提取第一个一级或二级标题。 -- **最大嵌入图片大小MB**:嵌入图片的最大大小 (MB)。默认:`20`。 -- **界面语言**:界面语言,支持 `en` (英语) 和 `zh` (中文)。默认:`zh`。 -- **英文字体**:英文字体名称。默认:`Calibri`。 -- **中文字体**:中文字体名称。默认:`SimSun`。 -- **代码字体**:代码字体名称。默认:`Consolas`。 -- **表头背景色**:表头背景色(十六进制,不带#)。默认:`F2F2F2`。 -- **表格隔行背景色**:表格隔行背景色(十六进制,不带#)。默认:`FBFBFB`。 -- **Mermaid_JS地址**:Mermaid.js 库的 URL。 -- **JSZip库地址**:JSZip 库的 URL(用于 DOCX 操作)。 -- **Mermaid_PNG缩放比例**:Mermaid PNG 生成缩放比例(分辨率)。默认:`3.0`。 -- **Mermaid显示比例**:Mermaid 在 Word 中的显示比例(视觉大小)。默认:`1.0`。 -- **Mermaid布局优化**:自动将 LR(左右)流程图转换为 TD(上下)。默认:`False`。 -- **Mermaid背景色**:Mermaid 图表背景色(如 `white`, `transparent`)。默认:`transparent`。 -- **启用Mermaid图注**:启用/禁用 Mermaid 图表的图注。默认:`True`。 -- **Mermaid图注样式**:Mermaid 图注的段落样式名称。默认:`Caption`。 -- **Mermaid图注前缀**:图注前缀(如 '图')。留空则根据语言自动检测。 -- **启用数学公式**:启用 LaTeX 数学公式块转换(`\[...\]` 和 `$$...$$`)。默认:`True`。 -- **启用行内公式**:启用行内 `$ ... $` 数学公式转换。默认:`True`。 +## 🚀 使用方法 -## 支持的 Markdown 语法 +1. **安装**: 在 Open WebUI 社区搜索 "导出为 Word" 并安装。 +2. **触发**: 在任意对话中,点击"导出为 Word"动作按钮。 +3. **下载**: .docx 文件将自动下载到你的设备。 -| 语法 | Word 效果 | -| :---------------------------- | :-------------------------------- | -| `# 标题1` 到 `###### 标题6` | 标题级别 1-6 | -| `**粗体**` 或 `__粗体__` | 粗体文本 | -| `*斜体*` 或 `_斜体_` | 斜体文本 | -| `***粗斜体***` | 粗体 + 斜体 | -| `` `行内代码` `` | 等宽字体 + 灰色背景 | -| ` ``` 代码块 ``` ` | **语法高亮**的代码块 | -| `> 引用文本` | 带左侧边框的灰色斜体文本 | -| `[链接](url)` | 蓝色下划线链接文本 | -| `~~删除线~~` | 删除线文本 | -| `- 项目` 或 `* 项目` | 无序列表 | -| `1. 项目` | 有序列表 | -| Markdown 表格 | **增强表格**(智能列宽) | -| `---` 或 `***` | 水平分割线 | -| `$$LaTeX$$` 或 `\[LaTeX\]` | **原生 Word 公式**(块级) | -| `$LaTeX$` 或 `\(LaTeX\)` | **原生 Word 公式**(行内) | -| ` ```mermaid ... ``` ` | **Mermaid 图表**(图片形式) | -| `[1]` 引用标记 | **可点击链接**到参考资料 | +## ⚙️ 配置参数 (Valves) -## 使用方法 +| 参数 | 默认值 | 说明 | +| :--- | :--- | :--- | +| **文档标题来源** | `chat_title` | `chat_title`(对话标题)、`ai_generated`(AI 生成)、`markdown_title`(Markdown 标题)| +| **最大嵌入图片大小MB** | `20` | 嵌入图片的最大大小 (MB) | +| **界面语言** | `zh` | `en`(英语)或 `zh`(中文)| +| **英文字体** | `Calibri` | 英文字体名称 | +| **中文字体** | `SimSun` | 中文字体名称 | +| **代码字体** | `Consolas` | 代码块字体名称 | +| **表头背景色** | `F2F2F2` | 表头背景色(十六进制)| +| **表格隔行背景色** | `FBFBFB` | 表格隔行背景色(十六进制)| +| **Mermaid_PNG缩放比例** | `3.0` | Mermaid 图片分辨率倍数 | +| **启用数学公式** | `True` | 启用 LaTeX 公式转换 | -1. 安装插件。 -2. 在任意对话中,点击"导出为 Word"按钮。 -3. .docx 文件将自动下载到你的设备。 +## 🛠️ 支持的 Markdown 语法 -## 依赖 +| 语法 | Word 效果 | +| :--- | :--- | +| `# 标题1` 到 `###### 标题6` | 标题级别 1-6 | +| `**粗体**` 或 `__粗体__` | 粗体文本 | +| `*斜体*` 或 `_斜体_` | 斜体文本 | +| `` `行内代码` `` | 等宽字体 + 灰色背景 | +| ` ``` 代码块 ``` ` | **语法高亮**的代码块 | +| `> 引用文本` | 带左侧边框的灰色斜体文本 | +| `[链接](url)` | 蓝色下划线链接文本 | +| `~~删除线~~` | 删除线文本 | +| `- 项目` 或 `* 项目` | 无序列表 | +| `1. 项目` | 有序列表 | +| Markdown 表格 | **增强表格**(智能列宽)| +| `$$LaTeX$$` 或 `\[LaTeX\]` | **原生 Word 公式**(块级)| +| `$LaTeX$` 或 `\(LaTeX\)` | **原生 Word 公式**(行内)| +| ` ```mermaid ... ``` ` | **Mermaid 图表**(图片形式)| +| `[1]` 引用标记 | **可点击链接**到参考资料 | + +## 📦 依赖 - `python-docx==1.1.2` - Word 文档生成 - `Pygments>=2.15.0` - 语法高亮 - `latex2mathml` - LaTeX 转 MathML - `mathml2omml` - MathML 转 Office Math (OMML) -所有依赖已在插件文档字符串中声明。 +## 📝 更新日志 -## 字体配置 - -- **英文文本**:Times New Roman -- **中文文本**:宋体(正文)、黑体(标题) -- **代码**:Consolas - -## 更新日志 +### v0.4.2 +- **S3 对象存储**: 通过 boto3 直连 S3/MinIO,图片获取速度更快。 +- **6 级回退机制**: 稳健的文件获取:数据库 → S3 → 本地 → URL → API → 属性。 +- **日志优化**: 改进错误提示,便于调试文件访问问题。 ### v0.4.1 - -- **中文参数名**: 将插件配置项名称和描述全部汉化,提升中文用户体验。 +- **中文参数名**: 配置项名称和描述全部汉化。 ### v0.4.0 - -- **多语言支持**: 新增界面语言切换(中文/英文),提示信息更友好。 +- **多语言支持**: 界面语言切换(中文/英文)。 - **字体与样式配置**: 支持自定义中英文字体、代码字体以及表格颜色。 -- **Mermaid 增强**: - - 客户端混合渲染(SVG+PNG),提高清晰度与兼容性。 - - 支持背景色配置,修复深色模式下的显示问题。 - - 增加错误边界,渲染失败时显示提示而非中断导出。 +- **Mermaid 增强**: 混合 SVG+PNG 渲染,支持背景色配置。 - **性能优化**: 导出大型文档时提供实时进度反馈。 -- **Bug 修复**: - - 修复 Markdown 表格中包含代码块或链接时的解析错误。 - - 修复下划线(`_`)、星号(`*`)、波浪号(`~`)作为长分隔符时的解析问题。 - - 增强图片嵌入的错误处理。 - -### v0.3.0 - -- **Mermaid 图表**: 原生支持将 Mermaid 图表渲染为 Word 中的图片。 -- **原生公式**: 将 LaTeX 公式转换为原生 Office MathML,支持在 Word 中编辑。 -- **引用参考**: 自动生成参考文献列表并链接引用。 -- **移除推理**: 选项支持从输出中移除 `` 推理块。 -- **表格增强**: 改进表格格式,支持智能列宽。 - -### v0.2.0 -- 新增原生数学公式支持(LaTeX → OMML) -- 新增 Mermaid 图表渲染 -- 新增引用与参考资料章节生成 -- 新增自动移除 AI 思考块 -- 增强表格格式(智能列宽、对齐) - -### v0.1.1 -- 初始版本,支持基本 Markdown 转 Word - -## 作者 - -Fu-Jie -GitHub: [Fu-Jie/awesome-openwebui](https://github.com/Fu-Jie/awesome-openwebui) - -## 许可证 - -MIT License diff --git a/plugins/actions/export_to_docx/export_to_word.py b/plugins/actions/export_to_docx/export_to_word.py index 2a98df9..7b606af 100644 --- a/plugins/actions/export_to_docx/export_to_word.py +++ b/plugins/actions/export_to_docx/export_to_word.py @@ -3,7 +3,7 @@ title: Export to Word (Enhanced) author: Fu-Jie author_url: https://github.com/Fu-Jie funding_url: https://github.com/Fu-Jie/awesome-openwebui -version: 0.4.1 +version: 0.4.2 icon_url: data:image/svg+xml;base64,PHN2ZwogIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIKICB3aWR0aD0iMjQiCiAgaGVpZ2h0PSIyNCIKICB2aWV3Qm94PSIwIDAgMjQgMjQiCiAgZmlsbD0ibm9uZSIKICBzdHJva2U9ImN1cnJlbnRDb2xvciIKICBzdHJva2Utd2lkdGg9IjIiCiAgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIgogIHN0cm9rZS1saW5lam9pbj0icm91bmQiCj4KICA8cGF0aCBkPSJNNiAyMmEyIDIgMCAwIDEtMi0yVjRhMiAyIDAgMCAxIDItMmg4YTIuNCAyLjQgMCAwIDEgMS43MDQuNzA2bDMuNTg4IDMuNTg4QTIuNCAyLjQgMCAwIDEgMjAgOHYxMmEyIDIgMCAwIDEtMiAyeiIgLz4KICA8cGF0aCBkPSJNMTQgMnY1YTEgMSAwIDAgMCAxIDFoNSIgLz4KICA8cGF0aCBkPSJNMTAgOUg4IiAvPgogIDxwYXRoIGQ9Ik0xNiAxM0g4IiAvPgogIDxwYXRoIGQ9Ik0xNiAxN0g4IiAvPgo8L3N2Zz4K requirements: python-docx, Pygments, latex2mathml, mathml2omml description: Export current conversation from Markdown to Word (.docx) with Mermaid diagrams rendered client-side (Mermaid.js, SVG+PNG), LaTeX math, real hyperlinks, improved tables, syntax highlighting, and blockquote support. @@ -65,6 +65,16 @@ try: except Exception: LATEX_MATH_AVAILABLE = False +# boto3 for S3 direct access (faster than API fallback) +try: + import boto3 + from botocore.config import Config as BotoConfig + import os + + BOTO3_AVAILABLE = True +except ImportError: + BOTO3_AVAILABLE = False + logging.basicConfig( level=logging.INFO, @@ -290,6 +300,8 @@ class Action: self._bookmark_id_counter: int = 1 self._active_doc: Optional[Document] = None self._user_lang: str = "en" # Will be set per-request + self._api_token: Optional[str] = None + self._api_base_url: Optional[str] = None def _get_lang_key(self, user_language: str) -> str: """Convert user language code to i18n key (e.g., 'zh-CN' -> 'zh', 'en-US' -> 'en').""" @@ -349,6 +361,22 @@ class Action: # Get user language from Valves configuration self._user_lang = self._get_lang_key(self.valves.UI_LANGUAGE) + # Extract API connection info for file fetching (S3/Object Storage support) + def _get_default_base_url() -> str: + port = os.environ.get("PORT") or "8080" + return f"http://localhost:{port}" + + if __request__: + try: + self._api_token = __request__.headers.get("Authorization") + self._api_base_url = str(__request__.base_url).rstrip("/") + except Exception: + self._api_token = None + self._api_base_url = _get_default_base_url() + else: + self._api_token = None + self._api_base_url = _get_default_base_url() + if __event_emitter__: last_assistant_message = body["messages"][-1] @@ -1075,19 +1103,85 @@ class Action: b64 = m.group("b64") or "" return self._decode_base64_limited(b64, max_bytes) + def _read_from_s3(self, s3_path: str, max_bytes: int) -> Optional[bytes]: + """Read file directly from S3 using environment variables for credentials.""" + if not BOTO3_AVAILABLE: + return None + + # Parse s3://bucket/key + if not s3_path.startswith("s3://"): + return None + + path_without_prefix = s3_path[5:] # Remove 's3://' + parts = path_without_prefix.split("/", 1) + if len(parts) < 2: + return None + + bucket = parts[0] + key = parts[1] + + # Read S3 config from environment variables + endpoint_url = os.environ.get("S3_ENDPOINT_URL") + access_key = os.environ.get("S3_ACCESS_KEY_ID") + secret_key = os.environ.get("S3_SECRET_ACCESS_KEY") + addressing_style = os.environ.get("S3_ADDRESSING_STYLE", "auto") + + if not all([endpoint_url, access_key, secret_key]): + logger.debug( + "S3 environment variables not fully configured, skipping S3 direct download." + ) + return None + + try: + s3_config = BotoConfig( + s3={"addressing_style": addressing_style}, + connect_timeout=5, + read_timeout=15, + ) + s3_client = boto3.client( + "s3", + endpoint_url=endpoint_url, + aws_access_key_id=access_key, + aws_secret_access_key=secret_key, + config=s3_config, + ) + + response = s3_client.get_object(Bucket=bucket, Key=key) + body = response["Body"] + data = body.read(max_bytes + 1) + body.close() + + if len(data) > max_bytes: + return None + + return data + except Exception as e: + logger.warning(f"S3 direct download failed for {s3_path}: {e}") + return None + def _image_bytes_from_owui_file_id( self, file_id: str, max_bytes: int ) -> Optional[bytes]: - if not file_id or Files is None: - return None - try: - file_obj = Files.get_file_by_id(file_id) - except Exception: - return None - if not file_obj: + if not file_id: return None - # Common patterns across Open WebUI versions / storage backends. + if Files is None: + logger.error( + "Files model is not available (import failed). Cannot retrieve file content." + ) + return None + + try: + file_obj = Files.get_file_by_id(file_id) + except Exception as e: + logger.error(f"Files.get_file_by_id({file_id}) failed: {e}") + return None + + if not file_obj: + logger.warning(f"File {file_id} not found in database.") + return None + + # 1. Try data field (DB stored) data_field = getattr(file_obj, "data", None) if isinstance(data_field, dict): blob_value = data_field.get("bytes") @@ -1099,19 +1193,119 @@ class Action: if isinstance(inline, str) and inline.strip(): return self._decode_base64_limited(inline, max_bytes) + # 2. Try S3 direct download (fastest for object storage) + s3_path = getattr(file_obj, "path", None) + if isinstance(s3_path, str) and s3_path.startswith("s3://"): + s3_data = self._read_from_s3(s3_path, max_bytes) + if s3_data is not None: + return s3_data + + # 3. Try file paths (Disk stored) + # We try multiple path variations to be robust against CWD differences (e.g. Docker vs Local) for attr in ("path", "file_path", "absolute_path"): candidate = getattr(file_obj, attr, None) if isinstance(candidate, str) and candidate.strip(): - raw = self._read_file_bytes_limited(Path(candidate), max_bytes) + # Skip obviously non-local paths (S3, GCS, HTTP) + if re.match(r"^(s3://|gs://|https?://)", candidate, re.IGNORECASE): + logger.debug(f"Skipping local read for non-local path: {candidate}") + continue + + p = Path(candidate) + + # Attempt 1: As-is (Absolute or relative to CWD) + raw = self._read_file_bytes_limited(p, max_bytes) if raw is not None: return raw + # Attempt 2: Relative to ./data (Common in OpenWebUI) + if not p.is_absolute(): + try: + raw = self._read_file_bytes_limited( + Path("./data") / p, max_bytes + ) + if raw is not None: + return raw + except Exception: + pass + + # Attempt 3: Relative to /app/backend/data (Docker default) + try: + raw = self._read_file_bytes_limited( + Path("/app/backend/data") / p, max_bytes + ) + if raw is not None: + return raw + except Exception: + pass + + # 4. Try URL (Object Storage / S3 Public URL) + urls_to_try = [] + url_attr = getattr(file_obj, "url", None) + if isinstance(url_attr, str) and url_attr: + urls_to_try.append(url_attr) + + if isinstance(data_field, dict): + url_data = data_field.get("url") + if isinstance(url_data, str) and url_data: + urls_to_try.append(url_data) + + if urls_to_try: + import urllib.request + + for url in urls_to_try: + if not url.startswith(("http://", "https://")): + continue + try: + logger.info( + f"Attempting to download file {file_id} from URL: {url}" + ) + # Use a timeout to avoid hanging + req = urllib.request.Request( + url, headers={"User-Agent": "OpenWebUI-Export-Plugin"} + ) + with urllib.request.urlopen(req, timeout=15) as response: + if 200 <= response.status < 300: + data = response.read(max_bytes + 1) + if len(data) <= max_bytes: + return data + else: + logger.warning( + f"File {file_id} from URL is too large (> {max_bytes} bytes)" + ) + except Exception as e: + logger.warning(f"Failed to download {file_id} from {url}: {e}") + + # 5. Try fetching via Local API (Last resort for S3/Object Storage without direct URL) + # If we have the API token and base URL, we can try to fetch the content through the backend API. + if self._api_base_url: + api_url = f"{self._api_base_url}/api/v1/files/{file_id}/content" + try: + import urllib.request + + headers = {"User-Agent": "OpenWebUI-Export-Plugin"} + if self._api_token: + headers["Authorization"] = self._api_token + + req = urllib.request.Request(api_url, headers=headers) + with urllib.request.urlopen(req, timeout=15) as response: + if 200 <= response.status < 300: + data = response.read(max_bytes + 1) + if len(data) <= max_bytes: + return data + except Exception: + # API fetch failed, just fall through to the next method + pass + + # 6. Try direct content attributes (last ditch) for attr in ("content", "blob", "data"): raw = getattr(file_obj, attr, None) if isinstance(raw, (bytes, bytearray)): b = bytes(raw) return b if len(b) <= max_bytes else None + logger.warning( + f"File {file_id} found but no content accessible. Attributes: {dir(file_obj)}" + ) return None def _add_image_placeholder(self, paragraph, alt: str, reason: str): diff --git a/plugins/actions/export_to_docx/export_to_word_cn.py b/plugins/actions/export_to_docx/export_to_word_cn.py index af90a3c..0205a82 100644 --- a/plugins/actions/export_to_docx/export_to_word_cn.py +++ b/plugins/actions/export_to_docx/export_to_word_cn.py @@ -3,7 +3,7 @@ title: 导出为 Word (增强版) author: Fu-Jie author_url: https://github.com/Fu-Jie funding_url: https://github.com/Fu-Jie/awesome-openwebui -version: 0.4.1 +version: 0.4.2 icon_url: data:image/svg+xml;base64,PHN2ZwogIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIKICB3aWR0aD0iMjQiCiAgaGVpZ2h0PSIyNCIKICB2aWV3Qm94PSIwIDAgMjQgMjQiCiAgZmlsbD0ibm9uZSIKICBzdHJva2U9ImN1cnJlbnRDb2xvciIKICBzdHJva2Utd2lkdGg9IjIiCiAgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIgogIHN0cm9rZS1saW5lam9pbj0icm91bmQiCj4KICA8cGF0aCBkPSJNNiAyMmEyIDIgMCAwIDEtMi0yVjRhMiAyIDAgMCAxIDItMmg4YTIuNCAyLjQgMCAwIDEgMS43MDQuNzA2bDMuNTg4IDMuNTg4QTIuNCAyLjQgMCAwIDEgMjAgOHYxMmEyIDIgMCAwIDEtMiAyeiIgLz4KICA8cGF0aCBkPSJNMTQgMnY1YTEgMSAwIDAgMCAxIDFoNSIgLz4KICA8cGF0aCBkPSJNMTAgOUg4IiAvPgogIDxwYXRoIGQ9Ik0xNiAxM0g4IiAvPgogIDxwYXRoIGQ9Ik0xNiAxN0g4IiAvPgo8L3N2Zz4K requirements: python-docx, Pygments, latex2mathml, mathml2omml description: 将对话导出为 Word (.docx),支持 Mermaid 图表 (客户端渲染 SVG+PNG)、LaTeX 数学公式、真实超链接、增强表格格式、代码高亮和引用块。 @@ -65,6 +65,16 @@ try: except Exception: LATEX_MATH_AVAILABLE = False +# boto3 for S3 direct access (faster than API fallback) +try: + import boto3 + from botocore.config import Config as BotoConfig + import os + + BOTO3_AVAILABLE = True +except ImportError: + BOTO3_AVAILABLE = False + logging.basicConfig( level=logging.INFO, @@ -290,6 +300,8 @@ class Action: self._bookmark_id_counter: int = 1 self._active_doc: Optional[Document] = None self._user_lang: str = "en" # Will be set per-request + self._api_token: Optional[str] = None + self._api_base_url: Optional[str] = None def _get_lang_key(self, user_language: str) -> str: """Convert user language code to i18n key (e.g., 'zh-CN' -> 'zh', 'en-US' -> 'en').""" @@ -347,6 +359,22 @@ class Action: # Get user language from Valves configuration self._user_lang = self._get_lang_key(self.valves.界面语言) + # Extract API connection info for file fetching (S3/Object Storage support) + def _get_default_base_url() -> str: + port = os.environ.get("PORT") or "8080" + return f"http://localhost:{port}" + + if __request__: + try: + self._api_token = __request__.headers.get("Authorization") + self._api_base_url = str(__request__.base_url).rstrip("/") + except Exception: + self._api_token = None + self._api_base_url = _get_default_base_url() + else: + self._api_token = None + self._api_base_url = _get_default_base_url() + if __event_emitter__: last_assistant_message = body["messages"][-1] @@ -1073,19 +1101,85 @@ class Action: b64 = m.group("b64") or "" return self._decode_base64_limited(b64, max_bytes) + def _read_from_s3(self, s3_path: str, max_bytes: int) -> Optional[bytes]: + """Read file directly from S3 using environment variables for credentials.""" + if not BOTO3_AVAILABLE: + return None + + # Parse s3://bucket/key + if not s3_path.startswith("s3://"): + return None + + path_without_prefix = s3_path[5:] # Remove 's3://' + parts = path_without_prefix.split("/", 1) + if len(parts) < 2: + return None + + bucket = parts[0] + key = parts[1] + + # Read S3 config from environment variables + endpoint_url = os.environ.get("S3_ENDPOINT_URL") + access_key = os.environ.get("S3_ACCESS_KEY_ID") + secret_key = os.environ.get("S3_SECRET_ACCESS_KEY") + addressing_style = os.environ.get("S3_ADDRESSING_STYLE", "auto") + + if not all([endpoint_url, access_key, secret_key]): + logger.debug( + "S3 environment variables not fully configured, skipping S3 direct download." + ) + return None + + try: + s3_config = BotoConfig( + s3={"addressing_style": addressing_style}, + connect_timeout=5, + read_timeout=15, + ) + s3_client = boto3.client( + "s3", + endpoint_url=endpoint_url, + aws_access_key_id=access_key, + aws_secret_access_key=secret_key, + config=s3_config, + ) + + response = s3_client.get_object(Bucket=bucket, Key=key) + body = response["Body"] + data = body.read(max_bytes + 1) + body.close() + + if len(data) > max_bytes: + return None + + return data + except Exception as e: + logger.warning(f"S3 direct download failed for {s3_path}: {e}") + return None + def _image_bytes_from_owui_file_id( self, file_id: str, max_bytes: int ) -> Optional[bytes]: - if not file_id or Files is None: - return None - try: - file_obj = Files.get_file_by_id(file_id) - except Exception: - return None - if not file_obj: + if not file_id: return None - # Common patterns across Open WebUI versions / storage backends. + if Files is None: + logger.error( + "Files model is not available (import failed). Cannot retrieve file content." + ) + return None + + try: + file_obj = Files.get_file_by_id(file_id) + except Exception as e: + logger.error(f"Files.get_file_by_id({file_id}) failed: {e}") + return None + + if not file_obj: + logger.warning(f"File {file_id} not found in database.") + return None + + # 1. Try data field (DB stored) data_field = getattr(file_obj, "data", None) if isinstance(data_field, dict): blob_value = data_field.get("bytes") @@ -1097,19 +1191,119 @@ class Action: if isinstance(inline, str) and inline.strip(): return self._decode_base64_limited(inline, max_bytes) + # 2. Try S3 direct download (fastest for object storage) + s3_path = getattr(file_obj, "path", None) + if isinstance(s3_path, str) and s3_path.startswith("s3://"): + s3_data = self._read_from_s3(s3_path, max_bytes) + if s3_data is not None: + return s3_data + + # 3. Try file paths (Disk stored) + # We try multiple path variations to be robust against CWD differences (e.g. Docker vs Local) for attr in ("path", "file_path", "absolute_path"): candidate = getattr(file_obj, attr, None) if isinstance(candidate, str) and candidate.strip(): - raw = self._read_file_bytes_limited(Path(candidate), max_bytes) + # Skip obviously non-local paths (S3, GCS, HTTP) + if re.match(r"^(s3://|gs://|https?://)", candidate, re.IGNORECASE): + logger.debug(f"Skipping local read for non-local path: {candidate}") + continue + + p = Path(candidate) + + # Attempt 1: As-is (Absolute or relative to CWD) + raw = self._read_file_bytes_limited(p, max_bytes) if raw is not None: return raw + # Attempt 2: Relative to ./data (Common in OpenWebUI) + if not p.is_absolute(): + try: + raw = self._read_file_bytes_limited( + Path("./data") / p, max_bytes + ) + if raw is not None: + return raw + except Exception: + pass + + # Attempt 3: Relative to /app/backend/data (Docker default) + try: + raw = self._read_file_bytes_limited( + Path("/app/backend/data") / p, max_bytes + ) + if raw is not None: + return raw + except Exception: + pass + + # 4. Try URL (Object Storage / S3 Public URL) + urls_to_try = [] + url_attr = getattr(file_obj, "url", None) + if isinstance(url_attr, str) and url_attr: + urls_to_try.append(url_attr) + + if isinstance(data_field, dict): + url_data = data_field.get("url") + if isinstance(url_data, str) and url_data: + urls_to_try.append(url_data) + + if urls_to_try: + import urllib.request + + for url in urls_to_try: + if not url.startswith(("http://", "https://")): + continue + try: + logger.info( + f"Attempting to download file {file_id} from URL: {url}" + ) + # Use a timeout to avoid hanging + req = urllib.request.Request( + url, headers={"User-Agent": "OpenWebUI-Export-Plugin"} + ) + with urllib.request.urlopen(req, timeout=15) as response: + if 200 <= response.status < 300: + data = response.read(max_bytes + 1) + if len(data) <= max_bytes: + return data + else: + logger.warning( + f"File {file_id} from URL is too large (> {max_bytes} bytes)" + ) + except Exception as e: + logger.warning(f"Failed to download {file_id} from {url}: {e}") + + # 5. Try fetching via Local API (Last resort for S3/Object Storage without direct URL) + # If we have the API token and base URL, we can try to fetch the content through the backend API. + if self._api_base_url: + api_url = f"{self._api_base_url}/api/v1/files/{file_id}/content" + try: + import urllib.request + + headers = {"User-Agent": "OpenWebUI-Export-Plugin"} + if self._api_token: + headers["Authorization"] = self._api_token + + req = urllib.request.Request(api_url, headers=headers) + with urllib.request.urlopen(req, timeout=15) as response: + if 200 <= response.status < 300: + data = response.read(max_bytes + 1) + if len(data) <= max_bytes: + return data + except Exception: + # API fetch failed, just fall through to the next method + pass + + # 6. Try direct content attributes (last ditch) for attr in ("content", "blob", "data"): raw = getattr(file_obj, attr, None) if isinstance(raw, (bytes, bytearray)): b = bytes(raw) return b if len(b) <= max_bytes else None + logger.warning( + f"File {file_id} found but no content accessible. Attributes: {dir(file_obj)}" + ) return None def _add_image_placeholder(self, paragraph, alt: str, reason: str): diff --git a/plugins/actions/infographic/README.md b/plugins/actions/infographic/README.md index a03cf59..51799bc 100644 --- a/plugins/actions/infographic/README.md +++ b/plugins/actions/infographic/README.md @@ -1,6 +1,6 @@ # 📊 Smart Infographic (AntV) -**Author:** [jeff](https://github.com/Fu-Jie) | **Version:** 1.4.0 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) +**Author:** [Fu-Jie](https://github.com/Fu-Jie) | **Version:** 1.4.0 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) An Open WebUI plugin powered by the AntV Infographic engine. It transforms long text into professional, beautiful infographics with a single click.