docs: sync markdown_normalizer 1.2.2

This commit is contained in:
fujie
2026-01-17 18:52:30 +08:00
parent e51d87ae80
commit 3b11537b5e
6 changed files with 381 additions and 156 deletions

View File

@@ -1,11 +1,12 @@
# Markdown Normalizer Filter # Markdown Normalizer Filter
A production-grade content normalizer filter for Open WebUI that fixes common Markdown formatting issues in LLM outputs. It ensures that code blocks, LaTeX formulas, Mermaid diagrams, and other Markdown elements are rendered correctly. A content normalizer filter for Open WebUI that fixes common Markdown formatting issues in LLM outputs. It ensures that code blocks, LaTeX formulas, Mermaid diagrams, and other Markdown elements are rendered correctly.
## Features ## Features
* **Details Tag Normalization**: Ensures proper spacing for `<details>` tags (used for thought chains). Adds a blank line after `</details>` and ensures a newline after self-closing `<details />` tags to prevent rendering issues. * **Details Tag Normalization**: Ensures proper spacing for `<details>` tags (used for thought chains). Adds a blank line after `</details>` and ensures a newline after self-closing `<details />` tags to prevent rendering issues.
* **Mermaid Syntax Fix**: Automatically fixes common Mermaid syntax errors, such as unquoted node labels (including multi-line labels and citations) and unclosed subgraphs, ensuring diagrams render correctly. * **Emphasis Spacing Fix**: Fixes extra spaces inside emphasis markers (e.g., `** text **` -> `**text**`) which can cause rendering failures. Includes safeguards to protect math expressions (e.g., `2 * 3 * 4`) and list variables.
* **Mermaid Syntax Fix**: Automatically fixes common Mermaid syntax errors, such as unquoted node labels (including multi-line labels and citations) and unclosed subgraphs. **New in v1.1.2**: Comprehensive protection for edge labels (text on connecting lines) across all link types (solid, dotted, thick).
* **Frontend Console Debugging**: Supports printing structured debug logs directly to the browser console (F12) for easier troubleshooting. * **Frontend Console Debugging**: Supports printing structured debug logs directly to the browser console (F12) for easier troubleshooting.
* **Code Block Formatting**: Fixes broken code block prefixes, suffixes, and indentation. * **Code Block Formatting**: Fixes broken code block prefixes, suffixes, and indentation.
* **LaTeX Normalization**: Standardizes LaTeX formula delimiters (`\[` -> `$$`, `\(` -> `$`). * **LaTeX Normalization**: Standardizes LaTeX formula delimiters (`\[` -> `$$`, `\(` -> `$`).
@@ -40,9 +41,46 @@ A production-grade content normalizer filter for Open WebUI that fixes common Ma
* `enable_heading_fix`: Fix missing space in headings. * `enable_heading_fix`: Fix missing space in headings.
* `enable_table_fix`: Fix missing closing pipe in tables. * `enable_table_fix`: Fix missing closing pipe in tables.
* `enable_xml_tag_cleanup`: Cleanup leftover XML tags. * `enable_xml_tag_cleanup`: Cleanup leftover XML tags.
* `enable_emphasis_spacing_fix`: Fix extra spaces in emphasis (default: True).
* `show_status`: Show status notification when fixes are applied. * `show_status`: Show status notification when fixes are applied.
* `show_debug_log`: Print debug logs to browser console. * `show_debug_log`: Print debug logs to browser console.
## Troubleshooting ❓
* **Submit an Issue**: If you encounter any problems, please submit an issue on GitHub: [Awesome OpenWebUI Issues](https://github.com/Fu-Jie/awesome-openwebui/issues)
## Changelog
### v1.2.2
* **Version Bump**: Documentation and metadata updated for the latest release.
### v1.2.1
* **Emphasis Spacing Fix**: Added a new fix for extra spaces inside emphasis markers (e.g., `** text **` -> `**text**`).
* Uses a recursive approach to handle nested emphasis (e.g., `**bold _italic _**`).
* Includes safeguards to prevent modifying math expressions (e.g., `2 * 3 * 4`) or list variables.
* Controlled by the `enable_emphasis_spacing_fix` valve (default: True).
### v1.2.0
* **Details Tag Support**: Added normalization for `<details>` tags.
* Ensures a blank line is added after `</details>` closing tags to separate thought content from the main response.
* Ensures a newline is added after self-closing `<details ... />` tags to prevent them from interfering with subsequent Markdown headings (e.g., fixing `<details/>#Heading`).
* Includes safeguard to prevent modification of `<details>` tags inside code blocks.
### v1.1.2
* **Mermaid Edge Label Protection**: Implemented comprehensive protection for edge labels (text on connecting lines) to prevent them from being incorrectly modified. Now supports all Mermaid link types including solid (`--`), dotted (`-.`), and thick (`==`) lines with or without arrows.
* **Bug Fixes**: Fixed an issue where lines without arrows (e.g., `A -- text --- B`) were not correctly protected.
### v1.1.0
* **Mermaid Fix Refinement**: Improved regex to handle nested parentheses in node labels (e.g., `ID("Label (text)")`) and avoided matching connection labels.
* **HTML Safeguard Optimization**: Refined `_contains_html` to allow common tags like `<br/>`, `<b>`, `<i>`, etc., ensuring Mermaid diagrams with these tags are still normalized.
* **Full-width Symbol Cleanup**: Fixed duplicate keys and incorrect quote mapping in `FULLWIDTH_MAP`.
* **Bug Fixes**: Fixed missing `Dict` import in Python files.
## License ## License
MIT MIT

View File

@@ -1,11 +1,12 @@
# Markdown 格式化过滤器 (Markdown Normalizer) # Markdown 格式化过滤器 (Markdown Normalizer)
这是一个用于 Open WebUI 的生产级内容格式化过滤器,旨在修复 LLM 输出中常见的 Markdown 格式问题。它能确保代码块、LaTeX 公式、Mermaid 图表和其他 Markdown 元素被正确渲染。 这是一个用于 Open WebUI 的内容格式化过滤器,旨在修复 LLM 输出中常见的 Markdown 格式问题。它能确保代码块、LaTeX 公式、Mermaid 图表和其他 Markdown 元素被正确渲染。
## 功能特性 ## 功能特性
* **Details 标签规范化**: 确保 `<details>` 标签(常用于思维链)有正确的间距。在 `</details>` 后添加空行,并在自闭合 `<details />` 标签后添加换行,防止渲染问题。 * **Details 标签规范化**: 确保 `<details>` 标签(常用于思维链)有正确的间距。在 `</details>` 后添加空行,并在自闭合 `<details />` 标签后添加换行,防止渲染问题。
* **Mermaid 语法修复**: 自动修复常见的 Mermaid 语法错误,如未加引号的节点标签(支持多行标签和引用标记)和未闭合的子图 (Subgraph),确保图表能正确渲染 * **强调空格修复**: 修复强调标记内部的多余空格(例如 `** 文本 **` -> `**文本**`),这会导致 Markdown 渲染失败。包含保护机制,防止误修改数学表达式(如 `2 * 3 * 4`)或列表变量
* **Mermaid 语法修复**: 自动修复常见的 Mermaid 语法错误,如未加引号的节点标签(支持多行标签和引用标记)和未闭合的子图 (Subgraph)。**v1.1.2 新增**: 全面保护各种类型的连线标签(实线、虚线、粗线),防止被误修改。
* **前端控制台调试**: 支持将结构化的调试日志直接打印到浏览器控制台 (F12),方便排查问题。 * **前端控制台调试**: 支持将结构化的调试日志直接打印到浏览器控制台 (F12),方便排查问题。
* **代码块格式化**: 修复破损的代码块前缀、后缀和缩进问题。 * **代码块格式化**: 修复破损的代码块前缀、后缀和缩进问题。
* **LaTeX 规范化**: 标准化 LaTeX 公式定界符 (`\[` -> `$$`, `\(` -> `$`)。 * **LaTeX 规范化**: 标准化 LaTeX 公式定界符 (`\[` -> `$$`, `\(` -> `$`)。
@@ -40,9 +41,46 @@
* `enable_heading_fix`: 修复标题中缺失的空格。 * `enable_heading_fix`: 修复标题中缺失的空格。
* `enable_table_fix`: 修复表格中缺失的闭合管道符。 * `enable_table_fix`: 修复表格中缺失的闭合管道符。
* `enable_xml_tag_cleanup`: 清理残留的 XML 标签。 * `enable_xml_tag_cleanup`: 清理残留的 XML 标签。
* `enable_emphasis_spacing_fix`: 修复强调语法中的多余空格 (默认: True)。
* `show_status`: 应用修复时显示状态通知。 * `show_status`: 应用修复时显示状态通知。
* `show_debug_log`: 在浏览器控制台打印调试日志。 * `show_debug_log`: 在浏览器控制台打印调试日志。
## 故障排除 (Troubleshooting) ❓
* **提交 Issue**: 如果遇到任何问题,请在 GitHub 上提交 Issue[Awesome OpenWebUI Issues](https://github.com/Fu-Jie/awesome-openwebui/issues)
## 更新日志
### v1.2.2
* **版本更新**: 文档与元数据已同步到最新版本。
### v1.2.1
* **强调空格修复**: 新增了对强调标记内部多余空格的修复(例如 `** 文本 **` -> `**文本**`)。
* 采用递归方法处理嵌套强调(例如 `**加粗 _斜体 _**`)。
* 包含保护机制,防止误修改数学表达式(如 `2 * 3 * 4`)或列表变量。
* 通过 `enable_emphasis_spacing_fix` 开关控制(默认:开启)。
### v1.2.0
* **Details 标签支持**: 新增了对 `<details>` 标签的规范化支持。
* 确保在 `</details>` 闭合标签后添加空行,将思维内容与正文分隔开。
* 确保在自闭合 `<details ... />` 标签后添加换行,防止其干扰后续的 Markdown 标题(例如修复 `<details/>#标题`)。
* 包含保护机制,防止修改代码块内部的 `<details>` 标签。
### v1.1.2
* **Mermaid 连线标签保护**: 实现了全面的连线标签保护机制,防止连接线上的文字被误修改。现在支持所有 Mermaid 连线类型,包括实线 (`--`)、虚线 (`-.`) 和粗线 (`==`),无论是否带有箭头。
* **Bug 修复**: 修复了无箭头连线(如 `A -- text --- B`)未被正确保护的问题。
### v1.1.0
* **Mermaid 修复优化**: 改进了正则表达式以处理节点标签中的嵌套括号(如 `ID("标签 (文本)")`),并避免误匹配连接线上的文字。
* **HTML 保护机制优化**: 优化了 `_contains_html` 检测,允许 `<br/>`, `<b>`, `<i>` 等常见标签,确保包含这些标签的 Mermaid 图表能被正常规范化。
* **全角符号清理**: 修复了 `FULLWIDTH_MAP` 中的重复键名和错误的引号映射。
* **Bug 修复**: 修复了 Python 文件中缺失的 `Dict` 类型导入。
## 许可证 ## 许可证
MIT MIT

View File

@@ -1,12 +1,13 @@
# Markdown Normalizer Filter # Markdown Normalizer Filter
**Author:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **Version:** 1.2.0 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **License:** MIT **Author:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **Version:** 1.2.2 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **License:** MIT
A content normalizer filter for Open WebUI that fixes common Markdown formatting issues in LLM outputs. It ensures that code blocks, LaTeX formulas, Mermaid diagrams, and other Markdown elements are rendered correctly. A content normalizer filter for Open WebUI that fixes common Markdown formatting issues in LLM outputs. It ensures that code blocks, LaTeX formulas, Mermaid diagrams, and other Markdown elements are rendered correctly.
## Features ## Features
* **Details Tag Normalization**: Ensures proper spacing for `<details>` tags (used for thought chains). Adds a blank line after `</details>` and ensures a newline after self-closing `<details />` tags to prevent rendering issues. * **Details Tag Normalization**: Ensures proper spacing for `<details>` tags (used for thought chains). Adds a blank line after `</details>` and ensures a newline after self-closing `<details />` tags to prevent rendering issues.
* **Emphasis Spacing Fix**: Fixes extra spaces inside emphasis markers (e.g., `** text **` -> `**text**`) which can cause rendering failures. Includes safeguards to protect math expressions (e.g., `2 * 3 * 4`) and list variables.
* **Mermaid Syntax Fix**: Automatically fixes common Mermaid syntax errors, such as unquoted node labels (including multi-line labels and citations) and unclosed subgraphs. **New in v1.1.2**: Comprehensive protection for edge labels (text on connecting lines) across all link types (solid, dotted, thick). * **Mermaid Syntax Fix**: Automatically fixes common Mermaid syntax errors, such as unquoted node labels (including multi-line labels and citations) and unclosed subgraphs. **New in v1.1.2**: Comprehensive protection for edge labels (text on connecting lines) across all link types (solid, dotted, thick).
* **Frontend Console Debugging**: Supports printing structured debug logs directly to the browser console (F12) for easier troubleshooting. * **Frontend Console Debugging**: Supports printing structured debug logs directly to the browser console (F12) for easier troubleshooting.
* **Code Block Formatting**: Fixes broken code block prefixes, suffixes, and indentation. * **Code Block Formatting**: Fixes broken code block prefixes, suffixes, and indentation.
@@ -42,28 +43,42 @@ A content normalizer filter for Open WebUI that fixes common Markdown formatting
* `enable_heading_fix`: Fix missing space in headings. * `enable_heading_fix`: Fix missing space in headings.
* `enable_table_fix`: Fix missing closing pipe in tables. * `enable_table_fix`: Fix missing closing pipe in tables.
* `enable_xml_tag_cleanup`: Cleanup leftover XML tags. * `enable_xml_tag_cleanup`: Cleanup leftover XML tags.
* `enable_emphasis_spacing_fix`: Fix extra spaces in emphasis (default: True).
* `show_status`: Show status notification when fixes are applied. * `show_status`: Show status notification when fixes are applied.
* `show_debug_log`: Print debug logs to browser console. * `show_debug_log`: Print debug logs to browser console.
## Troubleshooting ❓ ## Troubleshooting ❓
- **Submit an Issue**: If you encounter any problems, please submit an issue on GitHub: [Awesome OpenWebUI Issues](https://github.com/Fu-Jie/awesome-openwebui/issues) * **Submit an Issue**: If you encounter any problems, please submit an issue on GitHub: [Awesome OpenWebUI Issues](https://github.com/Fu-Jie/awesome-openwebui/issues)
## Changelog ## Changelog
### v1.2.2
* **Version Bump**: Documentation and metadata updated for the latest release.
### v1.2.1
* **Emphasis Spacing Fix**: Added a new fix for extra spaces inside emphasis markers (e.g., `** text **` -> `**text**`).
* Uses a recursive approach to handle nested emphasis (e.g., `**bold _italic _**`).
* Includes safeguards to prevent modifying math expressions (e.g., `2 * 3 * 4`) or list variables.
* Controlled by the `enable_emphasis_spacing_fix` valve (default: True).
### v1.2.0 ### v1.2.0
* **Details Tag Support**: Added normalization for `<details>` tags. * **Details Tag Support**: Added normalization for `<details>` tags.
* Ensures a blank line is added after `</details>` closing tags to separate thought content from the main response. * Ensures a blank line is added after `</details>` closing tags to separate thought content from the main response.
* Ensures a newline is added after self-closing `<details ... />` tags to prevent them from interfering with subsequent Markdown headings (e.g., fixing `<details/>#Heading`). * Ensures a newline is added after self-closing `<details ... />` tags to prevent them from interfering with subsequent Markdown headings (e.g., fixing `<details/>#Heading`).
* Includes safeguard to prevent modification of `<details>` tags inside code blocks. * Includes safeguard to prevent modification of `<details>` tags inside code blocks.
### v1.1.2 ### v1.1.2
* **Mermaid Edge Label Protection**: Implemented comprehensive protection for edge labels (text on connecting lines) to prevent them from being incorrectly modified. Now supports all Mermaid link types including solid (`--`), dotted (`-.`), and thick (`==`) lines with or without arrows. * **Mermaid Edge Label Protection**: Implemented comprehensive protection for edge labels (text on connecting lines) to prevent them from being incorrectly modified. Now supports all Mermaid link types including solid (`--`), dotted (`-.`), and thick (`==`) lines with or without arrows.
* **Bug Fixes**: Fixed an issue where lines without arrows (e.g., `A -- text --- B`) were not correctly protected. * **Bug Fixes**: Fixed an issue where lines without arrows (e.g., `A -- text --- B`) were not correctly protected.
### v1.1.0 ### v1.1.0
* **Mermaid Fix Refinement**: Improved regex to handle nested parentheses in node labels (e.g., `ID("Label (text)")`) and avoided matching connection labels. * **Mermaid Fix Refinement**: Improved regex to handle nested parentheses in node labels (e.g., `ID("Label (text)")`) and avoided matching connection labels.
* **HTML Safeguard Optimization**: Refined `_contains_html` to allow common tags like `<br/>`, `<b>`, `<i>`, etc., ensuring Mermaid diagrams with these tags are still normalized. * **HTML Safeguard Optimization**: Refined `_contains_html` to allow common tags like `<br/>`, `<b>`, `<i>`, etc., ensuring Mermaid diagrams with these tags are still normalized.
* **Full-width Symbol Cleanup**: Fixed duplicate keys and incorrect quote mapping in `FULLWIDTH_MAP`. * **Full-width Symbol Cleanup**: Fixed duplicate keys and incorrect quote mapping in `FULLWIDTH_MAP`.
* **Bug Fixes**: Fixed missing `Dict` import in Python files. * **Bug Fixes**: Fixed missing `Dict` import in Python files.

View File

@@ -1,12 +1,13 @@
# Markdown 格式化过滤器 (Markdown Normalizer) # Markdown 格式化过滤器 (Markdown Normalizer)
**作者:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **版本:** 1.2.0 | **项目:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **许可证:** MIT **作者:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **版本:** 1.2.2 | **项目:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **许可证:** MIT
这是一个用于 Open WebUI 的内容格式化过滤器,旨在修复 LLM 输出中常见的 Markdown 格式问题。它能确保代码块、LaTeX 公式、Mermaid 图表和其他 Markdown 元素被正确渲染。 这是一个用于 Open WebUI 的内容格式化过滤器,旨在修复 LLM 输出中常见的 Markdown 格式问题。它能确保代码块、LaTeX 公式、Mermaid 图表和其他 Markdown 元素被正确渲染。
## 功能特性 ## 功能特性
* **Details 标签规范化**: 确保 `<details>` 标签(常用于思维链)有正确的间距。在 `</details>` 后添加空行,并在自闭合 `<details />` 标签后添加换行,防止渲染问题。 * **Details 标签规范化**: 确保 `<details>` 标签(常用于思维链)有正确的间距。在 `</details>` 后添加空行,并在自闭合 `<details />` 标签后添加换行,防止渲染问题。
* **强调空格修复**: 修复强调标记内部的多余空格(例如 `** 文本 **` -> `**文本**`),这会导致 Markdown 渲染失败。包含保护机制,防止误修改数学表达式(如 `2 * 3 * 4`)或列表变量。
* **Mermaid 语法修复**: 自动修复常见的 Mermaid 语法错误,如未加引号的节点标签(支持多行标签和引用标记)和未闭合的子图 (Subgraph)。**v1.1.2 新增**: 全面保护各种类型的连线标签(实线、虚线、粗线),防止被误修改。 * **Mermaid 语法修复**: 自动修复常见的 Mermaid 语法错误,如未加引号的节点标签(支持多行标签和引用标记)和未闭合的子图 (Subgraph)。**v1.1.2 新增**: 全面保护各种类型的连线标签(实线、虚线、粗线),防止被误修改。
* **前端控制台调试**: 支持将结构化的调试日志直接打印到浏览器控制台 (F12),方便排查问题。 * **前端控制台调试**: 支持将结构化的调试日志直接打印到浏览器控制台 (F12),方便排查问题。
* **代码块格式化**: 修复破损的代码块前缀、后缀和缩进问题。 * **代码块格式化**: 修复破损的代码块前缀、后缀和缩进问题。
@@ -42,28 +43,42 @@
* `enable_heading_fix`: 修复标题中缺失的空格。 * `enable_heading_fix`: 修复标题中缺失的空格。
* `enable_table_fix`: 修复表格中缺失的闭合管道符。 * `enable_table_fix`: 修复表格中缺失的闭合管道符。
* `enable_xml_tag_cleanup`: 清理残留的 XML 标签。 * `enable_xml_tag_cleanup`: 清理残留的 XML 标签。
* `enable_emphasis_spacing_fix`: 修复强调语法中的多余空格 (默认: True)。
* `show_status`: 应用修复时显示状态通知。 * `show_status`: 应用修复时显示状态通知。
* `show_debug_log`: 在浏览器控制台打印调试日志。 * `show_debug_log`: 在浏览器控制台打印调试日志。
## 故障排除 (Troubleshooting) ❓ ## 故障排除 (Troubleshooting) ❓
- **提交 Issue**: 如果遇到任何问题,请在 GitHub 上提交 Issue[Awesome OpenWebUI Issues](https://github.com/Fu-Jie/awesome-openwebui/issues) * **提交 Issue**: 如果遇到任何问题,请在 GitHub 上提交 Issue[Awesome OpenWebUI Issues](https://github.com/Fu-Jie/awesome-openwebui/issues)
## 更新日志 ## 更新日志
### v1.2.2
* **版本更新**: 文档与元数据已同步到最新版本。
### v1.2.1
* **强调空格修复**: 新增了对强调标记内部多余空格的修复(例如 `** 文本 **` -> `**文本**`)。
* 采用递归方法处理嵌套强调(例如 `**加粗 _斜体 _**`)。
* 包含保护机制,防止误修改数学表达式(如 `2 * 3 * 4`)或列表变量。
* 通过 `enable_emphasis_spacing_fix` 开关控制(默认:开启)。
### v1.2.0 ### v1.2.0
* **Details 标签支持**: 新增了对 `<details>` 标签的规范化支持。 * **Details 标签支持**: 新增了对 `<details>` 标签的规范化支持。
* 确保在 `</details>` 闭合标签后添加空行,将思维内容与正文分隔开。 * 确保在 `</details>` 闭合标签后添加空行,将思维内容与正文分隔开。
* 确保在自闭合 `<details ... />` 标签后添加换行,防止其干扰后续的 Markdown 标题(例如修复 `<details/>#标题`)。 * 确保在自闭合 `<details ... />` 标签后添加换行,防止其干扰后续的 Markdown 标题(例如修复 `<details/>#标题`)。
* 包含保护机制,防止修改代码块内部的 `<details>` 标签。 * 包含保护机制,防止修改代码块内部的 `<details>` 标签。
### v1.1.2 ### v1.1.2
* **Mermaid 连线标签保护**: 实现了全面的连线标签保护机制,防止连接线上的文字被误修改。现在支持所有 Mermaid 连线类型,包括实线 (`--`)、虚线 (`-.`) 和粗线 (`==`),无论是否带有箭头。 * **Mermaid 连线标签保护**: 实现了全面的连线标签保护机制,防止连接线上的文字被误修改。现在支持所有 Mermaid 连线类型,包括实线 (`--`)、虚线 (`-.`) 和粗线 (`==`),无论是否带有箭头。
* **Bug 修复**: 修复了无箭头连线(如 `A -- text --- B`)未被正确保护的问题。 * **Bug 修复**: 修复了无箭头连线(如 `A -- text --- B`)未被正确保护的问题。
### v1.1.0 ### v1.1.0
* **Mermaid 修复优化**: 改进了正则表达式以处理节点标签中的嵌套括号(如 `ID("标签 (文本)")`),并避免误匹配连接线上的文字。 * **Mermaid 修复优化**: 改进了正则表达式以处理节点标签中的嵌套括号(如 `ID("标签 (文本)")`),并避免误匹配连接线上的文字。
* **HTML 保护机制优化**: 优化了 `_contains_html` 检测,允许 `<br/>`, `<b>`, `<i>` 等常见标签,确保包含这些标签的 Mermaid 图表能被正常规范化。 * **HTML 保护机制优化**: 优化了 `_contains_html` 检测,允许 `<br/>`, `<b>`, `<i>` 等常见标签,确保包含这些标签的 Mermaid 图表能被正常规范化。
* **全角符号清理**: 修复了 `FULLWIDTH_MAP` 中的重复键名和错误的引号映射。 * **全角符号清理**: 修复了 `FULLWIDTH_MAP` 中的重复键名和错误的引号映射。
* **Bug 修复**: 修复了 Python 文件中缺失的 `Dict` 类型导入。 * **Bug 修复**: 修复了 Python 文件中缺失的 `Dict` 类型导入。

View File

@@ -3,7 +3,7 @@ title: Markdown Normalizer
author: Fu-Jie author: Fu-Jie
author_url: https://github.com/Fu-Jie/awesome-openwebui author_url: https://github.com/Fu-Jie/awesome-openwebui
funding_url: https://github.com/open-webui funding_url: https://github.com/open-webui
version: 1.2.0 version: 1.2.2
openwebui_id: baaa8732-9348-40b7-8359-7e009660e23c openwebui_id: baaa8732-9348-40b7-8359-7e009660e23c
description: A content normalizer filter that fixes common Markdown formatting issues in LLM outputs, such as broken code blocks, LaTeX formulas, and list formatting. description: A content normalizer filter that fixes common Markdown formatting issues in LLM outputs, such as broken code blocks, LaTeX formulas, and list formatting.
""" """
@@ -43,6 +43,7 @@ class NormalizerConfig:
) )
enable_table_fix: bool = True # Fix missing closing pipe in tables enable_table_fix: bool = True # Fix missing closing pipe in tables
enable_xml_tag_cleanup: bool = True # Cleanup leftover XML tags enable_xml_tag_cleanup: bool = True # Cleanup leftover XML tags
enable_emphasis_spacing_fix: bool = True # Fix spaces inside **emphasis**
# Custom cleaner functions (for advanced extension) # Custom cleaner functions (for advanced extension)
custom_cleaners: List[Callable[[str], str]] = field(default_factory=list) custom_cleaners: List[Callable[[str], str]] = field(default_factory=list)
@@ -53,8 +54,8 @@ class ContentNormalizer:
# --- 1. Pre-compiled Regex Patterns (Performance Optimization) --- # --- 1. Pre-compiled Regex Patterns (Performance Optimization) ---
_PATTERNS = { _PATTERNS = {
# Code block prefix: if ``` is not at start of line or file # Code block prefix: if ``` is not at start of line (ignoring whitespace)
"code_block_prefix": re.compile(r"(?<!^)(?<!\n)(```)", re.MULTILINE), "code_block_prefix": re.compile(r"(\S[ \t]*)(```)"),
# Code block suffix: ```lang followed by non-whitespace (no newline) # Code block suffix: ```lang followed by non-whitespace (no newline)
"code_block_suffix": re.compile(r"(```[\w\+\-\.]*)[ \t]+([^\n\r])"), "code_block_suffix": re.compile(r"(```[\w\+\-\.]*)[ \t]+([^\n\r])"),
# Code block indent: whitespace at start of line + ``` # Code block indent: whitespace at start of line + ```
@@ -108,6 +109,13 @@ class ContentNormalizer:
"heading_space": re.compile(r"^(#+)([^ \n#])", re.MULTILINE), "heading_space": re.compile(r"^(#+)([^ \n#])", re.MULTILINE),
# Table: | col1 | col2 -> | col1 | col2 | # Table: | col1 | col2 -> | col1 | col2 |
"table_pipe": re.compile(r"^(\|.*[^|\n])$", re.MULTILINE), "table_pipe": re.compile(r"^(\|.*[^|\n])$", re.MULTILINE),
# Emphasis spacing: ** text ** -> **text**
# Matches emphasis blocks within a single line. We use a recursive approach
# in _fix_emphasis_spacing to handle nesting and spaces correctly.
# NOTE: We use [^\n] instead of . to prevent cross-line matching.
"emphasis_spacing": re.compile(
r"(?<!\*|_)(\*{1,3}|_)(?P<inner>[^\n]*?)(\1)(?!\*|_)"
),
} }
def __init__(self, config: Optional[NormalizerConfig] = None): def __init__(self, config: Optional[NormalizerConfig] = None):
@@ -207,6 +215,13 @@ class ContentNormalizer:
if content != original: if content != original:
self.applied_fixes.append("Cleanup XML Tags") self.applied_fixes.append("Cleanup XML Tags")
# 12. Emphasis spacing fix
if self.config.enable_emphasis_spacing_fix:
original = content
content = self._fix_emphasis_spacing(content)
if content != original:
self.applied_fixes.append("Fix Emphasis Spacing")
# 9. Custom cleaners # 9. Custom cleaners
for cleaner in self.config.custom_cleaners: for cleaner in self.config.custom_cleaners:
original = content original = content
@@ -283,8 +298,6 @@ class ContentNormalizer:
def _fix_code_blocks(self, content: str) -> str: def _fix_code_blocks(self, content: str) -> str:
"""Fix code block formatting (prefixes, suffixes, indentation)""" """Fix code block formatting (prefixes, suffixes, indentation)"""
# Remove indentation before code blocks
content = self._PATTERNS["code_block_indent"].sub(r"\1", content)
# Ensure newline before ``` # Ensure newline before ```
content = self._PATTERNS["code_block_prefix"].sub(r"\n\1", content) content = self._PATTERNS["code_block_prefix"].sub(r"\n\1", content)
# Ensure newline after ```lang # Ensure newline after ```lang
@@ -443,6 +456,47 @@ class ContentNormalizer:
"""Remove leftover XML tags""" """Remove leftover XML tags"""
return self._PATTERNS["xml_artifacts"].sub("", content) return self._PATTERNS["xml_artifacts"].sub("", content)
def _fix_emphasis_spacing(self, content: str) -> str:
"""Fix spaces inside **emphasis** or _emphasis_
Example: ** text ** -> **text**, **text ** -> **text**, ** text** -> **text**
"""
def replacer(match):
symbol = match.group(1)
inner = match.group("inner")
# Recursive step: Fix emphasis spacing INSIDE the current block first
# This ensures that ** _ italic _ ** becomes ** _italic_ ** before we strip outer spaces.
inner = self._PATTERNS["emphasis_spacing"].sub(replacer, inner)
# If no leading/trailing whitespace, nothing to fix at this level
stripped_inner = inner.strip()
if stripped_inner == inner:
return f"{symbol}{inner}{symbol}"
# Safeguard: If inner content is just whitespace, don't touch it
if not stripped_inner:
return match.group(0)
# Safeguard: If it looks like a math expression or list of variables (e.g. " * 3 * " or " _ b _ ")
# If the symbol is surrounded by spaces in the original text, it's likely an operator.
if inner.startswith(" ") and inner.endswith(" "):
# If it's single '*' or '_', and both sides have spaces, it's almost certainly an operator.
if symbol in ["*", "_"]:
return match.group(0)
return f"{symbol}{stripped_inner}{symbol}"
parts = content.split("```")
for i in range(0, len(parts), 2): # Even indices are markdown text
# We use a while loop to handle overlapping or multiple occurrences at the top level
while True:
new_part = self._PATTERNS["emphasis_spacing"].sub(replacer, parts[i])
if new_part == parts[i]:
break
parts[i] = new_part
return "```".join(parts)
class Filter: class Filter:
class Valves(BaseModel): class Valves(BaseModel):
@@ -494,6 +548,10 @@ class Filter:
enable_xml_tag_cleanup: bool = Field( enable_xml_tag_cleanup: bool = Field(
default=True, description="Cleanup leftover XML tags" default=True, description="Cleanup leftover XML tags"
) )
enable_emphasis_spacing_fix: bool = Field(
default=True,
description="Fix spaces inside **emphasis** (e.g. ** text ** -> **text**)",
)
show_status: bool = Field( show_status: bool = Field(
default=True, description="Show status notification when fixes are applied" default=True, description="Show status notification when fixes are applied"
) )
@@ -637,6 +695,7 @@ class Filter:
enable_heading_fix=self.valves.enable_heading_fix, enable_heading_fix=self.valves.enable_heading_fix,
enable_table_fix=self.valves.enable_table_fix, enable_table_fix=self.valves.enable_table_fix,
enable_xml_tag_cleanup=self.valves.enable_xml_tag_cleanup, enable_xml_tag_cleanup=self.valves.enable_xml_tag_cleanup,
enable_emphasis_spacing_fix=self.valves.enable_emphasis_spacing_fix,
) )
normalizer = ContentNormalizer(config) normalizer = ContentNormalizer(config)

View File

@@ -3,7 +3,7 @@ title: Markdown 格式修复器 (Markdown Normalizer)
author: Fu-Jie author: Fu-Jie
author_url: https://github.com/Fu-Jie/awesome-openwebui author_url: https://github.com/Fu-Jie/awesome-openwebui
funding_url: https://github.com/open-webui funding_url: https://github.com/open-webui
version: 1.2.0 version: 1.2.2
description: 内容规范化过滤器,修复 LLM 输出中常见的 Markdown 格式问题如损坏的代码块、LaTeX 公式、Mermaid 图表和列表格式。 description: 内容规范化过滤器,修复 LLM 输出中常见的 Markdown 格式问题如损坏的代码块、LaTeX 公式、Mermaid 图表和列表格式。
""" """
@@ -35,6 +35,7 @@ class NormalizerConfig:
enable_heading_fix: bool = True # 修复标题中缺失的空格 (#Header -> # Header) enable_heading_fix: bool = True # 修复标题中缺失的空格 (#Header -> # Header)
enable_table_fix: bool = True # 修复表格中缺失的闭合管道符 enable_table_fix: bool = True # 修复表格中缺失的闭合管道符
enable_xml_tag_cleanup: bool = True # 清理残留的 XML 标签 enable_xml_tag_cleanup: bool = True # 清理残留的 XML 标签
enable_emphasis_spacing_fix: bool = True # 修复 **强调内容** 中的多余空格
# 自定义清理函数 (用于高级扩展) # 自定义清理函数 (用于高级扩展)
custom_cleaners: List[Callable[[str], str]] = field(default_factory=list) custom_cleaners: List[Callable[[str], str]] = field(default_factory=list)
@@ -45,8 +46,8 @@ class ContentNormalizer:
# --- 1. Pre-compiled Regex Patterns (Performance Optimization) --- # --- 1. Pre-compiled Regex Patterns (Performance Optimization) ---
_PATTERNS = { _PATTERNS = {
# Code block prefix: if ``` is not at start of line or file # Code block prefix: if ``` is not at start of line (ignoring whitespace)
"code_block_prefix": re.compile(r"(?<!^)(?<!\n)(```)", re.MULTILINE), "code_block_prefix": re.compile(r"(\S[ \t]*)(```)"),
# Code block suffix: ```lang followed by non-whitespace (no newline) # Code block suffix: ```lang followed by non-whitespace (no newline)
"code_block_suffix": re.compile(r"(```[\w\+\-\.]*)[ \t]+([^\n\r])"), "code_block_suffix": re.compile(r"(```[\w\+\-\.]*)[ \t]+([^\n\r])"),
# Code block indent: whitespace at start of line + ``` # Code block indent: whitespace at start of line + ```
@@ -100,6 +101,13 @@ class ContentNormalizer:
"heading_space": re.compile(r"^(#+)([^ \n#])", re.MULTILINE), "heading_space": re.compile(r"^(#+)([^ \n#])", re.MULTILINE),
# Table: | col1 | col2 -> | col1 | col2 | # Table: | col1 | col2 -> | col1 | col2 |
"table_pipe": re.compile(r"^(\|.*[^|\n])$", re.MULTILINE), "table_pipe": re.compile(r"^(\|.*[^|\n])$", re.MULTILINE),
# Emphasis spacing: ** text ** -> **text**
# Matches emphasis blocks within a single line. We use a recursive approach
# in _fix_emphasis_spacing to handle nesting and spaces correctly.
# NOTE: We use [^\n] instead of . to prevent cross-line matching.
"emphasis_spacing": re.compile(
r"(?<!\*|_)(\*{1,3}|_)(?P<inner>[^\n]*?)(\1)(?!\*|_)"
),
} }
def __init__(self, config: Optional[NormalizerConfig] = None): def __init__(self, config: Optional[NormalizerConfig] = None):
@@ -199,6 +207,13 @@ class ContentNormalizer:
if content != original: if content != original:
self.applied_fixes.append("Cleanup XML Tags") self.applied_fixes.append("Cleanup XML Tags")
# 12. Emphasis spacing fix
if self.config.enable_emphasis_spacing_fix:
original = content
content = self._fix_emphasis_spacing(content)
if content != original:
self.applied_fixes.append("Fix Emphasis Spacing")
# 9. Custom cleaners # 9. Custom cleaners
for cleaner in self.config.custom_cleaners: for cleaner in self.config.custom_cleaners:
original = content original = content
@@ -257,8 +272,6 @@ class ContentNormalizer:
def _fix_code_blocks(self, content: str) -> str: def _fix_code_blocks(self, content: str) -> str:
"""Fix code block formatting (prefixes, suffixes, indentation)""" """Fix code block formatting (prefixes, suffixes, indentation)"""
# Remove indentation before code blocks
content = self._PATTERNS["code_block_indent"].sub(r"\1", content)
# Ensure newline before ``` # Ensure newline before ```
content = self._PATTERNS["code_block_prefix"].sub(r"\n\1", content) content = self._PATTERNS["code_block_prefix"].sub(r"\n\1", content)
# Ensure newline after ```lang # Ensure newline after ```lang
@@ -422,6 +435,47 @@ class ContentNormalizer:
"""Remove leftover XML tags""" """Remove leftover XML tags"""
return self._PATTERNS["xml_artifacts"].sub("", content) return self._PATTERNS["xml_artifacts"].sub("", content)
def _fix_emphasis_spacing(self, content: str) -> str:
"""Fix spaces inside **emphasis** or _emphasis_
Example: ** text ** -> **text**, **text ** -> **text**, ** text** -> **text**
"""
def replacer(match):
symbol = match.group(1)
inner = match.group("inner")
# Recursive step: Fix emphasis spacing INSIDE the current block first
# This ensures that ** _ italic _ ** becomes ** _italic_ ** before we strip outer spaces.
inner = self._PATTERNS["emphasis_spacing"].sub(replacer, inner)
# If no leading/trailing whitespace, nothing to fix at this level
stripped_inner = inner.strip()
if stripped_inner == inner:
return f"{symbol}{inner}{symbol}"
# Safeguard: If inner content is just whitespace, don't touch it
if not stripped_inner:
return match.group(0)
# Safeguard: If it looks like a math expression or list of variables (e.g. " * 3 * " or " _ b _ ")
# If the symbol is surrounded by spaces in the original text, it's likely an operator.
if inner.startswith(" ") and inner.endswith(" "):
# If it's single '*' or '_', and both sides have spaces, it's almost certainly an operator.
if symbol in ["*", "_"]:
return match.group(0)
return f"{symbol}{stripped_inner}{symbol}"
parts = content.split("```")
for i in range(0, len(parts), 2): # Even indices are markdown text
# We use a while loop to handle overlapping or multiple occurrences at the top level
while True:
new_part = self._PATTERNS["emphasis_spacing"].sub(replacer, parts[i])
if new_part == parts[i]:
break
parts[i] = new_part
return "```".join(parts)
class Filter: class Filter:
class Valves(BaseModel): class Valves(BaseModel):
@@ -469,6 +523,10 @@ class Filter:
enable_xml_tag_cleanup: bool = Field( enable_xml_tag_cleanup: bool = Field(
default=True, description="清理残留的 XML 标签" default=True, description="清理残留的 XML 标签"
) )
enable_emphasis_spacing_fix: bool = Field(
default=True,
description="修复强调语法中的多余空格 (例如 ** 文本 ** -> **文本**)",
)
show_status: bool = Field(default=True, description="应用修复时显示状态通知") show_status: bool = Field(default=True, description="应用修复时显示状态通知")
show_debug_log: bool = Field( show_debug_log: bool = Field(
default=True, description="在浏览器控制台打印调试日志 (F12)" default=True, description="在浏览器控制台打印调试日志 (F12)"
@@ -540,6 +598,7 @@ class Filter:
"Fix Headings": "标题格式", "Fix Headings": "标题格式",
"Fix Tables": "表格格式", "Fix Tables": "表格格式",
"Cleanup XML Tags": "XML清理", "Cleanup XML Tags": "XML清理",
"Fix Emphasis Spacing": "强调空格",
"Custom Cleaner": "自定义清理", "Custom Cleaner": "自定义清理",
} }
translated_fixes = [fix_map.get(fix, fix) for fix in applied_fixes] translated_fixes = [fix_map.get(fix, fix) for fix in applied_fixes]
@@ -626,6 +685,7 @@ class Filter:
enable_heading_fix=self.valves.enable_heading_fix, enable_heading_fix=self.valves.enable_heading_fix,
enable_table_fix=self.valves.enable_table_fix, enable_table_fix=self.valves.enable_table_fix,
enable_xml_tag_cleanup=self.valves.enable_xml_tag_cleanup, enable_xml_tag_cleanup=self.valves.enable_xml_tag_cleanup,
enable_emphasis_spacing_fix=self.valves.enable_emphasis_spacing_fix,
) )
normalizer = ContentNormalizer(config) normalizer = ContentNormalizer(config)