feat(markdown_normalizer): add details tag normalization and update documentation
This commit is contained in:
@@ -1,12 +1,12 @@
|
||||
# Markdown Normalizer Filter
|
||||
|
||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui)
|
||||
**Version:** 1.1.2
|
||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **Version:** 1.2.0 | **Project:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **License:** MIT
|
||||
|
||||
A content normalizer filter for Open WebUI that fixes common Markdown formatting issues in LLM outputs. It ensures that code blocks, LaTeX formulas, Mermaid diagrams, and other Markdown elements are rendered correctly.
|
||||
|
||||
## Features
|
||||
|
||||
* **Details Tag Normalization**: Ensures proper spacing for `<details>` tags (used for thought chains). Adds a blank line after `</details>` and ensures a newline after self-closing `<details />` tags to prevent rendering issues.
|
||||
* **Mermaid Syntax Fix**: Automatically fixes common Mermaid syntax errors, such as unquoted node labels (including multi-line labels and citations) and unclosed subgraphs. **New in v1.1.2**: Comprehensive protection for edge labels (text on connecting lines) across all link types (solid, dotted, thick).
|
||||
* **Frontend Console Debugging**: Supports printing structured debug logs directly to the browser console (F12) for easier troubleshooting.
|
||||
* **Code Block Formatting**: Fixes broken code block prefixes, suffixes, and indentation.
|
||||
@@ -32,6 +32,7 @@ A content normalizer filter for Open WebUI that fixes common Markdown formatting
|
||||
* `priority`: Filter priority (default: 50).
|
||||
* `enable_escape_fix`: Fix excessive escape characters.
|
||||
* `enable_thought_tag_fix`: Normalize thought tags.
|
||||
* `enable_details_tag_fix`: Normalize details tags (default: True).
|
||||
* `enable_code_block_fix`: Fix code block formatting.
|
||||
* `enable_latex_fix`: Normalize LaTeX formulas.
|
||||
* `enable_list_fix`: Fix list item newlines (Experimental).
|
||||
@@ -44,8 +45,18 @@ A content normalizer filter for Open WebUI that fixes common Markdown formatting
|
||||
* `show_status`: Show status notification when fixes are applied.
|
||||
* `show_debug_log`: Print debug logs to browser console.
|
||||
|
||||
## Troubleshooting ❓
|
||||
|
||||
- **Submit an Issue**: If you encounter any problems, please submit an issue on GitHub: [Awesome OpenWebUI Issues](https://github.com/Fu-Jie/awesome-openwebui/issues)
|
||||
|
||||
## Changelog
|
||||
|
||||
### v1.2.0
|
||||
* **Details Tag Support**: Added normalization for `<details>` tags.
|
||||
* Ensures a blank line is added after `</details>` closing tags to separate thought content from the main response.
|
||||
* Ensures a newline is added after self-closing `<details ... />` tags to prevent them from interfering with subsequent Markdown headings (e.g., fixing `<details/>#Heading`).
|
||||
* Includes safeguard to prevent modification of `<details>` tags inside code blocks.
|
||||
|
||||
### v1.1.2
|
||||
* **Mermaid Edge Label Protection**: Implemented comprehensive protection for edge labels (text on connecting lines) to prevent them from being incorrectly modified. Now supports all Mermaid link types including solid (`--`), dotted (`-.`), and thick (`==`) lines with or without arrows.
|
||||
* **Bug Fixes**: Fixed an issue where lines without arrows (e.g., `A -- text --- B`) were not correctly protected.
|
||||
@@ -56,6 +67,3 @@ A content normalizer filter for Open WebUI that fixes common Markdown formatting
|
||||
* **Full-width Symbol Cleanup**: Fixed duplicate keys and incorrect quote mapping in `FULLWIDTH_MAP`.
|
||||
* **Bug Fixes**: Fixed missing `Dict` import in Python files.
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
# Markdown 格式化过滤器 (Markdown Normalizer)
|
||||
|
||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui)
|
||||
**版本:** 1.1.2
|
||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui) | **版本:** 1.2.0 | **项目:** [Awesome OpenWebUI](https://github.com/Fu-Jie/awesome-openwebui) | **许可证:** MIT
|
||||
|
||||
这是一个用于 Open WebUI 的内容格式化过滤器,旨在修复 LLM 输出中常见的 Markdown 格式问题。它能确保代码块、LaTeX 公式、Mermaid 图表和其他 Markdown 元素被正确渲染。
|
||||
|
||||
## 功能特性
|
||||
|
||||
* **Details 标签规范化**: 确保 `<details>` 标签(常用于思维链)有正确的间距。在 `</details>` 后添加空行,并在自闭合 `<details />` 标签后添加换行,防止渲染问题。
|
||||
* **Mermaid 语法修复**: 自动修复常见的 Mermaid 语法错误,如未加引号的节点标签(支持多行标签和引用标记)和未闭合的子图 (Subgraph)。**v1.1.2 新增**: 全面保护各种类型的连线标签(实线、虚线、粗线),防止被误修改。
|
||||
* **前端控制台调试**: 支持将结构化的调试日志直接打印到浏览器控制台 (F12),方便排查问题。
|
||||
* **代码块格式化**: 修复破损的代码块前缀、后缀和缩进问题。
|
||||
@@ -32,6 +32,7 @@
|
||||
* `priority`: 过滤器优先级 (默认: 50)。
|
||||
* `enable_escape_fix`: 修复过度的转义字符。
|
||||
* `enable_thought_tag_fix`: 规范化思维标签。
|
||||
* `enable_details_tag_fix`: 规范化 Details 标签 (默认: True)。
|
||||
* `enable_code_block_fix`: 修复代码块格式。
|
||||
* `enable_latex_fix`: 规范化 LaTeX 公式。
|
||||
* `enable_list_fix`: 修复列表项换行 (实验性)。
|
||||
@@ -44,8 +45,18 @@
|
||||
* `show_status`: 应用修复时显示状态通知。
|
||||
* `show_debug_log`: 在浏览器控制台打印调试日志。
|
||||
|
||||
## 故障排除 (Troubleshooting) ❓
|
||||
|
||||
- **提交 Issue**: 如果遇到任何问题,请在 GitHub 上提交 Issue:[Awesome OpenWebUI Issues](https://github.com/Fu-Jie/awesome-openwebui/issues)
|
||||
|
||||
## 更新日志
|
||||
|
||||
### v1.2.0
|
||||
* **Details 标签支持**: 新增了对 `<details>` 标签的规范化支持。
|
||||
* 确保在 `</details>` 闭合标签后添加空行,将思维内容与正文分隔开。
|
||||
* 确保在自闭合 `<details ... />` 标签后添加换行,防止其干扰后续的 Markdown 标题(例如修复 `<details/>#标题`)。
|
||||
* 包含保护机制,防止修改代码块内部的 `<details>` 标签。
|
||||
|
||||
### v1.1.2
|
||||
* **Mermaid 连线标签保护**: 实现了全面的连线标签保护机制,防止连接线上的文字被误修改。现在支持所有 Mermaid 连线类型,包括实线 (`--`)、虚线 (`-.`) 和粗线 (`==`),无论是否带有箭头。
|
||||
* **Bug 修复**: 修复了无箭头连线(如 `A -- text --- B`)未被正确保护的问题。
|
||||
@@ -56,6 +67,3 @@
|
||||
* **全角符号清理**: 修复了 `FULLWIDTH_MAP` 中的重复键名和错误的引号映射。
|
||||
* **Bug 修复**: 修复了 Python 文件中缺失的 `Dict` 类型导入。
|
||||
|
||||
## 许可证
|
||||
|
||||
MIT
|
||||
|
||||
@@ -3,7 +3,7 @@ title: Markdown Normalizer
|
||||
author: Fu-Jie
|
||||
author_url: https://github.com/Fu-Jie/awesome-openwebui
|
||||
funding_url: https://github.com/open-webui
|
||||
version: 1.1.2
|
||||
version: 1.2.0
|
||||
openwebui_id: baaa8732-9348-40b7-8359-7e009660e23c
|
||||
description: A content normalizer filter that fixes common Markdown formatting issues in LLM outputs, such as broken code blocks, LaTeX formulas, and list formatting.
|
||||
"""
|
||||
@@ -29,6 +29,7 @@ class NormalizerConfig:
|
||||
False # Apply escape fix inside code blocks (default: False for safety)
|
||||
)
|
||||
enable_thought_tag_fix: bool = True # Normalize thought tags
|
||||
enable_details_tag_fix: bool = True # Normalize <details> tags (like thought tags)
|
||||
enable_code_block_fix: bool = True # Fix code block formatting
|
||||
enable_latex_fix: bool = True # Fix LaTeX formula formatting
|
||||
enable_list_fix: bool = (
|
||||
@@ -63,6 +64,12 @@ class ContentNormalizer:
|
||||
r"</(thought|think|thinking)>[ \t]*\n*", re.IGNORECASE
|
||||
),
|
||||
"thought_start": re.compile(r"<(thought|think|thinking)>", re.IGNORECASE),
|
||||
# Details tag: </details> followed by optional whitespace/newlines
|
||||
"details_end": re.compile(r"</details>[ \t]*\n*", re.IGNORECASE),
|
||||
# Self-closing details tag: <details ... /> followed by optional whitespace (but NOT already having newline)
|
||||
"details_self_closing": re.compile(
|
||||
r"(<details[^>]*/\s*>)(?!\n)", re.IGNORECASE
|
||||
),
|
||||
# LaTeX block: \[ ... \]
|
||||
"latex_bracket_block": re.compile(r"\\\[(.+?)\\\]", re.DOTALL),
|
||||
# LaTeX inline: \( ... \)
|
||||
@@ -130,7 +137,14 @@ class ContentNormalizer:
|
||||
if content != original:
|
||||
self.applied_fixes.append("Normalize Thought Tags")
|
||||
|
||||
# 3. Code block formatting fix
|
||||
# 3. Details tag normalization (must be before heading fix)
|
||||
if self.config.enable_details_tag_fix:
|
||||
original = content
|
||||
content = self._fix_details_tags(content)
|
||||
if content != original:
|
||||
self.applied_fixes.append("Normalize Details Tags")
|
||||
|
||||
# 4. Code block formatting fix
|
||||
if self.config.enable_code_block_fix:
|
||||
original = content
|
||||
content = self._fix_code_blocks(content)
|
||||
@@ -249,6 +263,24 @@ class ContentNormalizer:
|
||||
# 2. Standardize end tag and ensure newlines: </think> -> </thought>\n\n
|
||||
return self._PATTERNS["thought_end"].sub("</thought>\n\n", content)
|
||||
|
||||
def _fix_details_tags(self, content: str) -> str:
|
||||
"""Normalize <details> tags: ensure proper spacing after closing tags
|
||||
|
||||
Handles two cases:
|
||||
1. </details> followed by content -> ensure double newline
|
||||
2. <details .../> (self-closing) followed by content -> ensure newline
|
||||
|
||||
Note: Only applies outside of code blocks to avoid breaking code examples.
|
||||
"""
|
||||
parts = content.split("```")
|
||||
for i in range(0, len(parts), 2): # Even indices are markdown text
|
||||
# 1. Ensure double newline after </details>
|
||||
parts[i] = self._PATTERNS["details_end"].sub("</details>\n\n", parts[i])
|
||||
# 2. Ensure newline after self-closing <details ... />
|
||||
parts[i] = self._PATTERNS["details_self_closing"].sub(r"\1\n", parts[i])
|
||||
|
||||
return "```".join(parts)
|
||||
|
||||
def _fix_code_blocks(self, content: str) -> str:
|
||||
"""Fix code block formatting (prefixes, suffixes, indentation)"""
|
||||
# Remove indentation before code blocks
|
||||
@@ -428,6 +460,10 @@ class Filter:
|
||||
enable_thought_tag_fix: bool = Field(
|
||||
default=True, description="Normalize </thought> tags"
|
||||
)
|
||||
enable_details_tag_fix: bool = Field(
|
||||
default=True,
|
||||
description="Normalize <details> tags (add blank line after </details> and handle self-closing tags)",
|
||||
)
|
||||
enable_code_block_fix: bool = Field(
|
||||
default=True,
|
||||
description="Fix code block formatting (indentation, newlines)",
|
||||
@@ -591,6 +627,7 @@ class Filter:
|
||||
enable_escape_fix=self.valves.enable_escape_fix,
|
||||
enable_escape_fix_in_code_blocks=self.valves.enable_escape_fix_in_code_blocks,
|
||||
enable_thought_tag_fix=self.valves.enable_thought_tag_fix,
|
||||
enable_details_tag_fix=self.valves.enable_details_tag_fix,
|
||||
enable_code_block_fix=self.valves.enable_code_block_fix,
|
||||
enable_latex_fix=self.valves.enable_latex_fix,
|
||||
enable_list_fix=self.valves.enable_list_fix,
|
||||
|
||||
@@ -3,7 +3,7 @@ title: Markdown 格式修复器 (Markdown Normalizer)
|
||||
author: Fu-Jie
|
||||
author_url: https://github.com/Fu-Jie/awesome-openwebui
|
||||
funding_url: https://github.com/open-webui
|
||||
version: 1.1.2
|
||||
version: 1.2.0
|
||||
description: 内容规范化过滤器,修复 LLM 输出中常见的 Markdown 格式问题,如损坏的代码块、LaTeX 公式、Mermaid 图表和列表格式。
|
||||
"""
|
||||
|
||||
@@ -25,6 +25,7 @@ class NormalizerConfig:
|
||||
|
||||
enable_escape_fix: bool = True # 修复过度的转义字符
|
||||
enable_thought_tag_fix: bool = True # 规范化思维链标签
|
||||
enable_details_tag_fix: bool = True # 规范化 <details> 标签(类似思维链标签)
|
||||
enable_code_block_fix: bool = True # 修复代码块格式
|
||||
enable_latex_fix: bool = True # 修复 LaTeX 公式格式
|
||||
enable_list_fix: bool = False # 修复列表项换行 (默认关闭,因为可能过于激进)
|
||||
@@ -55,6 +56,12 @@ class ContentNormalizer:
|
||||
r"</(thought|think|thinking)>[ \t]*\n*", re.IGNORECASE
|
||||
),
|
||||
"thought_start": re.compile(r"<(thought|think|thinking)>", re.IGNORECASE),
|
||||
# Details tag: </details> followed by optional whitespace/newlines
|
||||
"details_end": re.compile(r"</details>[ \t]*\n*", re.IGNORECASE),
|
||||
# Self-closing details tag: <details ... /> followed by optional whitespace (but NOT already having newline)
|
||||
"details_self_closing": re.compile(
|
||||
r"(<details[^>]*/\s*>)(?!\n)", re.IGNORECASE
|
||||
),
|
||||
# LaTeX block: \[ ... \]
|
||||
"latex_bracket_block": re.compile(r"\\\[(.+?)\\\]", re.DOTALL),
|
||||
# LaTeX inline: \( ... \)
|
||||
@@ -122,7 +129,14 @@ class ContentNormalizer:
|
||||
if content != original:
|
||||
self.applied_fixes.append("Normalize Thought Tags")
|
||||
|
||||
# 3. Code block formatting fix
|
||||
# 3. Details tag normalization (must be before heading fix)
|
||||
if self.config.enable_details_tag_fix:
|
||||
original = content
|
||||
content = self._fix_details_tags(content)
|
||||
if content != original:
|
||||
self.applied_fixes.append("Normalize Details Tags")
|
||||
|
||||
# 4. Code block formatting fix
|
||||
if self.config.enable_code_block_fix:
|
||||
original = content
|
||||
content = self._fix_code_blocks(content)
|
||||
@@ -223,6 +237,24 @@ class ContentNormalizer:
|
||||
# 2. Standardize end tag and ensure newlines: </think> -> </thought>\n\n
|
||||
return self._PATTERNS["thought_end"].sub("</thought>\n\n", content)
|
||||
|
||||
def _fix_details_tags(self, content: str) -> str:
|
||||
"""规范化 <details> 标签:确保闭合标签后的正确间距
|
||||
|
||||
处理两种情况:
|
||||
1. </details> 后跟内容 -> 确保有双换行
|
||||
2. <details .../> (自闭合) 后跟内容 -> 确保有换行
|
||||
|
||||
注意:仅在代码块外部应用,以避免破坏代码示例。
|
||||
"""
|
||||
parts = content.split("```")
|
||||
for i in range(0, len(parts), 2): # 偶数索引是 Markdown 文本
|
||||
# 1. 确保 </details> 后有双换行
|
||||
parts[i] = self._PATTERNS["details_end"].sub("</details>\n\n", parts[i])
|
||||
# 2. 确保自闭合 <details ... /> 后有换行
|
||||
parts[i] = self._PATTERNS["details_self_closing"].sub(r"\1\n", parts[i])
|
||||
|
||||
return "```".join(parts)
|
||||
|
||||
def _fix_code_blocks(self, content: str) -> str:
|
||||
"""Fix code block formatting (prefixes, suffixes, indentation)"""
|
||||
# Remove indentation before code blocks
|
||||
@@ -403,6 +435,10 @@ class Filter:
|
||||
enable_thought_tag_fix: bool = Field(
|
||||
default=True, description="规范化思维链标签 (<think> -> <thought>)"
|
||||
)
|
||||
enable_details_tag_fix: bool = Field(
|
||||
default=True,
|
||||
description="规范化 <details> 标签 (在 </details> 后添加空行,处理自闭合标签)",
|
||||
)
|
||||
enable_code_block_fix: bool = Field(
|
||||
default=True,
|
||||
description="修复代码块格式 (缩进、换行)",
|
||||
@@ -494,6 +530,7 @@ class Filter:
|
||||
fix_map = {
|
||||
"Fix Escape Chars": "转义字符",
|
||||
"Normalize Thought Tags": "思维标签",
|
||||
"Normalize Details Tags": "Details标签",
|
||||
"Fix Code Blocks": "代码块",
|
||||
"Normalize LaTeX": "LaTeX公式",
|
||||
"Fix List Format": "列表格式",
|
||||
@@ -579,6 +616,7 @@ class Filter:
|
||||
config = NormalizerConfig(
|
||||
enable_escape_fix=self.valves.enable_escape_fix,
|
||||
enable_thought_tag_fix=self.valves.enable_thought_tag_fix,
|
||||
enable_details_tag_fix=self.valves.enable_details_tag_fix,
|
||||
enable_code_block_fix=self.valves.enable_code_block_fix,
|
||||
enable_latex_fix=self.valves.enable_latex_fix,
|
||||
enable_list_fix=self.valves.enable_list_fix,
|
||||
|
||||
@@ -14,6 +14,7 @@ class TestMarkdownNormalizer(unittest.TestCase):
|
||||
self.config = NormalizerConfig(
|
||||
enable_escape_fix=True,
|
||||
enable_thought_tag_fix=True,
|
||||
enable_details_tag_fix=True,
|
||||
enable_code_block_fix=True,
|
||||
enable_latex_fix=True,
|
||||
enable_list_fix=True,
|
||||
@@ -21,6 +22,7 @@ class TestMarkdownNormalizer(unittest.TestCase):
|
||||
enable_fullwidth_symbol_fix=True,
|
||||
enable_mermaid_fix=True,
|
||||
enable_xml_tag_cleanup=True,
|
||||
enable_heading_fix=True,
|
||||
)
|
||||
self.normalizer = ContentNormalizer(self.config)
|
||||
|
||||
@@ -42,6 +44,32 @@ class TestMarkdownNormalizer(unittest.TestCase):
|
||||
self.normalizer.normalize(input_text_deepseek), expected_deepseek
|
||||
)
|
||||
|
||||
def test_details_tag_fix(self):
|
||||
# Case 1: </details> followed by content without blank line
|
||||
input_text = (
|
||||
"<details><summary>Thought</summary>\n> Thinking\n</details>Next paragraph"
|
||||
)
|
||||
expected = "<details><summary>Thought</summary>\n> Thinking\n</details>\n\nNext paragraph"
|
||||
self.assertEqual(self.normalizer.normalize(input_text), expected)
|
||||
|
||||
# Case 2: Self-closing <details /> followed by heading
|
||||
input_text_self_closing = '<details id="__DETAIL_0__"/>#Heading'
|
||||
result = self.normalizer.normalize(input_text_self_closing)
|
||||
self.assertIn("# Heading", result) # Heading should be fixed
|
||||
self.assertIn(
|
||||
'<details id="__DETAIL_0__"/>\n', result
|
||||
) # Should have newline after
|
||||
|
||||
# Case 3: </details> already has proper spacing (should not add extra)
|
||||
input_already_good = "</details>\n\nNext"
|
||||
self.assertEqual(
|
||||
self.normalizer.normalize(input_already_good), input_already_good
|
||||
)
|
||||
|
||||
# Case 4: Details tag inside code block (should NOT be modified)
|
||||
input_code_block = "```html\n<details>\n</details>\n```"
|
||||
self.assertEqual(self.normalizer.normalize(input_code_block), input_code_block)
|
||||
|
||||
def test_code_block_fix(self):
|
||||
# Case 1: Indentation
|
||||
self.assertEqual(self.normalizer._fix_code_blocks(" ```python"), "```python")
|
||||
|
||||
37
plugins/filters/markdown_normalizer/test_side_effects.py
Normal file
37
plugins/filters/markdown_normalizer/test_side_effects.py
Normal file
@@ -0,0 +1,37 @@
|
||||
from markdown_normalizer import ContentNormalizer, NormalizerConfig
|
||||
|
||||
|
||||
def test_side_effects():
|
||||
normalizer = ContentNormalizer(NormalizerConfig(enable_details_tag_fix=True))
|
||||
|
||||
# Scenario 1: HTML code block
|
||||
code_block = """```html
|
||||
<details>
|
||||
<summary>Click</summary>
|
||||
Content
|
||||
</details>
|
||||
```"""
|
||||
|
||||
# Scenario 2: Python string
|
||||
python_code = """```python
|
||||
html = "</details>"
|
||||
print(html)
|
||||
```"""
|
||||
|
||||
print("--- Scenario 1: HTML Code Block ---")
|
||||
res1 = normalizer.normalize(code_block)
|
||||
print(repr(res1))
|
||||
if "</details>\n\n" in res1 and "```" in res1:
|
||||
print("WARNING: Modified inside HTML code block")
|
||||
|
||||
print("\n--- Scenario 2: Python String ---")
|
||||
res2 = normalizer.normalize(python_code)
|
||||
print(repr(res2))
|
||||
if 'html = "</details>\n\n"' in res2:
|
||||
print("CRITICAL: Broke Python string literal")
|
||||
else:
|
||||
print("OK")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
test_side_effects()
|
||||
Reference in New Issue
Block a user