feat: update markdown normalizer to v1.1.2 with comprehensive mermaid edge label protection
This commit is contained in:
@@ -1,13 +1,13 @@
|
||||
# Markdown Normalizer Filter
|
||||
|
||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie)
|
||||
**Version:** 1.1.0
|
||||
**Author:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui)
|
||||
**Version:** 1.1.2
|
||||
|
||||
A content normalizer filter for Open WebUI that fixes common Markdown formatting issues in LLM outputs. It ensures that code blocks, LaTeX formulas, Mermaid diagrams, and other Markdown elements are rendered correctly.
|
||||
|
||||
## Features
|
||||
|
||||
* **Mermaid Syntax Fix**: Automatically fixes common Mermaid syntax errors, such as unquoted node labels (including multi-line labels and citations) and unclosed subgraphs, ensuring diagrams render correctly.
|
||||
* **Mermaid Syntax Fix**: Automatically fixes common Mermaid syntax errors, such as unquoted node labels (including multi-line labels and citations) and unclosed subgraphs. **New in v1.1.2**: Comprehensive protection for edge labels (text on connecting lines) across all link types (solid, dotted, thick).
|
||||
* **Frontend Console Debugging**: Supports printing structured debug logs directly to the browser console (F12) for easier troubleshooting.
|
||||
* **Code Block Formatting**: Fixes broken code block prefixes, suffixes, and indentation.
|
||||
* **LaTeX Normalization**: Standardizes LaTeX formula delimiters (`\[` -> `$$`, `\(` -> `$`).
|
||||
@@ -46,6 +46,10 @@ A content normalizer filter for Open WebUI that fixes common Markdown formatting
|
||||
|
||||
## Changelog
|
||||
|
||||
### v1.1.2
|
||||
* **Mermaid Edge Label Protection**: Implemented comprehensive protection for edge labels (text on connecting lines) to prevent them from being incorrectly modified. Now supports all Mermaid link types including solid (`--`), dotted (`-.`), and thick (`==`) lines with or without arrows.
|
||||
* **Bug Fixes**: Fixed an issue where lines without arrows (e.g., `A -- text --- B`) were not correctly protected.
|
||||
|
||||
### v1.1.0
|
||||
* **Mermaid Fix Refinement**: Improved regex to handle nested parentheses in node labels (e.g., `ID("Label (text)")`) and avoided matching connection labels.
|
||||
* **HTML Safeguard Optimization**: Refined `_contains_html` to allow common tags like `<br/>`, `<b>`, `<i>`, etc., ensuring Mermaid diagrams with these tags are still normalized.
|
||||
|
||||
@@ -1,13 +1,13 @@
|
||||
# Markdown 格式化过滤器 (Markdown Normalizer)
|
||||
|
||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie)
|
||||
**版本:** 1.1.0
|
||||
**作者:** [Fu-Jie](https://github.com/Fu-Jie/awesome-openwebui)
|
||||
**版本:** 1.1.2
|
||||
|
||||
这是一个用于 Open WebUI 的内容格式化过滤器,旨在修复 LLM 输出中常见的 Markdown 格式问题。它能确保代码块、LaTeX 公式、Mermaid 图表和其他 Markdown 元素被正确渲染。
|
||||
|
||||
## 功能特性
|
||||
|
||||
* **Mermaid 语法修复**: 自动修复常见的 Mermaid 语法错误,如未加引号的节点标签(支持多行标签和引用标记)和未闭合的子图 (Subgraph),确保图表能正确渲染。
|
||||
* **Mermaid 语法修复**: 自动修复常见的 Mermaid 语法错误,如未加引号的节点标签(支持多行标签和引用标记)和未闭合的子图 (Subgraph)。**v1.1.2 新增**: 全面保护各种类型的连线标签(实线、虚线、粗线),防止被误修改。
|
||||
* **前端控制台调试**: 支持将结构化的调试日志直接打印到浏览器控制台 (F12),方便排查问题。
|
||||
* **代码块格式化**: 修复破损的代码块前缀、后缀和缩进问题。
|
||||
* **LaTeX 规范化**: 标准化 LaTeX 公式定界符 (`\[` -> `$$`, `\(` -> `$`)。
|
||||
@@ -46,6 +46,10 @@
|
||||
|
||||
## 更新日志
|
||||
|
||||
### v1.1.2
|
||||
* **Mermaid 连线标签保护**: 实现了全面的连线标签保护机制,防止连接线上的文字被误修改。现在支持所有 Mermaid 连线类型,包括实线 (`--`)、虚线 (`-.`) 和粗线 (`==`),无论是否带有箭头。
|
||||
* **Bug 修复**: 修复了无箭头连线(如 `A -- text --- B`)未被正确保护的问题。
|
||||
|
||||
### v1.1.0
|
||||
* **Mermaid 修复优化**: 改进了正则表达式以处理节点标签中的嵌套括号(如 `ID("标签 (文本)")`),并避免误匹配连接线上的文字。
|
||||
* **HTML 保护机制优化**: 优化了 `_contains_html` 检测,允许 `<br/>`, `<b>`, `<i>` 等常见标签,确保包含这些标签的 Mermaid 图表能被正常规范化。
|
||||
|
||||
@@ -3,7 +3,7 @@ title: Markdown Normalizer
|
||||
author: Fu-Jie
|
||||
author_url: https://github.com/Fu-Jie/awesome-openwebui
|
||||
funding_url: https://github.com/open-webui
|
||||
version: 1.1.0
|
||||
version: 1.1.2
|
||||
description: A content normalizer filter that fixes common Markdown formatting issues in LLM outputs, such as broken code blocks, LaTeX formulas, and list formatting.
|
||||
"""
|
||||
|
||||
@@ -11,7 +11,6 @@ from pydantic import BaseModel, Field
|
||||
from typing import Optional, List, Callable, Dict
|
||||
import re
|
||||
import logging
|
||||
import logging
|
||||
import asyncio
|
||||
import json
|
||||
from dataclasses import dataclass, field
|
||||
@@ -25,6 +24,9 @@ class NormalizerConfig:
|
||||
"""Configuration class for enabling/disabling specific normalization rules"""
|
||||
|
||||
enable_escape_fix: bool = True # Fix excessive escape characters
|
||||
enable_escape_fix_in_code_blocks: bool = (
|
||||
False # Apply escape fix inside code blocks (default: False for safety)
|
||||
)
|
||||
enable_thought_tag_fix: bool = True # Normalize thought tags
|
||||
enable_code_block_fix: bool = True # Fix code block formatting
|
||||
enable_latex_fix: bool = True # Fix LaTeX formula formatting
|
||||
@@ -214,12 +216,30 @@ class ContentNormalizer:
|
||||
return content
|
||||
|
||||
def _fix_escape_characters(self, content: str) -> str:
|
||||
"""Fix excessive escape characters"""
|
||||
content = content.replace("\\r\\n", "\n")
|
||||
content = content.replace("\\n", "\n")
|
||||
content = content.replace("\\t", "\t")
|
||||
content = content.replace("\\\\", "\\")
|
||||
return content
|
||||
"""Fix excessive escape characters
|
||||
|
||||
If enable_escape_fix_in_code_blocks is False (default), this method will only
|
||||
fix escape characters outside of code blocks to avoid breaking valid code
|
||||
examples (e.g., JSON strings with \\n, regex patterns, etc.).
|
||||
"""
|
||||
if self.config.enable_escape_fix_in_code_blocks:
|
||||
# Apply globally (original behavior)
|
||||
content = content.replace("\\r\\n", "\n")
|
||||
content = content.replace("\\n", "\n")
|
||||
content = content.replace("\\t", "\t")
|
||||
content = content.replace("\\\\", "\\")
|
||||
return content
|
||||
else:
|
||||
# Apply only outside code blocks (safe mode)
|
||||
parts = content.split("```")
|
||||
for i in range(
|
||||
0, len(parts), 2
|
||||
): # Even indices are markdown text (not code)
|
||||
parts[i] = parts[i].replace("\\r\\n", "\n")
|
||||
parts[i] = parts[i].replace("\\n", "\n")
|
||||
parts[i] = parts[i].replace("\\t", "\t")
|
||||
parts[i] = parts[i].replace("\\\\", "\\")
|
||||
return "```".join(parts)
|
||||
|
||||
def _fix_thought_tags(self, content: str) -> str:
|
||||
"""Normalize thought tags: unify naming and fix spacing"""
|
||||
@@ -239,7 +259,7 @@ class ContentNormalizer:
|
||||
return content
|
||||
|
||||
def _fix_latex_formulas(self, content: str) -> str:
|
||||
"""Normalize LaTeX formulas: \[ -> $$ (block), \( -> $ (inline)"""
|
||||
r"""Normalize LaTeX formulas: \[ -> $$ (block), \( -> $ (inline)"""
|
||||
content = self._PATTERNS["latex_bracket_block"].sub(r"$$\1$$", content)
|
||||
content = self._PATTERNS["latex_paren_inline"].sub(r"$\1$", content)
|
||||
return content
|
||||
@@ -267,6 +287,8 @@ class ContentNormalizer:
|
||||
":": ":",
|
||||
"?": "?",
|
||||
"!": "!",
|
||||
""": '"', # U+FF02 FULLWIDTH QUOTATION MARK
|
||||
"'": "'", # U+FF07 FULLWIDTH APOSTROPHE
|
||||
"“": '"',
|
||||
"”": '"',
|
||||
"‘": "'",
|
||||
@@ -319,8 +341,38 @@ class ContentNormalizer:
|
||||
# Check if it's a mermaid block
|
||||
lang_line = parts[i].split("\n", 1)[0].strip().lower()
|
||||
if "mermaid" in lang_line:
|
||||
# Apply the comprehensive regex fix
|
||||
parts[i] = self._PATTERNS["mermaid_node"].sub(replacer, parts[i])
|
||||
# Protect edge labels (text between link start and arrow) from being modified
|
||||
# by temporarily replacing them with placeholders.
|
||||
# Covers all Mermaid link types:
|
||||
# - Solid line: A -- text --> B, A -- text --o B, A -- text --x B
|
||||
# - Dotted line: A -. text .-> B, A -. text .-o B
|
||||
# - Thick line: A == text ==> B, A == text ==o B
|
||||
# - No arrow: A -- text --- B
|
||||
edge_labels = []
|
||||
|
||||
def protect_edge_label(m):
|
||||
start = m.group(1) # Link start: --, -., or ==
|
||||
label = m.group(2) # Text content
|
||||
arrow = m.group(3) # Arrow/end pattern
|
||||
edge_labels.append((start, label, arrow))
|
||||
return f"___EDGE_LABEL_{len(edge_labels)-1}___"
|
||||
|
||||
# Comprehensive edge label pattern for all Mermaid link types
|
||||
edge_label_pattern = (
|
||||
r"(--|-\.|\=\=)\s+(.+?)\s+(--+[>ox]?|--+\|>|\.-[>ox]?|=+[>ox]?)"
|
||||
)
|
||||
protected = re.sub(edge_label_pattern, protect_edge_label, parts[i])
|
||||
|
||||
# Apply the comprehensive regex fix to protected content
|
||||
fixed = self._PATTERNS["mermaid_node"].sub(replacer, protected)
|
||||
|
||||
# Restore edge labels
|
||||
for idx, (start, label, arrow) in enumerate(edge_labels):
|
||||
fixed = fixed.replace(
|
||||
f"___EDGE_LABEL_{idx}___", f"{start} {label} {arrow}"
|
||||
)
|
||||
|
||||
parts[i] = fixed
|
||||
|
||||
# Auto-close subgraphs
|
||||
subgraph_count = len(
|
||||
@@ -368,6 +420,10 @@ class Filter:
|
||||
enable_escape_fix: bool = Field(
|
||||
default=True, description="Fix excessive escape characters (\\n, \\t, etc.)"
|
||||
)
|
||||
enable_escape_fix_in_code_blocks: bool = Field(
|
||||
default=False,
|
||||
description="Apply escape fix inside code blocks (⚠️ Warning: May break valid code like JSON strings or regex patterns. Default: False for safety)",
|
||||
)
|
||||
enable_thought_tag_fix: bool = Field(
|
||||
default=True, description="Normalize </thought> tags"
|
||||
)
|
||||
@@ -532,6 +588,7 @@ class Filter:
|
||||
# Configure normalizer based on valves
|
||||
config = NormalizerConfig(
|
||||
enable_escape_fix=self.valves.enable_escape_fix,
|
||||
enable_escape_fix_in_code_blocks=self.valves.enable_escape_fix_in_code_blocks,
|
||||
enable_thought_tag_fix=self.valves.enable_thought_tag_fix,
|
||||
enable_code_block_fix=self.valves.enable_code_block_fix,
|
||||
enable_latex_fix=self.valves.enable_latex_fix,
|
||||
|
||||
@@ -3,7 +3,7 @@ title: Markdown 格式修复器 (Markdown Normalizer)
|
||||
author: Fu-Jie
|
||||
author_url: https://github.com/Fu-Jie/awesome-openwebui
|
||||
funding_url: https://github.com/open-webui
|
||||
version: 1.1.0
|
||||
version: 1.1.2
|
||||
description: 内容规范化过滤器,修复 LLM 输出中常见的 Markdown 格式问题,如损坏的代码块、LaTeX 公式、Mermaid 图表和列表格式。
|
||||
"""
|
||||
|
||||
@@ -314,8 +314,38 @@ class ContentNormalizer:
|
||||
# Check if it's a mermaid block
|
||||
lang_line = parts[i].split("\n", 1)[0].strip().lower()
|
||||
if "mermaid" in lang_line:
|
||||
# Apply the comprehensive regex fix
|
||||
parts[i] = self._PATTERNS["mermaid_node"].sub(replacer, parts[i])
|
||||
# Protect edge labels (text between link start and arrow) from being modified
|
||||
# by temporarily replacing them with placeholders.
|
||||
# Covers all Mermaid link types:
|
||||
# - Solid line: A -- text --> B, A -- text --o B, A -- text --x B
|
||||
# - Dotted line: A -. text .-> B, A -. text .-o B
|
||||
# - Thick line: A == text ==> B, A == text ==o B
|
||||
# - No arrow: A -- text --- B
|
||||
edge_labels = []
|
||||
|
||||
def protect_edge_label(m):
|
||||
start = m.group(1) # Link start: --, -., or ==
|
||||
label = m.group(2) # Text content
|
||||
arrow = m.group(3) # Arrow/end pattern
|
||||
edge_labels.append((start, label, arrow))
|
||||
return f"___EDGE_LABEL_{len(edge_labels)-1}___"
|
||||
|
||||
# Comprehensive edge label pattern for all Mermaid link types
|
||||
edge_label_pattern = (
|
||||
r"(--|-\.|\=\=)\s+(.+?)\s+(--+[>ox]?|--+\|>|\.-[>ox]?|=+[>ox]?)"
|
||||
)
|
||||
protected = re.sub(edge_label_pattern, protect_edge_label, parts[i])
|
||||
|
||||
# Apply the comprehensive regex fix to protected content
|
||||
fixed = self._PATTERNS["mermaid_node"].sub(replacer, protected)
|
||||
|
||||
# Restore edge labels
|
||||
for idx, (start, label, arrow) in enumerate(edge_labels):
|
||||
fixed = fixed.replace(
|
||||
f"___EDGE_LABEL_{idx}___", f"{start} {label} {arrow}"
|
||||
)
|
||||
|
||||
parts[i] = fixed
|
||||
|
||||
# Auto-close subgraphs
|
||||
# Count 'subgraph' and 'end' (case-insensitive)
|
||||
@@ -491,15 +521,6 @@ class Filter:
|
||||
except Exception as e:
|
||||
print(f"Error emitting status: {e}")
|
||||
|
||||
async def _emit_debug_log(
|
||||
self,
|
||||
__event_emitter__,
|
||||
applied_fixes: List[str],
|
||||
original: str,
|
||||
normalized: str,
|
||||
):
|
||||
"""Emit debug log to browser console via JS execution"""
|
||||
|
||||
async def _emit_debug_log(
|
||||
self,
|
||||
__event_call__,
|
||||
|
||||
Reference in New Issue
Block a user