Fix Mermaid syntax normalization: preserve quoted strings and prevent false positives
This commit is contained in:
162
plugins/filters/markdown_normalizer/FEATURES_CN.md
Normal file
162
plugins/filters/markdown_normalizer/FEATURES_CN.md
Normal file
@@ -0,0 +1,162 @@
|
||||
# Markdown Normalizer 功能详解
|
||||
|
||||
本插件旨在修复 LLM 输出中常见的 Markdown 格式问题,确保在 Open WebUI 中完美渲染。以下是支持的修复功能列表及示例。
|
||||
|
||||
## 1. 代码块修复 (Code Block Fixes)
|
||||
|
||||
### 1.1 去除代码块缩进
|
||||
LLM 有时会在代码块前添加空格缩进,导致渲染失效。本插件会自动移除这些缩进。
|
||||
|
||||
**Before:**
|
||||
```python
|
||||
print("hello")
|
||||
```
|
||||
|
||||
**After:**
|
||||
```python
|
||||
print("hello")
|
||||
```
|
||||
|
||||
### 1.2 补全代码块前后换行
|
||||
代码块标记 ` ``` ` 必须独占一行。如果 LLM 将其与文本混在一行,插件会自动修复。
|
||||
|
||||
**Before:**
|
||||
Here is code:```python
|
||||
print("hello")```
|
||||
|
||||
**After:**
|
||||
Here is code:
|
||||
```python
|
||||
print("hello")
|
||||
```
|
||||
|
||||
### 1.3 修复语言标识符后的换行
|
||||
有时 LLM 会忘记在语言标识符(如 `python`)后换行。
|
||||
|
||||
**Before:**
|
||||
```python print("hello")
|
||||
```
|
||||
|
||||
**After:**
|
||||
```python
|
||||
print("hello")
|
||||
```
|
||||
|
||||
### 1.4 自动闭合代码块
|
||||
如果输出被截断或 LLM 忘记闭合代码块,插件会自动添加结尾的 ` ``` `。
|
||||
|
||||
**Before:**
|
||||
```python
|
||||
print("unfinished code...")
|
||||
|
||||
**After:**
|
||||
```python
|
||||
print("unfinished code...")
|
||||
```
|
||||
|
||||
## 2. LaTeX 公式规范化 (LaTeX Normalization)
|
||||
|
||||
Open WebUI 使用 MathJax/KaTeX 渲染公式,通常需要 `$$` 或 `$` 包裹。本插件会将常见的 LaTeX 括号语法转换为标准格式。
|
||||
|
||||
**Before:**
|
||||
块级公式:\[ E = mc^2 \]
|
||||
行内公式:\( a^2 + b^2 = c^2 \)
|
||||
|
||||
**After:**
|
||||
块级公式:$$ E = mc^2 $$
|
||||
行内公式:$ a^2 + b^2 = c^2 $
|
||||
|
||||
## 3. 转义字符清理 (Escape Character Fix)
|
||||
|
||||
修复过度转义的字符,这常见于某些 API 返回的原始字符串中。
|
||||
|
||||
**Before:**
|
||||
Line 1\\nLine 2\\tTabbed
|
||||
|
||||
**After:**
|
||||
Line 1
|
||||
Line 2 Tabbed
|
||||
|
||||
## 4. 思维链标签规范化 (Thought Tag Fix)
|
||||
**功能**:
|
||||
1. 确保 `</thought>` 标签后有足够的空行,防止思维链内容与正文粘连。
|
||||
2. **标准化标签**: 将 `<think>` (DeepSeek 等模型常用) 或 `<thinking>` 统一转换为 Open WebUI 标准的 `<thought>` 标签,以便正确触发 UI 的折叠功能。
|
||||
|
||||
**默认**: 开启 (`enable_thought_tag_fix = True`)
|
||||
|
||||
**示例**:
|
||||
* **Before**: `<think>Thinking...</think>Response starts here.`
|
||||
* **After**:
|
||||
```xml
|
||||
<thought>Thinking...</thought>
|
||||
|
||||
Response starts here.
|
||||
```
|
||||
|
||||
## 5. 列表格式修复 (List Formatting Fix)
|
||||
|
||||
*默认关闭,需在设置中开启*
|
||||
|
||||
修复列表项缺少换行的问题。
|
||||
|
||||
**Before:**
|
||||
Header1. Item 1
|
||||
|
||||
**After:**
|
||||
Header
|
||||
1. Item 1
|
||||
|
||||
## 6. 全角符号转半角 (Full-width Symbol Fix)
|
||||
|
||||
*默认关闭,需在设置中开启*
|
||||
|
||||
仅在**代码块内部**将全角符号转换为半角符号,防止代码因符号问题无法运行。
|
||||
|
||||
**Before:**
|
||||
```python
|
||||
if x == 1:
|
||||
print("hello")
|
||||
```
|
||||
|
||||
**After:**
|
||||
```python
|
||||
if x == 1:
|
||||
print("hello")
|
||||
```
|
||||
|
||||
## 7. Mermaid 语法修复 (Mermaid Syntax Fix)
|
||||
**功能**: 修复 Mermaid 图表中常见的语法错误,特别是未加引号的标签包含特殊字符的情况。
|
||||
**默认**: 开启 (`enable_mermaid_fix = True`)
|
||||
**示例**:
|
||||
* **Before**:
|
||||
```mermaid
|
||||
graph TD
|
||||
A[Label with (parens)] --> B(Label with [brackets])
|
||||
```
|
||||
* **After**:
|
||||
```mermaid
|
||||
graph TD
|
||||
A["Label with (parens)"] --> B("Label with [brackets]")
|
||||
```
|
||||
|
||||
## 8. XML 标签清理 (XML Cleanup)
|
||||
|
||||
移除 LLM 输出中残留的无用 XML 标签(如 Claude 的 artifact 标签)。
|
||||
|
||||
**Before:**
|
||||
Here is the result <antArtifact>hidden metadata</antArtifact>.
|
||||
|
||||
**After:**
|
||||
## 9. 标题格式修复 (Heading Format Fix)
|
||||
**功能**: 修复标题标记 `#` 后缺少空格的问题。
|
||||
**默认**: 开启 (`enable_heading_fix = True`)
|
||||
**示例**:
|
||||
* **Before**: `#Heading 1`
|
||||
* **After**: `# Heading 1`
|
||||
|
||||
## 10. 表格格式修复 (Table Format Fix)
|
||||
**功能**: 修复表格行末尾缺少管道符 `|` 的问题。
|
||||
**默认**: 开启 (`enable_table_fix = True`)
|
||||
**示例**:
|
||||
* **Before**: `| Col 1 | Col 2`
|
||||
* **After**: `| Col 1 | Col 2 |`
|
||||
@@ -74,6 +74,7 @@ class ContentNormalizer:
|
||||
# Fix "reverse optimization": Must precisely match shape delimiters to avoid breaking structure
|
||||
# Priority: Longer delimiters match first
|
||||
"mermaid_node": re.compile(
|
||||
r'("[^"\\]*(?:\\.[^"\\]*)*")|' # Match quoted strings first (Group 1)
|
||||
r"(\w+)\s*(?:"
|
||||
r"(\(\(\()(?![\"])(.*?)(?<![\"])(\)\)\))|" # (((...))) Double Circle
|
||||
r"(\(\()(?![\"])(.*?)(?<![\"])(\)\))|" # ((...)) Circle
|
||||
@@ -281,14 +282,18 @@ class ContentNormalizer:
|
||||
"""Fix common Mermaid syntax errors while preserving node shapes"""
|
||||
|
||||
def replacer(match):
|
||||
# Group 1 is ID
|
||||
id_str = match.group(1)
|
||||
# Group 1 is Quoted String (if matched)
|
||||
if match.group(1):
|
||||
return match.group(1)
|
||||
|
||||
# Group 2 is ID
|
||||
id_str = match.group(2)
|
||||
|
||||
# Find matching shape group
|
||||
# Groups start at index 2, each shape has 3 groups (Open, Content, Close)
|
||||
# We iterate to find the non-None one
|
||||
# Groups start at index 3 (in match.group terms) or index 2 (in match.groups() tuple)
|
||||
# Tuple: (String, ID, Open1, Content1, Close1, ...)
|
||||
groups = match.groups()
|
||||
for i in range(1, len(groups), 3):
|
||||
for i in range(2, len(groups), 3):
|
||||
if groups[i] is not None:
|
||||
open_char = groups[i]
|
||||
content = groups[i + 1]
|
||||
|
||||
@@ -69,6 +69,7 @@ class ContentNormalizer:
|
||||
# 修复"反向优化"问题:必须精确匹配各种形状的定界符,避免破坏形状结构
|
||||
# 优先级:长定界符优先匹配
|
||||
"mermaid_node": re.compile(
|
||||
r'("[^"\\]*(?:\\.[^"\\]*)*")|' # Match quoted strings first (Group 1)
|
||||
r"(\w+)\s*(?:"
|
||||
r"(\(\(\()(?![\"])(.*?)(?<![\"])(\)\)\))|" # (((...))) Double Circle
|
||||
r"(\(\()(?![\"])(.*?)(?<![\"])(\)\))|" # ((...)) Circle
|
||||
@@ -276,14 +277,18 @@ class ContentNormalizer:
|
||||
"""修复常见的 Mermaid 语法错误,同时保留节点形状"""
|
||||
|
||||
def replacer(match):
|
||||
# Group 1 是 ID
|
||||
id_str = match.group(1)
|
||||
# Group 1 is Quoted String (if matched)
|
||||
if match.group(1):
|
||||
return match.group(1)
|
||||
|
||||
# 查找匹配的形状组
|
||||
# 组从索引 2 开始,每个形状有 3 个组 (Open, Content, Close)
|
||||
# 我们遍历找到非 None 的那一组
|
||||
# Group 2 is ID
|
||||
id_str = match.group(2)
|
||||
|
||||
# Find matching shape group
|
||||
# Groups start at index 3 (in match.group terms) or index 2 (in match.groups() tuple)
|
||||
# Tuple: (String, ID, Open1, Content1, Close1, ...)
|
||||
groups = match.groups()
|
||||
for i in range(1, len(groups), 3):
|
||||
for i in range(2, len(groups), 3):
|
||||
if groups[i] is not None:
|
||||
open_char = groups[i]
|
||||
content = groups[i + 1]
|
||||
|
||||
Reference in New Issue
Block a user