feat(markdown-normalizer): release v1.2.3 with bug fixes and test suite
This commit is contained in:
@@ -44,7 +44,7 @@ Filters act as middleware in the message pipeline:
|
|||||||
|
|
||||||
Fixes common Markdown formatting issues in LLM outputs, including Mermaid syntax, code blocks, and LaTeX formulas.
|
Fixes common Markdown formatting issues in LLM outputs, including Mermaid syntax, code blocks, and LaTeX formulas.
|
||||||
|
|
||||||
**Version:** 1.1.2
|
**Version:** 1.2.3
|
||||||
|
|
||||||
[:octicons-arrow-right-24: Documentation](markdown_normalizer.md)
|
[:octicons-arrow-right-24: Documentation](markdown_normalizer.md)
|
||||||
|
|
||||||
|
|||||||
@@ -44,7 +44,7 @@ Filter 充当消息管线中的中间件:
|
|||||||
|
|
||||||
修复 LLM 输出中常见的 Markdown 格式问题,包括 Mermaid 语法、代码块和 LaTeX 公式。
|
修复 LLM 输出中常见的 Markdown 格式问题,包括 Mermaid 语法、代码块和 LaTeX 公式。
|
||||||
|
|
||||||
**版本:** 1.0.1
|
**版本:** 1.2.3
|
||||||
|
|
||||||
[:octicons-arrow-right-24: 查看文档](markdown_normalizer.zh.md)
|
[:octicons-arrow-right-24: 查看文档](markdown_normalizer.zh.md)
|
||||||
|
|
||||||
|
|||||||
@@ -51,9 +51,17 @@ A content normalizer filter for Open WebUI that fixes common Markdown formatting
|
|||||||
|
|
||||||
## Changelog
|
## Changelog
|
||||||
|
|
||||||
|
### v1.2.3
|
||||||
|
|
||||||
|
* **List Marker Protection Enhancement**: Fixed a bug where list markers (`*`) followed by plain text and emphasis were having their spaces incorrectly stripped (e.g., `* U16 forward` became `*U16 forward`).
|
||||||
|
* **Placeholder Support**: Confirmed that 4 or more underscores (e.g., `____`) are correctly treated as placeholders and not modified by the emphasis fix.
|
||||||
|
|
||||||
### v1.2.2
|
### v1.2.2
|
||||||
|
|
||||||
* **Version Bump**: Documentation and metadata updated for the latest release.
|
* **Code Block Indentation Fix**: Fixed an issue where code blocks nested inside lists were having their indentation incorrectly stripped. Now preserves proper indentation for nested code blocks.
|
||||||
|
* **Underscore Emphasis Support**: Extended emphasis spacing fix to support `__` (double underscore for bold) and `___` (triple underscore for bold+italic) syntax.
|
||||||
|
* **List Marker Protection**: Fixed a bug where list markers (`*`) followed by emphasis markers (`**`) were incorrectly merged (e.g., `* **Yes**` became `***Yes**`). Added safeguard to prevent this.
|
||||||
|
* **Test Suite**: Added comprehensive pytest test suite with 56 test cases covering all major features.
|
||||||
|
|
||||||
### v1.2.1
|
### v1.2.1
|
||||||
|
|
||||||
|
|||||||
@@ -51,9 +51,17 @@
|
|||||||
|
|
||||||
## 更新日志
|
## 更新日志
|
||||||
|
|
||||||
|
### v1.2.3
|
||||||
|
|
||||||
|
* **列表标记保护增强**: 修复了列表标记 (`*`) 后跟普通文本和强调标记时,空格被错误剥离的问题(例如 `* U16 前锋` 变成 `*U16 前锋`)。
|
||||||
|
* **占位符支持**: 确认 4 个或更多下划线(如 `____`)会被正确视为占位符,不会被强调修复逻辑修改。
|
||||||
|
|
||||||
### v1.2.2
|
### v1.2.2
|
||||||
|
|
||||||
* **版本更新**: 文档与元数据已同步到最新版本。
|
* **代码块缩进修复**: 修复了列表中嵌套代码块的缩进被错误剥离的问题。现在会正确保留嵌套代码块的缩进。
|
||||||
|
* **下划线强调语法支持**: 扩展强调空格修复以支持 `__` (双下划线加粗) 和 `___` (三下划线加粗斜体) 语法。
|
||||||
|
* **列表标记保护**: 修复了列表标记 (`*`) 后跟强调标记 (`**`) 被错误合并的 Bug(例如 `* **是**` 变成 `***是**`)。添加了保护逻辑防止此问题。
|
||||||
|
* **测试套件**: 新增完整的 pytest 测试套件,包含 56 个测试用例,覆盖所有主要功能。
|
||||||
|
|
||||||
### v1.2.1
|
### v1.2.1
|
||||||
|
|
||||||
|
|||||||
@@ -53,9 +53,17 @@ A content normalizer filter for Open WebUI that fixes common Markdown formatting
|
|||||||
|
|
||||||
## Changelog
|
## Changelog
|
||||||
|
|
||||||
|
### v1.2.3
|
||||||
|
|
||||||
|
* **List Marker Protection Enhancement**: Fixed a bug where list markers (`*`) followed by plain text and emphasis were having their spaces incorrectly stripped (e.g., `* U16 forward` became `*U16 forward`).
|
||||||
|
* **Placeholder Support**: Confirmed that 4 or more underscores (e.g., `____`) are correctly treated as placeholders and not modified by the emphasis fix.
|
||||||
|
|
||||||
### v1.2.2
|
### v1.2.2
|
||||||
|
|
||||||
* **Version Bump**: Documentation and metadata updated for the latest release.
|
* **Code Block Indentation Fix**: Fixed an issue where code blocks nested inside lists were having their indentation incorrectly stripped. Now preserves proper indentation for nested code blocks.
|
||||||
|
* **Underscore Emphasis Support**: Extended emphasis spacing fix to support `__` (double underscore for bold) and `___` (triple underscore for bold+italic) syntax.
|
||||||
|
* **List Marker Protection**: Fixed a bug where list markers (`*`) followed by emphasis markers (`**`) were incorrectly merged (e.g., `* **Yes**` became `***Yes**`). Added safeguard to prevent this.
|
||||||
|
* **Test Suite**: Added comprehensive pytest test suite with 56 test cases covering all major features.
|
||||||
|
|
||||||
### v1.2.1
|
### v1.2.1
|
||||||
|
|
||||||
|
|||||||
@@ -53,9 +53,17 @@
|
|||||||
|
|
||||||
## 更新日志
|
## 更新日志
|
||||||
|
|
||||||
|
### v1.2.3
|
||||||
|
|
||||||
|
* **列表标记保护增强**: 修复了列表标记 (`*`) 后跟普通文本和强调标记时,空格被错误剥离的问题(例如 `* U16 前锋` 变成 `*U16 前锋`)。
|
||||||
|
* **占位符支持**: 确认 4 个或更多下划线(如 `____`)会被正确视为占位符,不会被强调修复逻辑修改。
|
||||||
|
|
||||||
### v1.2.2
|
### v1.2.2
|
||||||
|
|
||||||
* **版本更新**: 文档与元数据已同步到最新版本。
|
* **代码块缩进修复**: 修复了列表中嵌套代码块的缩进被错误剥离的问题。现在会正确保留嵌套代码块的缩进。
|
||||||
|
* **下划线强调语法支持**: 扩展强调空格修复以支持 `__` (双下划线加粗) 和 `___` (三下划线加粗斜体) 语法。
|
||||||
|
* **列表标记保护**: 修复了列表标记 (`*`) 后跟强调标记 (`**`) 被错误合并的 Bug(例如 `* **是**` 变成 `***是**`)。添加了保护逻辑防止此问题。
|
||||||
|
* **测试套件**: 新增完整的 pytest 测试套件,包含 56 个测试用例,覆盖所有主要功能。
|
||||||
|
|
||||||
### v1.2.1
|
### v1.2.1
|
||||||
|
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ title: Markdown Normalizer
|
|||||||
author: Fu-Jie
|
author: Fu-Jie
|
||||||
author_url: https://github.com/Fu-Jie/awesome-openwebui
|
author_url: https://github.com/Fu-Jie/awesome-openwebui
|
||||||
funding_url: https://github.com/open-webui
|
funding_url: https://github.com/open-webui
|
||||||
version: 1.2.2
|
version: 1.2.3
|
||||||
openwebui_id: baaa8732-9348-40b7-8359-7e009660e23c
|
openwebui_id: baaa8732-9348-40b7-8359-7e009660e23c
|
||||||
description: A content normalizer filter that fixes common Markdown formatting issues in LLM outputs, such as broken code blocks, LaTeX formulas, and list formatting.
|
description: A content normalizer filter that fixes common Markdown formatting issues in LLM outputs, such as broken code blocks, LaTeX formulas, and list formatting.
|
||||||
"""
|
"""
|
||||||
@@ -109,12 +109,13 @@ class ContentNormalizer:
|
|||||||
"heading_space": re.compile(r"^(#+)([^ \n#])", re.MULTILINE),
|
"heading_space": re.compile(r"^(#+)([^ \n#])", re.MULTILINE),
|
||||||
# Table: | col1 | col2 -> | col1 | col2 |
|
# Table: | col1 | col2 -> | col1 | col2 |
|
||||||
"table_pipe": re.compile(r"^(\|.*[^|\n])$", re.MULTILINE),
|
"table_pipe": re.compile(r"^(\|.*[^|\n])$", re.MULTILINE),
|
||||||
# Emphasis spacing: ** text ** -> **text**
|
# Emphasis spacing: ** text ** -> **text**, __ text __ -> __text__
|
||||||
# Matches emphasis blocks within a single line. We use a recursive approach
|
# Matches emphasis blocks within a single line. We use a recursive approach
|
||||||
# in _fix_emphasis_spacing to handle nesting and spaces correctly.
|
# in _fix_emphasis_spacing to handle nesting and spaces correctly.
|
||||||
# NOTE: We use [^\n] instead of . to prevent cross-line matching.
|
# NOTE: We use [^\n] instead of . to prevent cross-line matching.
|
||||||
|
# Supports: * (italic), ** (bold), *** (bold+italic), _ (italic), __ (bold), ___ (bold+italic)
|
||||||
"emphasis_spacing": re.compile(
|
"emphasis_spacing": re.compile(
|
||||||
r"(?<!\*|_)(\*{1,3}|_)(?P<inner>[^\n]*?)(\1)(?!\*|_)"
|
r"(?<!\*|_)(\*{1,3}|_{1,3})(?P<inner>[^\n]*?)(\1)(?!\*|_)"
|
||||||
),
|
),
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -485,6 +486,20 @@ class ContentNormalizer:
|
|||||||
if symbol in ["*", "_"]:
|
if symbol in ["*", "_"]:
|
||||||
return match.group(0)
|
return match.group(0)
|
||||||
|
|
||||||
|
# Safeguard: List marker protection
|
||||||
|
# If symbol is single '*' and inner content starts with whitespace followed by emphasis markers,
|
||||||
|
# this is likely a list item like "* **bold**" - don't merge them.
|
||||||
|
# Pattern: "* **text**" should NOT become "***text**"
|
||||||
|
if symbol == "*" and inner.lstrip().startswith(("*", "_")):
|
||||||
|
return match.group(0)
|
||||||
|
|
||||||
|
# Extended list marker protection:
|
||||||
|
# If symbol is single '*' and inner starts with multiple spaces (list indentation pattern),
|
||||||
|
# this is likely a list item like "* text" - don't strip the spaces.
|
||||||
|
# Pattern: "* U16 forward **Kuang**" should NOT become "*U16 forward **Kuang**"
|
||||||
|
if symbol == "*" and inner.startswith(" "):
|
||||||
|
return match.group(0)
|
||||||
|
|
||||||
return f"{symbol}{stripped_inner}{symbol}"
|
return f"{symbol}{stripped_inner}{symbol}"
|
||||||
|
|
||||||
parts = content.split("```")
|
parts = content.split("```")
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ title: Markdown 格式修复器 (Markdown Normalizer)
|
|||||||
author: Fu-Jie
|
author: Fu-Jie
|
||||||
author_url: https://github.com/Fu-Jie/awesome-openwebui
|
author_url: https://github.com/Fu-Jie/awesome-openwebui
|
||||||
funding_url: https://github.com/open-webui
|
funding_url: https://github.com/open-webui
|
||||||
version: 1.2.2
|
version: 1.2.3
|
||||||
description: 内容规范化过滤器,修复 LLM 输出中常见的 Markdown 格式问题,如损坏的代码块、LaTeX 公式、Mermaid 图表和列表格式。
|
description: 内容规范化过滤器,修复 LLM 输出中常见的 Markdown 格式问题,如损坏的代码块、LaTeX 公式、Mermaid 图表和列表格式。
|
||||||
"""
|
"""
|
||||||
|
|
||||||
@@ -101,12 +101,13 @@ class ContentNormalizer:
|
|||||||
"heading_space": re.compile(r"^(#+)([^ \n#])", re.MULTILINE),
|
"heading_space": re.compile(r"^(#+)([^ \n#])", re.MULTILINE),
|
||||||
# Table: | col1 | col2 -> | col1 | col2 |
|
# Table: | col1 | col2 -> | col1 | col2 |
|
||||||
"table_pipe": re.compile(r"^(\|.*[^|\n])$", re.MULTILINE),
|
"table_pipe": re.compile(r"^(\|.*[^|\n])$", re.MULTILINE),
|
||||||
# Emphasis spacing: ** text ** -> **text**
|
# Emphasis spacing: ** text ** -> **text**, __ text __ -> __text__
|
||||||
# Matches emphasis blocks within a single line. We use a recursive approach
|
# Matches emphasis blocks within a single line. We use a recursive approach
|
||||||
# in _fix_emphasis_spacing to handle nesting and spaces correctly.
|
# in _fix_emphasis_spacing to handle nesting and spaces correctly.
|
||||||
# NOTE: We use [^\n] instead of . to prevent cross-line matching.
|
# NOTE: We use [^\n] instead of . to prevent cross-line matching.
|
||||||
|
# Supports: * (italic), ** (bold), *** (bold+italic), _ (italic), __ (bold), ___ (bold+italic)
|
||||||
"emphasis_spacing": re.compile(
|
"emphasis_spacing": re.compile(
|
||||||
r"(?<!\*|_)(\*{1,3}|_)(?P<inner>[^\n]*?)(\1)(?!\*|_)"
|
r"(?<!\*|_)(\*{1,3}|_{1,3})(?P<inner>[^\n]*?)(\1)(?!\*|_)"
|
||||||
),
|
),
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -464,6 +465,20 @@ class ContentNormalizer:
|
|||||||
if symbol in ["*", "_"]:
|
if symbol in ["*", "_"]:
|
||||||
return match.group(0)
|
return match.group(0)
|
||||||
|
|
||||||
|
# Safeguard: List marker protection
|
||||||
|
# If symbol is single '*' and inner content starts with whitespace followed by emphasis markers,
|
||||||
|
# this is likely a list item like "* **bold**" - don't merge them.
|
||||||
|
# Pattern: "* **text**" should NOT become "***text**"
|
||||||
|
if symbol == "*" and inner.lstrip().startswith(("*", "_")):
|
||||||
|
return match.group(0)
|
||||||
|
|
||||||
|
# Extended list marker protection:
|
||||||
|
# If symbol is single '*' and inner starts with multiple spaces (list indentation pattern),
|
||||||
|
# this is likely a list item like "* text" - don't strip the spaces.
|
||||||
|
# Pattern: "* U16 forward **Kuang**" should NOT become "*U16 forward **Kuang**"
|
||||||
|
if symbol == "*" and inner.startswith(" "):
|
||||||
|
return match.group(0)
|
||||||
|
|
||||||
return f"{symbol}{stripped_inner}{symbol}"
|
return f"{symbol}{stripped_inner}{symbol}"
|
||||||
|
|
||||||
parts = content.split("```")
|
parts = content.split("```")
|
||||||
|
|||||||
1
plugins/filters/markdown_normalizer/tests/__init__.py
Normal file
1
plugins/filters/markdown_normalizer/tests/__init__.py
Normal file
@@ -0,0 +1 @@
|
|||||||
|
# Markdown Normalizer Test Suite
|
||||||
75
plugins/filters/markdown_normalizer/tests/conftest.py
Normal file
75
plugins/filters/markdown_normalizer/tests/conftest.py
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
"""
|
||||||
|
Shared fixtures for Markdown Normalizer tests.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
|
||||||
|
# Add the parent directory to sys.path for imports
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from markdown_normalizer import ContentNormalizer, NormalizerConfig
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def normalizer():
|
||||||
|
"""Default normalizer with all fixes enabled."""
|
||||||
|
config = NormalizerConfig(
|
||||||
|
enable_escape_fix=True,
|
||||||
|
enable_thought_tag_fix=True,
|
||||||
|
enable_details_tag_fix=True,
|
||||||
|
enable_code_block_fix=True,
|
||||||
|
enable_latex_fix=True,
|
||||||
|
enable_list_fix=False, # Experimental, keep off by default
|
||||||
|
enable_unclosed_block_fix=True,
|
||||||
|
enable_fullwidth_symbol_fix=False,
|
||||||
|
enable_mermaid_fix=True,
|
||||||
|
enable_heading_fix=True,
|
||||||
|
enable_table_fix=True,
|
||||||
|
enable_xml_tag_cleanup=True,
|
||||||
|
enable_emphasis_spacing_fix=True,
|
||||||
|
)
|
||||||
|
return ContentNormalizer(config)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def emphasis_only_normalizer():
|
||||||
|
"""Normalizer with only emphasis spacing fix enabled."""
|
||||||
|
config = NormalizerConfig(
|
||||||
|
enable_escape_fix=False,
|
||||||
|
enable_thought_tag_fix=False,
|
||||||
|
enable_details_tag_fix=False,
|
||||||
|
enable_code_block_fix=False,
|
||||||
|
enable_latex_fix=False,
|
||||||
|
enable_list_fix=False,
|
||||||
|
enable_unclosed_block_fix=False,
|
||||||
|
enable_fullwidth_symbol_fix=False,
|
||||||
|
enable_mermaid_fix=False,
|
||||||
|
enable_heading_fix=False,
|
||||||
|
enable_table_fix=False,
|
||||||
|
enable_xml_tag_cleanup=False,
|
||||||
|
enable_emphasis_spacing_fix=True,
|
||||||
|
)
|
||||||
|
return ContentNormalizer(config)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def mermaid_only_normalizer():
|
||||||
|
"""Normalizer with only Mermaid fix enabled."""
|
||||||
|
config = NormalizerConfig(
|
||||||
|
enable_escape_fix=False,
|
||||||
|
enable_thought_tag_fix=False,
|
||||||
|
enable_details_tag_fix=False,
|
||||||
|
enable_code_block_fix=False,
|
||||||
|
enable_latex_fix=False,
|
||||||
|
enable_list_fix=False,
|
||||||
|
enable_unclosed_block_fix=False,
|
||||||
|
enable_fullwidth_symbol_fix=False,
|
||||||
|
enable_mermaid_fix=True,
|
||||||
|
enable_heading_fix=False,
|
||||||
|
enable_table_fix=False,
|
||||||
|
enable_xml_tag_cleanup=False,
|
||||||
|
enable_emphasis_spacing_fix=False,
|
||||||
|
)
|
||||||
|
return ContentNormalizer(config)
|
||||||
@@ -0,0 +1,54 @@
|
|||||||
|
"""
|
||||||
|
Tests for code block formatting fixes.
|
||||||
|
Covers: prefix, suffix, indentation preservation.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
class TestCodeBlockFix:
|
||||||
|
"""Test code block formatting normalization."""
|
||||||
|
|
||||||
|
def test_code_block_indentation_preserved(self, normalizer):
|
||||||
|
"""Indented code blocks (e.g., in lists) should preserve indentation."""
|
||||||
|
input_str = """
|
||||||
|
* List item 1
|
||||||
|
```python
|
||||||
|
def foo():
|
||||||
|
print("bar")
|
||||||
|
```
|
||||||
|
* List item 2
|
||||||
|
"""
|
||||||
|
# Indentation should be preserved
|
||||||
|
assert " ```python" in normalizer.normalize(input_str)
|
||||||
|
|
||||||
|
def test_inline_code_block_prefix(self, normalizer):
|
||||||
|
"""Code block that follows text on same line should be modified."""
|
||||||
|
input_str = "text```python\ncode\n```"
|
||||||
|
result = normalizer.normalize(input_str)
|
||||||
|
# Just verify the code block markers are present
|
||||||
|
assert "```" in result
|
||||||
|
|
||||||
|
def test_code_block_suffix_fix(self, normalizer):
|
||||||
|
"""Code block with content on same line after lang should be fixed."""
|
||||||
|
input_str = "```python code\nmore code\n```"
|
||||||
|
result = normalizer.normalize(input_str)
|
||||||
|
# Content should be on new line
|
||||||
|
assert "```python\n" in result or "```python " in result
|
||||||
|
|
||||||
|
|
||||||
|
class TestUnclosedCodeBlock:
|
||||||
|
"""Test auto-closing of unclosed code blocks."""
|
||||||
|
|
||||||
|
def test_unclosed_code_block_is_closed(self, normalizer):
|
||||||
|
"""Unclosed code blocks should be automatically closed."""
|
||||||
|
input_str = "```python\ncode here"
|
||||||
|
result = normalizer.normalize(input_str)
|
||||||
|
# Should have closing ```
|
||||||
|
assert result.endswith("```") or result.count("```") == 2
|
||||||
|
|
||||||
|
def test_balanced_code_blocks_unchanged(self, normalizer):
|
||||||
|
"""Already balanced code blocks should not get extra closing."""
|
||||||
|
input_str = "```python\ncode\n```"
|
||||||
|
result = normalizer.normalize(input_str)
|
||||||
|
assert result.count("```") == 2
|
||||||
@@ -0,0 +1,48 @@
|
|||||||
|
"""
|
||||||
|
Tests for details tag normalization.
|
||||||
|
Covers: </details> spacing, self-closing tags.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
class TestDetailsTagFix:
|
||||||
|
"""Test details tag normalization."""
|
||||||
|
|
||||||
|
def test_details_end_gets_newlines(self, normalizer):
|
||||||
|
"""</details> should be followed by double newline."""
|
||||||
|
input_str = "</details>Content after"
|
||||||
|
result = normalizer.normalize(input_str)
|
||||||
|
assert "</details>\n\n" in result
|
||||||
|
|
||||||
|
def test_self_closing_details_gets_newline(self, normalizer):
|
||||||
|
"""Self-closing <details .../> should get newline after."""
|
||||||
|
input_str = "<details open />## Heading"
|
||||||
|
result = normalizer.normalize(input_str)
|
||||||
|
# Should have newline between tag and heading
|
||||||
|
assert "/>\n" in result or "/> \n" in result
|
||||||
|
|
||||||
|
def test_details_in_code_block_unchanged(self, normalizer):
|
||||||
|
"""Details tags inside code blocks should not be modified."""
|
||||||
|
input_str = "```html\n<details>content</details>more\n```"
|
||||||
|
result = normalizer.normalize(input_str)
|
||||||
|
# Content inside code block should be unchanged
|
||||||
|
assert "</details>more" in result
|
||||||
|
|
||||||
|
|
||||||
|
class TestThoughtTagFix:
|
||||||
|
"""Test thought tag normalization."""
|
||||||
|
|
||||||
|
def test_think_tag_normalized(self, normalizer):
|
||||||
|
"""<think> should be normalized to <thought>."""
|
||||||
|
input_str = "<think>content</think>"
|
||||||
|
result = normalizer.normalize(input_str)
|
||||||
|
assert "<thought>" in result
|
||||||
|
assert "</thought>" in result
|
||||||
|
|
||||||
|
def test_thinking_tag_normalized(self, normalizer):
|
||||||
|
"""<thinking> should be normalized to <thought>."""
|
||||||
|
input_str = "<thinking>content</thinking>"
|
||||||
|
result = normalizer.normalize(input_str)
|
||||||
|
assert "<thought>" in result
|
||||||
|
assert "</thought>" in result
|
||||||
@@ -0,0 +1,138 @@
|
|||||||
|
"""
|
||||||
|
Tests for emphasis spacing fix.
|
||||||
|
Covers: *, **, ***, _, __, ___ with spaces inside.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
class TestEmphasisSpacingFix:
|
||||||
|
"""Test emphasis spacing normalization."""
|
||||||
|
|
||||||
|
@pytest.mark.parametrize(
|
||||||
|
"input_str,expected",
|
||||||
|
[
|
||||||
|
# Double asterisk (bold)
|
||||||
|
("** bold **", "**bold**"),
|
||||||
|
("** bold text **", "**bold text**"),
|
||||||
|
("**text **", "**text**"),
|
||||||
|
("** text**", "**text**"),
|
||||||
|
# Triple asterisk (bold+italic)
|
||||||
|
("*** bold italic ***", "***bold italic***"),
|
||||||
|
# Double underscore (bold)
|
||||||
|
("__ bold __", "__bold__"),
|
||||||
|
("__ bold text __", "__bold text__"),
|
||||||
|
("__text __", "__text__"),
|
||||||
|
("__ text__", "__text__"),
|
||||||
|
# Triple underscore (bold+italic)
|
||||||
|
("___ bold italic ___", "___bold italic___"),
|
||||||
|
# Mixed markers
|
||||||
|
("** bold ** and __ also __", "**bold** and __also__"),
|
||||||
|
],
|
||||||
|
)
|
||||||
|
def test_emphasis_with_spaces_fixed(
|
||||||
|
self, emphasis_only_normalizer, input_str, expected
|
||||||
|
):
|
||||||
|
"""Test that emphasis with spaces is correctly fixed."""
|
||||||
|
assert emphasis_only_normalizer.normalize(input_str) == expected
|
||||||
|
|
||||||
|
@pytest.mark.parametrize(
|
||||||
|
"input_str",
|
||||||
|
[
|
||||||
|
# Single * and _ with spaces on both sides - treated as operator (safeguard)
|
||||||
|
"* italic *",
|
||||||
|
"_ italic _",
|
||||||
|
# Already correct emphasis
|
||||||
|
"**bold**",
|
||||||
|
"__bold__",
|
||||||
|
"*italic*",
|
||||||
|
"_italic_",
|
||||||
|
"***bold italic***",
|
||||||
|
"___bold italic___",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
def test_safeguard_and_correct_emphasis_unchanged(
|
||||||
|
self, emphasis_only_normalizer, input_str
|
||||||
|
):
|
||||||
|
"""Test that safeguard cases and already correct emphasis are not modified."""
|
||||||
|
assert emphasis_only_normalizer.normalize(input_str) == input_str
|
||||||
|
|
||||||
|
|
||||||
|
class TestEmphasisSideEffects:
|
||||||
|
"""Test that emphasis fix does NOT affect unrelated content."""
|
||||||
|
|
||||||
|
@pytest.mark.parametrize(
|
||||||
|
"input_str,description",
|
||||||
|
[
|
||||||
|
# URLs with underscores
|
||||||
|
("https://example.com/path_with_underscore", "URL"),
|
||||||
|
("Visit https://api.example.com/get_user_info for info", "URL in text"),
|
||||||
|
# Variable names (snake_case)
|
||||||
|
("The `my_variable_name` is important", "Variable in backticks"),
|
||||||
|
("Use `get_user_data()` function", "Function name"),
|
||||||
|
# File names
|
||||||
|
("Edit the `config_file_name.py` file", "File name"),
|
||||||
|
("See `my_script__v2.py` for details", "Double underscore in filename"),
|
||||||
|
# Math-like subscripts
|
||||||
|
("The variable a_1 and b_2 are defined", "Math subscripts"),
|
||||||
|
# Single underscores not matching emphasis pattern
|
||||||
|
("word_with_underscore", "Underscore in word"),
|
||||||
|
("a_b_c_d", "Multiple underscores"),
|
||||||
|
# Horizontal rules
|
||||||
|
("---", "HR with dashes"),
|
||||||
|
("***", "HR with asterisks"),
|
||||||
|
("___", "HR with underscores"),
|
||||||
|
# List items
|
||||||
|
("- item_one\n- item_two", "List items"),
|
||||||
|
],
|
||||||
|
)
|
||||||
|
def test_no_side_effects(self, emphasis_only_normalizer, input_str, description):
|
||||||
|
"""Test that various content types are NOT modified by emphasis fix."""
|
||||||
|
assert (
|
||||||
|
emphasis_only_normalizer.normalize(input_str) == input_str
|
||||||
|
), f"Failed for: {description}"
|
||||||
|
|
||||||
|
def test_list_marker_not_merged_with_emphasis(self, emphasis_only_normalizer):
|
||||||
|
"""Test that list markers (*) are not merged with emphasis (**).
|
||||||
|
|
||||||
|
Regression test for: "* **Yes**" should NOT become "***Yes**"
|
||||||
|
"""
|
||||||
|
input_str = """1. **Start**: The user opens the login page.
|
||||||
|
* **Yes**: Login successful.
|
||||||
|
* **No**: Show error message."""
|
||||||
|
result = emphasis_only_normalizer.normalize(input_str)
|
||||||
|
assert (
|
||||||
|
"* **Yes**" in result
|
||||||
|
), "List marker was incorrectly merged with emphasis"
|
||||||
|
assert (
|
||||||
|
"* **No**" in result
|
||||||
|
), "List marker was incorrectly merged with emphasis"
|
||||||
|
assert "***Yes**" not in result, "BUG: List marker merged with emphasis"
|
||||||
|
assert "***No**" not in result, "BUG: List marker merged with emphasis"
|
||||||
|
|
||||||
|
def test_list_marker_with_plain_text_then_emphasis(self, emphasis_only_normalizer):
|
||||||
|
"""Test that list items with plain text before emphasis are preserved.
|
||||||
|
|
||||||
|
Regression test for: "* U16 forward **Kuang**" should NOT become "*U16 forward **Kuang**"
|
||||||
|
"""
|
||||||
|
input_str = "* U16 China forward **Kuang Zhaolei**"
|
||||||
|
result = emphasis_only_normalizer.normalize(input_str)
|
||||||
|
assert "* U16" in result, "List marker spaces were incorrectly stripped"
|
||||||
|
assert (
|
||||||
|
"*U16" not in result or "* U16" in result
|
||||||
|
), "BUG: List marker spaces stripped"
|
||||||
|
|
||||||
|
|
||||||
|
class TestEmphasisInCodeBlocks:
|
||||||
|
"""Test that emphasis inside code blocks is NOT modified."""
|
||||||
|
|
||||||
|
def test_emphasis_in_code_block_unchanged(self, emphasis_only_normalizer):
|
||||||
|
"""Code blocks should be completely skipped."""
|
||||||
|
input_str = "```python\nmy_var = get_data__from_api()\n```"
|
||||||
|
assert emphasis_only_normalizer.normalize(input_str) == input_str
|
||||||
|
|
||||||
|
def test_mixed_emphasis_and_code(self, emphasis_only_normalizer):
|
||||||
|
"""Text outside code blocks should be fixed, inside should not."""
|
||||||
|
input_str = "** bold ** text\n```python\n** not bold **\n```"
|
||||||
|
expected = "**bold** text\n```python\n** not bold **\n```"
|
||||||
|
assert emphasis_only_normalizer.normalize(input_str) == expected
|
||||||
@@ -0,0 +1,51 @@
|
|||||||
|
"""
|
||||||
|
Tests for heading fix.
|
||||||
|
Covers: Missing space after # in headings.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
class TestHeadingFix:
|
||||||
|
"""Test heading space normalization."""
|
||||||
|
|
||||||
|
@pytest.mark.parametrize(
|
||||||
|
"input_str,expected",
|
||||||
|
[
|
||||||
|
("#Heading", "# Heading"),
|
||||||
|
("##Heading", "## Heading"),
|
||||||
|
("###Heading", "### Heading"),
|
||||||
|
("#中文标题", "# 中文标题"),
|
||||||
|
("#123", "# 123"), # Numbers after # also get space
|
||||||
|
],
|
||||||
|
)
|
||||||
|
def test_missing_space_added(self, normalizer, input_str, expected):
|
||||||
|
"""Headings missing space after # should be fixed."""
|
||||||
|
assert normalizer.normalize(input_str) == expected
|
||||||
|
|
||||||
|
@pytest.mark.parametrize(
|
||||||
|
"input_str",
|
||||||
|
[
|
||||||
|
"# Heading",
|
||||||
|
"## Already Correct",
|
||||||
|
"###", # Just hashes
|
||||||
|
],
|
||||||
|
)
|
||||||
|
def test_correct_headings_unchanged(self, normalizer, input_str):
|
||||||
|
"""Already correct headings should not be modified."""
|
||||||
|
assert normalizer.normalize(input_str) == input_str
|
||||||
|
|
||||||
|
|
||||||
|
class TestTableFix:
|
||||||
|
"""Test table pipe normalization."""
|
||||||
|
|
||||||
|
def test_missing_closing_pipe_added(self, normalizer):
|
||||||
|
"""Tables missing closing | should have it added."""
|
||||||
|
input_str = "| col1 | col2"
|
||||||
|
result = normalizer.normalize(input_str)
|
||||||
|
assert result.endswith("|") or "col2 |" in result
|
||||||
|
|
||||||
|
def test_already_closed_table_unchanged(self, normalizer):
|
||||||
|
"""Tables with closing | should not be modified."""
|
||||||
|
input_str = "| col1 | col2 |"
|
||||||
|
assert normalizer.normalize(input_str) == input_str
|
||||||
6
pytest.ini
Normal file
6
pytest.ini
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
[pytest]
|
||||||
|
testpaths = plugins
|
||||||
|
python_files = test_*.py
|
||||||
|
python_classes = Test*
|
||||||
|
python_functions = test_*
|
||||||
|
addopts = -v --tb=short
|
||||||
Reference in New Issue
Block a user