Files

fujie 7efb64b16b feat(async-context-compression): release v1.4.0 with structure-aware grouping and session locking

- Introduced Atomic Message Grouping to prevent tool-calling corruption (Issue #56)
- Implemented Tail Boundary Alignment for deterministic context truncation
- Added per-chat asynchronous session locking to prevent duplicate background tasks
- Enhanced summarization traceability with message IDs and names
- Synchronized version and changelog across all documentation files
- Optimized release-prep skill to remove redundant H1 titles

Closes #56

2026-03-09 20:50:24 +08:00

4.7 KiB

Raw Blame History

Fix: OpenAI API Error "messages with role 'tool' must be a response to a preceding message with 'tool_calls'"

Problem Description

In the async-context-compression filter, chat history can be trimmed or summarized when the conversation grows. If the retained tail starts in the middle of a native tool-calling sequence, the next request may begin with a tool message whose triggering assistant message is no longer present.

That produces the OpenAI API error: "messages with role 'tool' must be a response to a preceding message with 'tool_calls'"

Root Cause

History compression boundaries were not fully aware of atomic tool-call chains. A valid chain may include:

An assistant message with tool_calls
One or more tool messages
An optional assistant follow-up that consumes the tool results

If truncation happens inside that chain, the request sent to the model becomes invalid.

Solution: Atomic Boundary Alignment

The fix groups tool-call sequences into atomic units and aligns trim boundaries to those groups.

1. `_get_atomic_groups()`

This helper groups message indices into units that must be kept or dropped together. It explicitly recognizes native tool-calling patterns such as:

assistant(tool_calls)
tool
assistant follow-up response

Conceptually, it treats the whole sequence as one atomic block instead of independent messages.

def _get_atomic_groups(self, messages: List[Dict]) -> List[List[int]]:
    groups = []
    current_group = []

    for i, msg in enumerate(messages):
        role = msg.get("role")
        has_tool_calls = bool(msg.get("tool_calls"))

        if role == "assistant" and has_tool_calls:
            if current_group:
                groups.append(current_group)
            current_group = [i]
        elif role == "tool":
            if not current_group:
                groups.append([i])
            else:
                current_group.append(i)
        elif (
            role == "assistant"
            and current_group
            and messages[current_group[-1]].get("role") == "tool"
        ):
            current_group.append(i)
            groups.append(current_group)
            current_group = []
        else:
            if current_group:
                groups.append(current_group)
                current_group = []
            groups.append([i])

    if current_group:
        groups.append(current_group)

    return groups

2. `_align_tail_start_to_atomic_boundary()`

This helper checks whether a proposed trim point falls inside one of those atomic groups. If it does, the start index is moved backward to the beginning of that group.

def _align_tail_start_to_atomic_boundary(
    self, messages: List[Dict], raw_start_index: int, protected_prefix: int
) -> int:
    aligned_start = max(raw_start_index, protected_prefix)

    if aligned_start <= protected_prefix or aligned_start >= len(messages):
        return aligned_start

    trimmable = messages[protected_prefix:]
    local_start = aligned_start - protected_prefix

    for group in self._get_atomic_groups(trimmable):
        group_start = group[0]
        group_end = group[-1] + 1

        if local_start == group_start:
            return aligned_start

        if group_start < local_start < group_end:
            return protected_prefix + group_start

    return aligned_start

3. Applied to Tail Retention and Summary Progress

The aligned boundary is now used when rebuilding the retained tail and when calculating how much history can be summarized safely.

Example from the current implementation:

raw_start_index = max(compressed_count, effective_keep_first)
start_index = self._align_tail_start_to_atomic_boundary(
    messages, raw_start_index, effective_keep_first
)
tail_messages = messages[start_index:]

And during summary progress calculation:

raw_target_compressed_count = max(0, len(messages) - self.valves.keep_last)
target_compressed_count = self._align_tail_start_to_atomic_boundary(
    messages, raw_target_compressed_count, effective_keep_first
)

Verification Results

First compression boundary: When history first crosses the compression threshold, the retained tail no longer starts inside a tool-call block.
Complex sessions: Real-world testing with 30+ messages, multiple tool calls, and failed calls remained stable during background summarization.
Regression behavior: The filter now prefers a valid boundary even if that means retaining slightly more context than a naive raw slice would allow.

Conclusion

The fix prevents orphaned tool messages by making history trimming and summary progress aware of atomic tool-call groups. This eliminates the 400 error during long conversations and background compression.

4.7 KiB Raw Blame History