Skip to content

Conversation

bold84
Copy link
Contributor

@bold84 bold84 commented Aug 25, 2025

🎯 Overview

This PR implements a comprehensive XML tool call processing system that enables both GLM-4.5 and Qwen3-coder models to work seamlessly with TabbyAPI's tool calling functionality. These models generate tool calls in different XML formats, but TabbyAPI expects OpenAI's JSON format. This implementation bridges that gap with a generic, extensible solution.

🚀 Key Features

  • Multi-Model Support: Supports both GLM-4.5 and Qwen3-coder XML formats
  • Format Conversion: Automatically converts XML tool calls to OpenAI JSON format
  • Extensible Architecture: Generic XML processor system that can be extended for other XML-based models
  • Zero Breaking Changes: Existing JSON tool calling functionality remains completely unchanged
  • Robust Error Handling: Gracefully handles malformed XML and edge cases
  • Production Ready: Comprehensive test coverage with 28 passing tests

📋 What's Changed

Core Implementation

  • endpoints/OAI/utils/xml_tool_processors.py

    • BaseXMLToolCallProcessor: Abstract base class for extensible XML processing
    • GLM45ToolCallProcessor: GLM-4.5 specific XML parser
    • Qwen3CoderToolCallProcessor: Qwen3-coder specific nested XML parser (supports "qwen3-coder" only)
    • XMLToolCallProcessorFactory: Factory pattern for creating appropriate processors
  • common/templating.py

    • Added XML-specific metadata fields: tool_call_format, xml_processor_type, tool_start, tool_end, stop_strings
    • Extended extract_metadata() to handle complete XML configuration
    • Maintains backward compatibility with existing templates
  • endpoints/OAI/utils/tools.py

    • Added from_xml() method for XML-specific processing
    • Added from_text() method for automatic format detection and routing
    • Fixed import issues with Tool class
  • templates/tool_calls/qwen3-coder-tabbyapi.jinja

    • Complete TabbyAPI-compatible template with full XML metadata
    • Includes stop_strings, tool_start, tool_end, tool_call_format, xml_processor_type

Quality Assurance

  • tests/test_xml_tool_calls.py

    • 28 comprehensive test cases covering all functionality
    • Tests for GLM-4.5 format: single/multiple tool calls, JSON arguments, error handling
    • Tests for Qwen3-coder format: nested XML, multi-line parameters, attribute parsing
    • Factory tests with proper processor restrictions
    • All tests passing
  • docs/XML-Tool-Calling-Implementation.md

    • Complete implementation guide covering both GLM-4.5 and Qwen3-coder
    • Architecture explanation and troubleshooting section
    • Usage examples for both model types

🔧 Technical Details

Supported XML Formats

GLM-4.5 Format:

<tool_call>get_weather
<arg_key>city</arg_key>
<arg_value>Beijing</arg_value>
<arg_key>units</arg_key>
<arg_value>metric</arg_value>
</tool_call>

Qwen3-coder Format:

<tool_call>
<function=get_weather>
<parameter=city>
Beijing
</parameter>
<parameter=units>
metric
</parameter>
</function>
</tool_call>

Both convert to OpenAI's JSON format:

{
  "id": "call_123456789",
  "type": "function",
  "function": {
    "name": "get_weather",
    "arguments": "{\"city\": \"Beijing\", \"units\": \"metric\"}"
  }
}

Architecture

  • Factory Pattern: XMLToolCallProcessorFactory creates appropriate processors
  • Abstract Base Class: BaseXMLToolCallProcessor enables easy extension for other models
  • Automatic Detection: System automatically detects XML vs JSON format and routes accordingly
  • Type Safety: Full integration with TabbyAPI's existing Pydantic models
  • Strict Separation: Each processor only supports its specific model type

Processor Restrictions

  • GLM-4.5: Only supports "glm45" processor type
  • Qwen3-coder: Only supports "qwen3-coder" processor type
  • Regular Qwen3: Does NOT use XML processing (different tool calling format)

🧪 Testing

# Run the XML tool call tests
python -m pytest tests/test_xml_tool_calls.py -v

# Results: 28 passed in 0.04s ✅

Test coverage includes:

  • 9 GLM-4.5 processor tests
  • 10 Qwen3-coder processor tests
  • 6 factory pattern tests
  • 3 base class functionality tests

📖 Usage

GLM-4.5 Models

model:
  model_name: "path/to/glm-4.5-model"
  prompt_template: "tool_calls/glm-4p5-chat-template-tabbyapi"

Qwen3-coder Models

model:
  model_name: "path/to/qwen3-coder-model"
  prompt_template: "tool_calls/qwen3-coder-tabbyapi"

🔄 Backward Compatibility

  • ✅ Existing JSON tool calling functionality unchanged
  • ✅ All existing templates continue to work
  • ✅ No breaking changes to existing APIs
  • ✅ Graceful fallback for unsupported formats

🎉 Benefits

  1. Multi-Model Support: Users can now use both GLM-4.5 and Qwen3-coder models with tool calling
  2. Future-Proof: Architecture supports adding other XML-based models easily
  3. Robust: Handles edge cases, malformed XML, and multi-line parameters gracefully
  4. Well-Tested: Comprehensive test suite with 28 tests ensures reliability
  5. Documented: Complete documentation for users and developers
  6. Strict Separation: Clear boundaries between different model formats prevent conflicts

🔍 Files Changed

  • common/templating.py - Enhanced template metadata system with complete XML support
  • endpoints/OAI/utils/tools.py - Added XML processing capabilities with format detection
  • endpoints/OAI/utils/xml_tool_processors.py - New XML processor system with GLM-4.5 and Qwen3-coder support
  • templates/tool_calls/qwen3-coder-tabbyapi.jinja - New TabbyAPI-compatible Qwen3-coder template
  • tests/test_xml_tool_calls.py - Comprehensive test suite covering both formats
  • docs/XML-Tool-Calling-Implementation.md - Updated documentation covering both model types

This implementation successfully resolves XML tool calling compatibility for both GLM-4.5 and Qwen3-coder models while providing a solid foundation for supporting additional XML-based models in the future.

bold84 added 2 commits August 25, 2025 17:53
- Add BaseXMLToolCallProcessor abstract class for extensible XML processing
- Implement GLM45ToolCallProcessor for GLM-4.5 specific XML format
- Add XMLToolCallProcessorFactory with factory pattern for processor creation
- Extend TemplateMetadata with XML support fields (tool_call_format, xml_processor_type)
- Enhance ToolCallProcessor with XML routing and from_xml() method
- Fix Tool class import issues in tools.py
- Add comprehensive test suite with 16 passing tests
- Add complete implementation documentation

This enables GLM-4.5 models to work with TabbyAPI's tool calling system by
converting their XML format to OpenAI JSON format seamlessly. The system
is backward compatible and provides a foundation for other XML-based models.
@bold84
Copy link
Contributor Author

bold84 commented Aug 25, 2025

GLM 4.5 chat template

{#- TabbyAPI-compatible GLM 4.5 template with XML tool call processing -#}
{#- XML Tool Call Processing Configuration -#}
{%- set stop_strings = ["<|user|>", "<|assistant|>", "<|observation|>", "<|system|>"] -%}
{%- set tool_start = "<tool_call>" -%}
{%- set tool_end = "</tool_call>" -%}
{%- set tool_call_format = "xml" -%}
{%- set xml_processor_type = "glm45" -%}

[gMASK]<sop>
{%- if tools -%}
<|system|>
# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{% for tool in tools %}
{{ tool | tojson }}
{% endfor %}
</tools>

For each function call, output the function name and arguments within the following XML format:
<tool_call>{function-name}
<arg_key>{arg-key-1}</arg_key>
<arg_value>{arg-value-1}</arg_value>
<arg_key>{arg-key-2}</arg_key>
<arg_value>{arg-value-2}</arg_value>
...
</tool_call>{%- endif -%}
{%- macro visible_text(content) -%}
    {%- if content is string -%}
        {{- content }}
    {%- elif content is iterable and content is not mapping -%}
        {%- for item in content -%}
            {%- if item is mapping and item.type == 'text' -%}
                {{- item.text }}
            {%- elif item is string -%}
                {{- item }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{- content }}
    {%- endif -%}
{%- endmacro -%}
{%- set ns = namespace(last_user_index=-1) %}
{%- for m in messages %}
    {%- if m.role == 'user' %}
        {% set ns.last_user_index = loop.index0 -%}
    {%- endif %}
{%- endfor %}
{% for m in messages %}
{%- if m.role == 'user' -%}<|user|>
{{ visible_text(m.content) }}
{{- '/nothink' if (enable_thinking is defined and not enable_thinking and not visible_text(m.content).endswith("/nothink")) else '' -}}
{%- elif m.role == 'assistant' -%}
<|assistant|>
{%- set reasoning_content = '' %}
{%- set content = visible_text(m.content) %}
{%- if m.reasoning_content is string %}
    {%- set reasoning_content = m.reasoning_content %}
{%- else %}
    {%- if '</think>' in content %}
        {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
        {%- set content = content.split('</think>')[-1].lstrip('\n') %}
    {%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_user_index and reasoning_content -%}
{{ '\n<think>' + reasoning_content.strip() +  '</think>'}}
{%- else -%}
{{ '\n<think></think>' }}
{%- endif -%}
{%- if content.strip() -%}
{{ '\n' + content.strip() }}
{%- endif -%}
{# When rendering the assistant’s tool_calls, support both string and mapping #}
{%- if m.role == 'assistant' and m.tool_calls -%}
{%- for tc in m.tool_calls if tc.type == 'function' -%}
<tool_call>{{ tc.function.name }}
{%- set _raw_args = tc.function.arguments %}
{%- if _raw_args is mapping -%}
  {%- for k, v in _raw_args.items() -%}
<arg_key>{{ k }}</arg_key>
<arg_value>{{ v | tojson if v is not string else v }}</arg_value>
  {%- endfor -%}
{%- elif _raw_args is string -%}
<arg_key>__raw__</arg_key>
<arg_value>{{ _raw_args }}</arg_value>
{%- endif -%}
</tool_call>
{%- endfor -%}
{%- endif -%}
{%- elif m.role == 'tool' -%}
{%- if m.content is string -%}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
    {{- '<|observation|>' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- m.content }}
{{- '\n</tool_response>' }}
{%- else -%}
<|observation|>{% for tr in m.content %}

<tool_response>
{{ tr.output if tr.output is defined else tr }}
</tool_response>{% endfor -%}
{% endif -%}
{%- elif m.role == 'system' -%}
<|system|>
{{ visible_text(m.content) }}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    <|assistant|>{{- '\n<think></think>' if (enable_thinking is defined and not enable_thinking) else '' -}}
{%- endif -%}

@bold84
Copy link
Contributor Author

bold84 commented Aug 25, 2025

There are a few other models that will need something like this; Qwen3-Coder, Seed OSS, etc.

- Add Qwen3CoderToolCallProcessor for nested XML format parsing
- Support function=name and parameter=name attribute-based parsing
- Handle multi-line parameter values in Qwen3-coder format
- Create qwen3-coder-tabbyapi.jinja template with complete XML metadata
- Add comprehensive test suite with 10 Qwen3-coder specific tests
- Restrict GLM45ToolCallProcessor to only 'glm45' (remove glm-4.5, glm4 aliases)
- Restrict Qwen3CoderToolCallProcessor to only 'qwen3-coder' (not qwen3)
- Rename documentation to XML-Tool-Calling-Implementation.md
- Update documentation to cover both GLM-4.5 and Qwen3-coder formats
- All 28 tests passing with proper processor restrictions

This enables Qwen3-coder models to work with TabbyAPI's tool calling system
while maintaining strict separation between different model formats.
@bold84
Copy link
Contributor Author

bold84 commented Aug 26, 2025

Qwen3-Coder template

{#- TabbyAPI-compatible Qwen3-coder template with XML tool call processing -#}
{#- XML Tool Call Processing Configuration -#}
{%- set stop_strings = ["<|im_end|>"] -%}
{%- set tool_start = "<tool_call>" -%}
{%- set tool_end = "</tool_call>" -%}
{%- set tool_call_format = "xml" -%}
{%- set xml_processor_type = "qwen3-coder" -%}

{% macro render_extra_keys(json_dict, handled_keys) %}
    {%- if json_dict is mapping %}
        {%- for json_key in json_dict if json_key not in handled_keys %}
            {%- if json_dict[json_key] is mapping or (json_dict[json_key] is sequence and json_dict[json_key] is not string) %}
                {{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson | safe) ~ '</' ~ json_key ~ '>' }}
            {%- else %}
                {{-'\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' }}
            {%- endif %}
        {%- endfor %}
    {%- endif %}
{% endmacro %}
{%- if messages[0]["role"] == "system" %}
    {%- set system_message = messages[0]["content"] %}
    {%- set loop_messages = messages[1:] %}
{%- else %}
    {%- set loop_messages = messages %}
{%- endif %}
{%- if not tools is defined %}
    {%- set tools = [] %}
{%- endif %}
{%- if system_message is defined %}
    {{- "<|im_start|>system\n" + system_message }}
{%- else %}
    {%- if tools is iterable and tools | length > 0 %}
        {{- "<|im_start|>system\nYou are Qwen, a helpful AI assistant that can interact with a computer to solve tasks." }}
    {%- endif %}
{%- endif %}
{%- if tools is iterable and tools | length > 0 %}
    {{- "\n\n# Tools\n\nYou have access to the following functions:\n\n" }}
    {{- "<tools>" }}
    {%- for tool in tools %}
        {%- if tool.function is defined %}
            {%- set tool = tool.function %}
        {%- endif %}
        {{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
        {%- if tool.description is defined %}
            {{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
        {%- endif %}
        {{- '\n<parameters>' }}
        {%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
            {%- for param_name, param_fields in tool.parameters.properties|items %}
                {{- '\n<parameter>' }}
                {{- '\n<name>' ~ param_name ~ '</name>' }}
                {%- if param_fields.type is defined %}
                    {{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
                {%- endif %}
                {%- if param_fields.description is defined %}
                    {{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
                {%- endif %}
                {%- set handled_keys = ['name', 'type', 'description'] %}
                {{- render_extra_keys(param_fields, handled_keys) }}
                {{- '\n</parameter>' }}
            {%- endfor %}
        {%- endif %}
        {% set handled_keys = ['type', 'properties'] %}
        {{- render_extra_keys(tool.parameters, handled_keys) }}
        {{- '\n</parameters>' }}
        {%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
        {{- render_extra_keys(tool, handled_keys) }}
        {{- '\n</function>' }}
    {%- endfor %}
    {{- "\n</tools>" }}
    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
{%- endif %}
{%- if system_message is defined %}
    {{- '<|im_end|>\n' }}
{%- else %}
    {%- if tools is iterable and tools | length > 0 %}
        {{- '<|im_end|>\n' }}
    {%- endif %}
{%- endif %}
{%- for message in loop_messages %}
    {%- if message.role == "assistant" and message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls | length > 0 %}
        {{- '<|im_start|>' + message.role }}
        {%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
            {{- '\n' + message.content | trim + '\n' }}
        {%- endif %}
        {%- for tool_call in message.tool_calls %}
            {%- if tool_call.function is defined %}
                {%- set tool_call = tool_call.function %}
            {%- endif %}
            {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
            {%- if tool_call.arguments is defined %}
                {%- for args_name, args_value in tool_call.arguments|items %}
                    {{- '<parameter=' + args_name + '>\n' }}
                    {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
                    {{- args_value }}
                    {{- '\n</parameter>\n' }}
                {%- endfor %}
            {%- endif %}
            {{- '</function>\n</tool_call>' }}
        {%- endfor %}
        {{- '<|im_end|>\n' }}
    {%- elif message.role == "user" or message.role == "system" or message.role == "assistant" %}
        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
    {%- elif message.role == "tool" %}
        {%- if loop.previtem and loop.previtem.role != "tool" %}
            {{- '<|im_start|>user\n' }}
        {%- endif %}
        {{- '<tool_response>\n' }}
        {{- message.content }}
        {{- '\n</tool_response>\n' }}
        {%- if not loop.last and loop.nextitem.role != "tool" %}
            {{- '<|im_end|>\n' }}
        {%- elif loop.last %}
            {{- '<|im_end|>\n' }}
        {%- endif %}
    {%- else %}
        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>\n' }}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
{%- endif %}

@bold84 bold84 changed the title feat: Add XML tool call processing system for GLM-4.5 models feat: Add XML tool call processing system for GLM-4.5 and Qwen3-Coder models Aug 26, 2025
@smb1982x
Copy link

In 'endpoints/OAI/utils/chat_completion.py'

You're missing the tools input for _create_response;

def _create_response( request_id: str, generations: List[dict], model_name: Optional[str] ):

Should be;

def _create_response( request_id: str, generations: List[dict], model_name: Optional[str], tools: Optional[List] = None ):

Or else you will get this error;

2025-08-27 04:25:46.120 ERROR: Traceback (most recent call last): 2025-08-27 04:25:46.120 ERROR: File "/opt/ai/tabbyAPI/endpoints/OAI/utils/chat_completion.py", line 460, in generate_chat_completion 2025-08-27 04:25:46.120 ERROR: response = _create_response( 2025-08-27 04:25:46.120 ERROR: ^^^^^^^^^^^^^^^^^ 2025-08-27 04:25:46.120 ERROR: TypeError: _create_response() takes 3 positional arguments but 4 were given 2025-08-27 04:25:46.122 ERROR: Sent to request: Chat completion

@bold84
Copy link
Contributor Author

bold84 commented Aug 26, 2025

In 'endpoints/OAI/utils/chat_completion.py'

You're missing the tools input for _create_response;

def _create_response( request_id: str, generations: List[dict], model_name: Optional[str] ):

Should be;

def _create_response( request_id: str, generations: List[dict], model_name: Optional[str], tools: Optional[List] = None ):

Or else you will get this error;

2025-08-27 04:25:46.120 ERROR: Traceback (most recent call last): 2025-08-27 04:25:46.120 ERROR: File "/opt/ai/tabbyAPI/endpoints/OAI/utils/chat_completion.py", line 460, in generate_chat_completion 2025-08-27 04:25:46.120 ERROR: response = _create_response( 2025-08-27 04:25:46.120 ERROR: ^^^^^^^^^^^^^^^^^ 2025-08-27 04:25:46.120 ERROR: TypeError: _create_response() takes 3 positional arguments but 4 were given 2025-08-27 04:25:46.122 ERROR: Sent to request: Chat completion

Thank you for catching that. Fixed.

@wypiki
Copy link

wypiki commented Sep 3, 2025

Tried that today with MikeRoz/GLM-4.5-exl3 (q4, revised version) and OpenWebUIs native tool calling (non-native works). Spent a few hours debugging but couldn't figure out how to make it use the tools in OpenWebUI.
It just outputs that it's going to use the tool and then nothing happens.
The only combination which worked previously was LMStudio and Gemma3 27B.
It uses the right template as I can see in the logs.
Here is what GPT5 suggested:

  • Verified the PR’s test_xml_tool_calls.py – all tests pass.
  • Adjusted chat_completion.py so that tool calls are parsed via ToolCallProcessor.from_text when tool_call_format="xml".
  • Ensured that in the second pass (generate_tool_calls) we removed tool_start from stop strings and added tool_end so the model can generate the full <tool_call>...</tool_call> block.
  • Added logic to set a response_prefix="<tool_call>" so the output always starts correctly.
  • Disabled adding precursor text for XML mode to prevent or prose before the <tool_call>.
  • Added finish_reason="tool_calls" when mapping back the tool call generation.
  • Finally, patched _create_stream_chunk so that tool call deltas include role="assistant", since some clients (e.g. OpenWebUI) ignore deltas without a role.

None of that helped unfortunately... OpenWebUI still does not display the tool call result after the second generation pass — I see the model generating tokens, but the client never shows the function call output.

Any ideas?

@smb1982x
Copy link

smb1982x commented Oct 9, 2025

I've been doing some clean room experiments with GLM and tool calling, and I've found it will randomly stop obeying the jinja template and revert to its in built knowledge. This usually cascades, into it getting confused and then sort of giving up. It also sometimes gets confused after a while or immediately sometimes when it receives back tool call results in JSON format, its sorr of a crap shoot at that point whether it recovers or not.

I am having a lot more luck today, as I started fresh and ported the vllm glm XML parser into tabbyAPI, but further than that, I added something as a test, and it seems to be keeping GLM from getting confused or ignoring tool call results that cascade into spamming XML tool calls over and over, my new parser is bidirectional, so when the JSOn tool calling results hit TabbyAPI, I'm converting that back into XML call results before GLM received them. Seems to be going well, but, This model is a little unhinged when it gets confused so, I'll keep tinkering and if I get something that gives me consistent results, i will try to retro fit bits into your solution if possible.

@wypiki
Copy link

wypiki commented Oct 9, 2025

That's great! Native Tool Calling would make GLM 4.6 much more useful! That's one of the reasons I already thought about having to switch to VLLM, but VRAM is still limited...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants