Skip to content

taichengguo/MTSQL-R1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

demo

   📄 Arxiv   |   🤗 Hugging Face  

Resource Link
🤗 MTSQL-R1 (4B) MTSQL-R1(4B) (Will release after internal review)
🤗 MTSQL-R1 (1.7B) MTSQL-R1(1.7B) (Will release after internal review)
🤗 Dataset CoSQL-Long-Horizon-SFT-RL-Data (Will release after internal review)
🤗 Dataset SParC-Long-Horizon-SFT-RL-Data (Will release after internal review)
Code For SFT Will release after internal review
Code For RL Will release after internal review

Python CUDA 12.4

🚀 MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training

📋 Table of Contents

🌟 Highlights

Category Feature Description
Text-to-SQL 🎯 Excellent in Solving Long-Turn and Extra Hard SQL Questions
Text-to-SQL 🔄 Long-Horizon Formulation with Environment Feedback Leverages environment feedback through database execution and explicit memory verification to guide SQL generation and error correction
LLM Training 🎓 Two-Stage Training Framework 1) Tool-Integrated High-Quality SFT Dataset construction by Self-Taught; Warm-Start SFT 2)Curriculum RL Training with Multi-level rewards (Outcome and Dense Process Reward) Design
LLM Training 🔁 Multi-Turn End-to-End RL Training Enables end-to-end training across multiple turns with database and memory to enhance coherence

📖 Introduction

Short-horizon Text-to-SQL directly translates question to SQL, resulting execution erros and coherence-related erros.

Our approach enables:

  • Environment-based verification: The model interacts dynamically with two components: (i) a database for execution feedback and (ii) a long- term dialogue memory for explicit coherence checking to verify intermediate SQL outputs.

  • Self-correction: Based on verification feedback, the model iteratively refines its generated SQL queries to achieve consistent, executable outputs across multiple turns.

  • Autonomous End-to-End Learn actions (Propose, EXECUTE, Verify and Self-Correct) to generate better SQL.

⚙️ Configuration

Verl == 0.4.1

LLamafactory == 0.9.3

🔄 Training Framework

Stage1: Self-Taught Warm-Start SFT

  • Step1: Random Sampling with high temperature for generating natural reasoning trajectories
  • Step2: Difficulty-Aware Reject Sampling
  • Step3: SFT Model with Tool-Integrated Multi-Turn Trajectories and Loss Masking
  • Step4: Update Dataset, Model and repeat

Stage2: End-to-End Long-Horizon Reinforcement Learning

  • Step1: Curriculum Data Partition by difficulty
  • Step2: Outcome and Process Reward Design
  • Step3: Multi-Turn RL with Loss Masking

📈 Training Dynamics

The dynamics of Reward Score and Response Length During Training:

The dynamics of test score across different training checkpoints:

📊 Experiment Results

Overall Experiment Results

Key Findings and Take Aways:

  • Warm-start SFT and RL both provide gains in performance.
  • Small LLMs (1.7B/4B) struggle to follow long-horizon function-calling instructions.
  • Conventional SFT attains good Exact Match but exhibits weaker logical consistency (Execution Match) while Long-Horizon archives better Execution Match.
  • Long-horizon reasoning yields larger gains on multi-turn dialogues and complex questions.
  • long-horizon RL substantially improves out-of-domain performance.
  • Process Dense Reward helps the model learn from harder examples, further boosting performance compared with sparse outcome-only rewards.
  • Stronger function calling, verification, and self-correction correlate with better SQL performance.
  • With long-horizon actions and training, the agent learns to resolve execution failures (even null-return cases - we call it aha-moment in Text-to-SQL) and coherence errors.

Performance over different difficulties and turns

The evolution of different Long-Horizon Abilities and related Execution Match performance for 4B and 1.7B model

🙏 Acknowledgements

We would like to express our gratitude to the open-source community for their valuable contributions:

......etc

📫 Contact

For any issues or discussion, please contact [email protected], thanks

About

Official Code for MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published