Pre-trained Language Model for Table Question Answering via Query Plan Generation

Overview

Tabular data is ubiquitous across scientific, business, and industrial domains, yet automatic interpretation of tables remains a significant challenge due to their structural heterogeneity and lack of explicit semantics. Table Question Answering (TQA) aims to enable machines to answer natural language questions over tabular data, but current methods often suffer from low accuracy and poor numerical reasoning.

TableQA-BART introduces a novel approach: instead of directly generating or executing SQL queries, our method pre-trains a BART-based language model to generate computational graphs (query plans) analogous to SQL execution plans in relational databases. This paradigm shift reduces computational complexity and minimizes errors from implicit computations, leading to more robust and semantically consistent TQA systems.

Key Features

Query Plan Generation:

The model learns to generate linearized computational graphs (query plans) from SQL queries and tables, rather than executing SQL directly.
Pre-training on Large-Scale Data:

Pre-trained on 3.8 million SQL–table pairs, then fine-tuned on the WikiSQL dataset (80,000+ examples).
Superior Performance:

Achieves 95.1% denotation accuracy on WikiSQL test set, outperforming the TAPEX baseline.
Generalizability:

The pre-trained model can be fine-tuned for various TQA datasets (e.g., WTQ, SQuALL, FeTaQA, TabFact).

Pre-training Task

The pre-training pipeline consists of the following steps:

Input Preparation:

Concatenate an executable SQL query ($$Q$$) with a linearized table ($$T'$$) and feed it to the model.
Query Plan Extraction:

Execute the SQL query ($$Q$$) with the EXPLAIN prefix on the source table ($$T$$) to obtain the execution plan ($$P$$).
Plan Linearization:

Transform the execution plan ($$P$$) into a linearized plan-graph ($$P'$$).
Supervised Training:

Train the model to generate a plan ($$MP$$) that matches the linearized plan-graph ($$P'$$).

Model Architecture

Base Model: BART-large
Input: Concatenated SQL query and linearized table
Output: Linearized query plan (plan-graph)
Tokenizer: Standart BART tokenizer

Results

Model	DA dev	DA test
TAPEX sql-executor (upper SQL)	54.8	61.1
TAPEX sql-executor	40.4	41.0
Query Plan Generator (upper SQL)	37.1	63.4
Query Plan Generator	37.4	68.1
Fine-tuned Query Plan Generator (upper SQL)	94.1	94.2
Fine-tuned Query Plan Generator	95.0	95.1

Applications

Intelligent Table-Based QA Systems
Semantic Data Exploration
Business Intelligence Automation
Scientific Data Analysis

Reference

Zhong, Victor, Caiming Xiong, and Richard Socher. "Seq2sql: Generating structured queries from natural language using reinforcement learning." arXiv preprint arXiv:1709.00103 (2017).

Lewis, Mike, et al. "Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension." arXiv preprint arXiv:1910.13461 (2019).

Liu, Qian, et al. "TAPEX: Table pre-training via learning a neural SQL executor." arXiv preprint arXiv:2107.07653 (2021).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
TapexGraph		TapexGraph
doc		doc
README.md		README.md
convert_tapex_data_to_list_generate.py		convert_tapex_data_to_list_generate.py
convert_tapex_xml_to_graph.py		convert_tapex_xml_to_graph.py
convert_wikisql_data_to_list_generate.py		convert_wikisql_data_to_list_generate.py
convert_wikisql_xml_to_graph.py		convert_wikisql_xml_to_graph.py
evaluete.py		evaluete.py
get_set_sqall_execution_plan.py		get_set_sqall_execution_plan.py
plan_graph.py		plan_graph.py
plan_node.py		plan_node.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pre-trained Language Model for Table Question Answering via Query Plan Generation

Overview

Key Features

Pre-training Task

Model Architecture

Results

Applications

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Languages

YRL-AIDA/QueryPlanGenerator

Folders and files

Latest commit

History

Repository files navigation

Pre-trained Language Model for Table Question Answering via Query Plan Generation

Overview

Key Features

Pre-training Task

Model Architecture

Results

Applications

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages