Skip to content

Pinafore/qb2nq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

qb2nq

qb2nq (QuizBowl 2 Natural Questions transformation) is a project to transform complicated trivia questions in the quizbowl dataset to simpler Natural Questions (NQ) dataset for better Question-Answering (QA) performance.

Execution steps

Please run our code by git cloning the repository, then changing directory to our repository followed by the following commands.

Running make prereqs download required datasets and install required packages:

make prereqs

Running make clean would delete all intermediate and final results generated by the program:

make clean

The first step is to examine how different answers are referred to in the dataset.

python3 intermediate_results/lat_frequency.json

Next, we transform question from the QB format to look like the NQ format.

make intermediate_results/nq_like_questions.json

This is quite slow (it could probably be parallelized), so it might be better to test it out with a small number of questions. This will transform 100 questions.

python3 transform_question.py --limit=100

Example: Original QB Question (Elicitation)

Original QB elicitation: 
This river forms the Bujagali and Murchison Falls in its "Victoria" incarnation, and it also contains a segment named after Lake Albert.
One of its deltas forms the Sudd wetland region, and the Jonglei canal was proposed to reroute part of it around the Sudd.
Its headstreams include the Luvironza, and the Owen Falls Dam used it for hydroelectric power until 2006.
Called the Bahr al Jabal upon entering Sudan and joining the Bahr el Ghazal at Lake No, it originates near Jinja in Lake Victoria.
For 10 points, name this river that ends in Khartoum and unites with its blue counterpart as part of the longest river in the world.

Answer: White Nile

Heuristic 1: Split via Conjunctions

This river forms the Bujagali and Murchison Falls in its “Victoria” incarnation.

It contains a segment named after Lake Albert.

Heuristic 2: Conversion when there is no wh-words

Which river forms the Bujagali and Murchison Falls in its "Victoria" incarnation?

Which river contains a segment named after Lake Albert?

Heuristic 3: Conversion from Imperative to Interrogative

Which river unites with its blue counterpart as part of the longest river in the world?

After that, the next step is to run a classifer to distinguish QB questions from NQ questions.

make intermediate_results/logistic_regression_weight_dict_Qb_NQ.txt

This can reveal mistakes/problems in the transformation process. For instance, if you look at this feature set:

{"length": 2.1673546575675235, "ablength": 0.0, "START the": -0.1784736506085197, "START what": 0.6402255772376372, "START when": -0.5080459895220035, "START where": -1.074377737159208, "START who": -0.9162552042417396, "after the": 0.45851831536207965, "as the": 0.40412419433014296, "battle of": 0.1995002103612401, "black history": -0.24677112851309396, "by the": 0.3487353258571355, "did the": -0.6921127080342989, "filmed STOP": -0.23510293344821942, "guadeloupe in": 0.0827023037431746, "in his": 0.5877131237947829, "in the": -0.2604679092872282, "in what": 0.0827023037431746, "india STOP": -0.4548609004189222, "is the": -0.5464633523104327, "keep guadeloupe": 0.0827023037431746, "life of": 0.2261374967836842, "my heart": -0.35833224553475324, "not to": 0.0827023037431746, "of india": -0.30017948289961044, "of the": -0.36922395137105696, "of this": 0.23468026711909745, "on the": 0.11904041390054465, "part of": 0.10193534107347967, "ruler of": 0.5134096172530246, "series STOP": 0.3476276322692837, "the british": -0.07502072137405887, "the first": 0.13955619294729724, "the life": 0.2261374967836842, "the most": -0.045238416997391326, "the world": -0.2296368046047485, "this event": 0.23468026711909745, "to keep": 0.0827023037431746, "was new": -0.4769144853770957, "was partly": 0.20061171775689368, "was the": -0.5019693682834611, "what author": 0.4538406936817783, "what is": -0.13603736395825397, "what man": 0.8408500019504009, "what was": -0.6029497385045198, "what what": 0.8078417803364502, "when did": -0.5080459895220035, "where is": -0.1354608211486916, "where was": -0.23510293344821942, "which can": 0.2556495617992419, "which is": -0.01747670564064363, "who wrote": -0.3121178156168799, "BIAS": -5.441712535065695}

This suggests that our nq-like questions start with "the" too often and have "of this".

The orriginal QANTA can be found at https://sites.google.com/view/qanta/resources?authuser=0 The data can be found here. https://drive.google.com/drive/u/1/folders/1mebfGC5AakYHdmRLUf718oAsfEU8tcYM

Project Team Members:

Saptarashmi Bandyopadhyay

Hao Zou

Chenqi Zhu

Shraman Pal

Abhranil Chandra

Rohith Banda

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •