Skip to content

Data Query Tutorial

indhumeyyappan edited this page Sep 7, 2017 · 3 revisions

Data Query tool is a simple JSON based filtering tool to help retrieve data that has been collected using PACO for an experiment. The various features that the tool supports are

  • Filter - Used to retrieve only some records that match the filter conditions.
  • Order - Used to order the records returned
  • Limit - Used to limit to a certain number of records. This feature also supports pagination.
  • Group - To group data based on certain variables, and retrieve aggregate numbers for that group.
    Eg: Get the number of people who answered for a variable/question 'Name'. Here number of people is the aggregate number and the grouping of data is based on whether the participant has answered the question 'Name' or not.
  • Projection - Two kinds of projections are supported. Paco Event JSON format and the Explicit Projection JSON format. Examples of these projection types can be found here .

Each of the above mentioned features are bundled and made configurable using a single JSON request. Roughly each feature is configured using each of the JSON request component.

Request JSON:

This is composed of the following 6 components. Except for the Query->Criteria and Query->Values all the other components are optional. When a value for the particular request component is not specified, the data query tool might set default values.

  1. Query -> Criteria
  2. Query -> Values
  3. Order
  4. Group
  5. Limit
  6. Select
{
"query": 
    {
     "criteria": "",
     "values":[]
    },
"order":"",
"group":"",
"limit":"",
"select":[] 
}

Response JSON:

The response JSON can be of two kinds namely Paco Event JsonFormat and Explicit Projection JSON Format . The Data Query tool looks at the value for the Select component of the request JSON to determine the kind of response format. If the value of 'Select' component is '*', then the Data Query tool sends all the events in the PacoEvent JSON Format. If the value of 'Select' component is anything else, then the Data Query tool sends all the events in the Explicit Projection JSON Format.

Paco Event JSON Format: This format gives most of the information that has been captured for each event.

	{
    		"id": 612992,
		"experimentId": 47154748,
    		"who": "[email protected]",
    		"when": 1477612800000,
    		"appId": "Android",
    		"pacoVersion": "4.2.23",
    		"experimentName": "GetByQuery Test",
		"experimentGroupName": "New Group",
    		"responseTime": 1477587600000,
    		"experimentVersion": 13,
    		"timezone": "-07:00",
    		"actionTriggerId": 0,
    		"actionTriggerSpecId": 0,
    		"actionId": 0,
    		"joined": true,
    		"sortDate": "2016-10-28",
    		"missedSignal": false,
    		"emptyResponse": false,
    		"responses": [
        		{
            			"name": "color",
            			"answer": "red"
        		},
        		{
            			"name": "Form Duration",
            			"answer": "11"
        		}
    		]
	}

Explicit Projection JSON Format: This format helps the researcher decide the format in which they would want the response.

Eg: To view only the experiment name and experiment version for all users

	{
 	   "experiment_name": "GetByQuery Test",
    	   "experiment_version": 16,
    	   "who": "[email protected]"
	}


The intent and purpose of this Data Query tool is in very close resemblance to the select query in SQL. Hence the request format of this tool is loosely based on the select SQL Query clauses. We can see how each of the features of this tool map to the request component in the request JSON which is then mapped to the SQL select query clauses here .

Query Criteria and Query Values should always be present. Experiment Id is one of the conditions that is mandated on any kind of Data Query request. So, let us start from a very minimal example and try to filter the data with respect to one particular experiment, whose unique id is 123.

  1. Identify keyword associated with the filtering conditions by referring to the Data Format. Since we need for particular experiment, we need an id that uniquely identifies an experiment, so the keyword we need is experiment_id

2. Write down the single condition or the combination of conditions. Since we know we have to retrieve all events with a particular experiment_id, the single condition will be experiment_id = 123
3. Replace actual values in the above step with '?' So, now the condition becomes experiment_id = ?
4. Add it to the skeletonRequest So, now the request skeleton JSON looks like this
{
"query": 
    {
     "criteria": "experiment_id=?",
     "values":[]
    },
"order":"",
"group":"",
"limit":"",
"select":[] 
}

Default: No.
Optional: No.
Other Examples:
experiment_id=? and who=? experiment_id=? and response_time>? experiment_id=? and who in (?,?,?)


Let us continue with the very minimal example and try to filter the data with respect to one particular experiment, whose unique id is 123.
Make sure, the steps in Query -> Criteria is performed before you proceed with Query -> Values . The value for each of the ? in Query->Criteria, is specified in this part Query->Values.
1. Identify the Data Type of the keyword with '?' specified in Query->Criteria. In this example the data type of the keyword experiment_id is number
2. If it is a number we should specify it without double quotes. If it is a string, we should specify the value with double quotes. Date format should be in the format "2017/03/29 06:42:44" and with double quotes as well. Since the data type for experiment_id is number, now we have 123
3. Add it to the skeletonRequest So, now the request skeleton JSON looks like this
{
"query": 
    {
     "criteria": "experiment_id=?",
     "values":[123]
    },
"order":"",
"group":"",
"limit":"",
"select":[] 
}

Default: No.
Optional: No.
Other Examples:
"query": { "criteria": "experiment_id=? and who=?", "values":[123,"[email protected]"] } "query": { "criteria": "experiment_id=? and response_time>?", "values":[123,"[email protected]"] } "query": { "criteria": "experiment_id=? and who in (?,?,?)", "values":[123,"[email protected]","[email protected]","[email protected]"] }


Let us try to filter the data with respect to one particular experiment, whose unique id is 123 and then order it by each participants response time, earliest first
To determine how to represent the filtering conditions refer to Query -> Criteria and Query -> Values 1. Identify the key word associated with ordering, which is the participants response time by referring to Data Format Since we need response time of each participant, keyword we need would be response_time
2. Identify if we need to order it in ascending or the descending order. If it is ascending, we add 'asc' next to keyword and if it is descending, we add 'desc' next to keyword.
Here since we need earliest first, we need to add response_time asc
3. Add it to the skeletonRequest So, now the request skeleton JSON looks like this
{
"query": 
    {
     "criteria": "experiment_id=?",
     "values":[123]
    },
"order":"response_time asc",
"group":"",
"limit":"",
"select":[] 
}

Default: Asc, when there is some keyword specified in the order component of the request
event_id desc, when there is no keyword specified in the order component and the group component of the request
no specific ordering, when there is no keyword specified in the order component, but the group component of the request has some value
Optional: Yes
Other examples:

  • "order":"who" -> Orders by the participant email for each event in ascending
  • "order":"who desc" -> Orders by the participant email for each event in descending
  • "order":"who desc, response_time asc" -> Orders by the participant email in descending, and within each email, it will order by response time ascending.

Let us try to get all distinct responses for a variable 'question1' along with the number of times that response occurred
To determine how to represent the filtering conditions refer to Query -> Criteria and Query -> Values
1. Identify the key words associated with grouping by referring to Data Format Since we need to group by people who have the same answers for a specific question, keywords we need would be answer
2. Identify aggregate function Here since we need count of all people who have answered same value for a specific question, the aggregate function we need is count(*)
3. Add it to the skeletonRequest So, now the request skeleton JSON looks like this
{
"query": 
    {
     "criteria": "experiment_id=? and text=?  ",
     "values":[123,"inputname"]
    },
"order":"",
"group":"answer",
"limit":"",
"select":["answer", count(*)] 
}

Default: No.
Optional: Yes.
Other Examples:

  • "group":"experiment_id, who" -> Group by the experiment id, and within each experiment group by the possible participant emails/who.

Let us try to filter the data with respect to one particular experiment, whose unique id is 123 and retrieve 100 records at a time
To determine how to represent the filtering conditions refer to Query -> Criteria and Query -> Values
1. Identify the number of records needed in each request. Here, we need 100. we need to add 100
2. Add it to the skeletonRequest So, now the request skeleton JSON looks like this
{
"query": 
    {
     "criteria": "experiment_id=?",
     "values":[123]
    },
"order":"",
"group":"",
"limit":"100",
"select":[] 
}

Default: No
Optional: Yes. If no limit is specified, It will load all records. If it is a huge experiment, we might end up in a request taking longer.
Other examples:

  • "limit" : "100" -> limit to the first 100 records
  • "limit" : "100, 0" -> limit to the first 100 records (0 based counting, record 0 to record 99 will be returned)
  • "limit" : "100, 100" -> limit to 100 records but starting from position 100 ( record 100 to record 199 will be returned)
  • "limit" : "100, 200" -> limit to 100 records but starting from position 200 ( record 200 to record 299 will be returned)
    Each output will correspond to a record. If there are 4 questions answered in an event, there will be four records returned.

When you need to retrieve all the information on an event, then choose the value '*' for the Select request component. If we need specific information on an event, then choose the list of keywords separated by comma "," as value for the Select request component. To determine how to represent the filtering conditions refer to Query -> Criteria and Query -> Values

Let us try to filter the data with respect to one particular experiment, whose unique id is 123. Let us get only the participant email and repsonse time

  1. Identify the key word associated with participants email and response time by referring to Data Format Since we need email and response time of each participant, keyword we need in Select would be who, response_time
  2. Identify if we need explicit projection or all values for an event. If specific columns, Here we need only participant email and response time, so the values we need to add for select are "who","response_time" If we need all data, value we need to add for select is *
  3. Add it to the skeletonRequest So, now the request skeleton JSON looks like this
{
"query": 
    {
     "criteria": "experiment_id=?",
     "values":[123]
    },
"order":"",
"group":"",
"limit":"",
"select":["who","response_time"] 
}

Default: Yes. '' will be set, when no value is set in request
Optional: Yes.
Other Examples: "select":["
"] "select":["who","response_time","who"]

Feature Data Query Request Component SQL Select Query
Filter Query->Criteria where clause or the search conditions
Query->Values where clause values of the search conditions
Order order order by clause
Limit limit limit clause
Group group group by clause
Projection select Select list
Description key word Data Type
The unique id we store for each of the events id number
The unique id used to collect this data experiment_id number
participant email (anonymized) who string
The date and time when the event reaches the paco server (in UTC zone) when string
Client used to record the data app_id string
Version of the paco client app paco_version string
Name of the experiment experiment_name string
Name of the experiment group group_name string
The date and time when the client recorded this data response_time string
The version of the experiment that has been used to record this event experiment_version string
The timezone of the client used to record this data (Responses from the web form may be inaccurate) client_timezone string
joined status of a participant in the experiment joined number
Variable name in the experiment text string
The answer from participant answer string

## Examples ### Custom Responses
SNo Example Scenario Request Response
1 Get first thousand participant emails {"select":["who"], "query": {"criteria": "experiment_id=? ","values":[6013856]}, "group":"who","limit":"1000,0"} { "who": "[email protected]", "who": "[email protected]" }
2 Get latest response time {"select":["response_time"], "query": {"criteria": "experiment_id=? ","values":[6013856]},"order": "response_time desc","limit":"1"} { "response_time": "2017-09-07 09:31:25.0" }
3 Get earliest response time {"select":["response_time"], "query": {"criteria": "experiment_id=? ","values":[6013856]},"order": "response_time asc","limit":"1"} { "response_time": "2017-08-07 09:31:25.0" }
4 Get joined events and time for all participants {"select":["who","response_time"], "query": {"criteria": "experiment_id=? and joined=? ","values":[6013856,1]},"order": "response_time desc"} { "response_time": "2017-09-07 14:20:06.0", "who": "[email protected]" }
5 Get all responses for 1 question for all participants {"select":["who","answer"], "query": {"criteria": "experiment_id=? and text=? ","values":[6013856,"inputname"]},"order": "response_time asc","limit":"1000"} { "answer": "a", "who": "[email protected]" },{ "answer": "1", "who": "[email protected]" },{ "answer": "7", "who": "[email protected]" }
6 Get all distinct responses for 1 question along with the number of times that response occurred {"select":["answer", count(*)], "query": {"criteria": "experiment_id=? and text=? ","values":[6013856,"inputname"]},"group":"answer","limit":"1000"} { "answer": "2", "count(*)": 1 },{ "answer": "8", "count(*)": 1 },{ "answer": "aa2", "count(*)": 1 }
7 Get all distinct questions {"select":["text"], "query": {"criteria": "experiment_id=?","values":[6013856]},"group":"text","limit":"1000"}
(or)
{"select":["distinct text"], "query": {"criteria": "experiment_id=?","values":[6013856]},"limit":"1000"}
{ "text": "input1" },{ "text": "input2" },{ "text": "input3" },{ "text": "input4" }
8 Get all probable answers for all question for all users {"select":["text", "answer"], "query": {"criteria": "experiment_id=?","values":[6013856]},"group":"text,answer","limit":"1000"} { "answer": "1", "text": "input1" },{ "answer": "7", "text": "input1" },{ "answer": "a", "text": "input1" }
9 Get number of responses for a specific question and specific answer and is on or after a particular date and time {"select":["count(*)","text","answer"], "query": {"criteria": "experiment_id=? and text=? and answer=? and response_time>?","values":[562949920,"input1","a","2017/03/29 06:42:44"]},"group":"text,answer"} { "answer": "a", "count(*)": 2, "text": "input1" }
10 Get number of total events so far {"select":["count(*)"], "query": {"criteria": "experiment_id=?","values":[6013856]},"limit":"1000"} { "count(*)": 11 }
11 Get total number of outputs so far {"select":["count(*)"], "query": {"criteria": "experiment_id=?","values":[6013856]},"order":"answer","limit":"1000"} { "count(*)": 34 }
12 Get the users and the number of events each of them have responded on a particular day - 2017/08/25 {"select":["who","count(*)"], "query": {"criteria": "experiment_id=? and response_time>? and response_time","values":[5629499534213120,"2017/08/25 00:00:00","2017/08/26 00:00:00"]},"group":"who"} { "count(*)": 2, "who": "[email protected]" }
Clone this wiki locally