Data Query Tutorial

Data Query tool is a simple JSON based filtering tool to help retrieve data that has been collected using PACO for an experiment. The various features that the tool supports are

Filter - Used to retrieve only some records that match the filter conditions.
Order - Used to order the records returned
Limit - Used to limit to a certain number of records. This feature also supports pagination.
Group - To group data based on certain variables, and retrieve aggregate numbers for that group.
Eg: Get the number of people who answered for a variable/question 'Name'. Here number of people is the aggregate number and the grouping of data is based on whether the participant has answered the question 'Name' or not.
Projection - Two kinds of projections are supported. Paco Event JSON format and the Explicit Projection JSON format. Examples of these projection types can be found here .

Each of the above mentioned features are bundled and made configurable using a single JSON request. Roughly each feature is configured using each of the JSON request component.

Request JSON:

This is composed of the following 6 components. Except for the Query->Criteria and Query->Values all the other components are optional. When a value for the particular request component is not specified, the data query tool might set default values.

Query -> Criteria
Query -> Values
Order
Group
Limit
Select

Request Skeleton

{
"query": 
    {
     "criteria": "",
     "values":[]
    },
"order":"",
"group":"",
"limit":"",
"select":[] 
}

Response JSON:

The response JSON can be of two kinds namely Paco Event JsonFormat and Explicit Projection JSON Format . The Data Query tool looks at the value for the Select component of the request JSON to determine the kind of response format. If the value of 'Select' component is '*', then the Data Query tool sends all the events in the PacoEvent JSON Format. If the value of 'Select' component is anything else, then the Data Query tool sends all the events in the Explicit Projection JSON Format.

Response Formats

Paco Event JSON Format: This format gives most of the information that has been captured for each event.

	{
    		"id": 612992,
		"experimentId": 47154748,
    		"who": "[email protected]",
    		"when": 1477612800000,
    		"appId": "Android",
    		"pacoVersion": "4.2.23",
    		"experimentName": "GetByQuery Test",
		"experimentGroupName": "New Group",
    		"responseTime": 1477587600000,
    		"experimentVersion": 13,
    		"timezone": "-07:00",
    		"actionTriggerId": 0,
    		"actionTriggerSpecId": 0,
    		"actionId": 0,
    		"joined": true,
    		"sortDate": "2016-10-28",
    		"missedSignal": false,
    		"emptyResponse": false,
    		"responses": [
        		{
            			"name": "color",
            			"answer": "red"
        		},
        		{
            			"name": "Form Duration",
            			"answer": "11"
        		}
    		]
	}

Explicit Projection JSON Format: This format helps the researcher decide the format in which they would want the response.

Eg: To view only the experiment name and experiment version for all users

	{
 	   "experiment_name": "GetByQuery Test",
    	   "experiment_version": 16,
    	   "who": "[email protected]"
	}

The intent and purpose of this Data Query tool is in very close resemblance to the select query in SQL. Hence the request format of this tool is loosely based on the select SQL Query clauses. We can see how each of the features of this tool map to the request component in the request JSON which is then mapped to the SQL select query clauses here .

Query->Criteria

Query Criteria and Query Values should always be present. Experiment Id is one of the conditions that is mandated on any kind of Data Query request. So, let us start from a very minimal example and try to filter the data with respect to one particular experiment, whose unique id is 123.

Identify keyword associated with the filtering conditions by referring to the Data Format. Since we need for particular experiment, we need an id that uniquely identifies an experiment, so the keyword we need is experiment_id

2. Write down the single condition or the combination of conditions. Since we know we have to retrieve all events with a particular experiment_id, the single condition will be experiment_id = 123
3. Replace actual values in the above step with '?' So, now the condition becomes experiment_id = ?
4. Add it to the skeletonRequest So, now the request skeleton JSON looks like this

{
"query": 
    {
     "criteria": "experiment_id=?",
     "values":[]
    },
"order":"",
"group":"",
"limit":"",
"select":[] 
}

Default: No.
Optional: No.
Other Examples:
experiment_id=? and who=? experiment_id=? and response_time>? experiment_id=? and who in (?,?,?)

Query->Values

Let us continue with the very minimal example and try to filter the data with respect to one particular experiment, whose unique id is 123.
Make sure, the steps in Query -> Criteria is performed before you proceed with Query -> Values . The value for each of the ? in Query->Criteria, is specified in this part Query->Values.
1. Identify the Data Type of the keyword with '?' specified in Query->Criteria. In this example the data type of the keyword experiment_id is number
2. If it is a number we should specify it without double quotes. If it is a string, we should specify the value with double quotes. Date format should be in the format "2017/03/29 06:42:44" and with double quotes as well. Since the data type for experiment_id is number, now we have 123
3. Add it to the skeletonRequest So, now the request skeleton JSON looks like this

{
"query": 
    {
     "criteria": "experiment_id=?",
     "values":[123]
    },
"order":"",
"group":"",
"limit":"",
"select":[] 
}

Default: No.
Optional: No.
Other Examples:
"query": { "criteria": "experiment_id=? and who=?", "values":[123,"[email protected]"] } "query": { "criteria": "experiment_id=? and response_time>?", "values":[123,"[email protected]"] } "query": { "criteria": "experiment_id=? and who in (?,?,?)", "values":[123,"[email protected]","[email protected]","[email protected]"] }

Order

Let us try to filter the data with respect to one particular experiment, whose unique id is 123 and then order it by each participants response time, earliest first
To determine how to represent the filtering conditions refer to Query -> Criteria and Query -> Values 1. Identify the key word associated with ordering, which is the participants response time by referring to Data Format Since we need response time of each participant, keyword we need would be response_time
2. Identify if we need to order it in ascending or the descending order. If it is ascending, we add 'asc' next to keyword and if it is descending, we add 'desc' next to keyword.
Here since we need earliest first, we need to add response_time asc
3. Add it to the skeletonRequest So, now the request skeleton JSON looks like this

{
"query": 
    {
     "criteria": "experiment_id=?",
     "values":[123]
    },
"order":"response_time asc",
"group":"",
"limit":"",
"select":[] 
}

Default: Asc, when there is some keyword specified in the order component of the request
event_id desc, when there is no keyword specified in the order component and the group component of the request
no specific ordering, when there is no keyword specified in the order component, but the group component of the request has some value
Optional: Yes
Other examples:

"order":"who" -> Orders by the participant email for each event in ascending
"order":"who desc" -> Orders by the participant email for each event in descending
"order":"who desc, response_time asc" -> Orders by the participant email in descending, and within each email, it will order by response time ascending.

Group

Let us try to get all distinct responses for a variable 'question1' along with the number of times that response occurred
To determine how to represent the filtering conditions refer to Query -> Criteria and Query -> Values
1. Identify the key words associated with grouping by referring to Data Format Since we need to group by people who have the same answers for a specific question, keywords we need would be answer
2. Identify aggregate function Here since we need count of all people who have answered same value for a specific question, the aggregate function we need is count(*)
3. Add it to the skeletonRequest So, now the request skeleton JSON looks like this

{
"query": 
    {
     "criteria": "experiment_id=? and text=?  ",
     "values":[123,"inputname"]
    },
"order":"",
"group":"answer",
"limit":"",
"select":["answer", count(*)] 
}

Default: No.
Optional: Yes.
Other Examples:

"group":"experiment_id, who" -> Group by the experiment id, and within each experiment group by the possible participant emails/who.

Limit

Let us try to filter the data with respect to one particular experiment, whose unique id is 123 and retrieve 100 records at a time
To determine how to represent the filtering conditions refer to Query -> Criteria and Query -> Values
1. Identify the number of records needed in each request. Here, we need 100. we need to add 100
2. Add it to the skeletonRequest So, now the request skeleton JSON looks like this

{
"query": 
    {
     "criteria": "experiment_id=?",
     "values":[123]
    },
"order":"",
"group":"",
"limit":"100",
"select":[] 
}

Default: No
Optional: Yes. If no limit is specified, It will load all records. If it is a huge experiment, we might end up in a request taking longer.
Other examples:

"limit" : "100" -> limit to the first 100 records
"limit" : "100, 0" -> limit to the first 100 records (0 based counting, record 0 to record 99 will be returned)
"limit" : "100, 100" -> limit to 100 records but starting from position 100 ( record 100 to record 199 will be returned)
"limit" : "100, 200" -> limit to 100 records but starting from position 200 ( record 200 to record 299 will be returned)
Each output will correspond to a record. If there are 4 questions answered in an event, there will be four records returned.

Select

When you need to retrieve all the information on an event, then choose the value '*' for the Select request component. If we need specific information on an event, then choose the list of keywords separated by comma "," as value for the Select request component. To determine how to represent the filtering conditions refer to Query -> Criteria and Query -> Values

Let us try to filter the data with respect to one particular experiment, whose unique id is 123. Let us get only the participant email and repsonse time

Identify the key word associated with participants email and response time by referring to Data Format Since we need email and response time of each participant, keyword we need in Select would be who, response_time
Identify if we need explicit projection or all values for an event. If specific columns, Here we need only participant email and response time, so the values we need to add for select are "who","response_time" If we need all data, value we need to add for select is *
Add it to the skeletonRequest So, now the request skeleton JSON looks like this

{
"query": 
    {
     "criteria": "experiment_id=?",
     "values":[123]
    },
"order":"",
"group":"",
"limit":"",
"select":["who","response_time"] 
}

Default: Yes. '' will be set, when no value is set in request
Optional: Yes.
Other Examples: "select":[""] "select":["who","response_time","who"]

Mapping of DataQuery Request and SQL Select Query

Feature	Data Query Request Component	SQL Select Query
Filter	Query->Criteria	where clause or the search conditions
Filter	Query->Values	where clause values of the search conditions
Order	order	order by clause
Limit	limit	limit clause
Group	group	group by clause
Projection	select	Select list

Data Format

Description	key word	Data Type
The unique id we store for each of the events	id	number
The unique id used to collect this data	experiment_id	number
participant email (anonymized)	who	string
The date and time when the event reaches the paco server (in UTC zone)	when	string
Client used to record the data	app_id	string
Version of the paco client app	paco_version	string
Name of the experiment	experiment_name	string
Name of the experiment group	group_name	string
The date and time when the client recorded this data	response_time	string
The version of the experiment that has been used to record this event	experiment_version	string
The timezone of the client used to record this data (Responses from the web form may be inaccurate)	client_timezone	string
joined status of a participant in the experiment	joined	number
Variable name in the experiment	text	string
The answer from participant	answer	string

## Examples ### Custom Responses

SNo	Example Scenario	Request	Response
1	Get first thousand participant emails	`{"select":["who"], "query": {"criteria": "experiment_id=? ","values":[6013856]}, "group":"who","limit":"1000,0"}`	`{ "who": "[email protected]", "who": "[email protected]" }`
2	Get latest response time	`{"select":["response_time"], "query": {"criteria": "experiment_id=? ","values":[6013856]},"order": "response_time desc","limit":"1"}`	`{ "response_time": "2017-09-07 09:31:25.0" }`
3	Get earliest response time	`{"select":["response_time"], "query": {"criteria": "experiment_id=? ","values":[6013856]},"order": "response_time asc","limit":"1"}`	`{ "response_time": "2017-08-07 09:31:25.0" }`
4	Get joined events and time for all participants	`{"select":["who","response_time"], "query": {"criteria": "experiment_id=? and joined=? ","values":[6013856,1]},"order": "response_time desc"}`	`{ "response_time": "2017-09-07 14:20:06.0", "who": "[email protected]" }`
5	Get all responses for 1 question for all participants	`{"select":["who","answer"], "query": {"criteria": "experiment_id=? and text=? ","values":[6013856,"inputname"]},"order": "response_time asc","limit":"1000"}`	`{ "answer": "a", "who": "[email protected]" },{ "answer": "1", "who": "[email protected]" },{ "answer": "7", "who": "[email protected]" }`
6	Get all distinct responses for 1 question along with the number of times that response occurred	`{"select":["answer", count(*)], "query": {"criteria": "experiment_id=? and text=? ","values":[6013856,"inputname"]},"group":"answer","limit":"1000"}`	`{ "answer": "2", "count()": 1 },{ "answer": "8", "count()": 1 },{ "answer": "aa2", "count(*)": 1 }`
7	Get all distinct questions	`{"select":["text"], "query": {"criteria": "experiment_id=?","values":[6013856]},"group":"text","limit":"1000"}` (or) `{"select":["distinct text"], "query": {"criteria": "experiment_id=?","values":[6013856]},"limit":"1000"}`	`{ "text": "input1" },{ "text": "input2" },{ "text": "input3" },{ "text": "input4" }`
8	Get all probable answers for all question for all users	`{"select":["text", "answer"], "query": {"criteria": "experiment_id=?","values":[6013856]},"group":"text,answer","limit":"1000"}`	`{ "answer": "1", "text": "input1" },{ "answer": "7", "text": "input1" },{ "answer": "a", "text": "input1" }`
9	Get number of responses for a specific question and specific answer and is on or after a particular date and time	`{"select":["count(*)","text","answer"], "query": {"criteria": "experiment_id=? and text=? and answer=? and response_time>?","values":[562949920,"input1","a","2017/03/29 06:42:44"]},"group":"text,answer"}`	`{ "answer": "a", "count(*)": 2, "text": "input1" }`
10	Get number of total events so far	`{"select":["count(*)"], "query": {"criteria": "experiment_id=?","values":[6013856]},"limit":"1000"}`	`{ "count(*)": 11 }`
11	Get total number of outputs so far	`{"select":["count(*)"], "query": {"criteria": "experiment_id=?","values":[6013856]},"order":"answer","limit":"1000"}`	`{ "count(*)": 34 }`
12	Get the users and the number of events each of them have responded on a particular day - 2017/08/25	`{"select":["who","count(*)"], "query": {"criteria": "experiment_id=? and response_time>? and response_time","values":[5629499534213120,"2017/08/25 00:00:00","2017/08/26 00:00:00"]},"group":"who"}`	`{ "count(*)": 2, "who": "[email protected]" }`

Data Query Tutorial

Request JSON:

Response JSON:

Paco Event JSON Format: This format gives most of the information that has been captured for each event.

Explicit Projection JSON Format: This format helps the researcher decide the format in which they would want the response.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally