Skip to content

Commit 73baf3d

Browse files
author
Steve Baskauf
authored
Merge pull request #14 from HeardLibrary/vanderbot_v1-5
Vanderbot v1.5
2 parents 54bd94c + dd90b0a commit 73baf3d

12 files changed

+601
-234
lines changed

vanderbot/README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Here are some queries that can be run to explore the data:
3232

3333
[Number of clinical trials at Vanderbilt by principal investigator](https://w.wiki/XKK)
3434

35-
The current release is [v1.4](https://github.com/HeardLibrary/linked-data/releases/tag/v1.4).
35+
The current release is [v1.5](https://github.com/HeardLibrary/linked-data/releases/tag/v1.5).
3636

3737
## How it works
3838

@@ -147,5 +147,11 @@ The changes made in this release were made following tests that used the `csv-me
147147

148148
The first five scripts were not changed in this release.
149149

150+
## Release v1.5 (2020-09-08)
151+
152+
The major change to the code was to increase the number of table columns per date from one to three. Previously, there was a single column for the date string. However, this did not allow for varying date precision. Now there is an additional column for the Wikibase date precision number (e.g. 9 for year, 11 for date to day). The third column is for a date value node identifier. This can either be the actual node identifier from Wikidata (a hash of unknown origin) or a random UUID generated by one of the scripts in this suite. This identifies the node to which both the date value and date precision are attached. It effectively serves as a blank node. In the future, it may be replaced with the actula date node identifier.
153+
154+
The other addition is a Javascript script written by Jessie Baskauf that drives [this form](https://heardlibrary.github.io/digital-scholarship/script/wikidata/wikidata-csv2rdf-metadata.html), which can be used to generate a `csv-metadata.json` mapping schema. With such a mapping schema, any CSV can be used as the source date for the **vb6_upload_wikidata.py** upload script.
155+
150156
----
151-
Revised 2020-08-28
157+
Revised 2020-09-08

vanderbot/csv-metadata.json

Lines changed: 57 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -71,11 +71,26 @@
7171
"valueUrl": "http://www.wikidata.org/reference/{orcidReferenceHash}"
7272
},
7373
{
74-
"titles": "orcidReferenceValue",
75-
"name": "orcidReferenceValue",
76-
"datatype": "dateTime",
74+
"titles": "orcidReferenceValue_nodeId",
75+
"name": "orcidReferenceValue_nodeId",
76+
"datatype": "string",
7777
"aboutUrl": "http://www.wikidata.org/reference/{orcidReferenceHash}",
78-
"propertyUrl": "http://www.wikidata.org/prop/reference/P813"
78+
"propertyUrl": "http://www.wikidata.org/prop/reference/value/P813",
79+
"valueUrl": "http://example.com/.well-known/genid/{orcidReferenceValue_nodeId}"
80+
},
81+
{
82+
"titles": "orcidReferenceValue_val",
83+
"name": "orcidReferenceValue_val",
84+
"datatype": "dateTime",
85+
"aboutUrl": "http://example.com/.well-known/genid/{orcidReferenceValue_nodeId}",
86+
"propertyUrl": "http://wikiba.se/ontology#timeValue"
87+
},
88+
{
89+
"titles": "orcidReferenceValue_prec",
90+
"name": "orcidReferenceValue_prec",
91+
"datatype": "integer",
92+
"aboutUrl": "http://example.com/.well-known/genid/{orcidReferenceValue_nodeId}",
93+
"propertyUrl": "http://wikiba.se/ontology#timePrecision"
7994
},
8095
{
8196
"titles": "employerStatementUuid",
@@ -110,11 +125,26 @@
110125
"valueUrl": "{+employerReferenceSourceUrl}"
111126
},
112127
{
113-
"titles": "employerReferenceRetrieved",
114-
"name": "employerReferenceRetrieved",
115-
"datatype": "dateTime",
128+
"titles": "employerReferenceRetrieved_nodeId",
129+
"name": "employerReferenceRetrieved_nodeId",
130+
"datatype": "string",
116131
"aboutUrl": "http://www.wikidata.org/reference/{employerReferenceHash}",
117-
"propertyUrl": "http://www.wikidata.org/prop/reference/P813"
132+
"propertyUrl": "http://www.wikidata.org/prop/reference/value/P813",
133+
"valueUrl": "http://example.com/.well-known/genid/{employerReferenceRetrieved_nodeId}"
134+
},
135+
{
136+
"titles": "employerReferenceRetrieved_val",
137+
"name": "employerReferenceRetrieved_val",
138+
"datatype": "dateTime",
139+
"aboutUrl": "http://example.com/.well-known/genid/{employerReferenceRetrieved_nodeId}",
140+
"propertyUrl": "http://wikiba.se/ontology#timeValue"
141+
},
142+
{
143+
"titles": "employerReferenceRetrieved_prec",
144+
"name": "employerReferenceRetrieved_prec",
145+
"datatype": "integer",
146+
"aboutUrl": "http://example.com/.well-known/genid/{employerReferenceRetrieved_nodeId}",
147+
"propertyUrl": "http://wikiba.se/ontology#timePrecision"
118148
},
119149
{
120150
"titles": "affiliationStatementUuid",
@@ -149,11 +179,26 @@
149179
"valueUrl": "{+affiliationReferenceSourceUrl}"
150180
},
151181
{
152-
"titles": "affiliationReferenceRetrieved",
153-
"name": "affiliationReferenceRetrieved",
154-
"datatype": "dateTime",
182+
"titles": "affiliationReferenceRetrieved_nodeId",
183+
"name": "affiliationReferenceRetrieved_nodeId",
184+
"datatype": "string",
155185
"aboutUrl": "http://www.wikidata.org/reference/{affiliationReferenceHash}",
156-
"propertyUrl": "http://www.wikidata.org/prop/reference/P813"
186+
"propertyUrl": "http://www.wikidata.org/prop/reference/value/P813",
187+
"valueUrl": "http://example.com/.well-known/genid/{affiliationReferenceRetrieved_nodeId}"
188+
},
189+
{
190+
"titles": "affiliationReferenceRetrieved_val",
191+
"name": "affiliationReferenceRetrieved_val",
192+
"datatype": "dateTime",
193+
"aboutUrl": "http://example.com/.well-known/genid/{affiliationReferenceRetrieved_nodeId}",
194+
"propertyUrl": "http://wikiba.se/ontology#timeValue"
195+
},
196+
{
197+
"titles": "affiliationReferenceRetrieved_prec",
198+
"name": "affiliationReferenceRetrieved_prec",
199+
"datatype": "integer",
200+
"aboutUrl": "http://example.com/.well-known/genid/{affiliationReferenceRetrieved_nodeId}",
201+
"propertyUrl": "http://wikiba.se/ontology#timePrecision"
157202
},
158203
{
159204
"titles": "instanceOfUuid",

vanderbot/generate_direct_props.ipynb

Lines changed: 61 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"cells": [
33
{
44
"cell_type": "code",
5-
"execution_count": 2,
5+
"execution_count": null,
66
"metadata": {},
77
"outputs": [],
88
"source": [
@@ -24,7 +24,7 @@
2424
"# Configuration data\n",
2525
"# ---------------\n",
2626
"\n",
27-
"graph_name = 'https://github.com/HeardLibrary/linked-data/blob/29e5d02aaf00cb890792d7dee73707603a506b3e/json_schema/bluffton_presidents.csv'\n",
27+
"graph_name = 'https://raw.githubusercontent.com/HeardLibrary/linked-data/54bd94c609e9c5af6c558cd926939ded67cba2ae/json_schema/bluffton_presidents.csv'\n",
2828
"accept_media_type = 'text/turtle'\n",
2929
"sparql_endpoint = \"https://sparql.vanderbilt.edu/sparql\"\n",
3030
"request_header_dictionary = {\n",
@@ -59,7 +59,7 @@
5959
" exit()\n",
6060
" return(cred)\n",
6161
"\n",
62-
"def retrieve_direct_statements(sparql_endpoint):\n",
62+
"def retrieve_direct_statements(sparql_endpoint, graph_name):\n",
6363
" query = '''\n",
6464
"construct {?item ?directProp ?value.}\n",
6565
"from <''' + graph_name + '''>\n",
@@ -76,90 +76,84 @@
7676
" r = requests.get(sparql_endpoint, params={'query' : query}, headers=request_header_dictionary)\n",
7777
" return r.text\n",
7878
"\n",
79+
"def retrieve_time_statements(sparql_endpoint, graph_name, subject_type):\n",
80+
" # Happily, each subject type: \"statement\", \"reference\", and \"qualifier\" contains 9 characters.\n",
81+
" # so the string extraction is the same for all.\n",
82+
" query = '''\n",
83+
"prefix wikibase: <http://wikiba.se/ontology#>\n",
84+
"construct {?subject ?directProp ?timeValue.}\n",
85+
"from <''' + graph_name + '''>\n",
86+
"where {\n",
87+
" ?subject ?valueProperty ?value.\n",
88+
" ?value wikibase:timeValue ?timeValue.\n",
89+
" filter(substr(str(?valueProperty),1,45)=\"http://www.wikidata.org/prop/''' + subject_type + '''/value/\")\n",
90+
" bind(substr(str(?valueProperty),46) as ?id)\n",
91+
" bind(iri(concat(\"http://www.wikidata.org/prop/''' + subject_type + '''/\", ?id)) as ?directProp)\n",
92+
" }\n",
93+
"'''\n",
94+
" results = []\n",
95+
" r = requests.get(sparql_endpoint, params={'query' : query}, headers=request_header_dictionary)\n",
96+
" return r.text\n",
97+
"\n",
7998
"def perform_sparql_update(sparql_endpoint, pwd, update_command):\n",
8099
" # SPARQL Update requires HTTP POST\n",
81100
" hdr = {'Content-Type' : 'application/sparql-update'}\n",
82101
" r = requests.post(sparql_endpoint, auth=('admin', pwd), headers=hdr, data = update_command)\n",
83102
" print(str(r.status_code) + ' ' + r.url)\n",
84-
" print(r.text)\n"
103+
" print(r.text)\n",
104+
"\n",
105+
"def prep_and_update(sparql_endpoint, pwd, graph_name, graph_text):\n",
106+
" # remove prefixes from response Turtle, which are not necessary since IRIs are unabbreviated\n",
107+
" graph_text_list = graph_text.split('\\n')\n",
108+
" # print(graph_text_list)\n",
109+
" graph_text = ''\n",
110+
" for line in graph_text_list:\n",
111+
" try:\n",
112+
" if line[0] != '@':\n",
113+
" graph_text += line + '\\n'\n",
114+
" except:\n",
115+
" pass\n",
116+
" #print()\n",
117+
" #print(graph_text)\n",
118+
"\n",
119+
" if len(graph_text) != 0: # don't perform an update if there aren't any triples to add\n",
120+
" # Send SPARQL 1.1 UPDATE to endpoint to add the constructed triples into the graph\n",
121+
" update_command = '''INSERT DATA\n",
122+
" { GRAPH <''' + graph_name + '''> { \n",
123+
" ''' + graph_text + '''\n",
124+
" }}'''\n",
125+
"\n",
126+
" #print(update_command)\n",
127+
" perform_sparql_update(sparql_endpoint, pwd, update_command)\n",
128+
" else:\n",
129+
" print('no triples to write')"
85130
]
86131
},
87132
{
88133
"cell_type": "code",
89-
"execution_count": 3,
134+
"execution_count": null,
90135
"metadata": {},
91-
"outputs": [
92-
{
93-
"name": "stdout",
94-
"output_type": "stream",
95-
"text": [
96-
"constructed triples retrieved\n"
97-
]
98-
}
99-
],
136+
"outputs": [],
100137
"source": [
101138
"# ---------------\n",
102139
"# Construct the direct property statements entailed by the Wikibase model and retrieve from endpoint \n",
103140
"# ---------------\n",
104141
"pwd = load_credential(filename, directory)\n",
105142
"\n",
106-
"graph_text = retrieve_direct_statements(sparql_endpoint)\n",
143+
"graph_text = retrieve_direct_statements(sparql_endpoint, graph_name)\n",
107144
"#print(graph_text)\n",
108-
"print('constructed triples retrieved')"
109-
]
110-
},
111-
{
112-
"cell_type": "code",
113-
"execution_count": 4,
114-
"metadata": {},
115-
"outputs": [],
116-
"source": [
117-
"# remove prefixes from response Turtle, which are not necessary since IRIs are unabbreviated\n",
118-
"graph_text_list = graph_text.split('\\n')\n",
119-
"# print(graph_text_list)\n",
120-
"graph_text = ''\n",
121-
"for line in graph_text_list:\n",
122-
" try:\n",
123-
" if line[0] != '@':\n",
124-
" graph_text += line + '\\n'\n",
125-
" except:\n",
126-
" pass\n",
127-
"#print()\n",
128-
"#print(graph_text)"
129-
]
130-
},
131-
{
132-
"cell_type": "code",
133-
"execution_count": 5,
134-
"metadata": {},
135-
"outputs": [
136-
{
137-
"name": "stdout",
138-
"output_type": "stream",
139-
"text": [
140-
"200 https://sparql.vanderbilt.edu/sparql\n",
141-
"<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"><html><head><meta http-equiv=\"Content-Type\" content=\"text&#47;html;charset=UTF-8\"><title>blazegraph&trade; by SYSTAP</title\n",
142-
"></head\n",
143-
"><body<p>totalElapsed=0ms, elapsed=0ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p\n",
144-
"><hr><p>COMMIT: totalElapsed=251ms, commitTime=1598157003429, mutationCount=40</p\n",
145-
"></html\n",
146-
">\n",
147-
"\n",
148-
"done\n"
149-
]
150-
}
151-
],
152-
"source": [
153-
"# Send SPARQL 1.1 UPDATE to endpoint to add the constructed triples into the graph\n",
145+
"print('constructed direct triples retrieved')\n",
154146
"\n",
155-
"update_command = '''INSERT DATA\n",
156-
"{ GRAPH <''' + graph_name + '''> { \n",
157-
"''' + graph_text + '''\n",
158-
"}}'''\n",
147+
"prep_and_update(sparql_endpoint, pwd, graph_name, graph_text)\n",
148+
"print()\n",
159149
"\n",
160-
"#print(update_command)\n",
150+
"for subject_type in ['statement', 'reference', 'qualifier']:\n",
151+
" graph_text = retrieve_time_statements(sparql_endpoint, graph_name, subject_type)\n",
152+
" #print(graph_text)\n",
153+
" print('constructed direct ' + subject_type + ' time triples retrieved')\n",
161154
"\n",
162-
"perform_sparql_update(sparql_endpoint, pwd, update_command)\n",
155+
" prep_and_update(sparql_endpoint, pwd, graph_name, graph_text)\n",
156+
" print()\n",
163157
"\n",
164158
"print()\n",
165159
"print('done')"

0 commit comments

Comments
 (0)