Performs keyword search over RDF data, with classic IR techniques, upon triple-based documents using Elasticsearch (ES).
This project initializes a REST API for exploiting the indexes created here. System's response contains a ranked list of both a) triples and b) entities.
Requires Java 8 (or later) and a running Elasticsearch (elasticsearch-6.8) instance.
Build project and package with mvn package. The generated .war file inside the Elas4RDF-search/target
folder can be used for deploying the service in any server (e.g. apache-tomcat).
When deployed, application expects an application.properties file with options about the ES running instance: elastic.address=[string] & elastic.port=[int].
If file is not found, options default to localhost:9200.
Application can serve different indexes (collections) by preserving a state (non-persistent) that corresponds to a given configuration.
Initialize an index through a POST /datasets method with request body a .json file with the following
syntax:
{
"id": <dataset_identifier>,
"index.name": <ES_index_name>,
"index.fields": {
<field_1> : <field_boost_1>,
<field_2> : <field_boost_2>,
...
}
}
On success, response contains a confirmation .JSON message.
Note, this .json file should be automatically created after completing the index process from here. An example
can be found in src/resources/examples/ folder.
Queries are expressed through the GET method while requests accept either a high-level or a low-level syntax.
-
High-Level syntax ->
GET /high-levelrequired
id=[string]query=[string]optional
size=[int]offset=[int]type=[string]idis the dataset identifier created in (1) andquerycan include any free-text keywords.sizecorresponds to the number of the returned triples,offsetsupports pagination (only for triples) andtypecorresponds to the answer return type: "triples", "entities" or "both" (default). -
Low-Level syntax ->
GET /low-levelrequired
body=[json]index[string]optional
size=[int]type[string]bodyis used for expressing more complicated queries through the use of ES Query DSL. Parameterindexcorreponds to the ES index name.e.g. a multi-match type query
body = { "query": { "multi_match" : { "query": "beatles abbey road", "fields": ["subjectKeywords", "objectKeywords^2", "rdfs_comment_sub"], "type" : "cross_fields" } } }
example: using the curl command a (high-level syntax) request can be expressed as:
`curl --header "Content-Type: application/json" --request GET '<host>:<port>/elas4rdf_rest/high-level/?id=dataset_id&query=the%20beatles'`