📖 Guidelines for SPARQL endpoints metadata

Add precomputed metadata to your SPARQL endpoint to make it easier to query by humans and machines:

SPARQL examples give a good idea of the capabilities and known use-cases of your endpoint, as well as recommended query patterns. Most public endpoints already provide example queries, we propose to expose them directly in the SPARQL endpoint in a standard format.
A lightweigth classes schema provide an overview of all classes actually present in the endpoint and the predicates they use.

About the choice of standard used to define the classes schema

Ontologies are not schema. Ontologies describe possible concepts. A schema here describes the actual content of the endpoint.
ShEx/SHACL shapes are too detailed, constraints cannot automatically be inferred by querying the endpoint, and, more important, they do not enable to communicate counts of classes and predicates (e.g. there are 200 unique countries in this endpoint).
We chose to use the Vocabulary of Interlinked Datasets (VoID) description, which was also recommended by the Healthcare and Life Science (HCLS) community profile draft.

Caution

Most endpoints are too large to compute the classes schema on-the-fly, and doing so repeatedly would be computationally, and ecologically, very expensive, so it needs to be precomputed.

🧑‍🍳 How?

Precompute the metadata as RDF using command line tools, and upload it to the endpoint, either:

Directly in the endpoint, usually in a named graph ending with /.well-known/sparql-examples
Or in the endpoint’s service description

Important

Putting the metadata in the endpoint has many advantages:

The endpoint URL is all a user needs: any user or systems can directly retrieve the metadata from the endpoint using its URL, no need to know and query an external service to get the metadata
Cheap and efficient stack, no need to deploy and maintain an extra service and/or database
Can be queried from any client using a standard HTTP request, no need for ad hoc packages

The following ontologies are used to define the metadata:

SHACL ontology to represent SPARQL query examples
VoID ontology to represent the classes schema
VoID-ext, an extension to enable describing data properties in VoID (predicates that point to a value, like a string or int, instead of another node IRI)

💻 Apps using this framework

Example of systems relying on this metadata framework:

A chat app to help users write SPARQL queries for all SIB endpoints: expasy.org/chat
A query editor with context-aware autocomplete: sib-swiss.github.io/sparql-editor

🚀 Setup

Download the latest release of the 2 jar files used to compile metadata:

curl -s https://api.github.com/repos/sib-swiss/sparql-examples-utils/releases/latest \
  | grep "browser_download_url" \
  | grep "uber.jar" \
  | cut -d '"' -f 4 \
  | xargs curl -LO
curl -s https://api.github.com/repos/sib-swiss/void-generator/releases/latest \
  | grep "browser_download_url" \
  | grep "uber.jar" \
  | cut -d '"' -f 4 \
  | xargs curl -LO
mkdir -p data

📑 Document examples

Document query examples for your SPARQL endpoint in RDF turtle files, and use the sparql-examples-utils.jar CLI to validate and compile all files.

Fork this repository, or copy its examples/ folder in your project
Edit the content of the examples/ folder:
1. Each subfolder corresponds to a different endpoint
2. Record prefixes in the prefixes.ttl files
3. Prefixes in examples/prefixes.ttl are common to all endpoints, while the prefixes.ttl file in each subfolder is specific to the corresponding endpoint
4. Copy an existing example and adapt it to get started

Compile all query files for one endpoint into a RDF turtle file including prefix declarations (stored in the data/ folder):

java -jar sparql-examples-utils-*.jar convert -i examples/ -p UniProt -f ttl > data/examples.ttl

Or compile all endpoints as JSON-LD to the standard output:

java -jar sparql-examples-utils-*.jar convert -i examples/ -p all -f jsonld

Tip

See the sib-swiss/sparql-examples repository for more details.

🧮 Compute classes schema

Run the void-generator for your endpoint:

java -jar void-generator-*.jar -r https://sparql.wikipathways.org/sparql \
   -p https://sparql.wikipathways.org/sparql \
   --void-file data/void-wikipathway.ttl \
   --iri-of-void 'https://rdf.wikipathway.org/.well-known/void#' \
   -g http://rdf.wikipathways.org/

Depending on the size and structure of your endpoint, generating the classes schema may take a long time (up to hours). You can optimize the process using options such as:

--filter-expression-to-exclude-classes-from-void: exclude non-meaningful classes (e.g. very sparse ontology classes). Variable should be ?clazz
- Example excluding CHEBI classes:
```
--filter-expression-to-exclude-classes-from-void "!STRSTARTS(STR(?clazz), 'http://purl.obolibrary.org/obo/CHEBI_')"
```
--optimize-for: select triplestore query optimizations (Virtuoso, QLever, or default SPARQL)
--count-distinct-subjects false
--count-distinct-objects false

Example command optimized for a local Virtuoso endpoint using its JDBC connector:

java -jar void-generator-*.jar \
    --user dba \
    --password dba \
    --virtuoso-jdbc=jdbc:virtuoso://localhost:1111/charset=UTF-8 \ # note the localhost and "isql-t" port
    -r "https://YOUR_SPARQL_ENDPOINT/sparql" \
    -s data/void-file-locally-stored.ttl \
    -i "https://YOUR_SPARQL_ENDPOINT/.well-known/void"

Tip

See the sib-swiss/void-generator repository for more details.

📤 Upload the metadata to your endpoint

Once the RDF files are generated, upload them either to your endpoint or its service description.

Uploading to the endpoint is simpler, just load the compiled RDF into a named graph, but this mixes metadata and data. Some administrators might prefer a cleaner separation.

Exposing metadata through the endpoint service description is a good alternative (as this aligns with its original intended purpose), but most triplestores do not allow editing it directly. In that case, you could precompute the service description RDF, and serve it via a proxy rule when the endpoint is queried without a query parameter.

Tip

If you are looking for a system to automatize uploading to SPARQL endpoints, we recommend to look into kgsteward.

🪏 Retrieve the metadata

Example SPARQL queries to retrieve these metadata

Retrieve query examples:

PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX spex: <https://purl.expasy.org/sparql-examples/ontology#>

SELECT DISTINCT ?sq ?comment ?query
WHERE {
    ?sq a sh:SPARQLExecutable ;
        rdfs:comment ?comment ;
        sh:select|sh:ask|sh:construct|spex:describe ?query .
} ORDER BY ?sq

Classes schema without subject/objects count:

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>

SELECT DISTINCT ?subjectClass ?prop ?objectClass ?objectDatatype
WHERE {
  {
    ?cp void:class ?subjectClass ;
        void:propertyPartition ?pp .
    ?pp void:property ?prop .
    OPTIONAL {
        {
            ?pp  void:classPartition [ void:class ?objectClass ] .
        	
        } UNION {
            ?pp void-ext:datatypePartition [ void-ext:datatype ?objectDatatype ] .
        }
    }
  } UNION {
    ?linkset void:subjectsTarget [ void:class ?subjectClass ] ;
      void:linkPredicate ?prop ;
      void:objectsTarget [ void:class ?objectClass ] .
  }
}

Classes schema with subject/objects count:

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>

SELECT DISTINCT ?subjectsCount ?subjectClass ?prop ?objectClass ?objectsCount ?objectDatatype
WHERE {
  {
    ?cp void:class ?subjectClass ;
        void:entities ?subjectsCount ;
        void:propertyPartition ?pp .
    ?pp void:property ?prop .
    OPTIONAL {
        {
            ?pp  void:classPartition [ void:class ?objectClass ; void:triples ?objectsCount ] .
        } UNION {
            ?pp void-ext:datatypePartition [ void-ext:datatype ?objectDatatype ] .
        }
    }
  } UNION {
    ?linkset void:subjectsTarget [ void:class ?subjectClass ; void:entities ?subjectsCount ] ;
      void:linkPredicate ?prop ;
      void:objectsTarget [ void:class ?objectClass ; void:entities ?objectsCount ] .
  }
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📖 Guidelines for SPARQL endpoints metadata

🧑‍🍳 How?

💻 Apps using this framework

🚀 Setup

📑 Document examples

🧮 Compute classes schema

📤 Upload the metadata to your endpoint

🪏 Retrieve the metadata

About

Uh oh!

Releases

Packages

License

sib-swiss/sparql-guidelines

Folders and files

Latest commit

History

Repository files navigation

📖 Guidelines for SPARQL endpoints metadata

🧑‍🍳 How?

💻 Apps using this framework

🚀 Setup

📑 Document examples

🧮 Compute classes schema

📤 Upload the metadata to your endpoint

🪏 Retrieve the metadata

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages