Skip to content

sib-swiss/sparql-guidelines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“– Guidelines for SPARQL endpoints metadata

Add precomputed metadata to your SPARQL endpoint to make it easier to query by humans and machines:

  • SPARQL examples give a good idea of the capabilities and known use-cases of your endpoint, as well as recommended query patterns. Most public endpoints already provide example queries, we propose to expose them directly in the SPARQL endpoint in a standard format.
  • A lightweigth classes schema provide an overview of all classes actually present in the endpoint and the predicates they use.
About the choice of standard used to define the classes schema

Caution

Most endpoints are too large to compute the classes schema on-the-fly, and doing so repeatedly would be computationally, and ecologically, very expensive, so it needs to be precomputed.

๐Ÿง‘โ€๐Ÿณ How?

Precompute the metadata as RDF using command line tools, and upload it to the endpoint, either:

  • Directly in the endpoint, usually in a named graph ending with /.well-known/sparql-examples
  • Or in the endpointโ€™s service description

Important

Putting the metadata in the endpoint has many advantages:

  • The endpoint URL is all a user needs: any user or systems can directly retrieve the metadata from the endpoint using its URL, no need to know and query an external service to get the metadata
  • Cheap and efficient stack, no need to deploy and maintain an extra service and/or database
  • Can be queried from any client using a standard HTTP request, no need for ad hoc packages

The following ontologies are used to define the metadata:

  • SHACL ontology to represent SPARQL query examples
  • VoID ontology to represent the classes schema
  • VoID-ext, an extension to enable describing data properties in VoID (predicates that point to a value, like a string or int, instead of another node IRI)

๐Ÿ’ป Apps using this framework

Example of systems relying on this metadata framework:

๐Ÿš€ Setup

Download the latest release of the 2 jar files used to compile metadata:

curl -s https://api.github.com/repos/sib-swiss/sparql-examples-utils/releases/latest \
  | grep "browser_download_url" \
  | grep "uber.jar" \
  | cut -d '"' -f 4 \
  | xargs curl -LO
curl -s https://api.github.com/repos/sib-swiss/void-generator/releases/latest \
  | grep "browser_download_url" \
  | grep "uber.jar" \
  | cut -d '"' -f 4 \
  | xargs curl -LO
mkdir -p data

๐Ÿ“‘ Document examples

Document query examples for your SPARQL endpoint in RDF turtle files, and use the sparql-examples-utils.jar CLI to validate and compile all files.

  1. Fork this repository, or copy its examples/ folder in your project
  2. Edit the content of the examples/ folder:
    1. Each subfolder corresponds to a different endpoint
    2. Record prefixes in the prefixes.ttl files
    3. Prefixes in examples/prefixes.ttl are common to all endpoints, while the prefixes.ttl file in each subfolder is specific to the corresponding endpoint
    4. Copy an existing example and adapt it to get started

Compile all query files for one endpoint into a RDF turtle file including prefix declarations (stored in the data/ folder):

java -jar sparql-examples-utils-*.jar convert -i examples/ -p UniProt -f ttl > data/examples.ttl

Or compile all endpoints as JSON-LD to the standard output:

java -jar sparql-examples-utils-*.jar convert -i examples/ -p all -f jsonld

Tip

See the sib-swiss/sparql-examples repository for more details.

๐Ÿงฎ Compute classes schema

Run the void-generator for your endpoint:

java -jar void-generator-*.jar -r https://sparql.wikipathways.org/sparql \
   -p https://sparql.wikipathways.org/sparql \
   --void-file data/void-wikipathway.ttl \
   --iri-of-void 'https://rdf.wikipathway.org/.well-known/void#' \
   -g http://rdf.wikipathways.org/

Depending on the size and structure of your endpoint, generating the classes schema may take a long time (up to hours). You can optimize the process using options such as:

  • --filter-expression-to-exclude-classes-from-void: exclude non-meaningful classes (e.g. very sparse ontology classes). Variable should be ?clazz

    • Example excluding CHEBI classes:

      --filter-expression-to-exclude-classes-from-void "!STRSTARTS(STR(?clazz), 'http://purl.obolibrary.org/obo/CHEBI_')"
  • --optimize-for: select triplestore query optimizations (Virtuoso, QLever, or default SPARQL)

  • --count-distinct-subjects false

  • --count-distinct-objects false

Example command optimized for a local Virtuoso endpoint using its JDBC connector:

java -jar void-generator-*.jar \
    --user dba \
    --password dba \
    --virtuoso-jdbc=jdbc:virtuoso://localhost:1111/charset=UTF-8 \ # note the localhost and "isql-t" port
    -r "https://YOUR_SPARQL_ENDPOINT/sparql" \
    -s data/void-file-locally-stored.ttl \
    -i "https://YOUR_SPARQL_ENDPOINT/.well-known/void"

Tip

See the sib-swiss/void-generator repository for more details.

๐Ÿ“ค Upload the metadata to your endpoint

Once the RDF files are generated, upload them either to your endpoint or its service description.

Uploading to the endpoint is simpler, just load the compiled RDF into a named graph, but this mixes metadata and data. Some administrators might prefer a cleaner separation.

Exposing metadata through the endpoint service description is a good alternative (as this aligns with its original intended purpose), but most triplestores do not allow editing it directly. In that case, you could precompute the service description RDF, and serve it via a proxy rule when the endpoint is queried without a query parameter.

Tip

If you are looking for a system to automatize uploading to SPARQL endpoints, we recommend to look into kgsteward.

๐Ÿช Retrieve the metadata

Example SPARQL queries to retrieve these metadata

Retrieve query examples:

PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX spex: <https://purl.expasy.org/sparql-examples/ontology#>

SELECT DISTINCT ?sq ?comment ?query
WHERE {
    ?sq a sh:SPARQLExecutable ;
        rdfs:comment ?comment ;
        sh:select|sh:ask|sh:construct|spex:describe ?query .
} ORDER BY ?sq

Classes schema without subject/objects count:

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>

SELECT DISTINCT ?subjectClass ?prop ?objectClass ?objectDatatype
WHERE {
  {
    ?cp void:class ?subjectClass ;
        void:propertyPartition ?pp .
    ?pp void:property ?prop .
    OPTIONAL {
        {
            ?pp  void:classPartition [ void:class ?objectClass ] .
        	
        } UNION {
            ?pp void-ext:datatypePartition [ void-ext:datatype ?objectDatatype ] .
        }
    }
  } UNION {
    ?linkset void:subjectsTarget [ void:class ?subjectClass ] ;
      void:linkPredicate ?prop ;
      void:objectsTarget [ void:class ?objectClass ] .
  }
}

Classes schema with subject/objects count:

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>

SELECT DISTINCT ?subjectsCount ?subjectClass ?prop ?objectClass ?objectsCount ?objectDatatype
WHERE {
  {
    ?cp void:class ?subjectClass ;
        void:entities ?subjectsCount ;
        void:propertyPartition ?pp .
    ?pp void:property ?prop .
    OPTIONAL {
        {
            ?pp  void:classPartition [ void:class ?objectClass ; void:triples ?objectsCount ] .
        } UNION {
            ?pp void-ext:datatypePartition [ void-ext:datatype ?objectDatatype ] .
        }
    }
  } UNION {
    ?linkset void:subjectsTarget [ void:class ?subjectClass ; void:entities ?subjectsCount ] ;
      void:linkPredicate ?prop ;
      void:objectsTarget [ void:class ?objectClass ; void:entities ?objectsCount ] .
  }
}

About

๐Ÿ“– Guidelines on how to properly describe and make available metadata about SPARQL endpoints

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published