Experimental BARTOC Search engine with indexing pipeline and discovery interface
This application extracts JSKOS data with metadata about terminologies from BARTOC knowledge organization systems registry (managed in jskos-server), transforms and enriches the data and loads it in into a Solr search index. The index is then made available via a search API and a discovery interface.
- an URL to download database dumps with JSKOS concept schemes (by default https://bartoc.org/data/dumps/latest.ndjson)
- a Solr search server instance with configured scheme as expected by BARTOC search
- a jskos-server instance with
/voc/changes
API endpoint (by default the BARTOC instance available at https://bartoc.org/api) for live updates - either Docker to run from a Docker image or Node.js >= 18 and Redis to run from sources
The repository contains a docker-compose.yml to start the application, Solr, and Redis with one command:
cd docker
docker-compose up --build
- The search app is available at http://localhost:3883.
- Solr Admin UI is at http://localhost:8983.
- Redis runs in the background at port 6379.
Ports are hard-coded, so no service must run at these ports.
A docker image of the application is published on every push on branches main
and dev
and when pushing a git tag starting with v
(e.g., v1.0.0
). Commits are ignored if they only modify documentation, GitHub workflows, config, or meta files.
See docker-compose.yml
in the docker
directory for usage.
Tip: For Docker and most local development, configuration is handled automatically in the config/
directory. The default setup works out of the box.
Note:
- Choose one approach:
- If you use Docker (recommended), do not create a
.env
file in the project root—Docker handles all configuration for you. - If you use
npm run dev
(without Docker), you must create a.env
file in the project root to define your local settings (e.g., database URLs, Solr, Redis). Theconfig/config.default.json
is primarily for Docker and CI setups, and should not be edited for local development.
- If you use Docker (recommended), do not create a
Uncomment and adjust values as needed for your environment. If you are running services via Docker, keep these lines commented out or remove the .env
file entirely.
Start redis and Solr from Docker images:
docker compose -f docker/docker-compose-backends.yml create --remove-orphans
docker compose -f docker/docker-compose-backends.yml start
Create a local config/config.json
to refer to these backend services:
{
"redis": {
"host": "localhost",
"pingTimeout": 10000,
"pingRetries": 5,
"pingRetryDelay": 1000,
"port": 6379
},
"solr": {
"batchSize": 500,
"coreName": "terminologies",
"host": "localhost",
"pingTimeout": 10000,
"pingRetries": 5,
"pingRetryDelay": 1000,
"port": 8983,
"version": 8.1
}
}
And a local .env
file:
CONFIG_FILE=./config/config.json
__VITE_ADDITIONAL_SERVER_ALLOWED_HOSTS=.coli-conc.gbv.de
BASE_URL=http://localhost:3883/
Then start bartoc-search for development:
npm run dev
-
Install prerequisites:
- Node.js >= 18
- jskos-server instance (local or remote)
- Solr instance with configured schema
- Redis
-
Clone and install dependencies
-
Configure environment:
- Create a
.env
file in the project root to define your local settings for Redis and Solr services, and websocket host (see below for example).
- Create a
# Redis configuration
REDIS_HOST=127.0.0.1
REDIS_PORT=6379
# Solr configuration
SOLR_HOST=127.0.0.1
SOLR_PORT=8983
# Websocket configuration acess to Jskos server changes API
# - If you are running the Jskos server in a Docker container, you can use the
# container name as the host, e.g., `ws://jskos-server:3000`
# - If you are running the Jskos server on your local machine, you can use
# `ws://localhost:3000` or `ws://127.0.0.
WS_HOST=ws://jskos-server:3000
Uncomment and adjust values as needed for your environment. If you are running services via Docker, keep these lines related to both Solr and Redis commented out.
- Start the app:
npm run dev
- The app will attempt to connect to all services and retry if any are temporarily unavailable.
- If Redis or Solr are not running, background jobs and search will be disabled, but the app will still start.
- Docker issues: Make sure Docker Desktop or the Docker daemon is running.
- Port conflicts: Stop any other services using ports 3883, 3000, 8983, 6379, or 27017.
- Service not available: The app will log warnings if Solr or Redis are unavailable, but will keep running for development convenience.
- Configuration: See the
config/
directory and comments inconfig.default.json
for all options.
You can customize the application settings via a configuration file. By default, this configuration file resides in config/config.json
. However, it is possible to adjust this path via the CONFIG_FILE
environment variable. The path has to be either absolute (i.e. starting with /
) or relative to the config/
folder (i.e. it defaults to ./config.json
). If the file exists and contains invalid JSON data, JSKOS Server will refuse to start.
Currently, there are only two environment variables:
NODE_ENV
- eitherdevelopment
(default) orproduction
CONFIG_FILE
- alternate path to a configuration file, relative to theconfig/
folder; defaults to./config.json
.
You can either provide the environment variables during the command to start the server, or in a .env
file in the root folder.
All missing keys will be defaulted from config/config.default.json
:
The web interface is currently being developed. Feedback is welcome!
This service exposes three HTTP endpoints:
- GET / – web interface (HTML)
- GET /api/search – search API (JSON)
- GET /api/status – service status (JSON)
The HTTP response code should always be 200 except for endpoint /api/search
if there is an error with the Solr backend.
Returns the discovery interface in form of an HTML page with an experimental Vue client.
...
Performs a search query against the Solr index, returning results along with query metrics.
All query parameters are optional.
| Name | Type | Description |
| --- | --- | --- | --- |
| search
| string | Search string |
| field
| string | specific field to search in |
| limit
| integer | Number of results to return (default: 10
) |
| sort
| string | Field to sort by (e.g. relevance
, created
, modified
, title
) (default: relevance) |
| order
| string | Sort direction: asc or desc (default: desc
) |
JSON object like the following example /api/search?search=Film&sort=modified&order=desc
:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "allfields:(\"Film\"^3)",
"defType": "lucene",
"start": "0",
"sort": "modified_dt desc",
"rows": "10",
"wt": "json"
}
},
"response": {
"numFound": 3684,
"start": 0,
"numFoundExact": true,
"docs": [
{
"alt_labels_ss": [
"Klassifikation för litteraturvetenskap",
"estetik",
"teatervetenskap",
"film- och televisionsforskning",
"Classification for comparative literature",
"aesthetics",
"theatre research",
"film and television studies"
],
"created_dt": "2015-04-17T14:19:00Z",
"ddc_ss": [
"7",
"790",
"80"
],
"id": "http://bartoc.org/en/node/1297",
"languages_ss": [
"en",
"fi",
"sv"
],
"modified_dt": "2025-07-14T14:00:00.900Z",
"publisher_id": "http://viaf.org/viaf/126520961",
"publisher_label": "Helsingin yliopisto, Kirjasto",
"subject_uri": [
"http://dewey.info/class/7/e23/",
"http://dewey.info/class/790/e23/",
"http://dewey.info/class/80/e23/"
],
"subject_notation": [
"7",
"790",
"80"
],
"subject_scheme": [
"http://bartoc.org/en/node/241",
"http://bartoc.org/en/node/241",
"http://bartoc.org/en/node/241"
],
"type_uri": [
"http://www.w3.org/2004/02/skos/core#ConceptScheme",
"http://w3id.org/nkos/nkostype#classification_schema"
],
"title_en": "Shelf Rating of Literary Research, Aesthetics, Theater Science, Film and Television Research",
"title_sort": "Shelf Rating of Literary Research, Aesthetics, Theater Science, Film and Television Research",
"description_en": "Subject-specific classification scheme used by the University of Helsinki Library for literary studies, aesthetics, theatre research, film and television studies.",
"title_und": "Kirjallisuudentutkimuksen, estetiikan, teatteritieteen, elokuva- ja televisiotutkimuksen hyllyluokitus",
"_version_": 1837876396177752000
},
// …more documents…
]
}
}
Field | Type | Description |
---|---|---|
responseHeader.status |
integer | Solr execution status (0 = success). |
responseHeader.QTime |
integer | Query execution time in milliseconds. |
responseHeader.params |
object | Echoes back the parameters used for the query. |
response.numFound |
integer | Total number of matching documents. |
response.start |
integer | Offset into the result set. |
response.numFoundExact |
boolean | Indicates if numFound is an exact count. |
response.docs |
array | Array of document objects matching the query. |
└─ id |
string | Unique document identifier (URI). |
└─ title_en |
string | English title of the thesaurus or concept scheme. |
└─ title_sort |
string | Title normalized for sorting. |
└─ title_und |
string | Title in the “undefined” (und) language. |
└─ description_en |
string | Short English description or abstract. |
└─ alt_labels_ss |
array | Alternative labels (multilingual). |
└─ languages_ss |
array | Languages available (ISO codes). |
└─ ddc_ss |
array | Dewey Decimal Classification notations. |
└─ publisher_id |
string | Identifier URI of the publishing organization. |
└─ publisher_label |
string | Human‐readable label of the publishing organization. |
└─ subject_uri |
array | URIs of subject classifications. |
└─ subject_notation |
array | Notation codes for subjects. |
└─ subject_scheme |
array | URIs of subject schemes. |
└─ type_uri |
array | URIs indicating the resource’s SKOS/NKOS type(s). |
└─ created_dt |
string | Creation timestamp (ISO-8601). |
└─ modified_dt |
string | Last modification timestamp (ISO-8601). |
└─ _version_ |
integer | Solr internal version number for optimistic concurrency. |
...
Returns a concise health check of the service, including environment and Solr index status.
Field | Type | Description |
---|---|---|
ok |
boolean | Whether the application is running fine |
config.env |
string | The environment the server is run in (e.g. development or production ) |
config.serverVersion |
string | Version number of the application |
config.title |
string | A custom title of the BARTOC Search application instance |
solr.connected |
boolean | Whether Solr responded to a basic stats query |
solr.indexedRecords |
number | Total number of documents currently indexed in the Solr terminologies core |
solr.lastIndexedAt |
string | ISO‑8601 timestamp of the most recent update of a record into the index |
jskos.connected |
boolean | Whether WebSocket connection (JSKOS API) has been established for updates |
In case of an error, for instance failed connection to Solr or to jskos-server backend, the response field ok
is set to false
.
The response may temporarily include additional fields for debugging.
Example:
{
"ok": true,
"config": {
"env": "development",
"serverVersion": "0.1.0",
"title": "BARTOC Search (dev)"
},
"solr": {
"connected": true
"indexedRecords": 3782,
"lastIndexedAt": "2025-07-13T10:28:27"
},
"jskosServer": {
"connected": true
}
}
graph TD
Solr[(🔎 Solr Index)]
DB[(BARTOC database<br>jskos-server)]
Redis[(🧩 Redis)]
BullMQ[(📦 BullMQ Queue)]
subgraph search service [ ]
direction TB
Server[⚙️ Search service]
BullMQ[(📦 BullMQ Queue)]
Client[🖥️ Vue Client]
end
Client[🖥️ Vue Client]
User[👤 User]
Applications
%% FLOWS %%
DB -- "changes" --> Server
DB -- "full dump" --> Server
Server -->|update| Solr
Solr -->|search| Server
Server -- "Queue Jobs" --> BullMQ
BullMQ -- "Backed by" --> Redis
Client -- "Browser" --> User
Server -- "API" --> Applications
Server -- "API" --> Client
The application consists of the following components:
- jskos-Server provides a real-time stream of vocabulary changes, which the Search service consumes via a WebSocket connection ("Watching Streams").
- Search service is the core backend, responsible for transforming and loading data into the Solr Index for search and discovery. It also manages background jobs using a BullMQ Queue.
- BullMQ Queue is used for job scheduling and processing, and is backed by a Redis instance for fast, reliable message handling.
- The Vue Client communicates with the Search service for user-facing search and discovery features.
- Users interact with the system through the browser, while external applications can directly access the API.
The application requires a jskos-server with Changes API to get live updates. The API endpoint can be configured in configuration key webSocket
or with WS_HOST
environment variable (e.g. wss://coli-conc.gbv.de/dev-api/voc/changes
for BARTOC production).
The backend service listens for vocabulary change events from the JSKOS server using a WebSocket connection. This is handled in src/server/composables/useVocChanges.ts
. See also from jskos-server
repository, here some reference
- Purpose:
The WebSocket connection allows the backend to receive real-time notifications about vocabulary changes (create, update, delete) and enqueue them for processing in Solr. - Configuration:
You can override the WebSocket endpoint by settingWS_HOST
in your environment (e.g., in your.env
file or defined in/config/config.default.json
aswebSocket
field.).
This section is about getting running the Solr service in a dockerized environment.
- Environment Variables (
.env
) - Docker Compose Setup
- Application Service Configuration
- Bootstrapping at Startup
- Solr Schema
- Troubleshooting
See docker-compose.yml
in directory docker
for boilerplate.
When indexDataAtBoot
is enabled, the app will automatically:
- Wait for Solr to be ready (with retries if needed)
- Download the latest NDJSON dump from BARTOC
- Parse and transform records on the fly
- Batch and index them into Solr
This is handled by connectToSolr()
and bootstrapIndexSolr()
—no manual steps required.
Read the documentation here.
-
Core never appears / persistent 503
- Increase
MAX_RETRIES
and/orRETRY_INTERVAL
inconnectToSolr()
. - Ensure
terminologies-configset
is correct and accessible by Solr.
- Increase
-
Indexing errors
- Check network logs for
POST /solr/terminologies/update?commit=true
. - Inspect Solr logs under
/var/solr/logs
inside the container.
- Check network logs for
-
Environment mismatch
- Ensure
config.solr.url
points tohttp://bartoc-solr:8983/solr
from within the app container.
- Ensure
Redis is used for fast, in-memory job queues and background processing (via BullMQ). If Redis is unavailable, background jobs are paused but the app continues to serve API and search requests. Connection settings are read from config or environment variables (localhost
in development, redis
in Docker). Jobs are retried automatically if Redis goes down temporarily.
You can monitor and manage background jobs (queues, workers, job status) using the bull-board UI. This is recommended for development and debugging.
Replace myQueue
with your actual BullMQ queue instance(s) and the board will be available at http://localhost:3883/admin/queues (or your app port).
Features:
- View, retry, or remove jobs
- Inspect job data and logs
- Monitor queue and worker status in real time
For more details, see the bull-board documentation.
- Provide a reliable pipeline to synchronize BARTOC database with a Solr index
- Enrich data before to improve search
- Experiment with relevance ranking and facetted search
- Node.js + TypeScript
- Vite for build tooling
- Docker & Docker Compose for containerization
- Jest for unit and integration tests (?) -- no tests at the moment
The search application architecture has been initialized using a combination of community-supported templates and official Vite SSR guidance:
-
SSR Template:
Bootstrapped from the create-vite-extra SSR Vue TS template, which provides a ready-to-use setup for server-side rendering with Vue 3 and TypeScript. -
Vite SSR Dev Server:
Configured following the Vite official guide on setting up the SSR development server, enabling seamless hot module replacement and middleware integration – see Vite SSR Guide.
- TypeScript strict mode enabled
- Use ESLint and Prettier (
npm run lint
) - Tests must be provided for new features
MIT © 2025- Verbundzentrale des GBV (VZG)