Skip to content
112 changes: 112 additions & 0 deletions docs/event/metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
---
sidebar_position: 2
---

# Metadata


## Fields

(🟦) Metadata fields marked with this symbol are *Opencast-managed*: they are read-only for users/external applications. All other fields can be freely changed, as long as validity checks pass.

When *duplicating* an event, all fields are copied 1:1 unless specified otherwise.

### General
- `id: ID` 🟦: unique among all events.
- Can be chosen when creating an event, but cannot be changed afterwards.
- If no ID is specified while creating an event, Opencast generates a generated unguessable ID.
- When duplicating an event, the new event gets a new unguessable ID.
- `title: NonBlankString`: a short title that is the main label associated with this event for users. Plain text.
- `description: string?`: user-specified, human-readable description, potentially quite long.
- TODO: Decide whether this is plain text, markdown or anything else. External apps displaying this need to know that. Some basic formatting options might be nice?
- `creators: NonBlankString[]`: The people mainly responsible for creating this video and/or presenting the talk which this video is a recording of. Should contain human-readable names and not usernames. Plain text. This is the main "who?"-information shown in the UIs; other fields in `extraMetadata` (e.g. `dct:contributor`) might be shown too, but less prominently.
- `language: LangCode?`: describes the (main) language of this event and its metadata. For example, the audio language and (if applicable) language of video content is more important than the language of available subtitles. Generally, assets can have their own language specified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what if you have multiple audio tracks with different languages? If this can be specified at the asset level, why have this at the event level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhh interesting. So when e.g. Tobira would want to a show a language for an event, it would be derived from the track properties... maybe? But how would that work for an uploader UI? I am assuming that we do not always can/want to auto-detect the language, so it is useful for the user/video manager to specify a language when uploading, right? But what if a video file with multiple tracks is uploaded? Mh... Does the user then need to specify the lang per track? Or does an uploader only have a single "language" field? Since multiple audio tracks are likely very rare?

- `series: SeriesID?`: optional ID of the series this event belongs to. Must be a valid series ID of an existing series at all time.
- `owner: Username`: TODO figure out details

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow owner to also be a group rather than just an individual?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would like to only ever have a single individual (one person to "speak" to), but I see that others might want to have multiple users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past, people disagreed on the definition of the term "owner". Mainly is this the person legally owning this piece of media vs is this the person responsible for uploading this and having all the rights to manage this video. IMO it should be the latter and leave copyright stuff to other fields.

Note: ILIAS and Moodle encode ownership in the ACLs with a specific owner role (ROLE_OWNER_{username} by default). This makes sense if the owner should have all access rights. On the other hand, you could not include the write action, which doesn't make sense... On the other other hand, if access rights are controlled by metadata, this feels wrong.

This is not written here, but I would not require the username to exist. Uploading from Moodle / ILIAS is possible without Opencast knowing that the user exists. And I don't want Moodle / ILIAS to create user records.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As obvious by the "TODO", I will still think about all of this, but here are a few points that are already clear in my head:

  • Yes, the copyright/legal stuff I would solve with other fields, most likely DC rightsHolder
  • The owner field will not affect access rights, only the ACL does that.
  • Yes the username does not need to exist. The check is simply not possible with the wide range of auth systems we want to support.

- `createdBy: Username` 🟦: username of the user that created this event.
- Like `created`, this refers to the moment when the event is first added to the database, not necessarily when it is ingested.
- This refers to the user with which the API request is authorized, e.g. potentially an API user and not referring to an actual human person.
- Technical field, not intended to be shown to normal users.
- Set by Opencast and cannot be changed.
- When duplicating an event, the new event has this field set to the duplicating user.

### Time-data
- `startTime: DateTime?`: Actual real life datetime when the video recording started or will start, with timezone. If this is not applicable, for example because it's a short movie, this should be undefined. UIs should use this as primary date to show for a video and if unset, fallback to `created`.
- `endTime: DateTime?`: Like `startTime`, but when the video recording stopped. Due to cutting, recording pauses and etc, the `duration` is not necessarily `end - start`.
Comment on lines +34 to +35
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no longer a differentiate between technical (e.g. with a buffer) and bibliographical (i.e. without buffer and shown in UIs) start and end times?

Copy link
Member Author

@LukasKalbertodt LukasKalbertodt Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see a point to having the "technical time" in the metadata? To me it seems like it's just an implementation detail of the scheduler service. But I might be wrong.

EDIT: Although, yeah maybe the scheduler service shouldn't have any event-specific data and it should be included in the data model? Otherwise some external service is interested in that scheduling information and then it has to talk to a specific service and that's how we ended up with this mess...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KatrinIhler wrote:

  • careful: start date is currently the bibliographic start date, not the scheduling start date
    • in general, how do we handle the data the scheduler currently has?

- `duration: Milliseconds` 🟦: duration of the event. As specified in ["assets"](./assets), this needs to always match the duration of all non-internal tracks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written here, I disagree that this should be a requirement and I think we don't need this field on the event level.

Comment on lines +34 to +36

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that endTime would be the endTime used to schedule the recording.
But there could be a difference between the scheduled endTime and the real live end time of the recording.
An event scheduled to end at, say, 16:00 might end early and they manually stop the recording on the recording device at, say, 15:47. I'm not sure if it's necessary to have the real live end time as an event attribute, but certainly the duration of the tracks would be different to the duration of the event.

Is it even necessary to have the duration on the event level? From my point of view, the duration should always be endTime - startTime, so it would be redundant information.

- `modified: Timestamp` 🟦: Timestamp of when anything about this event was last changed.
- More precisely: at any point in time since `modified`, all fields, assets, ACL and any other part of the event data model need to have the exact same value as they have at the present moment.
Whenever anything about an event described in this data model changes, `modified` has to be set to `now()`.
- Noteworthy case: when a series is deleted and the event's `series` field is set to `null`, the event's `modified` needs to change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want this configurable? Currently, you can configure if deleting a non-empty series is forbidden or allowed and leaving the events without a series. I would suggest disallowing deleting non-empty series and making this not configurable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hu interesting, that's fine by me. But I suspect that at least some people want an easy way to remove the series from a bunch of videos at the same time. But 🤷 I can also just remove this note from the spec, so that the spec does not directly say anything about this topic.

- Opencast should try its best to not update `modified` when it's not necessary (e.g. when the title is set to the current value), but it is not a bug if `modified` is set to `now()` unnecessarily.
- When duplicating an event, the new event has `modified = now()`, i.e. it is not copied.
- `created: Timestamp` 🟦: Timestamp of when the event was created in Opencast. It is set once when the event is first stored in Opencast's DB, and never changed again.
- This also implies that scheduled event's `created` date is when the scheduling took place, _not_ the time it is scheduled for (that would be `startDate`).
- When duplicating an event, the new event has `created = now()`, i.e. it is not copied.


### Flags

- `isLive: bool` 🟦: TODO this is currently stored per track, figure out if that's useful


### Extra metadata
- `extraMetadata: Map<Label, Json>`: additional metadata that Opencast never interprets, but just stores and passes along.
- The keys of this map consists of a _namespace_ and a _field name_, separated by `:`, i.e. `ns:name`. Both parts must consist of only `a-z`, `A-Z`, `0-9`, `-` and `_`.
- The values of this map are arbitrary JSON values, i.e. Booleans, numbers, strings, arrays, maps or `null`. Items of arrays & maps can recursively be any JSON value as well.
- Unlike the "extended metadata" before, using `extraMetadata` does work out of the box and does not incur any relevant performance overhead. Therefore, applications are encouraged to add useful data here, e.g. `studip:course-id`, `oc-studio:version` or `ethz:room-number`.

#### Community documentation

There should be a community document for specifying rules, as well as collecting used fields and best practices around `extraMetadata`.
That way, common requirements are identified quickly and the community can converge towards a standard.

> Official namespaces:
> - **`oc`**: reserved.
> - **`ocx`**: OC eXperimental for fields that might get promoted to core metadata field in the future.
> - `ocx:downloadable: bool`: a flag indicating whether users are allowed to download this video (i.e. tracks attached to this event). This can inform external apps whether to show a download button or to enable anti-download protection. The exact effects of this flag are deliberately unspecified, this merely states an *intent*.
> - `ocx:listed: bool`: specifies whether this event should be considered "list", meaning that users can find it via search. If it is `false`, users have to know the ID of the event (e.g. via a series or playlist) in order to access it.
> - `ocx:explicitContent: bool`: specifies whether this event contains content that is considered "explicit", like swear words or whatnot. This is required for some integrations like iTunes.
> - **`dct`**: refers to the Dublin Core Terms specification, e.g. `dct:rightsHolder` refers to [the `rightsHolder` property](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/rightsHolder) of DC terms. It should be avoided to set fields that are already mapped from core fields, like `dct:title`, which is mapped to the OC core metadatum `title`.
>
> Community namespaces:
> - ...

Generally, anyone should be able to add to the "Community namespaces" section via pull request against this document:
its purpose is to document what is used, not for the OC committers to control what external apps do.
In other words: a PR against that section is *not* asking for permission.





## Dublin Core mapping

As you can see above, the event metadata is not literally a Dublin Core catalog anymore.
Dublin Core should be considered an *exchange format*, not a storage format!
Of course, standards make sense and therefore, many Opencast metadata fields correspond exactly to a Dublin Core field.
Opencast can offer an API that emits a DCC for an event.
That DCC XML is created on the fly in the API handler.

The mapping is as follows:

- [`identifier`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/identifier): `id`
- [`title`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/title): `title`
- [`description`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/description): `description`
- [`creator`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/creator): `creators`
- [`language`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/language): `language`
- [`isPartOf`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/isPartOf): `series` (i.e. the ID)
- [`modified`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/modified): `modified`
- [`extend`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/extent): `duration` as ISO 8601 duration
- `date` or `temporal`???: TODO combination of `startDate` and `endDate`
- `created` or `dateSubmitted` or `issued`???: TODO `created`
- Additionally, all fields in `extraMetadata` with `dct` namespace are mapped directly to the corresponding property, e.g. `dct:license` is mapped to [`license`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/license).

TODO: Should we also define a mapping for OAIMPH? Does that make sense?

---

:::danger[Open questions]

- ...
:::