-
Notifications
You must be signed in to change notification settings - Fork 3
TEI Header
The TEI header provides the metadata for the work in question as well as the necessary information needed to extract unique passages.
The format for the TEI header used for this CTS text follows the TEI header of the Open Greek and Latin (OGL) team at UVa.
There are four main elements in the TEI header
- File Description
- Encoding Description
- Profile Description
File Description
For the file description, our text notes the <titleStmt>
, <publication>
, <sourceDesc>
elements.
In the <titleStmt>
element, we include the <title>
of the original text under the name of the manuscript title given by Ximenez. For the <author>
element, we provide both Ximenez, and the original K'iche' authors referenced in the text by their positions, e.g. Nim Ch'okoj k'ut chuwach Kaweqib' (gran maestro de la palabra ante los Kaweq)
. Finally, for the <respStmt>
, we provide the <orgName>
, the name of the organization encoding the text (the Multepal team), and the names of those involved in encoding the text under the <persName>
element.
Encoding Description
This discussion of the encoding description will mention hierarchal or tree-like "levels of division." Here it is used to describe the structural divisions of the text. For instance, a text that is encoded with the preservation of paragraphs and lines has those two levels of division. These structural divisions are therefore hierarchal in sense that each paragraph is belongs to a single text, and each line within a paragraph belongs to that paragraph.
The <encodingDesc>
element is essential for a text to be CTS-compliant. The three main elements used are the <editorialDesc>
element, and two <refsDecl>
elements. The <editorialDesc>
explains any editorial descriptions undertaken in the encoding of the text. There is also a stand alone paragraph element stating that "The following text is encoded in accordance with TEI standards and with the CTS/CITE architecture."
The first <refsDecl>
element provides pointer patterns to extract the necessary level of division within the text. It must have an n
attribute set equal to "CTS". Each level of division will have its own <cRefPattern>
element. Because our text only divides the text by paragraphs, there will be a single <cRefPattern>
element. This element must have three attributes.
First, it has an n
attribute set equal to the name of the division (currently "paragraph" for our text).
Secondly, it has a matchPattern
attribute set equal to a regular expression that matches the number of divisions used in the extraction. We only use one level of division, so our <matchPattern>
attribute is set equal to "(\w+)"
(however, if we were to extract lines as well, this would look like "(\w+).(\w+)"
).
Finally, it has a replacementPattern
attribute set equal to an XPath statement that shows how the text arrives from the root node to a given level of division. To extract paragraphs, this attribute is set equal to "#xpath(/tei:TEI/tei:text/tei:body/tei:div/tei:div[@n='$1'])"
. Looking at the XML file containing our text, if you were to find the each of elements after tei:
, you will see that it arrives at the level of division of paragraphs. At the final tei:div
, the [@n='$1']
essentially looks at the n
attribute of each "paragraph," such that a request for paragraph 5 looks for a div
at this level with an n
attribute of 5.
The second <refsDecl>
element has a single empty <refState>
element, with a unit
attribute set equal to the name of the level of division.
Profile Description
The <profileDesc>
element uses two elements. First, the <langUsage>
element, which describes the language of the text. This is formatted with the ident
attribute set equal to the ISO 639 code of the language of the text. Secondly, the <creation>
element, which contains information about the creation of a text, e.g. phrases describing the origin of the text, e.g. the date and place of its composition.
Next: Main text