XML ouput

# Description

After running for a set of newspaper pages, recent eynollah (latest 0.5.0) produced PAGE 2019 xml files.

If one tries to parse these files using lxml, everything seems fine, but using Python's std libs (minidom, etree), it yields strange errors like `not well-formed (invalid token)`. After some investigation this seems to originate from the encoding declaration `<?xml version='1.0' encoding='utf8'?>` . There's a dash missing (should be `utf-8` instead of `utf8`) which seems to confuse the mentioned std parsers.

Probably no big deal, but it would ease usage for me, for example when evaluating output.

(Some more details: [Parser tests](https://github.com/ulb-sachsen-anhalt/digital-eval/blob/main/tests/test_digital_object_ocr_page_eynollah.py))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

XML ouput #196

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

XML ouput #196

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions