Skip to content

XML ouput #196

@M3ssman

Description

@M3ssman

Description

After running for a set of newspaper pages, recent eynollah (latest 0.5.0) produced PAGE 2019 xml files.

If one tries to parse these files using lxml, everything seems fine, but using Python's std libs (minidom, etree), it yields strange errors like not well-formed (invalid token). After some investigation this seems to originate from the encoding declaration <?xml version='1.0' encoding='utf8'?> . There's a dash missing (should be utf-8 instead of utf8) which seems to confuse the mentioned std parsers.

Probably no big deal, but it would ease usage for me, for example when evaluating output.

(Some more details: Parser tests)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions