-
Notifications
You must be signed in to change notification settings - Fork 32
Closed
Description
Description
After running for a set of newspaper pages, recent eynollah (latest 0.5.0) produced PAGE 2019 xml files.
If one tries to parse these files using lxml, everything seems fine, but using Python's std libs (minidom, etree), it yields strange errors like not well-formed (invalid token)
. After some investigation this seems to originate from the encoding declaration <?xml version='1.0' encoding='utf8'?>
. There's a dash missing (should be utf-8
instead of utf8
) which seems to confuse the mentioned std parsers.
Probably no big deal, but it would ease usage for me, for example when evaluating output.
(Some more details: Parser tests)
Metadata
Metadata
Assignees
Labels
No labels