-
Notifications
You must be signed in to change notification settings - Fork 11
Closed
Description
XML v0.3.5
julia> xml1 = "<si><t>Ends with non-breaking space 0x00A0 </t></si>"
"<si><t>Ends with non-breaking space 0x00A0 </t></si>"
julia> using XML
julia> d=XML.parse(XML.LazyNode, xml1)
LazyNode (depth=0) Document
julia> n=next(d)
LazyNode (depth=1) Element <si>
julia> n=next(n)
LazyNode (depth=2) Element <t>
julia> n=next(n)
LazyNode (depth=3) Text "Ends with non-breaking space 0x00A0\xc2"
julia>
The non-breaking space consists of two bytes: Char: ' ' → Code units: 2 → Bytes: UInt8[0xc2, 0xa0]
The second of these (0xa0) is recognised by isspace
as a space and so the previous byte (0xc2) incorrectly remains in the text.
It is not clear to me if this is an issue with XML.jl or with Base.isspace, but it is easily fixable in XML.jl
Metadata
Metadata
Assignees
Labels
No labels