Parsing XML with German specific characters

Hello,
I have issue with parsing XML using XMLParser when XML contains German specific characters, for example ü

This code:
Code Block swift
import Foundation
let xml = "<Xml><Tag>Martin Hübner</Tag><Tag>Value</Tag></Xml>"
let data = xml.data(using: .utf8)!
let parser = XMLParser(data: data)
let parserDelegate = ParserDelegate()
parser.delegate = parserDelegate
parser.parse()
class ParserDelegate: NSObject, XMLParserDelegate {
  func parser(_ parser: XMLParser, foundCharacters string: String) {
    print("foundCharacters: \(string)")
  }
}

has following output in Playground:
Code Block
foundCharacters: Martin H
foundCharacters: übner
foundCharacters: Value

It looks to me like that XMLParser is not able to parse XML with German characters correctly, because with following XML:
Code Block swift
let xml = "<Xml><Tag>Hello World</Tag><Tag>Value</Tag></Xml>"

output is correct:
Code Block
foundCharacters: Hello World
foundCharacters: Value

Do you have any idea how to solve this issue?

Thank you
Answered by OOPer in 646147022

Do you have any idea how to solve this issue?

In XMLParser, parser(_:foundCharacters:) would be called separately in many cases other than containing German characters.

Code Block
import Foundation
let xml = "<Xml><Tag>An &apos;example&apos;</Tag><Tag>Value</Tag></Xml>"
let data = xml.data(using: .utf8)!
let parser = XMLParser(data: data)
let parserDelegate = ParserDelegate()
parser.delegate = parserDelegate
parser.parse()
class ParserDelegate: NSObject, XMLParserDelegate {
func parser(_ parser: XMLParser, foundCharacters string: String) {
print("foundCharacters: \(string)")
}
}


Outputs:
Code Block
foundCharacters: An
foundCharacters: '
foundCharacters: example
foundCharacters: '
foundCharacters: Value


You need to write some logic to connect strings where you want a single string.


An example:
Code Block
import Foundation
let xml = "<Xml><Tag>Martin Hübner</Tag><Tag>Value</Tag></Xml>"
//let xml = "<Xml><Tag>An &apos;example&apos;</Tag><Tag>Value</Tag></Xml>"
let data = xml.data(using: .utf8)!
let parser = XMLParser(data: data)
let parserDelegate = ParserDelegate()
parser.delegate = parserDelegate
parser.parse()
class ParserDelegate: NSObject, XMLParserDelegate {
var textForTag: String? = nil
func parser(_ parser: XMLParser, foundCharacters string: String) {
print("foundCharacters: \(string)")
textForTag? += string
}
func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String : String] = [:]) {
if elementName == "Tag" {
textForTag = ""
}
}
func parser(_ parser: XMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
if elementName == "Tag" {
print("textForTag=\(textForTag ?? "Something wrong")")
textForTag = nil
}
}
}


Outputs:
Code Block
foundCharacters: Martin H
foundCharacters: übner
textForTag=Martin Hübner
foundCharacters: Value
textForTag=Value


Accepted Answer

Do you have any idea how to solve this issue?

In XMLParser, parser(_:foundCharacters:) would be called separately in many cases other than containing German characters.

Code Block
import Foundation
let xml = "<Xml><Tag>An &apos;example&apos;</Tag><Tag>Value</Tag></Xml>"
let data = xml.data(using: .utf8)!
let parser = XMLParser(data: data)
let parserDelegate = ParserDelegate()
parser.delegate = parserDelegate
parser.parse()
class ParserDelegate: NSObject, XMLParserDelegate {
func parser(_ parser: XMLParser, foundCharacters string: String) {
print("foundCharacters: \(string)")
}
}


Outputs:
Code Block
foundCharacters: An
foundCharacters: '
foundCharacters: example
foundCharacters: '
foundCharacters: Value


You need to write some logic to connect strings where you want a single string.


An example:
Code Block
import Foundation
let xml = "<Xml><Tag>Martin Hübner</Tag><Tag>Value</Tag></Xml>"
//let xml = "<Xml><Tag>An &apos;example&apos;</Tag><Tag>Value</Tag></Xml>"
let data = xml.data(using: .utf8)!
let parser = XMLParser(data: data)
let parserDelegate = ParserDelegate()
parser.delegate = parserDelegate
parser.parse()
class ParserDelegate: NSObject, XMLParserDelegate {
var textForTag: String? = nil
func parser(_ parser: XMLParser, foundCharacters string: String) {
print("foundCharacters: \(string)")
textForTag? += string
}
func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String : String] = [:]) {
if elementName == "Tag" {
textForTag = ""
}
}
func parser(_ parser: XMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
if elementName == "Tag" {
print("textForTag=\(textForTag ?? "Something wrong")")
textForTag = nil
}
}
}


Outputs:
Code Block
foundCharacters: Martin H
foundCharacters: übner
textForTag=Martin Hübner
foundCharacters: Value
textForTag=Value


I don't see any problem. You can't just look at characters. Those are always supposed to be concatenated. You have to look at tags. When you find a start tag, then you start collecting characters. You append all the characters together until you get to that start tag's end tag. And if you find a new tag before you get to the end tag, well then you've got a new tree. You will have to connect the characters for the inner tag. Then, when you get back to the outer tag, you'll start a new text node (assuming you want to do everything correctly).
Parsing XML with German specific characters
 
 
Q