Scala Sax parser unable to process <!DOCTYPE XML> -


i want parse xml files sources beyong control (specifically suunto sporttesters). when loading them using scala xml.load() load fine, prefer sax (pull) parsing better performance. pull parsers seems not happy file header. see following example:

import scala.io.source import scala.xml.pull.xmleventreader val text = """<?xml version="1.0" encoding="iso-8859-1"?> <!doctype xml> <movescount moveslinkversion="1.2.41.0" timezone="60" >  <device sn="quest_2596420792" >   <model info="device;int;r" >120</model>   <name info="device;text;r" >quest</name>   <fullname info="device;text;r" >suunto quest</fullname>   <serialnumber info="device;int;r" >2596420792</serialnumber>   </device> </movescount>"""  val src = source.fromstring(text)  (ev <- new xmleventreader(src)) {   println(ev) } 

this prints error while parsing:

:2:14: whitespace expected

when delete line containing doctype or change <!doctype xml >, error goes away , file parses fine.

is bug in xml pull parser? if is, there possible workaround? xml comes external sources way beyond control.

after trying parser (aalto xml) think document malformed beyond hope , 1 needs fix before feeding parser. workaround skip doc type header when present using pushbackinputstream transform input stream.


Comments

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

go - golang pprof for c library code -