I was going to title this post “LibXML — how to fail right out of the box” but then thought a more accurately description of the problem might be better. There is something I don’t understand about the open source community: its tolerance. I encountered this problem right out of the gate:
<?xml version="1.0" encoding="UTF-8"?> <root_node> <elem1 attr1="val1" attr2="val2"/> <elem2 attr1="val1" attr2="val2"/> <elem3 attr="baz"> <elem4/> <elem5> <elem6>Content for element 6</elem6> </elem5> </elem3> </root_node>
The resulting root children:
Do tell me why whitespace and CRLF’s are seen as empty nodes? Do tell me why this is absurd behavior is tolerated as the default? I had to google for some indication as to what’s going on. This fixes the problem (Ruby code):
doc = XML::Document.file('foo.xml', : encoding => XML::Encoding::UTF_8, :options => LibXML::XML::Parser::Options::NOBLANKS)
Other than that, it looks like a decent enough package, though I haven’t explored it further.
The libxml-ruby gem
The libxml-ruby gem worked fine, but I did have to, as the documentation says, copy three binaries into the one of the directories in the Windows path. Happily, the gem comes with the precompiled binaries – that’s a real help and kudos to the gem authors for providing the MinGW32 binaries.