LibXML — empty nodes (and the libxml-rails gem)

I was going to title this post “LibXML — how to fail right out of the box” but then thought a more accurately description of the problem might be better.  There is something I don’t understand about the open source community: its tolerance.  I encountered this problem right out of the gate:

The XML:

<?xml version="1.0" encoding="UTF-8"?>
<root_node>
<elem1 attr1="val1" attr2="val2"/>
<elem2 attr1="val1" attr2="val2"/>
<elem3 attr="baz">
  <elem4/>
  <elem5>
    <elem6>Content for element 6</elem6>
  </elem5>
</elem3>
</root_node>

The resulting root children:

libxml-1

 

 

 

 

Do tell me why whitespace and CRLF’s are seen as empty nodes?  Do tell me why this is absurd behavior is tolerated as the default?  I had to google for some indication as to what’s going on.  This fixes the problem (Ruby code):

doc = XML::Document.file('foo.xml', :
  encoding => XML::Encoding::UTF_8, 
  :options => LibXML::XML::Parser::Options::NOBLANKS)

Other than that, it looks like a decent enough package, though I haven’t explored it further.

The libxml-ruby gem

The libxml-ruby gem worked fine, but I did have to, as the documentation says, copy three binaries into the one of the directories in the Windows path.  Happily, the gem comes with the precompiled binaries – that’s a real help and kudos to the gem authors for providing the MinGW32 binaries.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s