libxml-ruby 0 Rubygems

Libxml bindings for Ruby.

= LibXML Ruby

== Overview The libxml gem provides Ruby language bindings for GNOME’s Libxml2 XML toolkit. It is free software, released under the MIT License.

We think libxml-ruby is the best XML library for Ruby because:

  • Speed - Its much faster than REXML and Hpricot
  • Features - It provides an amazing number of featues
  • Conformance - It passes all 1800+ tests from the OASIS XML Tests Suite

== Requirements libxml-ruby requires Ruby 1.8.4 or higher. It depends on libxml2 to function propoerly. libxml2, in turn, depends on:

  • libm (math routines: very standard)
  • libz (zlib)
  • libiconv

If you are running Linux or Unix you’ll need a C compiler so the extension can be compiled when it is installed. If you are running Windows, then install the x64-mingw32 gem or build it yourself using Devkit (http://rubyinstaller.org/add-ons/devkit/) or msys2 (https://msys2.github.io/).

== INSTALLATION The easiest way to install libxml-ruby is via Ruby Gems. To install:

gem install libxml-ruby

If you are running Windows, then install the libxml-ruby-x64-mingw32 gem. THe gem inncludes prebuilt extensions for Ruby 2.3. These extensions are built using MinGW64 and libxml2 version 2.9.3, iconv version 1.14 and zlib version 1.2.8. Note these binaries are available in the lib\libs directory. To use them, put them someplace on your path.

The gem also includes a Microsoft VC++ 2012 solution and XCode 5 project - these are very useful for debugging.

libxml-ruby’s source codes lives on Github at https://github.com/xml4r/libxml-ruby.

== Getting Started Using libxml is easy. First decide what parser you want to use:

  • Generally you’ll want to use the LibXML::XML::Parser which provides a tree based API.
  • For larger documents that don’t fit into memory, or if you prefer an input based API, use the LibXML::XML::Reader.
  • To parse HTML files use LibXML::XML::HTMLParser.
  • If you are masochistic, then use the LibXML::XML::SaxParser, which provides a callback API.

Once you have chosen a parser, choose a datasource. Libxml can parse files, strings, URIs and IO streams. For each data source you can specify an LibXML::XML::Encoding, a base uri and various parser options. For more information, refer the LibXML::XML::Parser.document, LibXML::XML::Parser.file, LibXML::XML::Parser.io or LibXML:::XML::Parser.string methods (the same methods are defined on all four parser classes).

== Advanced Functionality Beyond the basics of parsing and processing XML and HTML documents, libxml provides a wealth of additional functionality.

Most commonly, you’ll want to use its LibXML::XML::XPath support, which makes it easy to find data inside a XML document. Although not as popular, LibXML::XML::XPointer provides another API for finding data inside an XML document.

Often times you’ll need to validate data before processing it. For example, if you accept user generated content submitted over the Web, you’ll want to verify that it does not contain malicious code such as embedded scripts. This can be done using libxml’s powerful set of validators:

  • DTDs (LibXML::XML::Dtd)
  • Relax Schemas (LibXML::XML::RelaxNG)
  • XML Schema (LibXML::XML::Schema)

Finally, if you’d like to use XSL Transformations to process data, then install the libxslt gem which is available at https://github.com/xml4r/libxslt-ruby.

== Usage For information about using libxml-ruby please refer to its documentation at http://xml4r.github.com/libxml-ruby/rdoc/index.html. Some tutorials are also available at https://github.com/xml4r/libxml-ruby/wiki.

All libxml classes are in the LibXML::XML module. The easiest way to use libxml is to require ‘xml’. This will mixin the LibXML module into the global namespace, allowing you to write code like this:

require ‘xml’ document = XML::Document.new

However, when creating an application or library you plan to redistribute, it is best to not add the LibXML module to the global namespace, in which case you can either write your code like this:

require ‘libxml’ document = LibXML::XML::Document.new

Or you can utilize a namespace for your own work and include LibXML into it. For example:

require ‘libxml’

module MyApplication include LibXML

class MyClass
  def some_method
    document = XML::Document.new
  end
end

end

For simplicity’s sake, the documentation uses the xml module in its examples.

== Memory Management libxml-ruby automatically manages memory associated with the underlying libxml2 library. There is however one corner case that your code must handle. If a node is imported into a document, but not added to the document, a segmentation fault may occur on program termination.

# Do NOT do this require ‘xml’ doc1 = XML::Document.string(“test1”) doc2 = XML::Document.string(“test2”) node = doc2.import(doc1.root)

If doc2 is freed before the node a segmentation fault will occur since the node references the document. To avoid this, simply make sure to add the node to the document:

# DO this instead doc1 = XML::Document.string(“test1”) doc2 = XML::Document.string(“test2”) doc2.root << doc2.import(doc1.root)

Alternatively, you can call node.remove! to disassociate the node from doc2.

== Threading libxml-ruby fully supports native, background Ruby threads. This of course only applies to Ruby 1.9.x and higher since earlier versions of Ruby do not support native threads.

== Performance In addition to being feature rich and conformation, the main reason people use libxml-ruby is for performance. Here are the results of a couple simple benchmarks recently blogged about on the Web (you can find them in the benchmark directory of the libxml distribution).

From http://depixelate.com/2008/4/23/ruby-xml-parsing-benchmarks

           user     system      total        real

libxml 0.032000 0.000000 0.032000 ( 0.031000) Hpricot 0.640000 0.031000 0.671000 ( 0.890000) REXML 1.813000 0.047000 1.860000 ( 2.031000)

From https://svn.concord.org/svn/projects/trunk/common/ruby/xml_benchmarks/

           user     system      total        real

libxml 0.641000 0.031000 0.672000 ( 0.672000) hpricot 5.359000 0.062000 5.421000 ( 5.516000) rexml 22.859000 0.047000 22.906000 ( 23.203000)

== Documentation Documentation is available via rdoc, and is installed automatically with the gem.

libxml-ruby’s online documentation is generated using Hanna, which is a development gem dependency.

Note that older versions of Rdoc, which ship with Ruby 1.8.x, will report a number of errors. To avoid them, install Rdoc 2.1 or higher. Once you have installed the gem, you’ll have to disable the version of Rdoc that Ruby 1.8.x includes. An easy way to do that is rename the directory ruby/lib/ruby/1.8/rdoc to ruby/lib/ruby/1.8/rdoc_old.

== Support

If you have any questions about using libxml-ruby, please report them to Git Hub at https://github.com/xml4r/libxml-ruby/issues

== License See LICENSE for license information.

Related Repositories

roxml

roxml

ROXML is a module for binding Ruby classes to XML. It supports custom mapping and bidirectional marshalling between Ruby and XML using annotation-style class methods, via Nokogiri or LibXML. ...

ratom

ratom

A fast, libxml based, Ruby Atom library supporting the Syndication Format and the Publishing Protocol. ...

libxml-ruby

libxml-ruby

Libxml bindings for Ruby. ...

fastxml

fastxml

ruby libxml library targetting speed and ease of use. provides an hpricot-like interface to xml ...

libxml-ruby

libxml-ruby

This project has moved to http://github.com/xml4r/libxml-ruby, please add pull requests and issues to that project.. ...


Top Contributors

cfis webgago julp yeban nkriege hainesr jarl-dk nikitug tmm1 llimllib carpodaster dudleyf blackwinter stevenwilliamson raorn trans brettg cjheath dbussink abrasive mfn szajbus ZI660 CITguy szimek ender672 hagabaka ebihara99999 lorensr phiggins

Releases

-   v2.9.0 zip tar
-   v2.8.0 zip tar
-   v2.7.0 zip tar
-   v2.6.0 zip tar
-   v2.5.0 zip tar
-   v2.4.0 zip tar
-   init zip tar
-   Root_MAINT_0_3_8 zip tar
-   Root_DEV_0_4 zip tar
-   REL_8_2 zip tar
-   REL_2_3_4 zip tar
-   REL_2_3_1 zip tar
-   REL_2_3_0 zip tar
-   REL_2_2_2 zip tar
-   REL_2_2_1 zip tar
-   REL_2_1_2 zip tar
-   REL_2_1_1 zip tar
-   REL_2_1_0 zip tar
-   REL_2_0_7 zip tar
-   REL_2_0_6 zip tar
-   REL_2_0_5 zip tar
-   REL_2_0_4 zip tar
-   REL_2_0_3 zip tar
-   REL_2_0_2 zip tar
-   REL_2_0_1 zip tar
-   REL_2_0_0 zip tar
-   REL_1_1_3 zip tar
-   REL_1_1_2 zip tar
-   REL_1_1_1 zip tar
-   REL_1_1_0 zip tar