Ruby Tutorial – Downloading and Parsing XML

Everywhere I look now it’s Ruby, Ruby, Ruby and I think it’s about time I figure out what’s so great about this language. As a disclaimer, this tutorial will be pretty much the first thing I’ve ever done with Ruby, so the code might not be what a Ruby expert would consider the most efficient way to get things done. But I don’t care, we all gotta learn somewhere, and I decided to learn by using Ruby’s standard library to download and parse some XML.

I’ve already heard that no Ruby developer uses the standard library for stuff like this, but I’d rather start at the bottom before experimenting with Ruby Gems. All that aside, what I’m going to build today is a very basic Twitter reader, which we’ve done here at SOTC several times with several other languages. Twitter provides a very easy API and a fairly simple XML format that makes experimenting with it easy to do.

Downloading XML

The first thing we need to do is download some XML. After perusing Ruby’s Net::HTTP class, here’s the approach I decided to take:

require ‘net/http’

# Build the feed’s URI
uri = URI.parse(‘http://www.twitter.com/statuses/user_timeline.xml?screen_name=SwitchOnTheCode’);

# Request the twitter feed.
begin
  response = Net::HTTP.get_response(uri);
  if response.code != "200"
    raise
  end
rescue
  puts "Failed to request Twitter feed."
  exit 1
end

The very first thing we have to do is require the Net::HTTP library. There seems to be a few million ways to make simple requests, but I liked this one the best because of the error handling that’s available. get_response will raise an exception if something generally bad happens – no internet connection, twitter.com is down, etc. Beyond that, Twitter’s web server can return several HTTP error codes that indicate a different problem. The code for OK is 200, so if the response contains a code other than 200, something bad happened. In that case, I then raise my own exception so both errors are caught by the same rescue block and the script exits. If everything succeeds, we’ve downloaded our Twitter feed – just that easy.

Parsing XML

I’m now going to modify the script to parse the XML returned by get_response.

require ‘net/http’
require ‘rexml/document’

# Build the feed’s URI
uri = URI.parse(‘http://www.twitter.com/statuses/user_timeline.xml?screen_name=SwitchOnTheCode’);

# Request the twitter feed.
begin
  response = Net::HTTP.get_response(uri);
  if response.code != "200"
    raise
  end
rescue
  puts "Failed to request Twitter feed."
  exit 1
end

begin
  xmlDoc = REXML::Document.new response.body
rescue
  puts "Received invalid XML."
  exit 1
end

# Parse the data and print it in a friendly way.
xmlDoc.elements.each(‘statuses/status’) {
  |status|
  puts "%s – %s\n\n" %
    [
      status.elements[‘created_at’].text,
      status.elements["text"].text
    ]
}

To parse the XML I’m going to use the REXML library. The first thing I do is create a new REXML::Document class and supply it with the body (the XML) of our HTTP response. The XML will be immediately parsed and if an error occurs and exception will be thrown and my script will exit.

If no error occurred, I then loop through every Twitter status and print the date of the tweet and the contents. I do this by defining a block that takes a status element, finds the created_at and text child elements and prints their text properties with a dash between them.

Believe it or not, that’s actually all there is to it. If we run this on Switch On The Code’s Twitter feed, we’d get something that looks like this:

Tue Feb 22 20:29:16 +0000 2011 – Redis 2.2 Released http://t.co/K5sJV0N via Redis #storedb

Tue Feb 22 19:50:55 +0000 2011 – Disqus: Scaling The World’s Largest Django Application http://t.co/1iJkWGh via @ontwiik #python #django #programming

Tue Feb 22 16:23:48 +0000 2011 – Kinect for Windows SDK to Arrive Spring 2011 http://sotc.me/65353 (via Microsoft) #kinect

Mon Feb 14 20:49:58 +0000 2011 – A Critical Deep Dive into the WPF Rendering System http://sotc.me/18327 (via Jer’s Hacks) #wpf

Sun Feb 13 23:00:44 +0000 2011 – 500,000 Strong http://sotc.me/32603

Sun Feb 13 22:56:18 +0000 2011 – jQuery Tutorial – Creating an Autocomplete Input Textbox http://sotc.me/29035 #jQuery #tutorial

Generally I found using Ruby to be an enjoyable experience. The issue I’m having currently is managing the shear number of ways to do the same thing – especially when it came to iterating and printing the contents of the tweets. I found the documentation to be adequate, but had difficulties knowing when an exception will occur versus when an error code or bool will be returned. All of these things come with experience, which hopefully I’ll gain over time. My next challenge is to pick a UI framework and begin making graphical apps. If you’ve read the example code and have some better approaches, feel free to leave them below.

Leave a Reply

Your email address will not be published. Required fields are marked *