153 Characters versus Slim Scrooge
Jason, in the comments on my last post, pointed out that I should look at Slim Scrooge. He's actually both right and wrong and, to me, its a case of context. For this application I've moved from a development context to a production context. In development, I have no problems using tools (and I've looked at Slim Scrooge recently actually) and I do. Regularly. But this is now live and I've been dealing with LOTS of memory issues. And I mean LOTS of them -- yes, there will be more posts on this topic. And my concern around Slim Scrooge is just that I don't understand it. Yes it might magically make everything better but might not. I opted to go with what I firmly understood even tho it might have taken me more time than a plugin. Now, if I had noticed this, 4 weeks earlier in the cycle, before we had gone live, I would have been all over Slim Scrooge. For now at least, its :select to my personal rescue.
But thank you Jason. Slim Scrooge seems to be an excellent option and you were 100% right and on the money to point it out.
153 Characters to Save 700 Megabytes of RAM
I recently had to run a Rake task on a Rails app which generated a CSV file from a table containing 94,000 odd rows (94,142 to be pedantically specific) and while running it, something I normally do overnight, I noticed my laptopfreeze solid. After minor cursing, when the machine became responsive again, I checked Activity Monitor to find this:

Yep. That's right -- a single Ruby process using 952 megs of RAM. Oy. And, just to be safe, I confirmed that the Ruby process chewing 952 megabytes was in fact the Rake task. And, unfortunately it was.
So this brings up the question of of what exactly the table looks like. Now since this is a customer's application, I can't give the exact field names but here are the datatypes in the columns:
+---------------------+
| Type |
+---------------------+
| bigint(20) |
| int(11) |
| varchar(255) |
| varchar(255) |
| varchar(255) |
| text |
| varchar(255) |
| varchar(255) |
| varchar(32) |
| int(10) unsigned |
| int(11) |
| int(11) |
| double |
| double |
| tinyint(3) unsigned |
| tinyint(3) unsigned |
| tinyint(3) unsigned |
| datetime |
| datetime |
| int(10) unsigned |
| int(11) |
| int(11) |
| double |
| double |
| double |
| double |
| int(10) unsigned |
| int(11) |
| int(11) |
| double |
| double |
| double |
| tinyint(1) |
| tinyint(1) |
| int(11) |
| int(11) |
| int(11) |
| float |
| float |
| int(11) |
| int(11) |
| int(11) |
| float |
| float |
| float |
| float |
| int(11) |
| int(11) |
| int(11) |
| float |
| float |
| float |
| tinyint(1) |
+---------------------+
Yes its a big table but its not monstrous. Two thoughts came to mind to reduce the memory used by this:
- Fetch the objects one by one by incrementing the id value. This works but its slow and since its not solely an AutoIncrement column but instead a BigInt supplied from an external data source, this won't work well at all.
- Fetch less data
Being an old school database person, when I first came into the Rails world, I was initially dismayed by the prevalence of "SELECT * FROM table". I've seen all too many times that overly large fetches have a performance cost but, as I worked with Rails, I simply grew accustomed to it. An interesting blog post I found recently, Five ActiveRecord Tips, pointed out the :select parameter which I had never seen.
The idea behind :select is that you supply a SQL string which represents the attributes of the objects you want to fetch (or columns in the row if you're me and old school). Let's say that you want to get only the id of the object, the created_at and the updated_at in a table called apps then your :select would look like this:
:select => "apps.id, " + "apps.created_at, " + "apps.updated_at "
And that will be injected into your query by ActiveRecord so that only those 3 attributes will be retrieved per object.
Now here is the before and after of the magic 153 characters that saved 731 megabytes:
Before:
@apps = App.find(:all, :order => 'id ASC')
After:
@apps = App.find(:all, :select => "apps.id, " +
"apps.developer_id, " +
"apps.display_name, " +
"apps.canvas_name, " +
"apps.url, " +
"apps.description, " +
"apps.api_key ", :order => 'id ASC')
Keep in mind that standard SQL syntax matters so you have to use commas between the elements and no comma after the last element. Spaces also are important.
Now given that this is Ruby, we can make this a bit cleaner. Here's a first pass at that:
@apps = App.find(:all,
:select => %w(apps.id apps.developer_id apps.display_name apps.canvas_name apps.url apps.description apps.api_key).join(', '),
:order => 'id ASC',
:limit => 10)
Now given that this is a single table query, we can eliminate the 'apps.' entirely:
@apps = App.find(:all,
:select => %w(id developer_id display_name canvas_name url description api_key).join(', '),
:order => 'id ASC',
:limit => 10)
So here's our memory utilization after:
![]()
Now I'd argue that no matter how hard core a Ruby / Rails person you are, trading off 153 characters for 700 megabytes is a hell of a savings.
Implementing Your Own Caching Layer
I recently had to deal with performance problems in a very large application with a considerable number of SQL queries (i.e. object.find_by_sql or object.paginate_by_sql). And while we can argue whether or not using sql directly in an ActiveRecord context is good, some of these were complex enough (think sum operations, etc) that I didn't want to go and rewrite them as ActiveRecord. And, given a table that is being changed constantly by a crawler, the MySQL query cache wasn't an option*.
So I started things off, as I do so often, by talking to a buddy and discussing the issues. Oddly, he argued against using the built-in Rails caching tools and for doing it myself. Now this is unusual to say the least. Normally he always argues for the built-in frameworks but he and I have had issues in the past around caching and, in particular, cache expiration. So after that discussion, I came to a conceptual approach of this:
- Use an ActiveRecord model to start the data
- Use created_at as a tool to manage the cache expiration
- Serialize the data after fetch to store it away
- Write a get_latest method inside the model to test whether or not to fetch the data from the cache or the source
The first real problem came from needing to deal with not just straight ActiveRecord (AR) objects but will_paginate collections that wrap around the AR objects. Here's something brilliant about ActiveRecord, irrelevant for me, but brilliant:
serialize :data
If you put that at the top of your AR model file then that element of the model will be automatically serialized IN and OUT of the database. Outstanding -- but it kept failing for me. Why? Because I had will_paginate collections over the top of the AR objects. Oy. So now that that wasn't working, at all, I turned to Google and I did some research via a great Skorks article. Apparently you can serialize in Rails, automatically, via YAML or by using the marshal command. The benefit to using marshal is its binary which means its smokingly fast. Or at least as fast as anything in Rails is.
So I tried wrapping my data element like this:
Marshal.dump(res)
to store it (res was the result of the database operation).
and
Marshal.load(cache_result.data)
And no matter what I did, it just plain failed. So the normal walk away from the computer and ponder deeply while I wander the halls of my home looking contemplatively** around while I cogitate made me realize this: IT IS BINARY BUTTHEAD!
MySQL doesn't store binary data by default so this would require a migration change and a db:migrate:redo. So a quick dash back to the migration and I ended up with this:
class CreateQueryResultsCaches < ActiveRecord::Migration
def self.up
create_table :query_results_caches do |t|
t.string :q_hash
t.text :q
t.column :data, :binary, :limit => 10.megabyte
t.timestamps
end
add_index :query_results_caches, :q_hash
end
def self.down
remove_index :query_results_caches, :q_hash
drop_table :query_results_caches
end
end
Useful reference on creating blobs via migrations.
And that actually worked! If you notice the q_hash column, you may be wondering what that is. Given that my queries are long, its faster to hash the query and then use that hash for the lookup instead of trying to look up on a query that's 500 bytes or longer. Now there's only a few more bits to share.
The routine which evaluates the cache result:
def self.get_latest(q_hash)
latest = self.find(:first, :conditions => {:q_hash => q_hash}, :order => "created_at DESC")
# if within last 10 minutes then return else run the real query and store results
if latest && latest.created_at.between?(20.minutes.ago,Time.now)
return latest
else
nil
end
end
The two methods on the QueryCacheResult object for fetching from the cache and/or populating the cache with and without pagination:
def self.cache_it_or_create_it_by_sql_with_pagination(obj,q,page)
if page
q_hash = Digest::SHA1.hexdigest(q + page).to_s
else
q_hash = Digest::SHA1.hexdigest(q).to_s
end
cache_result = self.get_latest(q_hash)
if cache_result.nil?
res = obj.paginate_by_sql(q, :page => page, :per_page => 40)
QueryResultsCache.create(:q_hash => q_hash, :data => Marshal.dump(res), :q => q)
else
res = Marshal.load(cache_result.data)
end
res
end
And...
def self.cache_it_or_create_it_by_sql(obj,q)
q_hash = Digest::SHA1.hexdigest(q).to_s
cache_result = self.get_latest(q_hash)
if cache_result.nil?
res = obj.find_by_sql(q)
QueryResultsCache.create(:q_hash => q_hash, :data => Marshal.dump(res), :q => q)
else
res = Marshal.load(cache_result.data)
end
res
end
As a final note, here's an example how this is used from a controller:
@apps = QueryResultsCache.cache_it_or_create_it_by_sql_with_pagination(App,q,params[:page])
Clearly there's more that can be done here but when you find that the built-in Rails caching mechanisms aren't working for you --or-- you feel that stepping out of the framework will teach you something, implementing your own caching approach isn't all that difficult. Learn to use one of the serialization tools and you're off to the races!
*As an aside, I'd point out that the MySQL query cache just ain't all that great but that's another story for another day.
**Ok I went to the can.
In Praise of Rails Machine
It isn't often that you can pull a near 48 hour day dealing with deploying a new code base into production and come out of it with not only a smile but a blog post praising your hosting company. I just finished such an endeavor (ordeal??) and it was the typical sort of experience where you need to do constant server tuning, rebooting to get around load issues, issues with crawlers flooding your site and thus fine grained ActiveRecord optimizations, etc.
Now the site in question is hosted at Rails Machine and, to paraphrase, I come not to bury them but to praise them. During the course of this project, Rails Machine has been:
- Nothing but professional
- They turned me onto their new Moonshine project which makes server tuning magnificently easy (imagine setting your Passenger configuration options from within Ruby code)
- They didn't bat an eye at multiple reboots even late on a Saturday night
- When we had major problems, they replaced our server without being asked; all I had to do was point out that the issue had happened twice and *whammo* new server
And then, to add icing to a delicious cake, I just got this email:
Hi Scott -
I just wanted to check in and see if you were able to get this to work. Please let me know if you need any further help!
--Ahesan
Ahesan, you magnificent bastard! Thank you for sending this. I can't tell you how good it made me feel to get this at 3:24 am.
So here's my personal bottom line for hosting Rails apps. If you need quality, reliable hosting for Rails with outstanding support then you need to run, not walk and sign up with them. I won't tell you its cheap -- it isn't but Rails Machine knows their stuff and does an outstanding job. Overall I've been more impressed with Rails Machine than any other Rails hosting company I've worked with.
Note 1: I've been a Rails Machine customer for over 3 years now and they've kept every one of the Rails apps I've been a lead developer on running like a champ but their customer service over this past weekend was well over the top. Thanks guys. Appreciated.
Note 2: Since I wrote this up in the bowels of a Sunday night 3 am debugging session, I've since had follow up from others at Rails machine including Josh and Will. Both took the time to go above and beyond just as Ahesan did. Thanks guys. Will in particular cobbled together an excellent suggestion showing how to use MoonShine to do something not explicitly supported yet
Automatic Color Coding for script/console and irb
A friend recently turned me onto Awesome Print, a pp replacement:
Michael Dvorkin's awesome_print on GitHub
What Awesome Print does is automatically color code your script/console and irb output when you use it as a pp replacement. What gets even more trick though, nay awesome, is triggering it automatically so all output is color coded. Here's the magic.
Edit your .irbrc file in ~ and add the following lines:
begin
require "ap"
IRB::Irb.class_eval do
def output_value
ap @context.last_value
end
end
rescue LoadError => e
puts "ap gem not found. Try typing 'gem install awesome_print' to get super-fancy output."
end
You may also need to add this line at the top of the file:
require 'rubygems'
if your .irbrc file does not already contain it. Mine did, which is why I omitted it above.
Now the results of anything in script/console will have Awesome Print used automatically:
ActiveRecord vs. Core Data
With the release of ArrGeeBee and the development of my first iPad application*, I recently found myself delving into a number of Objective-C technologies, including Core Data. Given that I'm coming from a Ruby / Rails background, I found this really, really interesting since both Rails's ActiveRecord and CoreData are oriented around Object Relational Mappers (ORMs). Let's consider a simple model in Ruby called Event that is backed by ActiveRecord:
class Event < ActiveRecord::Base
end
Here's a simple example of using the an ActiveRecord finder to return all entries, sorted by creation date descending (using the newer style syntax):
events = Event.all(:order => "created_at DESC")
When you're learning a new platform, one of the first things you do is look at the new platform from the context of the old so my concern was how to write this in Objective C using its native ORM.
The basic structure of retrieving data from Core Data is bundled in a fetch request which specifies an entity from the Core Data stack (the Event), a predicate to limit the number of results returned (similar to the :conditions => {...} hash in ActiveRecord), and a sort descriptor to place them into some sort of order. Since my example above omitted any conditions, I'll do the same here and leave out the predicate for this request.
// Create the Fetch request and scope it to the entity we're retrieving (in this case, an event).
NSFetchRequest *request = [[NSFetchRequest alloc] init];
NSEntityDescription *entity = [NSEntityDescription entityForName:@"Event" inManagedObjectContext:managedObjectContext];
[request setEntity:entity];
// Create a sort descriptor to order these events by their creation date in descending order.
NSSortDescriptor *sortDescriptor = [[NSSortDescriptor alloc] initWithKey:@"creationDate" ascending:NO];
NSArray *sortDescriptors = [[NSArray alloc] initWithObjects:sortDescriptor, nil];
[request setSortDescriptors:sortDescriptors];
[sortDescriptors release];
[sortDescriptor release];
// Set up a pointer to hold an error, in case the fetch request fails.
NSError *error;
NSMutableArray *mutableFetchResults = [[managedObjectContext executeFetchRequest:request error:&error] mutableCopy];
if (mutableFetchResults == nil) {
// Some amount of error handling here.
}
// Finally, assign the result set into a controller variable called eventsArray, then clean up the last of the objects we were holding in memory.
[self setEventsArray:mutableFetchResults];
[mutableFetchResults release];
[request release];
That's about a 1:20 ratio of lines of code and while its never all about lines of code, this one example very dramatically spelled out to me a key difference between Objective C and Ruby and pointed out to me just how easy Ruby makes things. I think, all too often, we overlook the important role that metaprogramming has played in the Ruby language, and in Ruby's most popular frameworks. Things like object introspection have freed us from worrying about a lot of the little details around everyday tasks, such as retrieving records from a data store.
Now, that said, I've got my first app in the App Store, I've bought my first iPad and I'm barreling down this path. Onward!
* Coming real soon now. Interested? Drop an email to info@alloycode.com and I'll drop you on a mailing list. All I can tell you about it for now is that ... (Actually skip it -- just sign up for the mailing list.)
Fun with RMagick
As an avid photographer, I spend a lot of time tweaking my images to share them online. I use Adobe Lightroom for all my cataloging needs, but when it comes time to display the pictures on the internet, I like to add a bit of copyright information to try and ward off the most casual attempts to reuse my images without giving credit.
This is the sort of task that RMagick is incredibly well-suited to handle. Plus, it indulges my geeky side to manipulate images using Ruby code. Here's a snippet from my copyright script:
require 'rubygems'
require "RMagick"
include Magick
# Picture Section
picture = Magick::Image.read(ARGV[0]).first
width,height = picture.columns, picture.rows
# Overlay Section
overlay = Magick::Image.new(width, 20) {
self.background_color = "rgba(0,0,0,0.6)"
}
# Combine them!
picture.composite!(overlay, SouthGravity, MultiplyCompositeOp)
# Text Section
copyright = Draw.new
copyright.fill('white')
copyright.fill_opacity(0.75)
copyright.pointsize(14)
copyright.font_family('Helvetica')
copyright.font_weight(LighterWeight)
copyright.font_style(Magick::ItalicStyle)
copyright.text_align(RightAlign)
# Place the copyright onto the composite image
copyright.text(width - 5,height - 5,"© 2009 Jared Haworth")
copyright.draw(picture)
out = ARGV[0].sub(/\./, "-final.")
puts "Writing #{out}"
picture.write(out)
The code is fairly straightforward;
- I start by loading the image and storing the width and height.
- Next, I create a black overlay, with 60% opacity, the width of the original image and 20 pixels high.
- I apply the overlay to the original picture, using SouthGravity to place it centered on the bottom of the image.
- Then, I create my drawing object, set the fill color, opacity, and font options.
- I use the drawing object to place the text on the composite image, setting the baseline for the text 5 pixels above the bottom of the image, and setting the end of the text (right justified) 5 pixels in from the right border of the image.
- Finally, I append the text "-final" to the filename, just before the extension, and save it back to the filesystem.
The end result looks like this:

The RMagick documentation is a great help, and has tons of examples. I've used variations on this same code to create wet floor effects, drop shadows, and those neat, Polaroid-style images with the curly borders. And since it's all Ruby code, you can use the File and Dir objects to iterate over an entire directory of images, performing drop dead simple batch processing.
Rails 2.1 Now Available
Coinciding nicely with the conclusion of Railsconf 2008, it would appear that Rails 2.1 has been released.
The feature I'm most excited about? Native Timezone support. June 2008 marks my two year anniversary of working with Rails, and of all the projects I've taken on, the majority have required working with time zones. This is a most welcome change indeed.
I'd just like to add my voice to the chorus saying 'Great Job!' to the Rails core team and the 1,400 other contributors that have all added to Rails in the past six months.
IRb Command History in Time Machine
Of the 300 new features in Leopard (Mac OS 10.5), Time Machine may be the one that excites me the most. I've already used to to repair applications after failed updates, pull back browser history items from two months ago, and restore emails that I had inadvertently flushed.
Yesterday, I found a new use for Time Machine: reviving IRb command history.
Over at eduFire, we've been working on converting user profile pictures from file system store to S3 backed storage. Since we're using the attachment_fu plugin, it seemed that the easiest way would be to just create a new instance of the user profile picture and feed in the path to the file.
While using attachment_fu in a controller is beautifully simple, requiring just an uploaded_data parameter, manipulating files through script/console is a bit more difficult. I knew we had wrestled with this back in February, when we changed the default thumbnail resolutions, and had to reprocess everyone's photos. What I had failed to do back then was document how we actually achieved it.
I have my IRb history logging to ~/.irb_history, but with only 500 lines of scrollback, I knew that there was no chance an entry from three months ago had survived. My first attempt to view the file in Time Machine was unsuccessful, as I couldn't seem to figure out how to get Time Machine to display dot-files.
However, the Time Machine archive is browsable in Terminal; after finding the path to the relevant backup (which looked something like this: /Volumes/Time Machine Drive/Backups.backupdb/AlloyCode/2008-03-03-000650/Macintosh HD/Users/jared), I was able to open the .irb_history file in TextMate and find the command I had lost.
For those who might need to do a similar operation in the future, here's how to recreate a file attachment in script/console:
Avatar.create(:filename => filename, :content_type => content_type, :temp_path => temp_path)
In the above example, filename corresponds to the name to save the attachment as on the server, content_type is the MIME type, and temp_path is the full path to the actual asset.
Another Passenger on the mod_rails
A lot has been said about running Rails applications on shared hosting, most visibly in these two articles by David Heinemeier Hansson and Dallas Kashuba (on the DreamHost blog). For the past eight months, Alloy Code and Your Garage Online have been running on a single 256mb slice from Slicehost, and while it was possible to get the two sites to happily co-exist, I've always regarded it as something of a delicate house of cards, just waiting for a gust of wind or slammed door in a neighboring apartment to knock the whole stack down. Well, two weeks ago, I made a serious change in back-end configuration, and I couldn't be happier. Indulge me for a moment, because I feel that a little history is appropriate…
Initially, the slice was configured with Apache 2.2 acting as a proxy for two mongrel clusters. The resources of the slice itself allowed me to run two mongrels for the blog, and three for Your Garage Online. Since the majority of the blog's content is cached static pages, two mongrels seemed like plenty, and only suffered a slight delay in attempting to access the admin interface, or other dynamic content.
Then I came across Ezra's article on nginx, and I invested the better part of a weekend switching from a pure Apache/Mongrel setup, to something of a strange hybrid. Since nginx wouldn't honor SVN webdav connections properly, I had to keep an apache instance handy, but restricted to listening on a high-numbered port, with nginx forwarding requests intended for my SVN repositories back into Apache. Meanwhile, nginx had two other listeners set up, one forwarding Alloy Code traffic to the blog's mongrel cluster, and one forwarding Your Garage Online traffic to the other mongrel cluster.
A few months ago, I heard about Thin while listening to the Rails Envy Podcast, and the idea of using Unix sockets instead of TCP to forward the proxy requests really appealed to me. Unfortunately, even though I had upgraded my slice's copy of Ruby to 1.8.6, I was never able to get the Thin gem to install. Later, I discovered it was because my RubyGems installation was linked against the older, 1.8.5 version of the ruby binary. I never did find a good way to switch the gem command's installed ruby version, I had to reinstall RubyGems from scratch using the 1.8.6 binary to call the setup.rb file.
So, when Passenger came out, the configuration-tweaker in me was very excited to give it a try. Once again, I set aside the better part of a weekend to get the installation going. After sorting out the gems issue above, and recompiling Apache to include prefork support, I was ready to roll.
The installation instructions provided were enough to get me 90% of the way there. I had overlooked the fact that, eight months ago, I told Apache to only listen for connections on port 8010 (part of the nginx-svn debacle). And, it turns out that when running two rails applications on a single host, with a single IP address, I needed to provide a little extra context to make certain the static content was handled properly.
Since the hostname of the box itself is alloycode.com, the setup for the blog is as sparse as the Passenger sample:
<VirtualHost *:80>
ServerName alloycode.com
DocumentRoot /path/to/blog/public
</VirtualHost>
Setting up Your Garage Online was a little tricker. My first attempt was just to mirror the same code as above, but with the proper ServerName and DocumentRoot settings. Unfortunately, that meant that while the Rails stack did load, and process the requests properly, none of the stylesheets, images, javascrtips, or other static assets could be loaded. After carefully investigating each of the options in the Passenger documentation, I managed to put together this VirtualHost definition that seemed to do the trick:
<VirtualHost *:80>
ServerName yourgarageonline.com
ServerAlias www.yourgarageonline.com
DocumentRoot /path/to/ygo/public
<Directory "/path/to/ygo/public">
Options FollowSymLinks
AllowOverride None
Order allow,deny
Allow from all
</Directory>
RailsBaseURI /
</VirtualHost>
And like that, it was as if a switch was flipped, and everything was working 100%. It's now been two weeks since I put Passenger on that slice, and I haven't had any outages or runaway memory problems, and I no longer have to worry about my web server and my mongrel cluster getting out of sync, resulting in dreaded 503 Service Unavailable errors.
Kudos to the Phusion folks, this is one incredible release.
Heroku Hijinks
Heroku, a Y Combinator funded startup, has been making headlines lately! Mentions on the official Ruby on Rails blog, the Ruby on Rails Podcast, and even an article on TechCrunch outlining some of the upcoming features. Personally, I've been using it since late last year, and I've found it to be immensely useful in a couple of situations.
Scenario 1: While interviewing for my current job, I wanted to show my interviewers the application Keith and I had developed for the Rails Rumble. Thanks to some trademark issues and a certain corporation whose devices have become synonymous for copiers, the document collaboration application didn't have an online presence. Rather than attempt to put it back online with my current shared hosting (already stretched to the limit by this blog and the YGO application) or secure new shared hosting, instead I prepared to zip up the entire application and upload it to my Heroku account.
- The first thing I had to do was unpack the gems used by the application, such as
attachment_fuinto my /vendor directory, and make sure I added them to myconfig/environment.rbload path. - Secondly, I removed all my existing log files... no need to push an additional 300mb of data online!
After zipping the app and uploading it, I just needed to click the 'migrate' button inside the Heroku editor window, and I was in business. By virtue of putting the app online in a private state, I could ensure that only my interviewers were able to see it, and I was able to take it back offline once the interview process had concluded.
Scenario 2: In doing some freelance work on a contract tracking app, I wanted to show my client a live preview of the progress I was making, and to get some early feedback on some of the workflow. Again, rather than expose the whole application live through my existing hosting, I again created a zipped version of the application and added it to Heroku.
This time, I was able to show him the current 'snapshot' of the application, and instantly incorporate minor tweaks without even needing to reload a mongrel instance or re-upload any files. Small details, like the text on a form label or the size of an element could easily be adjusted on the fly, in accordance with the client's wishes; then, those changes were easily mirrored back down to my local copy to continue development.
All in all, I'm very impressed with Heroku. While I don't make extensive use of the in-browser editor, the few times I've had to make tweaks to a running instance, it's been very easy to do so right in the browser. Plus, having a browser based interface to my favorite ruby tool, script/console, has been incredibly useful.
Lastly, I think Heroku illustrates an excellent usage of Amazon Web Services as part of a business model, particularly in being able to make using EC2 easily available to the end user, along with an easy way to pass along the associated costs to that user. I'll be very excited to see where the future takes them.
Learning RSpec, Part II
UPDATED: April 22nd, 2010
I've been notified by one of my eagle-eyed readers that the solution originally posted at the end of this article is no longer correct. For whatever reason, RSpec no longer returns "Status" as one of the keys for the header.
The new, improved syntax is as follows:
it "should fail with invalid credentials" do
get :index
response.should_not be_success
response.status.should =~ /401/
end
Original Entry Follows
In the first entry, one of the issues I addressed was stubbing a response to an HTTP-Basic Authentication scheme. To get my spec to pass, I stubbed the entire authenticate method which I had written. While this approach was successful, it left me with a hole in my code coverage:

Now that I knew my controllers' actions were being tested, I needed a good way to test the authenticate method itself, to ensure that failed requests were not being let through. A quick glance in the Rails source gave me some clues to implementation.
Since the credentials are sent (encoded) in the HTTP header, the first thing I needed to do was inject those credentials into the header before the request was processed. Rails core (using Test::Unit) uses the following convenience method in /actionpack/test/controller/http_authentication_test.rb
def set_headers(value = @credentials, name = 'HTTP_AUTHORIZATION')
@controller.request.env[name] = value
end
The encoded credentials are passed in and assigned to the request environment. Accomplishing this in RSpec wasn't much different. First, we construct a set of encoded credentials, then we assign them to the environment variable 'HTTP_AUTHORIZATION'. Note that in the RSpec code below, we don't call @controller.request.env, but simply request.env
before(:each) do
@credentials = ActionController::HttpAuthentication::Basic.encode_credentials("david", "clearly_false")
request.env['HTTP_AUTHORIZATION'] = @credentials
end
This allowed me to create a series of tests to check that a user providing invalid credentials was not being allowed into the protected areas of the site.
it "should fail with invalid credentials" do
get :index
response.should_not be_success
response.headers["Status"].should =~ /401/
end
Two fairly simple expectations, when attempting to access a resource using invalid credentials, the response should not be a success, and the status code should match 401 (Unauthorized).
RSpec is definitely growing on me, and I've noticed some BDD disciplines starting to creep into my approach to testing with Test::Unit in my day job.
Learning RSpec
The end of 2007 has brought about some exciting changes for me, most notably a new job with Education Revolution. To try and ease the transition during a stressful holiday season, I gave myself a week off between jobs, which left me with a bit of unexpected time on my hands between Christmas and New Year's. I decided to make the most of my time off and try to learn RSpec.
RSpec is quickly becoming a darling among some of the visionaries in the Rails world. With the release of RSpec 1.1, which brings easy integration between Test::Unit style testing and Behavior Driven Development using RSpec, it seemed prudent to try it on myself and see how it fits.
On the whole I really like the idea of specifying behaviors instead of assertions, especially when coupled with doing some design-driven development. As a developer, having both a clear set of expected behaviors and a set of slides which show the application in a near-finished state, it removes a lot of the guesswork which had been plaguing me on previous projects. Also, hearing Adam Williams and John Long talk at November's Raleigh.rb meetup about doing top-down testing (starting with integration tests, then drilling down to functional and unit tests merely to handle edge cases) has turned my opinion of Integration tests on it's head.
Having come from a metaprogramming-driven Test::Unit background using Mike Clark's TestRig framework, RSpec felt like a lot more testing code. And indeed, my rake stats seems to support that, showing a 1:2.8 ratio for my training project. Finding help with RSpec has been tricky, though. There are some really great contrived examples on the RSpec homepage, but I had trouble finding more information on how to deal with presenters, nested resources, and HTTP Basic authentication. I want to show off two solutions that I cobbled together out of solutions found online.
The first problem I encountered was in dealing with the save! method in ActiveRecord. My controllers typically use the save! coupled with a rescue statement for ActiveRecord::RecordInvalid, like so:
class WidgetsController < ApplicationController
def create
@widget = Widget.new(params[:widget])
@widget.save!
respond_to do |format|
format.html { redirect_to @widget }
end
rescue ActiveRecord::RecordInvalid => e
respond_to do |format|
format.html { render :action => 'edit' }
end
end
end
Most of the RSpec examples I had come across show something like this (excerpted from Testing Controllers with RSpec):
def do_create
post :create, :menu_item=>{:name=>"value"}
end
it "should save the menu item" do
@menu_item.should_receive(:save).and_return(false)
do_create
end
The problem being, using save! doesn't return false on failure, it raises an exception. Fortunately, the answer was available on the RSpec-users mailing list. Now my widet_controller_spec.rb contains the following:
it "should fail to save the widget" do
@widget.should_receive(:save!).and_raise(ActiveRecord::RecordInvalid.new(@widget))
end
The second major stoppage I encountered was dealing with HTTP Basic authentication. The application I was building didn't require a huge complicated account/password structure, it just needed a few protected pages available to a single administrator.
class ApplicationController < ActionController::Base
before_filter :authenticate
def authenticate
authenticate_or_request_with_http_basic do |username, password|
username == 'jared' && password == 'secret'
end
end
end
Suddenly, all my controller specs for actions lying behind that authenticate filter were failing. The fix lies in stubbing out the method using the controller local variable in the spec.
describe WidgetsController do
describe "with successful admin login" do
before(:each) do
controller.stub!(:authenticate).and_return(true)
end
...
end
end
Of course, I still need to come back and write an integration test which will address both the success and failure states of the authenticate method.
All in all, I'm very impressed with RSpec, and I can see why it's picking up such a following. I'm definitely going to play around with it further, but I'm not quite ready to say that I'm going to switch all my projects over; one important factor to consider is future code maintainability. The pool of talented Rails developers is small enough to begin with, adding the further requirement of finding a Rails developer who is also versed in RSpec limits that result set even further.
Advanced YAML Fixtures Gotchas
I was very excited to use the new Advanced YAML Fixtures introduced in Rails 2.0, so I set about updating my code in an Edge Rails application I've been working on for the past year or so. There are a few caveats in using the new fixture styles, though.
The Fixtures are introspective, so if you've overridden the default name of an association in your model, you'll need to use that new name in your fixture. With the following code,
class Worker < ActiveRecord::Base
belongs_to :status, :class_name => 'WorkerStatus', :foreign_key => 'worker_status_id'
...
end
The key for worker_status: should actually just be status: in the workers.yml fixture file.
The larger problem I encountered was in using a mix of IDs and fixture record names. I use a testing framework, TestRig, which was written by Mike Clark and Dave Thomas for the Pragmatic Studio. The TestRig framework takes in ID parameters of fixtures in the database, and loads them accordingly. This falls apart with the new status: full_time style of declaring relationships in fixtures, because the former relies on a 'known' ID, and the new style generates a fairly random ID instead. This leads to a bunch of broken relationships throughout the test suite.
According to the API documentation, the generated ID is constant, so I could discover that ID and use it in the TestRig calls which require an integer key, but then I've essentially lost all benefit of using the Advanced YAML Fixtures, and added some crazy complexity to my tests as well.
De-ASCII Your Rails Logs
Recently, I've been delving into some filter-related problems in one of my Rails applications. This has required me to trap and review specific segments of my Rails log files. As part of my initial stack setup when I begin a new project, I install the Query Analyzer and Query Trace plugins.
The upside is I get very detailed trace information for my application, pictured below:
The downside is, when these files are viewed with Textmate or Console, they wind up looking more like this:
Rendered events/_event (0.00036)
[4;35;1mSlot Load (0.000361)[0m [0mSELECT * FROM 'slots' WHERE (slots.event_id = 23) [0m
[35;2mvendor/plugins/query_analyzer/lib/query_analyzer.rb:38:in 'select'[0m
[35;2mlib/association_extensions/chronological.rb:9:in 'first'[0m
[35;2mapp/models/event.rb:109:in 'begin_at'[0m
[35;2mapp/views/events/_event.html.erb:6:in '_run_erb_47app47views47events47_event46html46erb'[0m
[35;2mapp/views/events/index.html.erb:39:in '_run_erb_47app47views47events47index46html46erb'[0m
Today's "stretch the brain" exercise centered around creating a Textmate command to clear out this unnecessary cruft. For starters, I cloned the Text bundle's "Remove Unprintable Characters in Document / Selection" command. It looks like the Textmate folks are using Perl to remove unprintable characters:
perl -pe 's/[^\t\n\x20-\xFF]|\x7F|\xC2[\x80-\x9F]//g'
I left all the settings unchanged:
Save: Nothing
Input: Selected Text or Document
Output: Replace Selected Text
And edited the Perl regular expression to match each of the three possible variants of ASCII instruction:
perl -pe 's/\[\d;?(\d+)?;?(\d)?m//g'
Success! Now, with one keystroke, I can remove all that extra information from my log file, and get down to business.
