Archive for the “Technology Guides” Category

I feel like I’m in the homestretch of my migration off of my current hosting provider – FullControl. Nothing against these guys; they’ve been an absolute stellar service provider. I just don’t need the dedicated virtual server I was paying for with them. It’s a short story that came down to rightsizing my hosting provider to align with my current needs. I’ll tell the somewhat longer version of the story in this blog post though since there are a couple of interesting corollaries along the way.

I have three requirements of a hosting provider. Once they could fulfill these 3 requirements, I am looking to optimize on costs. My three requirements are:

  • Host WordPress blogs.
  • Provide Subversion source control services.
  • Support OSQA, which essentially means running Python and Django

Both my ex-provider and my new provider met these three requirements – my ex-provider at the high cost side and my new provider at the low cost side. At both ends of the price extremes, they still have similar architectures though – a single server that can host PHP, Python and MySQL. Despite the fact that one is Windows and one is Linux, they’re still both standard hosting stacks.

When I first started thinking about moving hosting providers, I considered some slightly more esoteric approaches, especially as they relate to blog hosting. I did a bit of probing and they all fell short in one area or another but are worth mentioning just due to the irregular architectures they embody.

  1. WordPress on Windows Azure. You can most certainly host WordPress on Windows Azure and SQL Server Azure. Zach Owens is an evangelist for Microsoft who is supporting this and blogs all about it. It sounds interesting but I get the sense that this is just some sort of Microsoft pet project and the floor on it could drop out at any time.
  2. WordPress on Amazon EC2 Micro Instances. I loved the ideas Ashley Schroder presented in his blog post on clustering WordPress on EC2 micro instances. His approach and experiences are worth reading about and will cause you to think about and investigate EC2 spot instance pricing structure, if nothing else.
  3. BlogEngine on EC2 Micro Instances using SQL Azure. A radical extension of Ashley’s ideas onto the Microsoft platform: host BlogEngine.NET on EC2 Micro Instances and talk to SQL Azure on the back end. This fell apart on BlogEngine’s architecture, which many posts indicate doesn’t scale out at all due to architectural limitations in the DAL and caching layers.

The more I thought about it, the more I just want a stack that just works for my personal web apps. As exciting as the above options were, they sounded like massive black holes that would suck in my free time. I ultimately decided to go with a simple solution: the tried-and-proven Dreamhost (http://www.dreamhost.com), a Linux provider, for my needs. I get what I need for less than $10 US per month and I can ramp up Amazon EC2 spot instances when I need a throw-away playground. The move over was a lot easier than I expected, consisting of the following three steps:

  1. Export WordPress content from my old provider wordpress site into my new site.
  2. Flip over DNS to point at my WordPress blog on my new provider’s site. This included flipping over DNS on all of my binary content (e.g. images) that I host on Amazon S3 and redirect to with a CNAME entry from a beckshome subdomain.
  3. Flip the switch on the DNS routing for Google Apps after I noticed my beckshome.com email dried up for a couple of days.

Comments No Comments »

I’m sure if I had a nickel for each time a software project was impacted by introducing production volume data into the testing life cycle either too late, or worse even – not at all, I’d be a rich man and wouldn’t be writing this blog entry. When you think about it, it’s really no wonder that we find ourselves in this situation. Developers new to the craft have no experience to draw on dealing with millions of rows of data. Experienced developers and DBAs often pass on war stories of hand crafted scripts and the perils of migrating data from production to lower environments, further reinforcing the belief that emulating production volumes of data is work restricted to the gods of IT.

Taking this trend a step further, applications are often exposed for the briefest of periods to data resembling production volume data during the test cycle. Even then, the data reflects yesterday’s production volumes and not next year’s volumes. Furthermore, testing of certain other functions is restricted because these functions either deal with new data where there is, as of yet, no production data or they deal with other external systems that have test data that’s out-of-sync with the data of the system under test.

Does preparing production volume data really need to be this difficult?
No!

This entry deals with the use of Red Gate Software’s SQL Data Generator product to generate a large volume of data for a simple test database. Why would one choose to use a product such as SQL Data Generator to generate test data instead of using alternate methods such as copying production data or creating custom data generation scripts? There are several reasons:

  1. Obfuscation of personally identifiable information (PII) from production data is a painful process. See point #2
  2. Syncing data between systems is a very painful process. More so if the obfuscated data from step #1 is used across systems
  3. By using production data, you’re only testing for current capacity, not for planned capacity 3, 6, or 12 months from now
  4. If the application or elements of the application are new, you may not have any relevant data to test with at all
  5. Writing custom data generation scripts is either (i) a one-off process that yields brittle scripts tied to a particular version of the schema; (ii) an exercise in re-inventing the wheel since commercial tools have already been built to do this.

Sample Data Model

Our data model is simple enough to be readily understandable while still presenting a couple of challenges that will illustrate some of the features of the SQL Data Generator tool. The data model serves as the backend for an online travelling site that collects and manages community-driven travel recommendations. Think about the book 1000 Places to See Before You Die as a Web2.0’ish site. Users can enter new tours / places, bundle similar or geographically close tours into tour packages, and provide user-specific tags for both tours and packages.

Generating Data - Data Model

There are several characteristics of this data model that are somewhat challenging and provide an opportunity to illustrate some of SQL Data Generator’s more advanced options. These features are:

  • Both the package and tour ids are unique identifiers (GUIDs). They are referenced by the ContributorId in the Tags table but there is no foreign key constraint. That is, a ContributionId is a GUID which may match up with either a tour id or a package id.
  • The sequence numbers within the TourPackages table represent the visual display order of the tours within the package. Therefore the sequence numbers cannot be random and must cycle through each of the tours in a package without repeating within that package.
  • The data generated for the model has to follow statistical distributions representative of the production environment, such as:
    • There should be 5 times as many tours as users with the number of tours per user following a normal statistical distribution between 1 and 10 tours.
    • The total number of packages should be 40% of the total number of tours. Tours distribution amongst the packages should be random
    • Total tags should be 60% of total tours. The vast majority of these tags (almost 10-to-1) should be attributed to tours. The remained are attributed to tour packages.

Basic SQL Data Generator Capabilities

Creating a project with SQL Data Generator is as easy as selecting the database you wish to generate data into.

Generating Production Data - Project Configuration

Once the project is created, SQL generator will infer information about the data based upon the column types and other characteristics. You can then review sample data and tweak the configuration options to meet your needs.

Generating Data - Column Generation Settings

Specifying valid values for the Tour table’s longitude column. Changes to the generator settings are immediately reflected in the sample data set, providing the opportunity to validate the impacts of the changes.

Generating Data - Previewing Generated Data

Intermediate SQL Data Generator Capabilities

Specifying the mechanism to determine how many rows to generate is made easy within SQL Data Generator. This enables the data to be generated in proportion to production ratios, as stipulated in our requirements.

Generating Data - Specifying Counts and Ratios

These same capabilities allow us to address the requirements around TourPackage sequence numbers by letting SQL Data Generator handle the generation of combinations within the TourId / Package Id composite key space.

Lastly, SQL Data Generator can use alternate generator sources, such as the output of SQL statements or direct input from a CSV file. In our case, this allows us to specify a SQL statement to pull the appropriate Ids from the Tour and Package tables for the reference Id values in the Tag table even though no explicit foreign key relationship is present.

Generating Data - Using a SQL Generator

Advanced SQL Data Generator Capabilities

Custom generators can be created for use with the SQL Data Generator. This enables domain specific data to be generated and the generators to be re-used across multiple projects. Custom generators are written in .NET code by implementing one of RedGate’s generator interfaces. Although this is not particularly difficult, it is beyond the scope of this post.

Generating Data

Once the generation options are specified in accordance with the requirements, the only thing left to do is generate the data. The data population action plan gives you an idea of what data will be going where.

Generating Data - Population Action Plan

Running the generation script against a local SQL Server Express installation on a small (one processor, 2 GB RAM) VMWare machine, SQL Data Generator was able to generate 420,000 records across 5 tables in less than 1 minute, yielding a total database size of about 400 MB.

Generating Data - Data Report

Other SQL Data Generator Capabilities

At this point in this blog entry, I’m hoping you’re at least starting to believe that data generation can be fast and easy. There are several other benefits to data generation with SQL Data Generator that weren’t covered here:

  • Project seamlessly incorporates changes to the underlying schema
  • The SQL Data Generator project file (extension “.sqlgen”) can be version controlled in conjunction with the scripts to create the database, providing the ability to create and fully populate current and historic versions of the database to align with application code changes.
  • If the seed numbers are not changed, the data generated is exactly the same across generations. If you need new / different data, change the seed number.

Related Links

Comments 2 Comments »

I was performing functional tests on my models that employed Attachment_Fu this morning and thought it would be worthwhile to share the code since it was a bit of a hassle pulling it together. Kudos to Mike Subelsky for his introduction to functional testing Attachment_Fu. It got me going in the right direction. What proved difficult once again was the multi-model controller. Once I got over that hump, I was on my way. As you can see from all the detail in the HTTP POST below, that was not an entirely easy task.

class ProductsControllerTest < Test::Unit::TestCase 
... 
def test_create_with_user     
num_products = Product.count     
imgdata = fixture_file_upload('/files/image.png', 'image/png')     
audiodata = fixture_file_upload('/files/sound.mp3', 'audio/mpeg')     
post :create, {:product => {
:name => "Widget",
:description => "A small tool-like item",
:weight => "3",
:price => "19.99",
:language_id => "1"
},
:image => {:uploaded_data => imgdata},
:audio => {:uploaded_data => audiodata} ,
:html => { :multipart => true }
},
{:user_id => users(:valid_active_user).id}
assert_response :redirect
assert_redirected_to :action => 'show'
assert_equal num_products + 1, Product.count
end
...
end

Comments No Comments »

Continuing my Rails on Windows thread, I’m going to spend a bit of time on something that’s brought me both some substantial gains and some minor woes lately, running the Attachment_Fu plugin on Windows. I’ll start off with some general Attachment_Fu information and then get into some of the quirks, which are, as expected, mostly specific to the Windows environment.

Attachment_Fu On Windows

First, for those not in the know, Attachment_Fu is a Rails plugin that allows you to store binary data (e.g. images, video, documents) and associate it with other models in your Rails application. Metadata (content type, size, height, width) about the attachment is stored in a separate model. Attachment_Fu’s sweet spot is handling images. It can handle automatic image conversion and thumbnailing using a number of popular image processors such as ImageScience, RMagick, or minmagick. Although not provided, you can imagine that Attachment_Fu might be extended to handle other types of binary processing utilities such as PDF converters or audio/video transcoding software. The other very cool thing about Attachment_Fu is that it provides support for pluggable persistence mechanisms. Out of the box, it allows for storage on the file system, as binary information in a database or on Amazon’s S3 storage service.

There is an abundance of information already written about Attachment_Fu so to avoid re-inventing the wheel, I’ll provide what I found to be the best sources of information to start.

  • Mike Clark’s tutorial is the gold standard introduction to using Attachment_Fu. The code is simplistic but rock solid. It covers using both the file system and S3 for storage and will get you up and running on Attachment_Fu in no time.
  • Some posts on the Attachment_Fu message board provide a solution to associating the attachment model with another model (i.e. making it an attachment to something). The posts provide both the controller and the view code for uploading the initial attachment and rendering it. Handling the attachment relationship in your MVC is going to be a fairly common requirement and most Attachment_Fu users will benefit from these posts.

For my part, I’m going to provide some controller source code for updating the attachment when you have a relationship with another model (an extension of the second item above) since this is one area that wasn’t covered well anywhere else and might save you some time in your travels. In the code below, my main model is the product and the image is the model where a photo and thumbnail are stored using Attachment_Fu.

class ProductController < ApplicationController

def update     @product = Product.find(params[:id])
# Load up product categories for the view
@all_categories = Category.find(:all, ::order=>"name_en")
if @product.update_attributes(params[:product])
if !params[:image][:uploaded_data].blank?
# My product only has one image / thumbnail, I'll destroy 'each'
# wait 3 # See quirk no.1 below
@product.images.each {|img| img.destroy}
@image = @product.images ||= Image.new
@image = @product.images.build(params[:image])
@image.save
end
flash[:notice] = 'Product was successfully updated.'
redirect_to :action => 'show', :id => @product
else
render :action => 'edit'
end
end

end

The links above, in combination with my snippet, should get you through creating an attachment and handling CRUD for an attachment and its parent model from a single view. Now comes the Windows quirkiness. Not knowing to expect these Attachment_Fu quirks and then having to root out the cause of the behavior took up a lot of time. It turns out that most of I found that most of the quirks are documented in some way, shape, or form. I’ve pulled together a list of the quirks as well as some best practice workarounds.

  • When running Attachment_Fu on Windows, the most commonly accounted problem is the “Size Is Not Included In List” validation error. It appears that no amount of fixing in the Ruby code is going to help here since it appears to be a Windows file system issue. The workaround is really simple, just add a wait x statement before your attachment processing and things will be golden. The x (which denotes seconds) time will vary based upon the size of the attachments you are processing. Bigger attachments require more of a wait. Also, be sure to comment this code out in production since this is a Windows only issue.

7/19/2007 Update – Rick suggested using RUBY_PLATFORM to determine if the wait should be invoked. I tested this and it worked as suggested

  • When you invoke the destroy method on your attachment using Attachment_Fu on Windows, your models reference to the attachment will be deleted but the physical attachments themselves will not be deleted if you have persisted them to the file system. If you look at the Attachment_Fu source code or your log files, you’ll see that Attachment_Fu assumes that you are using a UNIX-based system and executes UNIX commands like rm to remove these files. These commands will obviously not work in a Windows environment, leaving you with a bunch of zombie files. This should not be a problem if you use a database or S3 persistence mechanism since these mechanisms are independent of the operating system.

7/19/2007 Update – Rick corrected me. He is indeed calling the OS safe FileUtils.rm in the file system backend. It still isn’t working though – at least on my machine.

  • My last Windows specific quirk is actually an Internet Explorer issue. If your attachments are images, you may have problems with uploading JPEG’s using the default Attachment_Fu plugin. From what I’ve been able to determine, if you upload a JPEG from IE with a file extension of .JPEG, IE will set the MIME type to image/pjpeg for a progressive JPEG. However, if the image extension is simply .jpg, IE will set the MIME type to image/jpg. This MIME type, however, is not included in the default list of content types accepted by Attachment_Fu. My suggestion is to add this type to the list in the source code until Rick can get around to modifying the source.

7/19/2007 Update – The MIME type was added to source. For reference, Rick suggested that this could have been done without changing the source simply by adding
Technoweenie::AttachmentFu.content_types << ‘image/jpg

The last quirk for my post should be meaningful to all of those using Capistrano, the Rails migration utility. Capistrano manages versions of the application for rollforward / rollback by creating symlinks to previous versions of an application and deploying the most recent version of your entire application tree from your version control system (e.g. Subversion). However, since it’s very unlikely that you are storing all of the attachments for your application under version control, the attachments will be unlinked and no longer available when you migrate a new version of your application to production. To get around this issue, the solution proposed here creates a separate physical directory for the attachments outside of your application’s directory and then updates a symlink from your application’s attachment directory to the separate physical directory every time you migrate.

Comments 1 Comment »

I’ve been putting a good deal of time recently into converting GeoGlue from .NET to Rails. One of the things that I’m looking to get into the alpha release is the dynamic creation of podcasts. This is really nothing special since a podcast is little more than a special case of an RSS feed that points at external media files (i.e. audio or video).

Podcast Creation From Rails

I plan on covering the audio/video entry in an upcoming post about the nuances of the Attachment_Fu plugin on Windows. In this post, I’m going to just lay out the code for the podcast creation, since this is nothing more that a simple rxml file. I’ve sprinkled in comments liberally but most of the code should be fairly self explanatory to those familiar with Ruby and RSS feeds.

xml.instruct! ::xml, :version=&gt;&quot;1.0&quot;, :encoding=&gt;&quot;UTF-8&quot;
xml.rss('version' =&gt; '2.0') do
xml.channel do
xml.title @podcast.name
# Self-referencing link
xml.link url_for(:only_path =&gt; false)
# Important --&gt; RFC-822 compliant datetime
xml.pubDate(Time.now.strftime(&quot;%a, %d %b %Y %H:%M:%S %Z&quot;))
xml.language &quot;en-us&quot;
xml.ttl &quot;40&quot;
# User who caused the feed to be generated
xml.generator User.find(:first, session[:user_id]).name
xml.description @podcast.description
# 'public_filename' is a method from the Attachment_Fu plugin
xml.image do
xml.url url_for(:controller =&gt; @podcast.images[0].public_filename, ::only_path =&gt; false)
xml.link url_for(:only_path =&gt; false)
xml.title @podcast.name
xml.width @podcast.images[0].width
xml.height @podcast.images[0].height
end
@podcast.entries.each do |entry|
xml.item do
xml.title(entry.title)
xml.link(url_for(:controller =&gt; entry.audios[0].public_filename, ::only_path =&gt; false))
# User who actually generated the media (i.e. audio)
xml.author(entry.user.name)
xml.category &quot;Uncategorized&quot;
xml.guid(url_for(:controller =&gt; entry.audios[0].public_filename, ::only_path =&gt; false))
xml.description(entry.description)
# Simplification, you should pull from updated_at/updated_on
xml.pubDate(Time.now.strftime(&quot;%a, %d %b %Y %H:%M:%S %Z&quot;))
# The enclosure is very important!!
# If you use Attachment_Fu, everything you need is included in the model
xml.enclosure(:type=&gt;entry.audios[0].content_type,
:length=&gt;entry.audios[0].size.to_s,
:url=&gt;url_for(:controller =&gt; entry.audios[0].public_filename, ::only_path =&gt; false)
)
end
end
end
end

A couple of lessons learned from my experience. Firstly, Apple provides some good resources on generating podcasts. This is especially important since the iTunes crowd is a large and important contingent of the feed consuming world. There are iTunes-specific tags (and a schema) available. These tags are not mandatory (I didn’t use them here) but they will help you produce a richer feed for consumption within iTunes. Secondly, since the RXML file is just another view, make sure to turn off any default layouts that you might have applied to your other views. I’ve included a snippet below to demonstrate how to do this. Check your version of Rails, mileage may vary with exempt_from_layout based upon your release.

class ApplicationController &lt; ActionController::Base   

# Pick a unique cookie name to distinguish our session data from others 
session :session_key =&gt; '_trunk_session_id'
layout 'default'
exempt_from_layout :rxml
...

end

My final caveat is not to apply forms-based authentication to your podcast (RXML view). Either make the view public or, if you wish to protect it, do so using HTTP Basic authentication instead. If you’re using both forms-based and HTTP Basic authentication, you’ll probably need to sync the two by using a single LDAP repository. That’s fodder for a completely different post.

Comments No Comments »