Archive for July, 2009

Facebook has long been for me one of the last unexplored realms of social networking. Finally, when trying to convince new recruits to join me in using Twitter, I realized that so many of my friends, acquaintances, and colleagues were hooked on Facebook, I stood little chance of winning them over to Twitter without a deeper understanding of where Facebook fits in the social networking mix. I turned to the book “Facebook Me!” by Dave Awl to provide a solid background in how Facebook might work best for me and to help me understand how to integrate Facebook with the rest of the Web 2.0 applications I use. My review of this book from Amazon.com can be found below.

Facebook Me

This book provides the ideal balance between introduction to the Facebook application and reference manual for the more experienced user. The first few chapters will prove a bit superfluous to all but the greenest of newbies. After that, you can count on some pretty solid information on using Facebook to enhance your online social communications leveraging the breadth of Facebook’s communication features. Several elements of the book appealed to me particularly:

  • Very visual and, for the most part (ca. July 2009), up-to-date with respect to the latest enhancements to the Facebook user interface
  • Offers pragmatic advice on using Facebook features without overhyping features such as messaging, where there are clearly other capable mediums.
  • Provides a balanced view of Facebook’s features and alternatives for integrating other alternative mechanisms in with Facebook to augment the out-of-the-box offering.

I can’t emphasize the importance of the last two points to my assessment of the book. It showed me how to integrate other Web 2.0 technologies that I’m very happy with, e.g. FlickR for photos and Twitter for status updates, into Facebook. This integration allows me to enjoy what I believe to be the best of what Facebook has to offer (a huge social network of people you already know) with dramatically more sophisticated, open, and evolved media and messaging capabilities of other platforms.

For the new to intermediate Facebook user, this may be the only book they’ll ever need. More dedicated and fanatical Facebook users might find that this book doesn’t go deep enough. I find myself somewhere in between. I’ve caught on to Facebook pretty quickly but I still don’t plan on using the majority of features outlined in this book. That’s why the book is a solid 4 starts for me. Were I a bit more into Facebook and a bit less into other Web 2.0 technologies, I could see this being a 4.5 or 5 start book.

Comments No Comments »

When ScottGu puts the time into creating a mini-tutorial for a new technology, it’s usually something worth investigating. After seeing his tutorial / overview of the new IIS Search Engine Optimization Toolkit, I decided I ought to give it a look. With the new blog running WordPress on IIS, this seemed especially timely and relevant.

As Scott mentions in his blog, a prerequisite to getting the IIS SEO Toolkit up and running is the installation of the Microsoft Web Platform Installer. I was surprised how easy this installation went. When the installation is complete, you’ll have a new icon on your desktop and a new “Management” section within the IIS admin tool. The Installer looks like a great tool although I’m sure that some (myself included) will be leery about Microsoft installing server-related software on their machines.

IIS SEO - New Admin Features

I followed ScottGu’s recommendations for installing and running the tool. After running it both against Scott’s site and then performing some follow-up analysis, there were several things that I felt warranted a bit further explanation:

  1. The scan of my blog took a lot longer to run. This was on the order of 8 minutes for my blog versus the 13 seconds Scott quotes. My suspicion is that, especially as your site’s link depth increases and you point towards more external media, the scan takes longer to run and pseudo-index it all. In short, the IIS SEO Toolkit is doing a full spidering of your web site and the time to do so will vary according to the size and complexity of your site.
  2. Scott mentioned but didn’t go into a lot of detail on the robots exclusion and sitemap / site indexing tools. I was hoping that there would have been a bit more automation that would occur after the initial site analysis was run but was disappointed to find out that this was not the case. These tools look to be little more than editors slapped on top of these files.
  3. On the positive side, there’s a lot more that this tool can do than was covered in ScottGu’s brief post. In short, the analysis provides four information groupings: violations, content, performance, and links. Of these, ScottGu only covers one, Violations. I offer some more information on the other capabilities and features below.

Site Analysis Trending Capabilities

The IIS SEO Toolkit stores historical analysis metadata and details. This effectively affords you the capability to perform analysis and trending of your site’s SEO and other critical metrics over time. You can see below how my site changed between two different analysis runs.

IIS SEO Analysis History

Content Summary

The content summary offers an abundance of information on content types, hosts, link, files, titles, and keywords. This information is useful for SEO and other site maintenance activities. The image below illustrates one example of the content summary – the pages with broken links summary.

IIS SEO Content Summary

Performance Summary

The performance summary section provides information on slow pages, pages with a large amount of resources, and page performance metrics by content and directory type. These statistics require a bit of interpretation. The image below is of performance by content type. This report allows further investigation as to why some content types categorically take longer to render than others do.

IIS SEO - Performance Summary

Query Capabilities

All of the canned reports in the IIS SEO Toolkit are backed by a query engine. The ability to directly query the data is also provided using a simple query builder. As of this release, it looks as if queries are restricted to a single analysis run. It would be nice in the future if the queries could be expanded to span multiple analysis runs and provide a longitudinal picture of a site’s evolution.

IIS SEO - Query Capabilities

Comments 1 Comment »

I’m sure if I had a nickel for each time a software project was impacted by introducing production volume data into the testing life cycle either too late, or worse even – not at all, I’d be a rich man and wouldn’t be writing this blog entry. When you think about it, it’s really no wonder that we find ourselves in this situation. Developers new to the craft have no experience to draw on dealing with millions of rows of data. Experienced developers and DBAs often pass on war stories of hand crafted scripts and the perils of migrating data from production to lower environments, further reinforcing the belief that emulating production volumes of data is work restricted to the gods of IT.

Taking this trend a step further, applications are often exposed for the briefest of periods to data resembling production volume data during the test cycle. Even then, the data reflects yesterday’s production volumes and not next year’s volumes. Furthermore, testing of certain other functions is restricted because these functions either deal with new data where there is, as of yet, no production data or they deal with other external systems that have test data that’s out-of-sync with the data of the system under test.

Does preparing production volume data really need to be this difficult?
No!

This entry deals with the use of Red Gate Software’s SQL Data Generator product to generate a large volume of data for a simple test database. Why would one choose to use a product such as SQL Data Generator to generate test data instead of using alternate methods such as copying production data or creating custom data generation scripts? There are several reasons:

  1. Obfuscation of personally identifiable information (PII) from production data is a painful process. See point #2
  2. Syncing data between systems is a very painful process. More so if the obfuscated data from step #1 is used across systems
  3. By using production data, you’re only testing for current capacity, not for planned capacity 3, 6, or 12 months from now
  4. If the application or elements of the application are new, you may not have any relevant data to test with at all
  5. Writing custom data generation scripts is either (i) a one-off process that yields brittle scripts tied to a particular version of the schema; (ii) an exercise in re-inventing the wheel since commercial tools have already been built to do this.

Sample Data Model

Our data model is simple enough to be readily understandable while still presenting a couple of challenges that will illustrate some of the features of the SQL Data Generator tool. The data model serves as the backend for an online travelling site that collects and manages community-driven travel recommendations. Think about the book 1000 Places to See Before You Die as a Web2.0’ish site. Users can enter new tours / places, bundle similar or geographically close tours into tour packages, and provide user-specific tags for both tours and packages.

Generating Data - Data Model

There are several characteristics of this data model that are somewhat challenging and provide an opportunity to illustrate some of SQL Data Generator’s more advanced options. These features are:

  • Both the package and tour ids are unique identifiers (GUIDs). They are referenced by the ContributorId in the Tags table but there is no foreign key constraint. That is, a ContributionId is a GUID which may match up with either a tour id or a package id.
  • The sequence numbers within the TourPackages table represent the visual display order of the tours within the package. Therefore the sequence numbers cannot be random and must cycle through each of the tours in a package without repeating within that package.
  • The data generated for the model has to follow statistical distributions representative of the production environment, such as:
    • There should be 5 times as many tours as users with the number of tours per user following a normal statistical distribution between 1 and 10 tours.
    • The total number of packages should be 40% of the total number of tours. Tours distribution amongst the packages should be random
    • Total tags should be 60% of total tours. The vast majority of these tags (almost 10-to-1) should be attributed to tours. The remained are attributed to tour packages.

Basic SQL Data Generator Capabilities

Creating a project with SQL Data Generator is as easy as selecting the database you wish to generate data into.

Generating Production Data - Project Configuration

Once the project is created, SQL generator will infer information about the data based upon the column types and other characteristics. You can then review sample data and tweak the configuration options to meet your needs.

Generating Data - Column Generation Settings

Specifying valid values for the Tour table’s longitude column. Changes to the generator settings are immediately reflected in the sample data set, providing the opportunity to validate the impacts of the changes.

Generating Data - Previewing Generated Data

Intermediate SQL Data Generator Capabilities

Specifying the mechanism to determine how many rows to generate is made easy within SQL Data Generator. This enables the data to be generated in proportion to production ratios, as stipulated in our requirements.

Generating Data - Specifying Counts and Ratios

These same capabilities allow us to address the requirements around TourPackage sequence numbers by letting SQL Data Generator handle the generation of combinations within the TourId / Package Id composite key space.

Lastly, SQL Data Generator can use alternate generator sources, such as the output of SQL statements or direct input from a CSV file. In our case, this allows us to specify a SQL statement to pull the appropriate Ids from the Tour and Package tables for the reference Id values in the Tag table even though no explicit foreign key relationship is present.

Generating Data - Using a SQL Generator

Advanced SQL Data Generator Capabilities

Custom generators can be created for use with the SQL Data Generator. This enables domain specific data to be generated and the generators to be re-used across multiple projects. Custom generators are written in .NET code by implementing one of RedGate’s generator interfaces. Although this is not particularly difficult, it is beyond the scope of this post.

Generating Data

Once the generation options are specified in accordance with the requirements, the only thing left to do is generate the data. The data population action plan gives you an idea of what data will be going where.

Generating Data - Population Action Plan

Running the generation script against a local SQL Server Express installation on a small (one processor, 2 GB RAM) VMWare machine, SQL Data Generator was able to generate 420,000 records across 5 tables in less than 1 minute, yielding a total database size of about 400 MB.

Generating Data - Data Report

Other SQL Data Generator Capabilities

At this point in this blog entry, I’m hoping you’re at least starting to believe that data generation can be fast and easy. There are several other benefits to data generation with SQL Data Generator that weren’t covered here:

  • Project seamlessly incorporates changes to the underlying schema
  • The SQL Data Generator project file (extension “.sqlgen”) can be version controlled in conjunction with the scripts to create the database, providing the ability to create and fully populate current and historic versions of the database to align with application code changes.
  • If the seed numbers are not changed, the data generated is exactly the same across generations. If you need new / different data, change the seed number.

Related Links

Comments 2 Comments »

I had long planned the move from the .NET-based DasBlog blogging engine to WordPress but just couldn’t seem to make the time to complete the move. I finally pulled the trigger and cutover to WordPress a couple of weeks ago. The process was not nearly as painful as I imagined and I’m now beginning to reap the rewards of working on a blogging platform that’s more broadly integrated into the Web ecosystem. This blog entry is a collection of the key technical takeaways from my migration. Hopefully they will be helpful for other people looking to migrate to WordPress, especially on the Microsoft IIS platform.

Wordpress on IIS 7

  • Getting WordPress Up and Running on IIS is Very Easy – I was surprised how easy it was to get WordPress running on IIS 7. The entire process took me no longer than 30 minutes to complete once I had the correct guidance in place. The items that were of the utmost help to me here were as follows:

  • There’s Help Porting Content Into WordPress From Other Blog Engines – This was very welcome news as porting everything by hand would have been intolerably tedious. Porting blog content is a two step process:

  • Don’t Forget About Mapping the URLs of Your Entries so That All Your Links Don’t Break – Maintaining external consistency is critical to followers of your blog. They care little that you migrated onto new software. Google cares even less. Don’t make people think about this. Do the work for them and map your legacy URLs to the new URLs in WordPress so that the change is transparent to everyone but you. Most of the guidance I could find on the web around mapping WordPress URLs dealt with Apache mod_rewrite. Fortunately IIS 7 provides an extension called “URL Rewrite” that can rewrite incoming URL requests. This article and the links within provide you everything that you need to understand URL Rewrite and get the job done.

  • Take The Opportunity To Leverage the Cloud – Although the text ports fairly well using BlogML as a bridge, the other binary content (images, document, etc.) need to be moved over manually. You can just copy the DasBlog contents folder over to maintain URL continuity or you can get a bit more ambitious. I chose to leverage Amazon’s S3 file storage service to store all of my binary content so that I don’t have to worry about backing it up or moving it ever again. I took the opportunity to set up S3 virtual hosting so that, with a bit of DNS trickery, my blog binary contents are all served from http://s3.beckshome.com.

  • Identify and Engage the Necessary WordPress Widgets – One of the key features of the WordPress blogging engine is its extensibility and the vast array of freely available themes and plugins that you can use to add valuable functionality to your blog. A top 10 or 20 list of plugins would warrant another post entirely and there are a multitude of these lists already out there. Instead, I’ll recommend a series of plugins that I found to be absolutely necessary to replace content or functions I had available under DasBlog and which I considered “table stakes” for the move over to WordPress.
    • Flickr Badge Widget – I replaced separate DasBlog pages for my photos and videos with a single Flickr Flash Badge that links to my Flickr account.
    • Kimli Flash Embed – I have a screencast I did on Microsoft Virtual Earth a while back. This was the only way to embed it into the main WordPress page.
    • SyntaxHighlighter Evolved – I have a bunch of source code snippets embedded in my blog entries, mostly C# and Ruby. This plugin made them look better than they ever did on my older blog with zero fuss.
    • WP Google Analytics – Despite the avialbility of WordPress stats, I’m sticking with Google analytics. This plugin made the transition seamless.

Comments 1 Comment »

Since jumping back on the blogging bandwagon, I’ve been looking to get more familiar with the top social networking sites. I’ve had some experiences with most of the major players except Twitter, which I never did manage to get into. I decided to give Twitter a fair chance and see if it worked for me. In order to do this, I felt some basic background / guidance was necessary before jumping in heads-first. Turns out that The Twitter Book from Tim O’Reilly and Sarah Milstein was really all that I needed. My Amazon review follows:

The Twitter Book

Think of The Twitter Book not as a book but rather like a longer, really well done, Powerpoint presentation. For the most part, the top of every other page of the book has a really clear storyboard message which is explained on the subsequent two pages with creative examples, both textual and using simple, colorful graphics. As countless reviewers have already pointed out, it’s a case of the book medium emulating the tool it’s describing – terse and colorful.

The book is an easy read in an hour, give or take 10 minutes. It also functions well as a reference document if you need to go back and look up Twitter features, such as hashtags and retweets, as you gain more familiarity with the Twitter service. At 231 not-so-dense pages, the book is rightsized for a service that enforces a 140 character message limit.

If you’ve looked at Twitter before and didn’t get what all the fuss was about, give it another shot after reading this book. Try the “Three Weeks or Your Money Back – Guaranteed” plan in chapter 1. You’ve got lots to gain and very little to lose.

Comments No Comments »