July 20, 2010

Building Ruby Gem Native Extensions on Windows

If you’re using Ruby on Windows, but always encountering gems that require native extensions, then the new(ish) RubyInstaller project is for you.

When browsing the Ruby download page, you may have noticed the newfangled Windows installer for download. They’ve swapped out the old installer (ever wonder where the option for installing SciTE went?) in favor of packaging now being done by the kind folks at RubyInstaller.

Besides just providing newer/better Ruby for Windows, the RubyInstaller team has also been working on the RubyInstaller Development Kit (DevKit), an add-on for building native extensions under Windows. You’ll find a download link for DevKit here and instructions for installation here.

Installing DevKit is pretty easy. It amounts to just extracting some files to your Ruby install path. Once done, building native extension just works (at least the ones I tried). This is great for gems like ruby-debug-ide which haven’t been shipping pre-compiled Windows extensions with the latest releases.

It looks like RubyInstallers first stable releases came out around March of this year. I didn’t notice it until now, but I’m glad someone is putting in the effort to make Windows Ruby development more seamless.

June 21, 2009

MySQL Join Performance

Earlier this week I was curious about the performance of JOINs in MySQL. How severe is the performance hit of joins? How much slower is a string join over an integer join? I decided to do some tests, and I’m going to share my results here.

I did these tests on a computer with the following specs and software:

  • 3.2 GHz Pentium 4
  • 2 GB of RAM
  • Windows XP
  • MySQL 5.0.51b running in development mode (no query caching)

I made two tables:

CREATE TABLE `parents` (
`id` int(10) unsigned NOT NULL auto_increment,
`uuid` varchar(128) default NULL,
`time` datetime default NULL,
`string` varchar(128) default NULL,
PRIMARY KEY  (`id`),
UNIQUE KEY `uuid` (`uuid`),
KEY `time` (`time`),
KEY `string` (`string`)
)

CREATE TABLE `children` (
`id` int(10) unsigned NOT NULL auto_increment,
`uuid` varchar(128) default NULL,
`parent_id` int(10) unsigned NOT NULL,
`parent_uuid` varchar(128) default NULL,
`time` datetime default NULL,
`string` varchar(128) default NULL,
PRIMARY KEY  (`id`),
UNIQUE KEY `uuid` (`uuid`),
KEY `parent_id` (`parent_id`),
KEY `parent_uuid` (`parent_uuid`),
KEY `time` (`time`),
KEY `string` (`string`)
)

I filled these tables with records using the following Ruby code:

require 'mysql'
require 'digest/sha1'
require 'base64'

RECORDS   = 10_000
stringS = ['blue', 'red', 'green'].freeze

my = Mysql::new("localhost", "user", "password", "db_perf")

my.query("DELETE FROM parents")
my.query("DELETE FROM children")
my.query("ALTER TABLE parents  AUTO_INCREMENT = 1")
my.query("ALTER TABLE children AUTO_INCREMENT = 1")

def make_uuid(arg)
Digest::SHA1.hexdigest(arg.to_s)
end

start_time = Time.now.tv_sec
(1..RECORDS).each do |i|
if ((i % 500) == 0)
puts "Iteration #{i}"
time    = Time.now.tv_sec
elapsed = time - start_time
time_per_record = elapsed.to_f/i
time_to_complete = time_per_record * RECORDS
puts "Minutes to complete: #{time_to_complete/60}"
end

uuid   = make_uuid(i)
time   = Time.at(i * 3600).strftime('%Y-%m-%d %H:%M')
string = stringS[i % stringS.length]
res  = my.query(
"INSERT INTO parents "                 +
"(uuid, time, string) VALUES " +
"('#{uuid}', '#{time}', '#{string}')" )
res  = my.query(
"INSERT INTO children "         +
"(uuid, time, string, parent_id, parent_uuid) VALUES " +
"('#{uuid}', '#{time}', '#{string}', #{i}, '#{uuid}')" )
end

puts ''
puts 'DONE'
end_time = Time.now.tv_sec
elapsed = end_time - start_time
puts "Elapsed time in minutes: #{elapsed.to_f/60}"

The table engine you use and the character encoding will make a difference. Why? Because the engine will affect how keys are stored and used, and the character encoding will affect the speed of string comparisons.

I use Rails these days, and when dealing with Rails, InnoDB (which is transactional) and UTF8 are common, so let’s look at them:

image

image

Blue designates queries without a join. They are:

Select Group by String SELECT * FROM children GROUP BY string
Select Group By Time Func SELECT * FROM children GROUP BY YEAR(time), DAYOFYEAR(time)
Select SELECT * FROM children

Red designates queries with a join, which are:

Join Integer Group by String SELECT * FROM children INNER JOIN (parents) ON (children.parent_id = parents.id) GROUP BY parents.string
Join Integer Group by Time Func SELECT * FROM children INNER JOIN (parents) ON (children.parent_id = parents.id) GROUP BY YEAR(children.time), DAYOFYEAR(children.time)
Join String Group by String SELECT * FROM children INNER JOIN (parents) ON (children.parent_uuid = parents.uuid) GROUP BY parents.string
Join String Group by Time Func SELECT * FROM children INNER JOIN (parents) ON (children.parent_uuid = parents.uuid) GROUP BY YEAR(children.time), DAYOFYEAR(children.time)
Join Integer SELECT * FROM children INNER JOIN (parents) ON (children.parent_id = parents.id)
Join String SELECT * FROM children INNER JOIN (parents) ON (children.parent_uuid = parents.uuid)

The 10K queries were run 10 times, and the 100K queries were run 6 times. I did a warmup round before each measured round (so for a 10K query, that means 10 times to warmup, 10 times to measure). The times you see in the charts are the average times. Be sure to turn off query caching.

The results speak for themselves, so I’m offering them up with no analysis for now.

June 6, 2008

More on Maglev

After a little more searching around, I found some good pieces which bring back to reality the performance claims of Maglev and potential problems with object oriented databases. Let’s not forget that Maglev isn’t even out yet, and we have no idea if it’ll be what it claims to be. In the mean time, there are other well known Ruby intrepreters and VMs, such as Ruby MRI, JRuby and YARV.

Check out these in depth looks at Maglev:

This is by Charles Nutter, contributor to a popular Ruby intrepreter written in Java called JRuby. In it, he calls Maglev vaporware, thus Gemstone’s Maglev performance claims don’t mean much at this point. The key to his point is that we have no idea if Maglev is fully Ruby compliant, and to prove his point, he shows us actual numbers for performance gains that can be achieved in JRuby by removing some compliance features.

Sho Fukamachi questions the Maglev performance claims and the usefulness of OODBs.

My Thoughts

My point in the last post was that mapping from OO languages to relational DBs can get ugly. Naturally then, I’m curious about other ways of looking at the problem, but other solutions could have problems, too.

Design patterns such as Ruby’s ActiveRecord have popped up to abstract relational systems to an OO style. However, abstracting relational DBs to an OO model can often result in more DB queries, and it’s easy for inexperienced programmers to write bad, query heavy code. They just see everything as objects, and they don’t understand the consequences of their manipulations.

For this reason, it’s no mystery to me why people often complain about Ruby’s scalability. It isn’t necessarily Ruby that’s the problem. It might be the programmers. Perhaps an OODB is the solution since it would remove the need to map and reduce the aforementioned consequences, but I really don’t know.

May 31, 2008

On Database Abstraction, PHP, and Ruby

It took me a couple days on my current PHP/MySQL project to get the DB abstraction with proper error handling, input validation, and relational support coded to the point where I’m happy with the model. This was after trying Zend_Db, Doctrine, and Propel, which are all good libraries, but I hit points where the work of getting it to do what I wanted efficiently just wasn’t worth it. I decided it would be faster to just roll my own slim library.

I’m always mystified at why DB models are such a hassle to code, so I spent a little time reading. I think I get it now. It has to do with the mapping problem between relational systems and object oriented systems.

Ted Neward, writer of several books on C# and Java, calls this “The Vietnam of Computer Science“. What he’s saying is that object-relational mapping (ORM) quickly reaches a point of diminishing returns. The problem stems from the conceptual disconnect between the language and the data store.

Hoping to shed additional light on the subject, I went looking for alternative perspectives to my PHP/MySQL solution. Since it’s all the rage for ease of use, I was mostly curious about Ruby, the Rails framework, and object oriented databases.

Ruby is certainly different, but after some reading, I’m starting to understand why the language lends itself better to the OO style of DB abstraction. On top of that, Rails has powerful scaffolding that lets you flesh out pieces very fast without the code getting ugly. With ActiveRecord in mind from the ground up, things can be pretty elegant some times.

I also found some really cool systems for bridging the gap between languages and data stores. For instance, check out MagLev for Smalltalk/Ruby. Imagine using an OO DB so you don’t have to do the relational mapping, doing it in a distributed fashion, and using things like shared memory more efficiently as an object staging area between your running site scripts and the DB. That’s kind of MagLev.

All of this is interesting, but for the current project I’m sticking with PHP. However, on the next project I might try Ruby on Rails. I want to see first hand if and how it overcomes some of the challenges.

On a slightly tangential subject, a part of me really dislikes the weak typing of both Ruby and PHP. The net effect in PHP is that you pretty much end up using arrays for everything, making it hard to keep track of structure. Ruby on Rails does get around some of the weak typing issues with some better built-in validation, scaffolding, and member attributes (I’m not sure if that’s what they call them). Sometimes I’d rather be doing C#.