MySQL Join Performance

Earlier this week I was curious about the performance of JOINs in MySQL. How severe is the performance hit of joins? How much slower is a string join over an integer join? I decided to do some tests, and I’m going to share my results here.

I did these tests on a computer with the following specs and software:

  • 3.2 GHz Pentium 4
  • 2 GB of RAM
  • Windows XP
  • MySQL 5.0.51b running in development mode (no query caching)

I made two tables:

CREATE TABLE `parents` (
`id` int(10) unsigned NOT NULL auto_increment,
`uuid` varchar(128) default NULL,
`time` datetime default NULL,
`string` varchar(128) default NULL,
PRIMARY KEY  (`id`),
UNIQUE KEY `uuid` (`uuid`),
KEY `time` (`time`),
KEY `string` (`string`)
)

CREATE TABLE `children` (
`id` int(10) unsigned NOT NULL auto_increment,
`uuid` varchar(128) default NULL,
`parent_id` int(10) unsigned NOT NULL,
`parent_uuid` varchar(128) default NULL,
`time` datetime default NULL,
`string` varchar(128) default NULL,
PRIMARY KEY  (`id`),
UNIQUE KEY `uuid` (`uuid`),
KEY `parent_id` (`parent_id`),
KEY `parent_uuid` (`parent_uuid`),
KEY `time` (`time`),
KEY `string` (`string`)
)

I filled these tables with records using the following Ruby code:

require 'mysql'
require 'digest/sha1'
require 'base64'

RECORDS   = 10_000
stringS = ['blue', 'red', 'green'].freeze

my = Mysql::new("localhost", "user", "password", "db_perf")

my.query("DELETE FROM parents")
my.query("DELETE FROM children")
my.query("ALTER TABLE parents  AUTO_INCREMENT = 1")
my.query("ALTER TABLE children AUTO_INCREMENT = 1")

def make_uuid(arg)
Digest::SHA1.hexdigest(arg.to_s)
end

start_time = Time.now.tv_sec
(1..RECORDS).each do |i|
if ((i % 500) == 0)
puts "Iteration #{i}"
time    = Time.now.tv_sec
elapsed = time - start_time
time_per_record = elapsed.to_f/i
time_to_complete = time_per_record * RECORDS
puts "Minutes to complete: #{time_to_complete/60}"
end

uuid   = make_uuid(i)
time   = Time.at(i * 3600).strftime('%Y-%m-%d %H:%M')
string = stringS[i % stringS.length]
res  = my.query(
"INSERT INTO parents "                 +
"(uuid, time, string) VALUES " +
"('#{uuid}', '#{time}', '#{string}')" )
res  = my.query(
"INSERT INTO children "         +
"(uuid, time, string, parent_id, parent_uuid) VALUES " +
"('#{uuid}', '#{time}', '#{string}', #{i}, '#{uuid}')" )
end

puts ''
puts 'DONE'
end_time = Time.now.tv_sec
elapsed = end_time - start_time
puts "Elapsed time in minutes: #{elapsed.to_f/60}"

The table engine you use and the character encoding will make a difference. Why? Because the engine will affect how keys are stored and used, and the character encoding will affect the speed of string comparisons.

I use Rails these days, and when dealing with Rails, InnoDB (which is transactional) and UTF8 are common, so let’s look at them:

image

image

Blue designates queries without a join. They are:

Select Group by String SELECT * FROM children GROUP BY string
Select Group By Time Func SELECT * FROM children GROUP BY YEAR(time), DAYOFYEAR(time)
Select SELECT * FROM children

Red designates queries with a join, which are:

Join Integer Group by String SELECT * FROM children INNER JOIN (parents) ON (children.parent_id = parents.id) GROUP BY parents.string
Join Integer Group by Time Func SELECT * FROM children INNER JOIN (parents) ON (children.parent_id = parents.id) GROUP BY YEAR(children.time), DAYOFYEAR(children.time)
Join String Group by String SELECT * FROM children INNER JOIN (parents) ON (children.parent_uuid = parents.uuid) GROUP BY parents.string
Join String Group by Time Func SELECT * FROM children INNER JOIN (parents) ON (children.parent_uuid = parents.uuid) GROUP BY YEAR(children.time), DAYOFYEAR(children.time)
Join Integer SELECT * FROM children INNER JOIN (parents) ON (children.parent_id = parents.id)
Join String SELECT * FROM children INNER JOIN (parents) ON (children.parent_uuid = parents.uuid)

The 10K queries were run 10 times, and the 100K queries were run 6 times. I did a warmup round before each measured round (so for a 10K query, that means 10 times to warmup, 10 times to measure). The times you see in the charts are the average times. Be sure to turn off query caching.

The results speak for themselves, so I’m offering them up with no analysis for now.

Comments

Twitter Search

A few weeks ago, I finally got around to trying Twitter’s search. All it took was a few queries for me to realize its potential.

(For those that don’t know, Twitter gained in-house search functionality when it acquired Summize last year.)

One afternoon, while I was mucking around with Ruby on Rails, I encountered a problem with a release candidate. Annoyed, I took to the Internet searching for a solution. Google wasn’t of much help; the RC was just too fresh for there to be much information on regular web pages.

Disappointed, I started thinking about other ways to find an answer. The Rails IRC channel was one possibility, their bug tracker was another. That’s when it hit me – why not try Twitter?

It seemed logical. After all, Twitter is the place for soapbox style venting. Someone must have complained about the same problem. Maybe they even offered up a link to a solution.

I did a search, and to my amazement, I actually saw some interesting comments. For the first time, Twitter actually helped me save time rather than waste it.

Therein lies most of the value in Twitter. Forget for a moment the mundane, narcissistic, attention whoring tweets that occupy most of Twitter. If you cut out all the garbage, the rest of Twitter is a rich source of real-time information about problems, products, places, news, and more.

Social marketers already know this. They’re using Twitter search to monitor brands and engage their customers. But allow me to go even broader. Twitter search isn’t just for social marketers, it holds great promise for the average seeker of information.

Want to hear what’s going on at an event? Favorite web site down? Check Twitter.

This is a new breed of real-time information. Think of it as collective experience. Sure, you could say Facebook has been doing it for ages, but Twitter is special. Twitter’s culture encourages public sharing. That’s what makes it great for searching.

Over time, people will slowly discover its value. As the mainstream starts to search Twitter and similar public streams, third parties like Google will start to realize the value in indexing it. They will provide new views into the information that we haven’t seen yet. Startups will also jump on board and innovate.

One could even argue that this is half the point of Google Friend Connect. Google is inserting itself into the chain so it can start analyzing and presenting that information for search.

Let’s not forget that a new source of information means a new area for targeted advertising. This is great for a company like Twitter, because it gives them another option to consider for their business model.

On that note, I’m starting to come around on the idea that Twitter could be a viable business. Expanding their reach, introducing users to their search, and then monetizing the results could be a great source of revenue just like it is for Google.

I’d also like to see Twitter start introducing business accounts. They should give businesses ways to manage groups of accounts so employees can participate in company tweeting.

Another possibility is to provide features to analyze the response to tweets (e.g., see how many people are tuning in, and what kinds of people are listening).

Twitter has already started taking steps in the right direction, and I really do hope it works out.

Comments

The Past Few Months

It’s been a while since I’ve updated my blog, so I thought I’d provide a rundown of what I’ve been doing.

Startup

We really found our focus over the past couple months. We are committed to making tools for mobile developers. Our name is Localytics, and our first tool is an analytics service for mobile applications.

We launched an early version of our analytics engine last week at a DemocampBoston1. Democamp involves a quick succession of presentations by local startups. Check out my recap.

We captured a video of our presentation. The presentation was 5 minutes with a 5 minute Q&A at the end. Check it out:

Registration is closed right now, but we will be opening it up in a couple weeks. During that time, we will add more features, documentation, and hopefully some guides on how to get the most out of the service.

For now, you can check out our demo, which might not make a whole lot of sense, but at least it’s there and it works.

Gaming

I managed to get through a couple of the 2008 holiday season games. I have mixed feelings about most of them. The only ones I’d flat out recommend are Gears of War 2 and Left 4 Dead. Perhaps I’ll go into more detail in future posts.

I finally discovered the wonder that is Xbox Live Arcade. There are some awesome games on there, many of them with a retro feel. Braid, Castle Crashers, and Geometry Wars are all great. Each is an update of classic game styles with gorgeous HD visuals.

Braid is a side-scroller with time manipulation, Castle Crashers invokes memories of Teenage Muntant Ninja Turtles beat-em-up games, and Geometry Wars is Asteroids on crack. I especially like how Geometry Wars pulls in high scores from your friends, encouraging you to beat them.

Comments

What an insane fall lineup

The games coming out this fall season are an assault on the senses. This is a way better lineup than last year. Games I care about include:

  • Dead Space – released Oct 13 to great reviews
  • Fable 2 – Oct 21
  • Far Cry 2 – Oct 22
  • Command and Conquer: Red Alert 3 – Oct 28
  • Fallout 3 – Oct 28
  • Gears of War 2 – Nov 7
  • Call of Duty: World at War – Nov 11
  • Mirror’s Edge – Nov 11
  • World of Warcraft: Wrath of the Lich King – Nov 13
  • Left 4 Dead – Nov 17
  • Prince of Persia – Dec 2

I’ve bolded the ones I’m definitely going to buy. Almost all of these are Xbox 360 titles. There are a couple good PS3 games (such as Little Big Planet), but I don’t have a PS3.

I’m cautious about Fallout 3. It has probably the biggest advertising campaign out of all of them, but I have a feeling it won’t meet expectations.

Lots of sequels in there. In fact, the only ones that aren’t a sequel or in some way part of an existing series are Dead Space, Mirror’s Edge, and Left 4 Dead.

Comments

Yahoo Fire Eagle Now Public

Yahoo Fire Eagle has come out of private beta, and it’s now open to everyone. Fire Eagle provides a unified way for applications and web sites to share location data tied to specific users.

For your average web surfer or mobile device user, this means a single, unified way for sharing your location across all of the sites and applications you use (that are on Fire Eagle). When you update your location in one application, other applications get updated as well. Privacy controls let you decide exactly how that sharing should work.

So imagine you are using an application on your phone that works with Fire Eagle. Let’s say that app uses your GPS location to find restaurants near you. When it does so, it notifies Fire Eagle where you are. Now suppose a completely different web site exists that lets people tell their friends where they are. If it works with Fire Eagle, then it’ll see the update from the restaurant finder on your phone.

As a developer, this opens up a whole new range of possibilities, and it also increases the amount of data you could observe.

This is relevant to one of my previous posts on location tracking, and it’s an area I’m really interested in. Will developers embrace it? Will users fear the privacy implications? We’ll see.

Comments

Languages and Their Library Cultures

I saw an interesting blog post today titled Why I stick with Perl. In short, the author’s reason for sticking with Perl was that he believes it to have the best “library culture.” A strong library culture is more important to him than the language itself.

What does “library culture” even mean? I would define it as the practices, beliefs, and institutions regarding the sharing of modular code. In other words, it refers to the systems for sharing code and the practices of developers concerning those systems.

Perl certainly has a very strong library culture. CPAN is one of the most extensive library repositories out there. In my experiences with the Perl community, the use of CPAN has always been encouraged, and I’ve generally been pleased with code quality and documentation.

Besides Perl, this made me think about other library cultures. I never realized how drastically different some of them are. Here are my thoughts on a few of the more interesting cases.

Ruby on Rails

Ruby on Rails is very much so a cut and paste world. Despite the existence of RubyForge, I still find myself grabbing code from blogs, wikis, or some random guy’s self-hosted repository way too often. It’s always a little unsettling.

For example, a typical plugin hunt involves going to the Rails wiki, grabbing the plugin from a random guy’s subversion repo, making the changes suggested on the Rails wiki, then going to a forum or blog and copy-pasting the changes suggested there. After all this work, you later find out that some other guy did all this and checked it into GitHub in a new repo. Now whose changes do I follow, the original random guy’s repo or the new repo from a new random guy? And who are these people anyway? It all feels very questionable.

PHP

PHP provides so much in the core extensions that I rarely need to look outside of what’s already there. Whenever I do need something, PEAR or a professional framework like Zend Framework often has it. I can’t think of a single time I ever needed code from a random repo.

C# and .NET

.NET is a monstrous library which has everything under the sun. Microsoft also provides great documentation. Due to these two factors, C# doesn’t even really need a strong library culture, it already has a behemoth with a ton of money taking care of it.

Microsoft can’t cover everything, though, so they’ll always be the need for other libraries. Unfortunately, the third-party library culture isn’t as open as it is in languages like Perl. The Microsoft world encourages licensing libraries instead of collaborative repositories.

Comments

Cuil is Uncool

For anyone who missed the flurry of news over the startup Cuil, I think it’s an interesting look at what’s wrong (at times) with Silicon Valley. Despite being full of ex-Google employees and $30+ million in funding, all they have is a crappy search engine that sucks. It might be because they were too busy eating muffins and strawberries.

It’ll be interesting to see where Cuil goes. The search relevance is so bad that some people speculate it isn’t about search, it’s about flipping their indexing algorithms. Considering the high amount of funding, I have a hard time believing anyone will pay for this.

Quite possibly one of the funniest things to come out of this was the jab from Yahoo. Using their BOSS API, Yahoo created a search mashup called Yuil. Yahoo has since taken down Yuil, but it was a funny shot at Cuil while it lasted.

Despite how much the product sucks, I do have to hand it to Cuil or whoever did their marketing/PR/advertising. They were all over the blogosphere and major new sites. I haven’t seen a launch blitz like this in a long time. Hell, they got me to say something.

Comments

We’re Getting Close to Photo-realism in Games

Check out this screenshot from Crysis Warhead, and this one, too. Now if only the colors could be a little less brown and gritty, and maybe go a little less on the bloom.

Comments

The State of the Web Illustration – Summer 2008

I liked this illustration of 2008 Internet and tech trends by Matt Inman. He did one back in 2007, too. It’s not particularly comprehensive, I just think he’s a good illustrator. His portfolio is creative and vibrant. Now that I’m working on my own web projects, I’m starting to appreciate good, original design work.

Comments

Mobile Web Panelist Discussion

Just yesterday I attended a panelist discussion about mobile web browsing organized by Mobile Monday Boston. I did a write-up of the event here.

This is a good opportunity to plug my new blog, Boston Startups (where the write-up exists).  The purpose of this new blog is to more properly separate my coverage of Boston area startups from my personal blog, as well as to provide a voice for the Boston startup community. I welcome guest posts and coverage of events.

Regarding the event, I enjoyed it. Turn out was somewhere between 50 and 100 people. At an after party, I learned about some companies relevant to one of my current projects: Admob and Quattro Wireless. Both deal in mobile advertising, and Quattro is local to Waltham, MA.

Comments

Close
E-mail It