TechStars Application Tips
With the deadline looming for TechStars Boston 2010, I've been asked for tips from a few people. It's time to share them in a blog post. Hopefully this helps.
Disclaimer: this is just my perspective from Localytics, a TechStars Boston 2009 company. My tips are anecdotal at best. As always, exercise your own judgement.
- There are a lot of applications, so short and sweet works well. Eliminate fluff. Most of our answers were less than a few concise paragraphs. The only answer that was long was for "Tell us about each founder." That's because...
- ... it's more about the founders than it is about the idea. A non-trivial percentage of TechStars companies change there idea, and it's not a big deal. Your application should help people like David and Shawn get to know you. Highlight cool things you've worked on in the past. Show that you're motivated and smart.
- Be mentor-able. TechStars is all about mentoring. You need to be open to advice while still being able to think critically and make decisions. To the extent that you can, convey those personality traits in your application.
- Be dedicated to the startup. TechStars strongly favors founding teams that are 100% committed and ready to go.
- Get your application in early - TechStars is already looking at them. Getting in early means your application has more time to float in front of different eyes. You are at less risk of being lost in the last minute flood.
- Get feedback. If you're connected to any alums or mentors, see if they'll review your application. We resubmitted our application several times based on feedback (though be warned, I'm not sure if resubmitting is possible on the application form, we sent updates over email).
- If your company/product is already chugging along, then demonstrate progress. Show how your company has met goals or achieved milestones like X number of users or high profile client Y.
- Try to get into TechStars for a Day. It gives you a chance for face time. If you do go, bring a laptop and be ready to informally demo your product.
- Apply! It isn't that much work, so just do it.
You should also check out:
- TechStars Boston Application Tips from Healy Jones
Best of luck to all of the applicants! TechStars is a truly amazing program.
Ruby MD5 and SHA1 Digest Benchmark
I did a benchmark of MD5 and SHA1 digests in Ruby. The benchmark was done in Ruby 1.8.6 on Windows.
The code used to benchmark:
require 'benchmark'
require 'digest/sha1'
require 'digest/md5'
require 'base64'
# Demonstration of digests
puts "SHA1.hexdigest #{Digest::SHA1.hexdigest('test')}"
puts "MD5.hexdigest #{Digest::MD5.hexdigest('test')}"
puts "SHA1.digest #{Base64.encode64(Digest::SHA1.digest('test'))}"
puts "MD5.digest #{Base64.encode64(Digest::MD5.digest('test'))}"
puts "MD5.digest.inspect #{Digest::MD5.digest('test').inspect}"
print "\nSHA1.digest bytes "
Digest::SHA1.digest('test').each_byte {|c| printf("%4d", c) }
print "\nMD5.digest bytes "
Digest::MD5.digest( 'test').each_byte {|c| printf("%4d", c) }
puts "\n\n"
# Benchmark
TIMES = 50_000
Benchmark.bmbm do |b|
b.report("sha1 hexdig") { TIMES.times do |x|; Digest::SHA1.hexdigest(x.to_s); end }
b.report("md5 hexdig") { TIMES.times do |x|; Digest::MD5.hexdigest(x.to_s); end }
b.report("sha1 digest") { TIMES.times do |x|; Digest::SHA1.digest(x.to_s); end }
b.report("md5 digest") { TIMES.times do |x|; Digest::MD5.digest(x.to_s); end }
end
The output:
SHA1.hexdigest a94a8fe5ccb19ba61c4c0873d391e987982fbbd3 MD5.hexdigest 098f6bcd4621d373cade4e832627b4f6 SHA1.digest qUqP5cyxm6YcTAhz05Hph5gvu9M= MD5.digest CY9rzUYh03PK3k6DJie09g== MD5.digest.inspect "\t\217k\315F!\323s\312\336N\203&'\264\366" SHA1.digest bytes 169 74 143 229 204 177 155 166 28 76 8 115 211 145 233 135 152 47 187 211 MD5.digest bytes 9 143 107 205 70 33 211 115 202 222 78 131 38 39 180 246 Rehearsal ----------------------------------------------- sha1 hexdig 0.797000 0.000000 0.797000 ( 0.796000) md5 hexdig 0.641000 0.000000 0.641000 ( 0.641000) sha1 digest 0.625000 0.000000 0.625000 ( 0.625000) md5 digest 0.500000 0.000000 0.500000 ( 0.500000) -------------------------------------- total: 2.563000sec user system total real sha1 hexdig 0.750000 0.032000 0.782000 ( 0.781000) md5 hexdig 0.593000 0.000000 0.593000 ( 0.594000) sha1 digest 0.594000 0.031000 0.625000 ( 0.625000) md5 digest 0.469000 0.000000 0.469000 ( 0.468000)
As expected, digest methods are faster than hexdigest and MD5 is faster than SHA1.
Database Normalization: First, Second, and Third Normal Forms
I read a great explanation of first, second, and third normal form a few weeks ago. For those that know what database normalization is but haven't seen the "forms", the different forms are essentially rules for having a well normalized relation DB. Keeping them in mind when doing DB design is key to keeping a great database. I'd like to make an attempt at condensing the linked tutorial into its essentials.
First Normal Form (1NF): No repeating elements or groups of elements
The point here is to not repeat yourself within a table. Don't do this:
| OrderId | ItemId1 | ItemId2 | … |
| 1 | 100 | 101 |
Instead, break it out into relational tables.
Second Normal Form (2NF): No partial dependencies on a concatenated key
This is a complex way of saying that if a column isn’t intrinsically related to the entire primary key, then you should break out the primary key into different tables.
Example:
| OrderId (PK) | ItemId (PK) | OrderDate | … |
| 1 | 100 | 2009-01-01 | |
| 1 | 101 | 2009-01-01 |
The primary key here is (OrderId, ItemId). Consider OrderDate. It is conceptually part of an order. An order always occurs at some time. But is an OrderDate related to an Item. Not really.
You may be saying, “but items are part of an order!”, and you would be right. But that’s not what I’m getting at. OrderDate is independent of the item itself.
Look at another way: in the table above the OrderDate will always be the same for a given OrderId regardless of the value of the ItemId column. This means data duplication, which is anti-normalization.
Here’s how we correct the problem:
| Orders | ||
| OrderId (PK) | OrderDate | … |
| 1 | 2009-01-01 |
| Order_Items | ||
| OrderId (PK) | ItemId (PK) | … |
| 1 | 100 | |
| 1 | 101 |
Here is an excellent line from the article, “All we are trying to establish here is whether a particular order on a particular date relies on a particular item.”
Third Normal Form (3NF): No dependencies on non-key attributes
2NF covers the case of multi-column primary keys. 3NF is meant to cover single column keys. Simply stated, pull out columns that don’t directly relate to the subject of the row (the primary key), and put them in their own table.
Example:
| Orders | |||
| OrderId (PK) | OrderDate | CustomerName | CustomerCity |
| 1 | 2009-01-01 | John Smith | Chicago |
Customer information could be the subject of its own table. Pull out customer name and other customer fields into another table, and then put a Customer foreign key into Orders.
Wikipedia has a great quote from Bill Kent: “every non-key attribute ‘must provide a fact about the key, the whole key, and nothing but the key’.”
MySQL Join Performance
Earlier this week I was curious about the performance of JOINs in MySQL. How severe is the performance hit of joins? How much slower is a string join over an integer join? I decided to do some tests, and I’m going to share my results here.
I did these tests on a computer with the following specs and software:
- 3.2 GHz Pentium 4
- 2 GB of RAM
- Windows XP
- MySQL 5.0.51b running in development mode (no query caching)
I made two tables:
CREATE TABLE `parents` ( `id` int(10) unsigned NOT NULL auto_increment, `uuid` varchar(128) default NULL, `time` datetime default NULL, `string` varchar(128) default NULL, PRIMARY KEY (`id`), UNIQUE KEY `uuid` (`uuid`), KEY `time` (`time`), KEY `string` (`string`) ) CREATE TABLE `children` ( `id` int(10) unsigned NOT NULL auto_increment, `uuid` varchar(128) default NULL, `parent_id` int(10) unsigned NOT NULL, `parent_uuid` varchar(128) default NULL, `time` datetime default NULL, `string` varchar(128) default NULL, PRIMARY KEY (`id`), UNIQUE KEY `uuid` (`uuid`), KEY `parent_id` (`parent_id`), KEY `parent_uuid` (`parent_uuid`), KEY `time` (`time`), KEY `string` (`string`) )
I filled these tables with records using the following Ruby code:
require 'mysql'
require 'digest/sha1'
require 'base64'
RECORDS = 10_000
stringS = ['blue', 'red', 'green'].freeze
my = Mysql::new("localhost", "user", "password", "db_perf")
my.query("DELETE FROM parents")
my.query("DELETE FROM children")
my.query("ALTER TABLE parents AUTO_INCREMENT = 1")
my.query("ALTER TABLE children AUTO_INCREMENT = 1")
def make_uuid(arg)
Digest::SHA1.hexdigest(arg.to_s)
end
start_time = Time.now.tv_sec
(1..RECORDS).each do |i|
if ((i % 500) == 0)
puts "Iteration #{i}"
time = Time.now.tv_sec
elapsed = time - start_time
time_per_record = elapsed.to_f/i
time_to_complete = time_per_record * RECORDS
puts "Minutes to complete: #{time_to_complete/60}"
end
uuid = make_uuid(i)
time = Time.at(i * 3600).strftime('%Y-%m-%d %H:%M')
string = stringS[i % stringS.length]
res = my.query(
"INSERT INTO parents " +
"(uuid, time, string) VALUES " +
"('#{uuid}', '#{time}', '#{string}')" )
res = my.query(
"INSERT INTO children " +
"(uuid, time, string, parent_id, parent_uuid) VALUES " +
"('#{uuid}', '#{time}', '#{string}', #{i}, '#{uuid}')" )
end
puts ''
puts 'DONE'
end_time = Time.now.tv_sec
elapsed = end_time - start_time
puts "Elapsed time in minutes: #{elapsed.to_f/60}"
The table engine you use and the character encoding will make a difference. Why? Because the engine will affect how keys are stored and used, and the character encoding will affect the speed of string comparisons.
I use Rails these days, and when dealing with Rails, InnoDB (which is transactional) and UTF8 are common, so let’s look at them:
Blue designates queries without a join. They are:
| Select Group by String | SELECT * FROM children GROUP BY string |
| Select Group By Time Func | SELECT * FROM children GROUP BY YEAR(time), DAYOFYEAR(time) |
| Select | SELECT * FROM children |
Red designates queries with a join, which are:
| Join Integer Group by String | SELECT * FROM children INNER JOIN (parents) ON (children.parent_id = parents.id) GROUP BY parents.string |
| Join Integer Group by Time Func | SELECT * FROM children INNER JOIN (parents) ON (children.parent_id = parents.id) GROUP BY YEAR(children.time), DAYOFYEAR(children.time) |
| Join String Group by String | SELECT * FROM children INNER JOIN (parents) ON (children.parent_uuid = parents.uuid) GROUP BY parents.string |
| Join String Group by Time Func | SELECT * FROM children INNER JOIN (parents) ON (children.parent_uuid = parents.uuid) GROUP BY YEAR(children.time), DAYOFYEAR(children.time) |
| Join Integer | SELECT * FROM children INNER JOIN (parents) ON (children.parent_id = parents.id) |
| Join String | SELECT * FROM children INNER JOIN (parents) ON (children.parent_uuid = parents.uuid) |
The 10K queries were run 10 times, and the 100K queries were run 6 times. I did a warmup round before each measured round (so for a 10K query, that means 10 times to warmup, 10 times to measure). The times you see in the charts are the average times. Be sure to turn off query caching.
The results speak for themselves, so I’m offering them up with no analysis for now.
The Past Few Months
It's been a while since I've updated my blog, so I thought I'd provide a rundown of what I've been doing.
Startup
We really found our focus over the past couple months. We are committed to making tools for mobile developers. Our name is Localytics, and our first tool is an analytics service for mobile applications.
We launched an early version of our analytics engine last week at a DemocampBoston1. Democamp involves a quick succession of presentations by local startups. Check out my recap.
We captured a video of our presentation. The presentation was 5 minutes with a 5 minute Q&A at the end. Check it out:
Registration is closed right now, but we will be opening it up in a couple weeks. During that time, we will add more features, documentation, and hopefully some guides on how to get the most out of the service.
For now, you can check out our demo, which might not make a whole lot of sense, but at least it's there and it works.
Gaming
I managed to get through a couple of the 2008 holiday season games. I have mixed feelings about most of them. The only ones I'd flat out recommend are Gears of War 2 and Left 4 Dead. Perhaps I'll go into more detail in future posts.
I finally discovered the wonder that is Xbox Live Arcade. There are some awesome games on there, many of them with a retro feel. Braid, Castle Crashers, and Geometry Wars are all great. Each is an update of classic game styles with gorgeous HD visuals.
Braid is a side-scroller with time manipulation, Castle Crashers invokes memories of Teenage Muntant Ninja Turtles beat-em-up games, and Geometry Wars is Asteroids on crack. I especially like how Geometry Wars pulls in high scores from your friends, encouraging you to beat them.
What an insane fall lineup
The games coming out this fall season are an assault on the senses. This is a way better lineup than last year. Games I care about include:
- Dead Space - released Oct 13 to great reviews
- Fable 2 - Oct 21
- Far Cry 2 - Oct 22
- Command and Conquer: Red Alert 3 - Oct 28
- Fallout 3 - Oct 28
- Gears of War 2 - Nov 7
- Call of Duty: World at War - Nov 11
- Mirror's Edge - Nov 11
- World of Warcraft: Wrath of the Lich King - Nov 13
- Left 4 Dead - Nov 17
- Prince of Persia - Dec 2
I've bolded the ones I'm definitely going to buy. Almost all of these are Xbox 360 titles. There are a couple good PS3 games (such as Little Big Planet), but I don't have a PS3.
I'm cautious about Fallout 3. It has probably the biggest advertising campaign out of all of them, but I have a feeling it won't meet expectations.
Lots of sequels in there. In fact, the only ones that aren't a sequel or in some way part of an existing series are Dead Space, Mirror's Edge, and Left 4 Dead.
Yahoo Fire Eagle Now Public
Yahoo Fire Eagle has come out of private beta, and it's now open to everyone. Fire Eagle provides a unified way for applications and web sites to share location data tied to specific users.
For your average web surfer or mobile device user, this means a single, unified way for sharing your location across all of the sites and applications you use (that are on Fire Eagle). When you update your location in one application, other applications get updated as well. Privacy controls let you decide exactly how that sharing should work.
So imagine you are using an application on your phone that works with Fire Eagle. Let's say that app uses your GPS location to find restaurants near you. When it does so, it notifies Fire Eagle where you are. Now suppose a completely different web site exists that lets people tell their friends where they are. If it works with Fire Eagle, then it'll see the update from the restaurant finder on your phone.
As a developer, this opens up a whole new range of possibilities, and it also increases the amount of data you could observe.
This is relevant to one of my previous posts on location tracking, and it's an area I'm really interested in. Will developers embrace it? Will users fear the privacy implications? We'll see.
Languages and Their Library Cultures
I saw an interesting blog post today titled Why I stick with Perl. In short, the author's reason for sticking with Perl was that he believes it to have the best "library culture." A strong library culture is more important to him than the language itself.
What does "library culture" even mean? I would define it as the practices, beliefs, and institutions regarding the sharing of modular code. In other words, it refers to the systems for sharing code and the practices of developers concerning those systems.
Perl certainly has a very strong library culture. CPAN is one of the most extensive library repositories out there. In my experiences with the Perl community, the use of CPAN has always been encouraged, and I've generally been pleased with code quality and documentation.
Besides Perl, this made me think about other library cultures. I never realized how drastically different some of them are. Here are my thoughts on a few of the more interesting cases.
Ruby on Rails
Ruby on Rails is very much so a cut and paste world. Despite the existence of RubyForge, I still find myself grabbing code from blogs, wikis, or some random guy's self-hosted repository way too often. It's always a little unsettling.
For example, a typical plugin hunt involves going to the Rails wiki, grabbing the plugin from a random guy's subversion repo, making the changes suggested on the Rails wiki, then going to a forum or blog and copy-pasting the changes suggested there. After all this work, you later find out that some other guy did all this and checked it into GitHub in a new repo. Now whose changes do I follow, the original random guy's repo or the new repo from a new random guy? And who are these people anyway? It all feels very questionable.
PHP
PHP provides so much in the core extensions that I rarely need to look outside of what's already there. Whenever I do need something, PEAR or a professional framework like Zend Framework often has it. I can't think of a single time I ever needed code from a random repo.
C# and .NET
.NET is a monstrous library which has everything under the sun. Microsoft also provides great documentation. Due to these two factors, C# doesn't even really need a strong library culture, it already has a behemoth with a ton of money taking care of it.
Microsoft can't cover everything, though, so they'll always be the need for other libraries. Unfortunately, the third-party library culture isn't as open as it is in languages like Perl. The Microsoft world encourages licensing libraries instead of collaborative repositories.
Cuil is Uncool
For anyone who missed the flurry of news over the startup Cuil, I think it's an interesting look at what's wrong (at times) with Silicon Valley. Despite being full of ex-Google employees and $30+ million in funding, all they have is a crappy search engine that sucks. It might be because they were too busy eating muffins and strawberries.
It'll be interesting to see where Cuil goes. The search relevance is so bad that some people speculate it isn't about search, it's about flipping their indexing algorithms. Considering the high amount of funding, I have a hard time believing anyone will pay for this.
Quite possibly one of the funniest things to come out of this was the jab from Yahoo. Using their BOSS API, Yahoo created a search mashup called Yuil. Yahoo has since taken down Yuil, but it was a funny shot at Cuil while it lasted.
Despite how much the product sucks, I do have to hand it to Cuil or whoever did their marketing/PR/advertising. They were all over the blogosphere and major new sites. I haven't seen a launch blitz like this in a long time. Hell, they got me to say something.