Archive for April, 2007

Wikipedia Visualization

Monday, April 30th, 2007

My coworker Chris showed me a page that has beautiful graphs visualizing wikipedia links. I love the way he’s laid out the connections. Large scale graphs are always cool to look at but this seems like it might actually be useful in showing some patterns.

Learning to Play Go

Thursday, April 26th, 2007

Many people have been asking me recently about the best way to learn to play Go. Since I now have a blog I can put it here and just forward the link :).

I would start by downloading GnuGo or Many Faces of Go (Windows and 9×9 only, but a very nice program), and playing with it for a while.

The next thing I would do is buy all the books in the Learn to Play Go by Janice Kim. There are many good introductory Go books out there, but these are the most fun to read and very clearly written. Also they Go much further than most of the books and will teach you lots of useful beginner strategy.

At this point you should be able to beat GnuGo, and you will have to find some humans to play with :).

As the chess master (and European 1-dan) at work told me the other day,

When I play chess I’m sure I’m wasting my time. When I play go I’m not completely sure.

Why Buildings Fall Down

Tuesday, April 24th, 2007

I just finished reading Why Buildings Fall Down: How Structures Fail. It was one of those books that you can read in two or three pages increments every night before passing out, since it’s all short case studies in building collapses. It’s also one of those books that my friends would probably make fun of me for reading, so it’s nice to read in bed where they can’t see me.

But I found the whole thing absolutely fascinating. I’m in complete awe of engineers that build giant structures where failures have the potential to kill thousands of people.

One chapter ends with the following quote that I can’t get out of my mind:

Is there a moral to this story? There is indeed. It is well known to all creative designers and was clearly articulated by LeMessurier: “Any time you depart from established practice, make ten times the effort, ten times the investigations.”

Actually pretty much every chapter ends with a quote to that effect. I have to say, I would be a terrible civil engineer.

Netflix

Sunday, April 22nd, 2007

I’ve been watching the Netflix contest with interest since it started. For my non-nerd friends, Netflix is offering one million dollars to someone who comes up with an algorithm to improve their recommendation engine by ten percent.

A user made a graph of progress over time:

After the initial quick gains, progress still seems to be steadier than I might have thought. Looking at the current leaderboard it seems the leader has actually made it up to 7%, so the improvements still haven’t stopped, although it does seem to slowing somewhat.

There a short paper draft from Netflix discussing more about the current progress towards the prize. I think there’s a lot to be learned from this contest.

R&D by M&A

Friday, April 20th, 2007

I was interviewing a job candidate and he used the phrase “R&D by M&A”, meaning the trend where companies buy startups instead of investing in research labs. It’s a really catchy term, and there is at least a kernel of truth to it.

(I think it might have come from this Wired article.)

I’ve been asking friends both at startups and research labs what they think about it (unsurprisingly the startup friends like it, the research friends don’t :) ). I’m not sure companies really see a tradeoff between research and acquisitions. From first hand experience I think I can say that Flickr, Konfabulator, upcoming.org, etc. would never have come out of Yahoo Research Labs. Not that they weren’t doing interesting research there, but I don’t think any of these successful Web 2.0 companies are doing something that wasn’t technically possible before. Clearly Yahoo, et al. are investing in R&D and M&A.

Fiction

Tuesday, April 17th, 2007

I wish I read more fiction. But it feels like such an anti-social thing. It’s easy to talk about fun facts from Freakonomics, Omnivore’s Dilemma, Happiness, etc. (btw: all great books in my opinion) but harder to talk about stories. Recently I was eating breakfast with three friends and we couldn’t find a single novel we had all read.

So here’s my offer. I’ll write down what I’ve read recently that I liked, and if you want to read it, I’ll give you the book:

Norwegian Wood by Haruki Murakami. I can never decide if I like Murakami, but I loved this book. I felt a little lame, since it’s well known as his most “mainstream” novel, but it really moved me.

Deadwood by Pete Dexter. A lot like Blood Meridian in that it’s a extremely well researched historical Western with really interesting characters. It’s also really dirty. It was so entertaining I could have sat down and read it in one sitting.

You Don’t Love Me Yet by Jonathan Lethem. I love Jonathan Lethem, but as the New York Times Book Review said this might be for the “Lethem completist”. I’m not totally sure I would recommend this book, but I definitely enjoyed reading it. I think you have to appreciate books where the focus is really on the characters and not the plot.

Any recommendations? I’m looking for more good fiction to read, although the top of the queue right now has to be Killing John Fry by Walter Mosley.

A Completely New Puzzle

Monday, April 16th, 2007

I loved the comments on the king/wife puzzle.

Kenny’s comment, while not as funny as Tess’s or Will’s made me think of another (completely new!) puzzle.

Kenny said:

The biggest problem I see with the applicability of this one is the question of why we want to maximize the chance of finding the best one. Shouldn’t we maximize the expected quality of the one we eventually choose?

Which made me think, what if the king (or inteviewer :) ) wanted to minimize the expected rank of the chosen wife applicant (or job applicant)? It looks like a trickier puzzle… The answer isn’t obvious to me, but it seems tractable. I’ll think about it and post my answer soon.

Harmonic Series

Saturday, April 14th, 2007

My coworker Paul Pedersen showed me another interesting observation/puzzle the other day:

I remember learning in high school that the harmonic series doesn’t converge:

1, 1+(1/2), 1+(1/2)+(1/3), 1+(1/2)+(1/3)+(1/4) …

There is a nice intuitive proof on Mathworld that dates back to the 14th century

So while 1 + (1/2) + (1/4) … never passes 2, the harmonic series reaches every number, but it grows very, very slowly, as you can see from this graph:

Paul had an interesting and surprising observation:

While the series passes an infinite number of integers at an increasingly slow rate, it never actually has an integer value. (Except for n=1)

For example it starts (3/2),(5/6),(13/12) just missing 2.

I’m still thinking about how to prove this fact. If you have a sum, (1/2) + (1/3) + … (1/n), you can reduce it to [(3*4*...*n)+(2*4*...*n)+...+(2*3*...*n-1)]/(2*3*…*n).

So in order for this to be an integer it has to be the case that for some integer N, [(3*4*...*n)+(2*4*...*n)+...+(2*3*...*n-1)] = (2*3*…*n)*N

We can rewrite this as [n*((n-1)*((n-2)*... (2+3) + 2*3) + 2*3*4) ... ) + 2*3*...*(n-1)] = (2*3*…*n)*N

We can call this n*A + 2*3*…*(n-1) = (2*3*…*n)*N.

If n is prime, we have the sum of a multiple of n and a non-multiple of n, and I think it’s clear that there is no N that will make this equation true.

In fact you have no chance of satisfying the equation until the 2n, since there will always be exactly one term missing the n factor and in that term no factor will be divisible by n.

Is it true that there will always be a prime between n and 2n? It seems like it should be true, since the density of prime numbers is on the order of (1/log n). I guess if someone has proved this then the proof is done, but based on the way Paul presented the problem to me, I think there might be a better way to show this…

Greatest Blog Ever

Friday, April 13th, 2007

Check this out: a blog about making fractals in the R (a statistical functional programming language. I feel like the internet keeps teaching me over and over that I’m not unique. Although my R fractals wouldn’t have so many for loops… Maybe I’ll see if he will accept a guest post :).

TextRunner

Thursday, April 12th, 2007

Some of my coworkers have been playing with a project called TextRunner that extracts all kinds of relations from a large web corpus. You can type in any of Arg1, relation and arg2 and it will find matching tuples. For example I tried “Sharks” and “eat” and the top three results were “plankton” and “a fish” and “about 1 % to 10 % of its total body weight”. Most of the results were plausible, and it certainly works better on this particular query than Google. I tried just the relation: “was discovered by” and I find some interesting results along with some junk. For example it got Indium was discovered by Ferdinand Reich, but it also found things like Aspertane was discovered by accident.

There are several cool things about there approach, one of which is to use a parser on a small, well-formed sample of documents to generate features for a classifier that’s output is used as training data for classifier using more basic features. Using the more basic features makes it much faster, which is nice because the corpus is large and parsing is slow, and it also is more reliable in domains such as the big, ugly web where a statistical parser trained on the Penn Treebank doesn’t generalize well.