Archive for May, 2007

Amazon Review

Wednesday, May 30th, 2007

Peter Norvig is the Director of Research at Google and he wrote a really funny Amazon review of Chris Manning and Heinrich Schutze’s seminal Natural Language Processing textbook:

There are lots of books (and even more junk email) with titles like “Get Rich Quick”. On the surface, this book is the exact opposite: a scholarly, scientific text aimed at comprehensive, accurate description, not at commercial hype. But if someone told me I had to make a million bucks in one year, and I could only refer to one book to do it, I’d grab a copy of this book and start a web text-processing company. Your return on investment might not be $1M, but this book delivers everything it promises. For all the major practical applications of statistical text processing, this book accurately and clearly surveys the major techniques. It often has pretty good advice about which techniques to prefer, but sometimes reads more like a catalog of listings (this reflects not on the authors’ failing, but rather on the field’s immaturity).

It’s worth comparing this book to the other recent NLP text: Jurafsky and Martin’s. (Disclaimer: I worked with them on the preparation of their text.) Jurafsky and Martin cover much more ground, including many aspects that are ignored by Manning and Schutze. So if you want a general overview of natural language, if you want to know about the syntax of English, or the intricacies of dialog, then Jurafsky and Martin is for you. But if your needs are more focused on the algorithms for lower-level text processing with statistical techniques, then Manning and Schutze is far more comprehensive. If you’re a serious student or professional in NLP, you just have to have both.

Math as Natural Language

Tuesday, May 29th, 2007

Hal argues that Math is a natural language.

It seems to me that Math is different that natural language in that it refers (maybe ambiguously) to something unambiguous. But maybe people would argue that natural language does the same thing? Anyway, a really interesting post, and I agree there is a ton of information written in “Math” and it would be interesting to see more study of math-as-a-language.

Go Artificial Intelligence

Sunday, May 27th, 2007

My friend Arvel told me about an article in Scientific American claiming that new technology was making Go AI better. I was surprised that I hadn’t heard about this, so I did a little poking and came across this recent reuters article and this zdnet article.

The Reuters article contain the quotation:

“On a nine by nine board we are not far from reaching the level of a professional Go player,” said Levente Kocsis at the Hungarian Academy of Sciences’ computing lab SZTAKI.

If this quote is real, Levente has absolutely no idea what he’s talking about.

The article refers to an approach where Go is formulated as an MDP and a Monte Carlo simulation is used to evaluate positions. Basically, to determine the value of a board position they repeatedly play a fast randomized algorithm to the end of the game and aggregate the scores of the different runs. It’s an old method and it works badly on 9×9 boards and not at all on full size 19×19 boards.

The new program Mogo that they seem to be talking about is reasonably strong but not necessarily even state of the art. Based on recent tournament performance it seems roughly similar in strength to Gnugo.

Science reporting is often bad, but the Reuters article is just ridiculous.

I’ve never understood why, but even informed people grossly overestimate the quality of Go programs. I’ve heard estimates that GnuGo is around 5kyu or better, but I can easily beat it by 50 points giving it a nine stone handicap. And I’m nowhere near a professional. I think people are confused because programs typically have large dictionaries of moves and so they look fairly strong in standard situations, but as soon as they are put in an unusual tactical positions they completely break down.

Gilgamesh

Tuesday, May 22nd, 2007

Several people have asked me why I take the time to write this blog, and why I’m not more scared of having my random thoughts recorded forever. I guess one reason is how cool it is to make random connections with people that you would otherwise never interact with outside of work. For example, John Battelle writes an outstanding search blog that I’ve read for years to find out about what’s going on in the industry.

One day he starts rambling about how much he loved the story Gilgamesh. Probably to 99 percent of his readers he sounds off his rocker. Well I don’t think I’ve ever met anyone that likes Gilgamesh as mush as me. But maybe it just never came up in conversation. If I ever meet John, we’ll have something to talk about.

When Paul Moved to England…

Monday, May 21st, 2007

Paul commented about the Will Rodgers phenomenon, which I had never heard of. Good old Wikipedia tells us that it’s when you move an element from one set to another and in doing so increase the average of both sets. The name comes from this excellent quote:

When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states.

I’m gonna try to figure out a way to casually work that quote into a conversation soon.

Backgammon Strategy

Friday, May 18th, 2007

Recently I’ve been playing a lot of backgammon with my coworker Manny. I love playing games with Manny because he takes the strategy as seriously as I do.

I used to play backgammon as a kid but I never played with a doubling cube. The way it works is you bet for say a dollar. At any point either player can decide pass the cube and double the value of the game. The other player can accept the cube or forfeit at the original bet. After that at any point during the game they can pass the doubling cube back to their opponent, redoubling the value of the game. This can go on indefinitely.

Assuming you aren’t risk-averse and you and your opponent have the same skill and are rational, at what probability of winning should you try to pass the cube (double the bet)? If you like this kind of problem, think about it for a minute. It’s not very hard, but it’s not obvious either.

One strategy would be to pass the cube with any chance of winning greater than 50%. But in passing the cube you give up the right to double to your opponent, so you lose something.

To see this, imagine if you have a 95% chance of winning and you hold the doubling cube. You can pass the cube and your opponent will resign, so the game is worth 1$. If you don’t hold the doubling cube, there is a 5% chance that you will lose, so the game is only worth 0.90$.

When should you accept a double? The naive calculation is to solve P(win)*2*bet - P(loss)*2*bet = -bet.

This would suggest that you should offer a double if you have a three in four chance of winning and accept a double if you have at least a one in four chance of winning. But it ignores the fact that if you accept the double you have the advantage of holding the doubling cube. This article shows that the best strategy under above assumptions plus a few more is to accept a double if you have at least a one in five chance of winning.

So when should you offer a double? I’m still not completely sure. The article suggests at the same point, but I’m not completely convinced of that, even under its assumptions.

True backgammon has a few extra rules that change things, as well as a cutoff where a player wins a match above a certain number of wins so in practice people have different strategies. Accepted strategy seems to be to offer around at around 60% chance of winning and accept at around 20% chance of winning. There’s a bunch of interesting backgammon strategy articles on the web.

Facebook

Tuesday, May 15th, 2007

I finally made a Facebook account! Be my friend so I don’t look unpopular…

It’s strange writing a profile that everyone from high school friends to future employers will read. Kind of like this blog I guess. I wonder if the internet will make us all more honest about ourselves.

Real Climate

Thursday, May 10th, 2007

I love the Real Climate blog. It gives me a window into the world of climate science that I would never get from mainstream science news (which generally omits or screws up the math enough to make it boring) or from academic publications (which I would never read). Climate science involves a pretty interesting set of data mining and philosophical problems.

This recent post where they show that Republican control of the senate is due to the solar magnetic cycle is really outstanding.

I tried to get my dad (an energy policy consultant) to start a Real Economists blog that was more about climate policy and he said:

With climate science it is the “real scientists” who make sense and speak to the real issues. With regard to environmental policy the “real economists” are mostly nuts (albeit PhD nuts who publish papers in peer reviewed journals and hold important jobs in government and consulting firms…).

(Dad — I hope you don’t mind me quoting you here…)

Chmess

Wednesday, May 9th, 2007

I liked my friend Kenny’s post on Mathematics, Philosophy and Chmess

Chmess, for those of you who haven’t heard, is just like chess - except the king can move two squares in either direction. As Daniel Dennett has pointed out, Chmess provides a rich source of a priori truths to explore. However, the a priori truths of chmess are not particularly worthy of exploration. Dennett’s challenge of Chmess is to explain the difference between describing the a priori truths of Chmess and practicing philosophy.

If that were really the case about mathematics, then the problem of Chmess would be extreme. Why shouldn’t a top university hire someone who was an incredible genius and had made all sorts of deep and subtle discoveries about Chmess?

I guess the obvious question is what makes math or philosophy different from Chmess. Kenny makes an interesting argument about how they are different.

I’m really not sure what I think about this, but I am pretty sure that if everyone started playing Chmess, a large interesting body of knowledge would be built, and then it would become a reasonably interesting thing to study, and one day top universities might hire someone who had made all sorts of deep and subtle discoveries.

Lots of people study computer graphics which as far as I can tell is just a way to make better movies and video games and other people work on search engines which is just a way to help people find those movies and video games. Is that so different than being a Chmess master?

Simpson’s Paradox

Wednesday, May 9th, 2007

Simpson’s paradox is easier for me to understand than Stein’s Paradox (in my last post), but it is also a surprising result.

I think it’s best phrased like a puzzle:

Player A has a higher batting average against left handed pitchers than Player B and a higher batting average against right handed pitchers than Player B. Is it possible that Player B has a higher batting average than Player A?

The answer (of course) is yes, but it’s a little counter-intuitive. According to the Wikipedia and my coworker Brendan, this actually came up in a gender discrimination study at Berkeley, where women had lower acceptance rates than men in every grad school, but higher acceptance rates overall, or maybe vice-versa…