Archive for February, 2008

reCAPTCHA

Wednesday, February 27th, 2008

Last week I saw Luis von Ahn give a talk about reCAPTCHA, his latest project to stop spammers and do OCR on scanned books at the same time. Normally, CAPTCHAs are used by sites like Ticketmaster and Yahoo and show you an image of a random collection of characters that are distorted in a way that humans can read them but computer programs can’t.

reCAPTCHAs take text from old books that have been scanned in, but the OCR program had a low confidence in its transcription of the word. It shows that word to a user and at the same time it also shows a different word that the OCR correctly transcribed.

If the user enters the known word correctly, they are assumed to be human. The users transcription of the unknown word is then used as a gold standard transcription. I believe Luis said that if they require two people to transcribe a given word in this way, the accuracy is above 99 percent.

Apparently users on the internet do something like 60,000,000 CAPTCHAs a day, and transcribing a word costs around 0.5 cents, so this project is making lots of transcriptions of books that wouldn’t otherwise be possible. It helps people with bad eyesight and makes a great training corpus for OCR research.

The beauty of this project is that it will always be one step ahead of OCR programs. CAPTCHAs have had to get tougher and tougher over the years as OCR systems get better. But as long as Luis is using an OCR program that is near state of the art, if his program can’t figure out the correct transcription it’s an impossible task for other OCR programs.

I’ve set it up so that if you try to enter a comment on this blog, you can see reCAPTCHA at work. :)

Anti-Portfolio

Tuesday, February 26th, 2008

I love that these VCs (Bessemer Venture Partners) have an Anti Portfolio of good investments that they turned down.

Contrast that with Battery Ventures saying that passing on Facebook “may turn out to have been a mistake”.

Political Circular Mill

Sunday, February 24th, 2008

Pop-science favorite, Wisdom of the Crowds, talks about how army ants using simple local heuristics will occasionally start following each other in a circle until all or most of them die of exhaustion. I was poking around on the internet about this phenomenon and found two articles:

Republican Ants March in “Circular Mill” of Death

Political Entomology, Part II: Liberal Ants and Their Circular Mill

These aren’t responses to each other, the authors seem to have come up with their observations completely independently.

Why is the NY Times IT department so good? (and the BBC so bad?)

Thursday, February 21st, 2008

How did the NY Times get such awesome engineers and designers? They use cool new technologies like Hadoop/EC2/S3 to deploy archive search and push out beautiful and informative interactive graphs every week.

In contrast, the BBC deploys the infamous Perl on Railsbecause their infrastructure sucks.

If you just look at the front pages, it’s clear that the NY Times is using the web medium much better than the BBC.

I wonder how this happens. Is it a difference in budgets? A few critical decisions that go right or wrong? The personality of the person in charge?

Hollywood and Silicon Valley

Wednesday, February 20th, 2008

I have a random distant connection to Marshall Herskovitz, the creator of Quarterlife (and a lot of well known mainstream TV shows and movies). Recently I passed his contact info along to my friends who founded Episodic, an online video advertising startup. It sounds like he’s a really nice guy and the meeting went well.

He wrote an article about Silicon Valley in Slate today, and I can’t help but wonder if meeting my friends had some influence on it.

He says,

Geeks, engineers, and boys. And because the DNA of the Internet is entirely male, it exudes the best and worst of what males have to offer. On the plus side—it’s brilliant, complex, competitive, audacious in how it’s changed our way of organizing experience. On the negative side—it’s linear, utilitarian, cold, emotionless, disconnected.

I think it would be slightly more specific and accurate to say “the DNA of the Internet is entirely nerd”.

If the Slate article is Hollywood’s take on Silicon Valley, Marc Andressen’s excellent post, Rebuilding Hollywood in Silicon Valley’s image, has to be Silicon Valley’s take on Hollywood.

IPaper

Tuesday, February 19th, 2008

My friends at Scribd just launched ipaper, a nice alternative to PDF. The demo looks very nice. More discussion on digg.

Yahoo using Hadoop

Tuesday, February 19th, 2008

Cool to see Yahoo has switched over its webmap search infrastructure to the open source Hadoop project.

From Jeremy Zawodny’s blog:

OLPC on Mechanical Turk

Tuesday, February 12th, 2008

I got one of the one laptop per child laptops a few months ago. It’s pretty cool.

I was really surprised to see a task on Amazon’s Mechanical Turk to review the laptop in the OLPC forums for five dollars. How many people with OLPCs are on Mechanical Turk? Me I guess… Made an easy five bucks.

olpc-turk

Microsoft-Yahoo question

Monday, February 11th, 2008

picture-8.png

Two facts I can’t reconcile:

(1) There’s a group of shareholders http://breakoutperformance.blogspot.com/2008/02/yahoo-shareholders-in-favor-of-selling.html trying to force Yahoo to sell to Microsoft at the original offer (31 dollars, half cash, half stock).

(2) Yahoo’s shares are trading at 29.55, which is more than Microsoft’s current offer, since MSFT’s stock has fallen since the original offer.

Fact (1) seems reasonable, I would be pretty angry about Yahoo playing chicken with my money.  They don’t have an obvious counter offer, and if MSFT walked away you would guess that the stock would fall back to 19 per share.

Fact (2) seems possible, maybe Yahoo’s game of chicken is working and we can expect MSFT to offer a higher bid, so the share price reflects that.

But how can both be true?  If you want Yahoo to sell to MSFT at the original price right now, don’t blog about it, sell it on the open market right now and you can get an extra 40 cents per share.  Seriously, can anyone explain this?

So many things about this takeover have ruined my faith in efficient markets.

Huckabee

Sunday, February 10th, 2008

Why is Huckabee staying in the race? He says,

“I didn’t major in math. I majored in miracles, and I still believe in them, too.” (NYT)

Also, he eat fried squirrel out of a popcorn popper.