Boosting
I don’t get much time to read papers these days, but this JMLR article called Evidence Contrary to the Statistical View of Boosting was fascinating (found on Inductio Ex Machina.)
There’s a format where the authors write their thesis and then a few people respond and the authors write a short counter response. There is a gold mine of practical tricks on how to get good performance with boosted decision trees.
One of the main questions is in boosting the article discusses is should weak learners be restricted to the minimum possible power necessary to fit the data. For example, should an additive model be restricted to stumps as weak learers. The textbook answer is yes, but in practice having tree sizes larger than the number of interactions can improve performance. This discussion came up more than once in previous jobs.
Friedman makes a great point that boosted decision trees are optimizing some loss function on the probability of output, but the authors claims are all based around classification accuracy.