Hello, Science! + sources

UCI Website Revamped

First of all, happy new 2008 to all readers!

UCI hosts a famous collection of datasets at the UCI Machine Learning Repository. Recently, they have completely updated their webpage and are starting to offer new datasets. This is a great service to the machine learning community, but I would like to see us take this one step further: we should match this repository of datasets with a repository of algorithms. This would not only allow us to compare algorithms, but also get a lot of intuition about the nature of the datasets: a well-understood algorithm that does great on - say - digit classification but performs really poorly on the Wisconsin breast cancer dataset teaches us something about the nature of the data. A recent JMLR paper* calling for more open source machine learning software, mentions a project at the university of Toronto called Delve that meant to do exactly this. Unfortunately, the project seems to be dead as of 2003 or so.

* The Need for Open Source Software in Machine Learning - Sören Sonnenburg, Mikio L. Braun, Cheng Soon Ong, Samy Bengio, Leon Bottou, Geoffrey Holmes, Yann LeCun, Klaus-Robert Müller, Fernando Pereira, Carl Edward Rasmussen, Gunnar Rätsch, Bernhard Schölkopf, Alexander Smola, Pascal Vincent, Jason Weston, Robert Williamson; Journal of Machine Learning Research; 8(Oct):2443--2466, 2007.

art, Idea, Info, Machine Learning, research, and more:

UCI Website Revamped + sources