I'm a PhD student working in machine learning, and I very highly recommend this library. I've used it for all sorts of problems within and outside my research, and it just works great. I've used it in C++, Python, as well as Matlab.
Their papers are excellent too if anyone is interested in reading about large-scale optimization problems for SVM.
Sofia-ml which is a very fast linear svm and classification C++ package. Supports PEGASOS as well as logistic regression and also learning rankings. Has no bindings for other languages which is a bit of a downside. Still, a useful command-line tool.
It also includes a package for very fast mini-batch K-Means (http://code.google.com/p/sofia-ml/wiki/SofiaKMeans). Combining these two approaches one can effectively learn a "kernelized" model while still being linear and therefore very fast (at least this is the claim, I haven't tried this).
I've used both the SVM and k-means package and they work very well. For sparse datasets with >500 dimensions and > 10 million rows, file IO time was <15 sec, training time <3 sec. K-means is slower but still orders of magnitude faster than standard batch k-means.
Finally, Vowpal Wabbit is a very fast package that also uses stochastic gradient descent as the workhorse. Also has a nice feature-hashing compression scheme which is being widely adopted (e.g. in Mahout, and also in sofia-ml above).
SVM's are awesome for pattern matching. I first encountered them on a project to identify pedestrians from IR images and was blown away with the simplicity of underlaying math.
For anyone curious, it basically boils down to roughly "put your data in a plane and plot a line through them that has the largest possible margin from each cluster".
Except that the straight line is in feature space and not input space; the computations are done using only a kernel function, which takes vectors in the input space, and computes the dot product in the feature space.
This is a very important distinction because while the method is linear in the feature space, it can solve non-linear problems in the input space.
it provides a nice wrapper around libsvm, liblinear, and a whole bunch of other classification libraries. plus it provides things like HDF5 support, octave, matlab, python and R bindings, more esoteric kernels (e.g., on strings) as well as one-class and multi-class SVMs.
I have also used libsvm a lot and can heartily recommend it - but only for non-linear kernels. If you wish to use a linear SVM (which if you aren't familiar with machine learning you should probably try first) then for your own sake try libocas:
It uses SVM light format and also has a mex wrapper (MATLAB). More importantly I found that for linear SVMs it was around 100-1000 times faster than libsvm (I shit ye not).
It might also be worthwhile to have a look at WEKA, it's a UI / java implementation for all kinds of Machine Learning algorithms. Makes it really easy to just test stuff, because most of the time there is not really a way to tell which machine learning algorithm will work best.
Does anyone know of a solid Ruby interface for this? When I tried using it for a recent project I had a lot of problems getting the Gems to work on OS X. Other than that I've head a lot of praise for it...
Their papers are excellent too if anyone is interested in reading about large-scale optimization problems for SVM.