Multiple Models Book: Active Learning with Mixture Models

Most machine learning algorithms assume that the learner is passive, having no control over the training data it sees. In many practical situations, however, learners have the ability to be active, by selecting or exerting some control over the training data. An intelligent active learning strategy can sharply reduce the number of queries, actions, or experiments needed for the learner to achieve good performance.

Recent years have seen a great number of heuristic approaches to active learning, each with the goal of providing an efficient means for gathering training data. In this chapter, we show that for statistically based machine learning algorithms, one can compute the ``optimal'' way to select training data. We describe one such model, the mixture of Gaussians, in which the data is modelled locally by Gaussians distributions. We then show how to efficiently select queries or actions which minimise the expected variance of the model's estimate, resulting in a learner which significantly outperforms learners which gather data randomly or heuristically.


R Murray-Smith
Last modified: Tue Mar 25 16:25:24 GMT