More Than Semi-supervised Learning
Semi-supervised learning (SSL) has grown into an important research area in machine learning, motivated by the fact that human labeling is expensive while unlabeled data are relatively easy to obtain. A basic assumption in traditional SSL is that unlabeled data and labeled data share the same distribution. However, this assumption may be incorrect when unlabeled data have a shifted covariance, or come from a related but different domain, or contain irrelevant data. With the divergence of the distribution of unlabeled data, very little academic literature exists on how to choose or adapt machine learning algorithms to different settings of unlabeled data. This book, therefore, introduces a new unified view on learning with different settings of unlabeled data. This book consists of two parts: the first part analyzes the fundamental assumptions of SSL and proposes a few efficient SSL algorithms; the second part discusses three learning frameworks to deal with other settings of unlabeled data. This book should be helpful to researchers or graduate students in areas with abundance of unlabeled data, such as computer vision, bioinformatics, web mining, and natural language processing.