Abstract
Active learning (AL) has widely been used to address the shortage of labeled datasets. Yet, most AL techniques require an initial set of labeled data as the knowledge base to perform active querying. The informativeness of the initial labeled set significantly affects the subsequent active query; hence the performance of active learning. In this paper, a new clustering-based active learning framework, namely Active Learning using a Clustering-based Sampling (ALCS), is proposed to simultaneously consider the representativeness and informativeness of samples using no prior label information. A density-based clustering approach is employed to explore the cluster structure from the data without requiring exhaustive parameter tuning. A simple yet effective distance-based querying strategy is adopted to adjust the sampling weight between the center-based and boundary-based selections for active learning. A novel bi-cluster boundary-based sample query procedure is introduced to select the most uncertain samples across the boundary among adjacent clusters. Additionally, we developed an effective diversity exploration strategy to address the redundancy among queried samples. Our extensive experimentation provided a comparison of the ALCS approach with state-of-the-art methods, exhibiting that ALCS produces statistically better or comparable performance than state-of-the-art methods.
| Original language | English |
|---|---|
| Pages (from-to) | 13250-13267 |
| Number of pages | 18 |
| Journal | Applied Intelligence |
| Volume | 52 |
| Issue number | 11 |
| DOIs | |
| State | Published - Sep 1 2022 |
Keywords
- Active learning
- Boundary-based selection
- Center-based selection
- Clustering
- Informative-based query
- Representative-based query
Fingerprint
Dive into the research topics of 'A clustering-based active learning method to query informative and representative samples'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver