TY - JOUR
T1 - A clustering-based active learning method to query informative and representative samples
AU - Yan, Xuyang
AU - Nazmi, Shabnam
AU - Gebru, Biniam
AU - Anwar, Mohd
AU - Homaifar, Abdollah
AU - Sarkar, Mrinmoy
AU - Gupta, Kishor Datta
PY - 2022/9/1
Y1 - 2022/9/1
N2 - Active learning (AL) has widely been used to address the shortage of labeled datasets. Yet, most AL techniques require an initial set of labeled data as the knowledge base to perform active querying. The informativeness of the initial labeled set significantly affects the subsequent active query; hence the performance of active learning. In this paper, a new clustering-based active learning framework, namely Active Learning using a Clustering-based Sampling (ALCS), is proposed to simultaneously consider the representativeness and informativeness of samples using no prior label information. A density-based clustering approach is employed to explore the cluster structure from the data without requiring exhaustive parameter tuning. A simple yet effective distance-based querying strategy is adopted to adjust the sampling weight between the center-based and boundary-based selections for active learning. A novel bi-cluster boundary-based sample query procedure is introduced to select the most uncertain samples across the boundary among adjacent clusters. Additionally, we developed an effective diversity exploration strategy to address the redundancy among queried samples. Our extensive experimentation provided a comparison of the ALCS approach with state-of-the-art methods, exhibiting that ALCS produces statistically better or comparable performance than state-of-the-art methods.
AB - Active learning (AL) has widely been used to address the shortage of labeled datasets. Yet, most AL techniques require an initial set of labeled data as the knowledge base to perform active querying. The informativeness of the initial labeled set significantly affects the subsequent active query; hence the performance of active learning. In this paper, a new clustering-based active learning framework, namely Active Learning using a Clustering-based Sampling (ALCS), is proposed to simultaneously consider the representativeness and informativeness of samples using no prior label information. A density-based clustering approach is employed to explore the cluster structure from the data without requiring exhaustive parameter tuning. A simple yet effective distance-based querying strategy is adopted to adjust the sampling weight between the center-based and boundary-based selections for active learning. A novel bi-cluster boundary-based sample query procedure is introduced to select the most uncertain samples across the boundary among adjacent clusters. Additionally, we developed an effective diversity exploration strategy to address the redundancy among queried samples. Our extensive experimentation provided a comparison of the ALCS approach with state-of-the-art methods, exhibiting that ALCS produces statistically better or comparable performance than state-of-the-art methods.
KW - Active learning
KW - Boundary-based selection
KW - Center-based selection
KW - Clustering
KW - Informative-based query
KW - Representative-based query
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85125249992&origin=inward
UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85125249992&origin=inward
U2 - 10.1007/s10489-021-03139-y
DO - 10.1007/s10489-021-03139-y
M3 - Article
SN - 0924-669X
VL - 52
SP - 13250
EP - 13267
JO - Applied Intelligence
JF - Applied Intelligence
IS - 11
ER -