Abstract
As an effective dimensionality reduction technique, feature selection is widely used in the preprocessing procedure in data mining. It is highly advocated by its superiority in mitigating the effect of noisy data and simplifying the analysis of high-dimensional data. In this paper, a novel unsupervised feature selection procedure based on a clustering algorithm is proposed to evaluate the goodness of features and select a set of useful features without losing the characteristics of the data. It consists of two steps: clustering and feature evaluation. In the clustering procedure, a novel clustering algorithm based on fitness proportionate sharing (FPS-clustering) is adopted to separate data into distinct clusters without any prior knowledge about data, which is more applicable to the analysis of unknown datasets. On the other hand, the feature evaluation procedure will use the information extracted from the clustering procedure to evaluate the usefulness of each feature and select good features. The proposed method is simulated with four other famous existing feature selection algorithms and a comparison is provided in this paper. Simulation results on both synthetic and real data sets demonstrate that the proposed procedure of feature selection can effectively evaluate the significance of features and obtain a better subset of features than other four existing algorithms.
| Original language | English |
|---|---|
| Title of host publication | Unknown book |
| Pages | 1355-1360 |
| State | Published - 2018 |