Abstract
A vast amount of unstructured text that contains valuable information is available over the web. This text is changing and proliferating, making it hard for people to process, read, and remember. Data mining and information extraction algorithms are used to develop new automation techniques to process the unstructured text. Among this publicly available text, there are a considerable amount of online medical articles, which provides valuable information about diseases, symptoms, operations, treatments, drugs, etc. Automatic unstructured text classification offers practical information management that does not depend on the subjective criteria of classification. It also provides useful information by obtaining and correlating relevant data present in documents. It also classifies, identifies and presents all sources of knowledge and reduces the time for retrieving information by simplifying access to content. Therefore, medical information needs to be classified into their respected categories (such as Diabetes, Cancer, Depression, Pediatrics, etc.). In this paper, we propose to use a deep learning approach for unstructured medical text classification at the document level. In our classification model we used two types of features: (i) content-based features (stylistic and complexity), and (ii) health domain-specific features. Moreover, rather than dealing with binary classification, this work handles multiclasses medical articles classification. This classification is done based on linguistic features that are extracted from the text, it also incorporates medical domain-specific terms/keywords as part of the classification feature set. These domain-specific features are extracted by applying topic modeling technique to spot the most probable terms for each medical class. Our experiments shows a reasonable classification accuracy for such a large number of classes.
| Original language | English |
|---|---|
| Title of host publication | 16th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA 2019 |
| Volume | 2019- |
| DOIs | |
| State | Published - 2019 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Fingerprint
Dive into the research topics of 'Unstructured medical text classification using linguistic analysis: A supervised deep learning approach'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver