Skip to main navigation Skip to search Skip to main content

Unstructured medical text classification using linguistic analysis: A supervised deep learning approach

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A vast amount of unstructured text that contains valuable information is available over the web. This text is changing and proliferating, making it hard for people to process, read, and remember. Data mining and information extraction algorithms are used to develop new automation techniques to process the unstructured text. Among this publicly available text, there are a considerable amount of online medical articles, which provides valuable information about diseases, symptoms, operations, treatments, drugs, etc. Automatic unstructured text classification offers practical information management that does not depend on the subjective criteria of classification. It also provides useful information by obtaining and correlating relevant data present in documents. It also classifies, identifies and presents all sources of knowledge and reduces the time for retrieving information by simplifying access to content. Therefore, medical information needs to be classified into their respected categories (such as Diabetes, Cancer, Depression, Pediatrics, etc.). In this paper, we propose to use a deep learning approach for unstructured medical text classification at the document level. In our classification model we used two types of features: (i) content-based features (stylistic and complexity), and (ii) health domain-specific features. Moreover, rather than dealing with binary classification, this work handles multiclasses medical articles classification. This classification is done based on linguistic features that are extracted from the text, it also incorporates medical domain-specific terms/keywords as part of the classification feature set. These domain-specific features are extracted by applying topic modeling technique to spot the most probable terms for each medical class. Our experiments shows a reasonable classification accuracy for such a large number of classes.
Original languageEnglish
Title of host publication16th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA 2019
Volume2019-
DOIs
StatePublished - 2019

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Fingerprint

Dive into the research topics of 'Unstructured medical text classification using linguistic analysis: A supervised deep learning approach'. Together they form a unique fingerprint.

Cite this