From social media to public health surveillance: Word embedding based clustering method for twitter classification

Xiangfeng Dai, Marwan Bikdash, Bradley Meyer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Social media provide a low-cost alternative source for public health surveillance and health-related classification plays an important role to identify useful information. In this paper, we summarized the recent classification methods using social media in public health. These methods rely on bag-of-words (BOW) model and have difficulty grasping the semantic meaning of texts. Unlike these methods, we present a word embedding based clustering method. Word embedding is one of the strongest trends in Natural Language Processing (NLP) at this moment. It learns the optimal vectors from surrounding words and the vectors can represent the semantic information of words. A tweet can be represented as a few vectors and divided into clusters of similar words. According to similarity measures of all the clusters, the tweet can then be classified as related or unrelated to a topic (e.g., influenza). Our simulations show a good performance and the best accuracy achieved was 87.1%. Moreover, the proposed method is unsupervised. It does not require labor to label training data and can be readily extended to other classification problems or other diseases.

Original languageEnglish
Title of host publicationIEEE SoutheastCon 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538615393
DOIs
StatePublished - May 10 2017
EventIEEE SoutheastCon 2017 - Charlotte, United States
Duration: Mar 30 2017Apr 2 2017

Publication series

NameConference Proceedings - IEEE SOUTHEASTCON
Volume0
ISSN (Print)1091-0050
ISSN (Electronic)1558-058X

Conference

ConferenceIEEE SoutheastCon 2017
Country/TerritoryUnited States
CityCharlotte
Period03/30/1704/2/17

Keywords

  • Big data
  • Clustering Process
  • Machine learning
  • Natural Language Processing
  • Public Health
  • Similarity Measure
  • Social Network
  • Surveillance
  • Twitter
  • Unsupervised Classification
  • Word Embeddings
  • Word2Vec

Fingerprint

Dive into the research topics of 'From social media to public health surveillance: Word embedding based clustering method for twitter classification'. Together they form a unique fingerprint.

Cite this