Hybrid classification for tweets related to infection with influenza

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Traditional public health surveillance methods such as those employed by the CDC (United States Centers for Disease Control and Prevention) rely on regular clinical reports, which are almost always manual and labor intensive. Twitter, a popular micro-blogging service, provides the possibility of automated public health surveillance. Tweets, however, are less than 140 characters, and do not provide sufficient word occurrences for conventional classification methods to work reliably. Moreover, natural language is complex. This makes health-related classification more challenging. In this study, we use flu-related classification as a demonstration to propose a hybrid classification method, which combines two classification approaches: manually- defined features and auto-generated features by machine learning approaches. Preprocessing based on Natural Language Processing (NLP) is used to help extract useful information, and to eliminate noise features. Our simulations show an improved accuracy.

Original languageEnglish
Title of host publicationSoutheastCon 2015
Pages5-Jan
StatePublished - 2015

Fingerprint

Dive into the research topics of 'Hybrid classification for tweets related to infection with influenza'. Together they form a unique fingerprint.

Cite this