We created pipelines for processing and classifying real-time data from Twitter related to vaping and e-cigarrettes. Using human-annotated gold standard data for classification of tweets, we compared several machine learning algorithms and classified tweets as relevant and non-relavant; commercial; pro-vape and anti-vape. Tweets were obtained using the RITHM software developed at the Center for Research on Media, Technology and Health (MTH) at the University of Pittsburgh School of Medicine. We collaborated with the Department of Biomedical Informatics at the University of Pittsburgh and the Pittsburgh Supercomputing Center (PSC) for the analysis.
Github repository: https://github.com/CRMTH/RITHM
Publication(s): Visweswaran S, Colditz JB, O’Halloran P, Han NR, Taneja SB, Welling J, Chu KH, Sidani JE, Primack BA, Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study, J Med Internet Res 2020;22(8):e17478, URL: https://www.jmir.org/2020/8/e17478, DOI: 10.2196/17478
Funding: This work was supported by awards from the National Cancer Institute of the National Institutes of Health (R01-CA225773), the National Library of Medicine of the National Institutes of Health (R01-LM012095), and the National Science Foundation (ACI-1548562 and ACI-1445606 to the Pittsburgh Supercomputing Center).