Noelle Toumey Reetz
Office of the Vice President for Research and Economic Development
ATLANTA—A Georgia State University researcher is mining social media data to document the experiences of so-called “long-haulers,” people who remain sick long after being diagnosed with COVID-19.
Experts know little about the clinical course of COVID-19. In the early days of the pandemic, clinicians did not believe coronavirus symptoms could persist past two or three weeks. Patients tended to either recover quickly or die from the infection. In late July, the Centers for Disease Control and Prevention published a report acknowledging that in a third of patients — even young adults with no preexisting conditions — COVID-19 can result in prolonged illness.
Juan Banda, assistant professor of computer science at Georgia State, has amassed one of the world’s largest publicly available datasets of COVID-19 Twitter chatter, made up of more than 602 million individual Tweets. Collaborating with researchers from Oxford University and Harvard Medical School, he used the dataset to identify common symptoms shared by long-haulers, some of whom take months to recover. The work is important because clinical reports documenting long-term symptoms of COVID-19 are not accessible to the public.
“Clinical data is not easily available, and it does not always capture detailed follow-up of the patients,” Banda said. “However, those patients are sharing their experiences on social media, allowing us to study the progression of the disease based on self-reported experiences.”
The researchers analyzed Tweets that were published in May — more than 60 days after the start of the pandemic — through July. The 10 most commonly mentioned symptoms were malaise and fatigue, labored breathing, tachycardia or heart palpitations, chest pain, insomnia/sleep disorders, cough, headache and joint pain or fever.
Some of the most serious reported health impacts for long-haulers were acute respiratory failure and acute organ injury, including kidney injury in 20 percent of patients and damage to the heart muscle in 20 to 30 percent of patients. Consequently, long-haulers may be likely to develop chronic conditions such as chronic kidney disease, heart failure and chronic obstructive pulmonary disease.
Banda and his colleagues chose to make the preliminary study public through the pre-print website medRxiv to aid the global push to learn more about the clinical manifestations of COVID-19.
“We have demonstrated that researchers can leverage social media data, specifically from Twitter, to conduct long-term studies of self-reported symptoms,” Banda said.
The work is part of a massive research project to collect and track social media chatter related to COVID-19. Banda’s team at Georgia State began collecting Tweets dedicated to coronavirus on March 10 and continue to collect nearly 4.5 million Tweets each day. Banda has also used the dataset to investigate the spread of misinformation relating to COVID-19 in various geographic areas. The dataset, which you can find here, is publicly available as a resource for the global research community.
“We have been contacted by scientists who are studying things like conspiracy theories, views and sentiments around personal protective equipment and the effects of the lockdown on society,” Banda said. “There’s a wide range of topics that other groups and experts are analyzing using our dataset.”
Banda’s collaboration with Dani Prieto-Alhambra (Oxford), Gurdas Singh (Oxford) and Osaid Alser (Harvard Medical School) came about through the Observational Health Data Sciences and Informatics (OHDSI) collaborative, an interdisciplinary network focused on applying large-scale analytics to health data.
This is preliminary research that has not been peer reviewed. The study results require further analysis and should not be considered conclusive or used to guide clinical decision-making.
Trained as a computer scientist and mathematician, Banda’s work bridges fields from astroinformatics to big data mining to biomedical informatics. He works with large volumes of image data, extracting and transforming computer vision image features into large content-based image retrieval systems for NASA’s Solar Dynamics Observatory mission.