ATLANTA—Researchers at Georgia State University have released a trove of data from the social media site Twitter to help represent the spread and impacts of the global coronavirus outbreak.
The dataset of more than 140 million tweets is publicly available as a resource for the global research community.
The work is part of research that collects and tracks social media chatter to clarify mobility patterns during natural disasters. This rare step of making the work public before the results are finalized highlights the unprecedented threat posed during the global pandemic.
Juan Banda, an assistant professor of computer science, is heading up the project and working with epidemiologists and data scientists. The dataset will be updated every two days and could have wide-reaching implications.
“It was a big decision to make to release the data before having a few papers prepared on it, but it is for the common good,” said Banda. “We are all on the same planet together, and any additional data that could be easily available for other researchers to analyze can make the difference. I am a big believer in open-science, and this is definitely a time where it’s important to have the greatest number of eyes on the research.”
The work provides unique insight into the outbreak, including information on travel, displacement, diagnoses, treatment and a historical record of the timing. Banda is collaborating with Gerardo Chowell, a professor of mathematical epidemiology and chair of the Department of Population Health Sciences in the School of Public Health. Chowell said the work can identify how people are getting and using information on social media.
“This dataset,” Chowell said, “will allow researchers to investigate the spread of misinformation relating to COVID-19, study the change in population behaviors and sentiments as the virus spreads in different geographic areas and quantify the effects of social distancing efforts and changes in human mobility patterns over course of the pandemic.”
The research team, including research assistant Ramya Tekumalla, began collecting tweets dedicated to coronavirus on March 10. They have collected millions of impressions that could help scientists identify clues they might otherwise overlook. Chowell and Banda used similar research to identify patterns during the recent global Zika outbreak.
“These data provide another view of the pandemic’s impact,” said Banda. “While most efforts are focused on infection rates, hospitalizations and death toll for epidemiological use, our dataset can be used to measure from where people are getting their information (or disinformation) and gauge the sentiment of people with respect to the measures our government is taking, and more.”
So far, researchers have been collecting close to 4.5 million tweets every day. This is part of a revolution in data collection and computer science that offers new ways to track people living through a pandemic in real-time, something that wasn’t possible even 10 years ago.
As scientists around the world work to reduce the toll from the outbreak, Banda and his team hope the work can improve future outcomes and even encourage the public to change behavior.
“Indirectly, by being able to tackle sources of disinformation and highlight instances of people not following rules, I believe we can get everybody to do their part in flattening the curve, said Banda. “In a future scenario, having this data will allow researchers to be better prepared and build systems to detect community transmission, and devise interventions to not be in the current position we are now.”
Additional data were provided by Guanyu Wang of the University of Missouri, Jingyuan Yu of the Universitat Autònoma de Barcelona, Tuo Liu of Carl von Ossietzky Universität Oldenburg and Yuning Ding of the Universität Duisburg-Essen. See the dataset here.
Trained as a computer scientist and mathematician, Banda's work bridges fields from astroinformatics to big data mining to biomedical informatics. He works with large volumes of image data, extracting and transforming computer vision image features into large content-based image retrieval systems for NASA’s Solar Dynamics Observatory mission.
Population Health Sciences
Chowell employs mathematical models to investigate the role of environmental characteristics on the transmission dynamics of SARS-CoV-2 in various settings. The team studies the extent to which changes in the environment could drive the course of the pandemic the next few months. In the School of Public Health, his research team is producing daily incidence forecasts of the pandemic’s series of deaths and global trajectory.