Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter
T. Alshaabi, J. L. Adams, M. V. Arnold, J. R. Minot, D. R. Dewhurst, A. J. Reagan, C. M. Danforth, and P. S. Dodds
Science Advances, 7, eabe6534, 2021
Times cited: 48
Logline: The Storywrangler project is a curation of Twitter into day-scale usage ranks and frequencies of n-grams for over 100 billion tweets in 100 languages from 2008 through to mid 2020. The massive sociolinguistic data set accounts for social amplification of n-grams via retweets, which can be visualized through time series contagiograms. The project is intended to enable or enhance the study of any large-scale temporal phenomena where people matter including culture, politics, economics, linguistics, public health, conflict, cimate change, and data journalism.
Abstract:
In real time, Twitter strongly imprints world events, popular culture, and the day-to-day; Twitter records an ever growing compendium of language use and change; and Twitter has been shown to enable certain kinds of socioeconomic measurement and prediction. Vitally, and absent from many standard corpora such as books and news archives, Twitter also encodes popularity and spreading through retweets. Here, we describe Storywrangler, an ongoing, day-scale curation of over 100 billion tweets containing around 1 trillion 1-grams from 2008 to 2020. For each day, we break tweets into 1-, 2-, and 3-grams across 150+ languages, record usage frequencies, and generate Zipf distributions. We make the data set available through an interactive time series viewer, and as downloadable time series and daily distributions. We showcase a few examples of the many possible avenues of study we aim to enable including how social amplification can be visualized through 'contagiograms'.
- This is the default HTML.
- You can replace it with your own.
- Include your own code without the HTML, Head, or Body tags.
BibTeX:
@Article{alshaabi2021c, author = {Alshaabi, Thayer and Adams, Jane L. and Arnold, Michael V. and Minot, Joshua R. and Dewhurst, David R. and Reagan, Andrew J. and Danforth, Christopher M. and Dodds, Peter Sheridan}, title = {Storywrangler: {A} massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using {T}witter}, journal = {Science Advances}, year = {2021}, key = {stories,social media,complex systems,language}, volume = {7}, pages = {eabe6534}, } }