Preserving and Analysing Large-Scale Twitter Data

Donnerstag, 13. Juni 2024
13 bis 14 Uhr

Online via Zoom

Veranstaltet von

Vortragende Person/Vortragende Personen:
Dr. Dimitar Dimitrov

Diese Veranstaltung ist Teil der Veranstaltungsreihe „GESIS online talks on Social Science Methods and Research Data“.

Preserving data from social media is crucial for many scientific disciplines. Publicly available social media archives facilitate research in the social sciences and provide corpora for training and testing a wide range of machine learning and natural language processing methods. To reduce the reliance on commercial gatekeepers, we decided in 2013 to create a large-scale longitudinal archive of tweets from X (then Twitter) for research purposes. We collected data from the then freely available random sample of 1% of all tweets from Twitter’s streaming API. In this talk, we will introduce TweetsKB - a knowledge base of tweets that has been enriched with named entities and sentiments. We also show how TweetsKB can be used to create topic specific sub-corpora, focusing on important societal events such as the COVID-19 pandemic. Understanding the COVID-19 discourse, its differences to the general Twitter discourse, and interdependencies with real-world events or (mis)information can foster valuable insights.

Registration via Zoom.
