Thesis & Internship proposals

These projects are proposals for interested Master's or PhD students who would like to join CENTAI for an internship or for their Master's thesis with me. Students from Università di Torino and Politecnico di Torino are particularly encouraged to get in contact, but remote or visiting students are welcomed as well.

Here are some examples of published works that we developed this way with interested students.

This page might not be very up-to-date, so if you are interested in my research topics feel free to drop me an email!

Determinants of Climate Change stances on Reddit

Which factors drive opinion change on the climate change, and thus drive support for climate change mitigation policies? This question is of particular interest since anticipating how the public opinion will react to such policies is of fundamental importance for the success of mitigation strategies. While this problem has been studied with traditional methodologies such as surveys and small-scale qualitative methods, so far research has ignored the potential offered by Reddit, in terms of depth of analysis and amount of available data.

To leverage this source of information, the first step would be building a data set of Reddit posts mentioning Climate Change (CC) or Global Warming (GW). Then, the idea is to develop a probabilistic model to study which factors drive opinion change or a novel interest in this issue, especially considering:

Demographic factors (e.g. young age)
Geographic factors (e.g., areas more affected by catastrophic events)
Cultural factors (e.g., higher education)
Political side

These determinants can be automatically obtained for Reddit users by using state-of-the-art methods [1]. Beside these factors, the communities participated by users have been extensively used as a precious source of information to understand the sociological background of users [2]. Therefore, it would be possible to understand how mentions of climate change (our dependent variable, as proxy of a novel interest in the issue) is affected by these factors (independent variables: participation in communities and sociodemographic attributes). Even if such factors are not the main drivers of stance adoption, thanks to the large scale of Reddit data, it would be possible to identify and investigate these small but important patterns [2, 3].

Requirements: Good knowledge of Python (pandas, numpy, scikit-learn, jupyter), background in statistics, willing to learn big data techniques (e.g. pyspark).

References:

[1] Quantifying social organization and political polarization in online platforms. Isaac Waller & Ashton Anderson

[2] J Massachs, Corrado Monti, GDF Morales, F Bonchi. Roots of trumpism: Homophily and social feedback in donald trump support on reddit. 12th ACM Conference on Web Science, 2020

[3] "The language of opinion change on social media under the lens of communicative action". Corrado Monti, Luca Maria Aiello, Gianmarco De Francisci Morales, Francesco Bonchi. Scientific Reports 12, February 2021

Direct observation of the Backfire effect on Reddit

An open question in social science is the existence of backfire effect — that is, are two individuals with different opinions on a topic more likely to become more polarized if they interact? This question has risen in importance due to the large amount of interactions enabled by social media [3, 4].

We previously found that conflictual interactions are a large part of political interactions on Reddit [1]. We tackled the question of backfire through a model-driven machine learning framework [2]; however, these were proof-of-concept results, hardly conclusive. We would be interested therefore in evaluating this question in the most straightforward way as possible, using Reddit data. In other words, we would like to design the simplest possible model and experiment that can help answering this question convincingly.

The experimental design would involve first identifying a "treatment" group of Reddit users that experienced conflictual interactions; second, a control group of different users (and/or the same users at a different point in time); third, leveraging their participation to sociopolitical subreddits as a proxy for polarization. Finally, we will combine all these data sources into a causal experiment design (e.g. involving interrupted time series analysis).

Requirements: Good knowledge of Python (pandas, numpy, scikit-learn, jupyter), background in statistics, willing to learn big data techniques (e.g. pyspark).

References:

[1] Gianmarco De Francisci Morales, Corrado Monti, and Michele Starnini. No echo in the chambers of political interactions on Reddit. Scientific Reports 11(1), February 2021

[2] Corrado Monti, Gianmarco De Francisci Morales, and Francesco Bonchi. Learning Opinion Dynamics From Social Traces. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD2020)

[3] Pennycook, G., Epstein, Z., Mosleh, M., Arechar, A.A., Eckles, D. and Rand, D.G., 2021. Shifting attention to accuracy can reduce misinformation online. Nature, 592(7855).

[4] Sippitt A. "The backfire effect. Does it exist?", 2019.

Who are the gatekeepers?

Gatekeepers are social media users that, to a certain degree, control what comes across the barrier of echo chambers and filter bubbles [1, 2]. They are typically characterized from a network science perspective by a higher centrality [2].

However, it would be (even more) interesting to find out who they are from a demographic point of view. Are they rich or poor? Old or young? Urban or rural? Educated or uneducated? And so on. In other words, did social media truly democratize mass communication, or certain social groups keep a (loose) control of what is happening? Since we already observed [3] that echo chambers might appear around geographical communities, it is reasonable to hypothesize further connections between offline and online attributes.

We have developed tools able to extract reliable probabilistic information about demographics characteristics from social media (esp. Reddit) users: the question now is how to connect the realm of network centrality with the offline world.

Requirements: Good knowledge of Python (pandas, numpy, scikit-learn, jupyter), background in statistics, willing to learn big data techniques (e.g. pyspark).

References:

[1] Welbers K, Opgenhaffen M. Social media gatekeeping: An analysis of the gatekeeping influence of newspapers’ public Facebook pages. New Media & Society. 2018 Dec;20(12).

[2] Garimella K, De Francisci Morales G, Gionis A, Mathioudakis M. Political discourse on social media: Echo chambers, gatekeepers, and the price of bipartisanship. InProceedings of the 2018 world wide web conference 2018 Apr 23.

[3] Gianmarco De Francisci Morales, Corrado Monti, and Michele Starnini. “No echo in the chambers of political interactions on Reddit” Scientific Reports 11(1),February 2021.

Reddit: GreatAwakening versus QAnonCasualties

Understanding how susceptible are different groups to conspiracy theories is of prime interest because of the impact such groups have on public life. We would like to use Reddit to characterize those communities, focusing on r/GreatAwakening, a former subreddit dedicated to the QAnon conspiracy theory (banned on 12 September 2018), and on r/QAnonCasualties/, an active subreddit for support of persons with a friend or loved one taken in by QAnon. These two subreddits have existed in different timespans, but we will study the users that participated to those subreddits, and track their history from public data. We wish to investigate through machine learning (1) the similarities and links between these two communities (2) which elements characterize each the two communities (3) if common elements between the two allow to distinguish between the topics related to QAnon and the social environment where both QAnon believers and their social circle are embedded in.

Requirements: Good knowledge of Python (pandas, numpy, scikit-learn, jupyter), background in statistics, some basics of machine learning, willing to learn big data techniques (e.g. pyspark).

References:

Massachs, Corrado Monti, De Francisci Morales, Bonchi. “Roots of Trumpism: Homophily and Social Feedback in Donald Trump Support on Reddit.” ACM WebSci 2020.

Gianmarco De Francisci Morales, Corrado Monti, and Michele Starnini. “No echo in the chambers of political interactions on Reddit” Scientific Reports 11(1),February 2021.

Papasavva et al, "Is it a Qoincidence?": An Exploratory Study of QAnon on Voat, WWW 2021.

Prediction of reach in Facebook Political Ads

Since February 2019, amid accusation of spreading misinformation, Facebook has decided to open access to the political ads they have run, through the Facebook Ads Library, an open API. This library is a rich yet underestimated data source, and therefore many basic questions still need to be answered. One of these is: which features make a political ad more or less popular, either because of Facebook algorithms or because of advertisers decisions? Is an emotional language more likely to be associated with a wider audience? Since the FB Ads Library also offers information about demographic segments, different questions can be asked also for separate fragments of the audience: e.g., is the presence of a video or a link associated to a wider reach among younger people? It is particularly interesting to answer such questions for political ads, in order to understand more about political campaigning on Facebook.

Requirements: Good knowledge of Python (pandas, numpy, scikit-learn, jupyter), background in statistics, interest in Machine Learning & Natural Language Processing.

References:

Capozzi, De Francisci Morales, Mejova, Corrado Monti, Panisson, Paolotti. “Facebook Ads: Politics of Migration in Italy.” SocInfo 2020.

Capozzi, De Francisci Morales, Mejova, Corrado Monti, Panisson, Paolotti. “Clandestino or Rifugiato? Anti-immigration Facebook Ad Targeting in Italy”. CHI2021.