Corrado Monti

Network Models & Analysis _ [all topics]

13 papers found.

Learning Individual Behavior in Agent-Based Models with Graph Diffusion Networks

Francesco Cozzi, Marco Pangallo, Alan Perotti, André Panisson, Corrado Monti

Advances in Neural Information Processing Systems 2025 (NeurIPS 2025).

Link | PDF

Agent-Based Models (ABMs) are powerful tools for studying emergent properties in complex systems. In ABMs, agent behaviors are governed by local interactions and stochastic rules. However, these rules are, in general, non-differentiable, limiting the use of gradient-based methods for optimization, and thus integration with real-world data. We propose a novel framework to learn a differentiable surrogate of any ABM by observing its generated data. Our method combines diffusion models to capture behavioral stochasticity and graph neural networks to model agent interactions. Distinct from prior surrogate approaches, our method introduces a fundamental shift: rather than approximating system-level outputs, it models individual agent behavior directly, preserving the decentralized, bottom-up dynamics that define ABMs. We validate our approach on two ABMs (Schelling's segregation model and a Predator-Prey ecosystem) showing that it replicates individual-level patterns and accurately forecasts emergent dynamics beyond training. Our results demonstrate the potential of combining diffusion models and graph learning for data-driven ABM simulation.

Likelihood-Based Methods Improve Parameter Estimation in Opinion Dynamics Models

Jacopo Lenti, Corrado Monti, Gianmarco De Francisci Morales.

Proceedings of the 17th ACM International Conference on Web Search and Data Mining (WSDM '24).

Link | PDF | GitHub

We show that a maximum likelihood approach for parameter estimation in agent-based models (ABMs) of opinion dynamics outperforms the typical simulation-based approach. Simulation-based approaches simulate the model repeatedly in search of a set of parameters that generates data similar enough to the observed one. In contrast, likelihood-based approaches derive a likelihood function that connects the unknown parameters to the observed data in a statistically principled way. We compare these two approaches on the well-known bounded-confidence model of opinion dynamics. We do so on three realistic scenarios of increasing complexity depending on data availability: (i) fully observed opinions and interactions, (ii) partially observed interactions, (iii) observed interactions with noisy proxies of the opinions. We highlight how identifying observed and latent variables is fundamental for connecting the model to the data. To realize the likelihood-based approach, we first cast the model into a probabilistic generative guise that supports a proper data likelihood. Then, we describe the three scenarios via probabilistic graphical models and show the nuances that go into translating the model. Finally, we implement the resulting probabilistic models in an automatic differentiation framework (PyTorch). This step enables easy and efficient maximum likelihood estimation via gradient descent. Our experimental results show that the maximum likelihood estimates are up to 4x more accurate and require up to 200x less computational time.

Online conspiracy communities are more resilient to deplatforming

Corrado Monti, Matteo Cinelli, Carlo Valensise, Walter Quattrociocchi, Michele Starnini.

PNAS Nexus, Volume 2, Issue 10, October 2023.

Link

Online social media foster the creation of active communities around shared narratives. Such communities may turn into incubators for conspiracy theories—some spreading violent messages that could sharpen the debate and potentially harm society. To face these phenomena, most social media platforms implemented moderation policies, ranging from posting warning labels up to deplatforming, i.e. permanently banning users. Assessing the effectiveness of content moderation is crucial for balancing societal safety while preserving the right to free speech. In this article, we compare the shift in behavior of users affected by the ban of two large communities on Reddit, GreatAwakening and FatPeopleHate, which were dedicated to spreading the QAnon conspiracy and body-shaming individuals, respectively. Following the ban, both communities partially migrated to Voat, an unmoderated Reddit clone. We estimate how many users migrate, finding that users in the conspiracy community are much more likely to leave Reddit altogether and join Voat. Then, we quantify the behavioral shift within Reddit and across Reddit and Voat by matching common users. While in general the activity of users is lower on the new platform, GreatAwakening users who decided to completely leave Reddit maintain a similar level of activity on Voat. Toxicity strongly increases on Voat in both communities. Finally, conspiracy users migrating from Reddit tend to recreate their previous social network on Voat. Our findings suggest that banning conspiracy communities hosting violent content should be carefully designed, as these communities may be more resilient to deplatforming.

Evidence of Demographic rather than Ideological Segregation in News Discussion on Reddit

Corrado Monti, Jacopo D'Ignazi, Michele Starnini, Gianmarco De Francisci Morales

Proceedings of the ACM Web Conference 2023 (WWW2023), May 1-5, 2023, Austin, TX, USA. ACM

Link | PDF | GitHub | Dataset | Short video

We evaluate homophily and heterophily among ideological and demographic groups in a typical opinion formation context: online discussions of current news. We analyze user interactions across five years in the r/news community on Reddit, one of the most visited websites in the United States. Then, we estimate demographic and ideological attributes of these users. Thanks to a comparison with a carefully-crafted network null model, we establish which pairs of attributes foster interactions and which ones inhibit them. Individuals prefer to engage with the opposite ideological side, which contradicts the echo chamber narrative. Instead, demographic groups are homophilic, as individuals tend to interact within their own group - even in an online setting where such attributes are not directly observable. In particular, we observe age and income segregation consistently across years: users tend to avoid interactions when belonging to different groups. These results persist after controlling for the degree of interest by each demographic group in different news topics. Our findings align with the theory that affective polarization - the difficulty in socializing across political boundaries-is more connected with an increasingly divided society, rather than ideological echo chambers on social media.

Communities, Gateways, and Bridges: Measuring Attention Flow in the Reddit Political Sphere

Cesare Rollo, Gianmarco De Francisci Morales, Corrado Monti, André Panisson.

International Conference on Social Informatics (SocInfo2022). Springer, 2022.

Link | PDF

Won a monetary prize as SocInfo Best Paper Award! 🏆

Online social media have attracted large and vibrant communities, which shape how people interact online. Platforms such as Reddit provide a safe harbor for groups to discuss a variety of topics, including politics and even conspiracy theories. We propose a framework, dubbed attention-flow graph, to investigate the flow of users across Reddit communities from a network perspective. This graph concisely summarizes how users shift their attention from one subreddit to another over time, and allows to capture its community structure. In addition, it enables the operationalization of the concepts of gateways and bridges: particular subreddits that support the transition of users towards specific communities. We apply this framework to identify political and conspiracy communities, thus discovering their bridges and gateways. We find that conspiracy theories help attracting users to the alt-right community from occultist subreddits, but also by diverting users from the radical left.

No echo in the chambers of political interactions on Reddit

Gianmarco De Francisci Morales, Corrado Monti, and Michele Starnini.

Scientific Reports 11 (1), February 2021 (Nature Publishing Group)

Link | PDF | GitHub

Echo chambers in online social networks, whereby users’ beliefs are reinforced by interactions with like-minded peers and insulation from others’ points of view, have been decried as a cause of political polarization. Here, we investigate their role in the debate around the 2016 US elections on Reddit, a fundamental platform for the success of Donald Trump. We identify Trump vs Clinton supporters and reconstruct their political interaction network. We observe a preference for cross-cutting political interactions between the two communities rather than within-group interactions, thus contradicting the echo chamber narrative. Furthermore, these interactions are asymmetrical: Clinton supporters are particularly eager to answer comments by Trump supporters. Beside asymmetric heterophily, users show assortative behavior for activity, and disassortative, asymmetric behavior for popularity. Our findings are tested against a null model of random interactions, by using two different approaches: a network rewiring which preserves the activity of nodes, and a logit regression which takes into account possible confounding factors. Finally, we explore possible socio-demographic implications. Users show a tendency for geographical homophily and a small positive correlation between cross-interactions and voter abstention. Our findings shed light on public opinion formation on social media, calling for a better understanding of the social dynamics at play in this context.

Learning Opinion Dynamics From Social Traces

Corrado Monti, Gianmarco De Francisci Morales, and Francesco Bonchi.

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD2020). ACM, 2020.

Link | PDF | GitHub | Short video

Opinion dynamics - the research field dealing with how people's opinions form and evolve in a social context - traditionally uses agent-based models to validate the implications of sociological theories. These models encode the causal mechanism that drives the opinion formation process, and have the advantage of being easy to interpret. However, as they do not exploit the availability of data, their predictive power is limited. Moreover, parameter calibration and model selection are manual and difficult tasks. In this work we propose an inference mechanism for fitting a generative, agent-like model of opinion dynamics to real-world social traces. Given a set of observables (e.g., actions and interactions between agents), our model can recover the most-likely latent opinion trajectories that are compatible with the assumptions about the process dynamics. This type of model retains the benefits of agent-based ones (i.e., causal interpretation), while adding the ability to perform model selection and hypothesis testing on real data. We showcase our proposal by translating a classical agent-based model of opinion dynamics into its generative counterpart. We then design an inference algorithm based on online expectation maximization to learn the latent parameters of the model. Such algorithm can recover the latent opinion trajectories from traces generated by the classical agent-based model. In addition, it can identify the most likely set of macro parameters used to generate a data trace, thus allowing testing of sociological hypotheses. Finally, we apply our model to real-world data from Reddit to explore the long-standing question about the impact of backfire effect. Our results suggest a low prominence of the effect in Reddit's political conversation.

Generating Realistic Interest-Driven Information Cascades

Federico Cinus, Francesco Bonchi, Corrado Monti, and André Panisson.

International AAAI Conference on Weblogs and Social Media (ICWSM2020). AAAI, 2020.

Link

We propose a model for the synthetic generation of information cascades in social media. In our model the information “memes” propagating in the social network are characterized by a probability distribution in a topic space, accompanied by a textual description, i.e., a bag of keywords coherent with the topic distribution. Similarly, every user of the social media is described by a vector of interests defined over the same topic space. Information cascades are governed by the topic of the meme, its level of virality, the interests of each user, community pressure, and social influence. The main technical challenge we face towards our goal is the generation of realistic interest vectors, given a known network structure and a tunable level of homophily. We tackle this problem by means of a method based on non-negative matrix factorization, which is shown experimentally to outperform non-trivial baselines based on label propagation and random-walk-based graph embedding. As we showcase in our experiments, our model offers a small set of simple and easily interpretable “knobs” which allow to study, in vitro, how each set of assumptions affects the resulting propagations. Finally, we show how to generate synthetic cascades that have similar macro-statistics to the real-world cascades for a dataset containing both the network and the cascades.

Estimating Latent Feature-Feature Interactions in Large Feature-Rich Graphs

Included in my PhD thesis.

Corrado Monti and Paolo Boldi.

Internet Mathematics, 2017.

Link | PDF

Real-world complex networks describe connections between objects; in reality, those objects are often endowed with some kind of features. How does the presence or absence of such features interplay with the network link structure? Although the situation here described is truly ubiquitous, there is a limited body of research dealing with large graphs of this kind. Many previous works considered homophily as the only possible transmission mechanism translating node features into links. Other authors, instead, developed more sophisticated models, that are able to handle complex feature interactions, but are unfit to scale to very large networks. We expand on the MGJ model, where interactions between pairs of features can foster or discourage link formation. In this work, we will investigate how to estimate the latent feature-feature interactions in this model. We shall propose two solutions: the first one assumes feature independence and it is essentially based on Naive Bayes; the second one, which relaxes the independence assumption assumption, is based on perceptrons. In fact, we show it is possible to cast the model equation in order to see it as the prediction rule of a perceptron. We analyze how classical results for the perceptrons can be interpreted in this context; then, we define a fast and simple perceptron-like algorithm for this task, which can process $10^8$ links in minutes. We then compare these two techniques, first with synthetic datasets that follows our model, gaining evidence that the Naive independence assumptions are detrimental in practice. Secondly, we consider a real, large-scale citation network where each node (i.e., paper) can be described by different types of characteristics; there, our algorithm can assess how well each set of features can explain the links, and thus finding meaningful latent feature-feature interactions.

Cleansing Wikipedia Categories Using Centrality

Included in my PhD thesis.

Paolo Boldi and Corrado Monti.

Proceedings of the 25th International Conference Companion on World Wide Web, ACM 2016.

Link | PDF | GitHub

We propose a novel general technique aimed at pruning and cleansing the Wikipedia category hierarchy, with a tunable level of aggregation. Our approach is endogenous, since it does not use any information coming from Wikipedia articles, but it is based solely on the user-generated (noisy) Wikipedia category folksonomy itself. We show how the proposed techniques can help reduce the level of noise in the hierarchy and discuss how alternative centrality measures can differently impact on the result.

Learning Latent Category Matrix to Find Unexpected Relations in Wikipedia

Included in my PhD thesis.

Paolo Boldi and Corrado Monti.

Proceedings of the 8th ACM Conference on Web Science, (WebSci2016), ACM 2016.

Link | PDF | GitHub

Besides finding trends and unveiling typical patterns, modern information retrieval is increasingly more interested in the discovery of surprising information in textual datasets. In this work we focus on finding "unexpected links" in hyperlinked document corpora when documents are assigned to categories. To achieve this goal, we model the hyperlinks graph through node categories: the presence of an arc is fostered or discouraged by the categories of the head and the tail of the arc. Specifically, we determine a latent category matrix that explains common links. The matrix is built using a margin-based online learning algorithm (Passive-Aggressive), which makes us able to process graphs with $10^{8}$ links in less than $10$ minutes. We show that our method provides better accuracy than most existing text-based techniques, with higher efficiency and relying on a much smaller amount of information. It also provides higher precision than standard link prediction, especially at low recall levels; the two methods are in fact shown to be orthogonal to each other and can therefore be fruitfully combined.

A Network Model Characterized by a Latent Attribute Structure with Competition

Included in my PhD thesis.

Paolo Boldi, Irene Crimaldi, and Corrado Monti.

Information Sciences 354 (2016): 236–56.

Link | PDF

The quest for a model that is able to explain, describe, analyze and simulate real-world complex networks is of uttermost practical as well as theoretical interest. In this paper we introduce and study a network model that is based on a latent attribute structure: each node is characterized by a number of features and the probability of the existence of an edge between two nodes depends on the features they share. Features are chosen according to a process of Indian-Buffet type but with an additional random "fitness" parameter attached to each node, that determines its ability to transmit its own features to other nodes. As a consequence, a node's connectivity does not depend on its age alone, so also "young" nodes are able to compete and succeed in acquiring links. One of the advantages of our model for the latent bipartite "node-attribute" network is that it depends on few parameters with a straightforward interpretation. We provide some theoretical, as well experimental, results regarding the power-law behaviour of the model and the estimation of the parameters. By experimental data, we also show how the proposed model for the attribute structure naturally captures most local and global properties (e.g., degree distributions, connectivity and distance distributions) real networks exhibit. keyword: Complex network, social network, attribute matrix, Indian Buffet process

Liquid FM: Recommending Music through Viscous Democracy

Paolo Boldi, Corrado Monti, Massimo Santini, and Sebastiano Vigna.

Italian Information Retrieval Workshop, 2015.

Link | PDF | GitHub

Most modern recommendation systems use the approach of collaborative filtering: users that are believed to behave alike are used to produce recommendations. In this work we describe an application (Liquid FM) taking a completely different approach. Liquid FM is a music recommendation system that makes the user responsible for the recommended items. Suggestions are the result of a voting scheme, employing the idea of viscous democracy. Liquid FM can also be thought of as the first testbed for this voting system. In this paper we outline the design and architecture of the application, both from the theoretical and from the implementation viewpoints.