information et réseaux sociaux 1

Telechargé par Ouafa Amrini
HAL Id: tel-01100255
https://tel.archives-ouvertes.fr/tel-01100255
Submitted on 6 Jan 2015
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Distributed under a Creative Commons Attribution - NoDerivatives| 4.0 International
License
Diusion de l’information dans les médias sociaux :
modélisation et analyse
Adrien Guille
To cite this version:
Adrien Guille. Diusion de l’information dans les médias sociaux : modélisation et analyse. Informa-
tique [cs]. Université Lumière Lyon 2, 2014. Français. �tel-01100255�
Thèse présentée pour obtenir le grade de
Docteur de l’Université Lumière Lyon 2
École Doctorale Informatique et Mathématiques (ED 512)
Laboratoire ERIC (EA 3083)
Discipline : Informatique
Diffusion de l’information dans les médias sociaux
Modélisation et analyse
Par : Adrien Guille
Présentée et soutenue publiquement le 25 novembre 2014, devant un jury composé de :
Pascal Poncelet,Professeur des Universités, Université Montpellier 2 Rapporteur
Emmanuel Viennet,Professeur des Universités, Université Paris 13 Rapporteur
Vincent Labatut,Maître de Conférences, Université d’Avignon et des Pays du Vaucluse Examinateur
Christine Largeron,Professeur des Universités, Université Jean Monnet Saint-Etienne Examinatrice
Cécile Favre,Maître de Conférences, Université Lumière Lyon 2 Co-directrice
Djamel Zighed,Professeur des Universités, Université Lumière Lyon 2 Directeur
Abstract
Social media have greatly modified the way we produce, diffuse and consume
information, and have become powerful information vectors. The goal of this thesis
is to help in the understanding of the information diffusion phenomenon in social
media by providing means of modeling and analysis.
First, we propose MABED (Mention-Anomaly-Based Event Detection), a statistical
method for automatically detecting events that most interest social media users from
the stream of messages they publish. In contrast with existing methods, it doesn’t
only focus on the textual content of messages but also leverages the frequency of
social interactions that occur between users. MABED also differs from the literature in
that it dynamically estimates the period of time during which each event is discussed
rather than assuming a predefined fixed duration for all events. Secondly, we propose
T-BASIC (Time-Based ASynchronous Independent Cascades), a probabilistic model
based on the network structure underlying social media for predicting information
diffusion, more specifically the evolution of the number of users that relay a given
piece of information through time. In contrast with similar models that are also based
on the network structure, the probability that a piece of information propagate from
one user to another isn’t fixed but depends on time. We also describe a procedure
for inferring the latent parameters of that model, which we formulate as functions of
observable characteristics of social media users. Thirdly, we propose SONDY (SOcial
Network DYnamics), a free and extensible software that implements state-of-the-art
methods for mining data generated by social media, i.e. the messages published by
users and the structure of the social network that interconnects them. As opposed
to existing academic tools that either focus on analyzing messages or analyzing the
network, SONDY permits the joint analysis of these two types of data through the
analysis of influence with respect to each detected event.
The experiments, conducted on data collected on Twitter, demonstrate the rele-
vance of our proposals and shed light on some properties that give us a better un-
derstanding of the mechanisms underlying information diffusion. First, we compare
3
the performance of MABED against those of methods from the literature and find that
taking into account the frequency of social interactions between users leads to more
accurate event detection and improved robustness in presence of noisy content. We
also show that MABED helps with the interpretation of detected events by providing
clear textual descriptions and precise temporal descriptions. Secondly, we demons-
trate the relevancy of the procedure we propose for estimating the pairwise diffusion
probabilities on which T-BASIC relies. For that, we illustrate the predictive power of
users’ characteristics, and compare the performance of the method we propose to es-
timate the diffusion probabilities against those of state-of-the-art methods. We show
the importance of having non-constant diffusion probabilities, which allows incorpo-
rating the variation of users’ level of receptivity through time into T-BASIC. We also
study how – and in which proportion – the social, topical and temporal characteristics
of users impact information diffusion. Thirdly, we illustrate with various scenarios the
usefulness of SONDY, both for non-experts – thanks to its advanced user interface and
adapted visualizations – and for researchers – thanks to its application programming
interface.
Keywords. Social media data mining ; Event detection and tracking ; Modeling and
predicting information diffusion ; Scientific software development.
4
1 / 190 100%
La catégorie de ce document est-elle correcte?
Merci pour votre participation!

Faire une suggestion

Avez-vous trouvé des erreurs dans linterface ou les textes ? Ou savez-vous comment améliorer linterface utilisateur de StudyLib ? Nhésitez pas à envoyer vos suggestions. Cest très important pour nous !