Title Identification of leaders, lurkers, associates and spammers in a social network: context-dependent and context-independent approaches
Publication Type Journal Article
Year of Publication 2011
Authors Fazeen, M, Dantu, R, Guturu, P
Journal Social Network Analysis and Mining
Volume 1
Pagination 241-254
ISSN 1869-5469
Keywords Context dependent and context independent data analysis, Fuzzy logic, MLP, Naive Bayesian classifier, Random Forest, Social networks, Twitter
Abstract

In this paper, we present two methods for classification of different social network actors (individuals or organizations) such as leaders (e.g., news groups), lurkers, spammers and close associates. The first method is a two-stage process with a fuzzy-set theoretic (FST) approach to evaluation of the strengths of network links (or equivalently, actor-actor relationships) followed by a simple linear classifier to separate the actor classes. Since this method uses a lot of contextual information including actor profiles, actor-actor tweet and reply frequencies, it may be termed as a context-dependent approach. To handle the situation of limited availability of actor data for learning network link strengths, we also present a second method that performs actor classification by matching their short-term (say, roughly 25 days) tweet patterns with the generic tweet patterns of the prototype actors of different classes. Since little contextual information is used here, this can be called a context-independent approach. Our experimentation with over 500 randomly sampled records from a twitter database consists of 441,234 actors, 2,045,804 links, 6,481,900 tweets, and 2,312,927 total reply messages indicates that, in the context-independent analysis, a multilayer perceptron outperforms on both on classification accuracy and a new F-measure for classification performance, the Bayes classifier and Random Forest classifiers. However, as expected, the context-dependent analysis using link strengths evaluated using the FST approach in conjunction with some actor information reveals strong clustering of actor data based on their types, and hence can be considered as a superior approach when data available for training the system is abundant.

URL http://dx.doi.org/10.1007/s13278-011-0017-9
DOI 10.1007/s13278-011-0017-9

Publication Status:

UNT Department:

UNT Center:

UNT Lab: