A guest post by Phillip Brooker on the Chorus team’s social media data capture and analysis tools.
The promise of social media as a resource and topic of social science research is thus far as frustrating as it is tantalising. It is widely recognised that social media may have much to offer academic research, yet acquiring and making effective use of this material – part of what some refer to as ‘the big data challenge’ – seems to sit just outside of our current technical skillset. Although computer science has enjoyed a recent burst of activity in the development of algorithms for capturing and processing social media data in various ways (such as Thelwall et al.’s (2010) SentiStrength, which is able to produce values for positive and negative sentiment in short text, or Cui et al.’s (2011) TextFlow which is a temporal topic model designed to capture the evolutionary aspects of unfolding topic flows), the majority of these computational techniques have failed to filter through to social science in any significant way. The chief barrier to this is in the difficulties of securing a mutually productive relationship which requires both a level of technical understanding from social scientists, and a sensitivity to the methodological and analytic interests of social research on the part of computer scientists.
Tackling this barrier head-on, Chorus is a software development project that aims to facilitate social media research for social science by bringing together the existing algorithms and metrics from the computer sciences with the requirements and methodologies of the social sciences. The Chorus team has its origins in two projects with nodes located at Brunel University – MATCH (www.match.ac.uk), which is a research programme investigating various issues around medical device manufacture, and FoodRisC (www.foodrisc.org), which is a European initiative directed towards improving risk communication around food issues. The team is a small, close-knit, interdisciplinary collaboration of programmers, web developers, and social scientists in the role of requirements engineers. Hence, the Chorus project is an attempt to utilise a broad array of expertises to furnish social science with a bespoke social media data capture and analysis tool, for both quantitative and qualitative research, and to find a way of making the technical world of algorithms more user-friendly for an audience unused to dealing with them.
The Chorus package comprises of two distinct programs. Firstly, we have Chorus-TC (TweetCatcher), which is a browser based service for managing Twitter queries, automating retrieval of new posts, along with sentiment analysis (using Sentistrength: Thelwall, 2010) and archiving functionalities. Tweetcatcher allows users to sift Twitter for relevant data in two distinct ways: either by topical keywords appearing in Twitter conversation widely (i.e. semantically-driven data) or by identifying a network of Twitter users and following their daily ‘Twitter lives’ (i.e. user-driven data).
Secondly, we have Chorus-TV (TweetVis), which is a visual analytic suite for facilitating both quantitative and qualitative approaches to social media data in social science. Visual analytics (VA) is an interdisciplinary computing methodology combining methods from data mining, information visualization, human-computer interaction and cognitive psychology. The VA approach is highly relevant to the aims of Chorus, enabling exploratory analysis of social media data in an intuitive and user-friendly fashion. Two main views are available within Chorus-TV. The Time-Line Explorer (below) provides users an opportunity to analyse Twitter data across time and visualize the unfolding Twitter conversation according to various metrics (including tweet frequency, sentiment, semantic novelty and homogeneity, collocated words, and so on).
[Click on image to enlarge]
By contrast, the Cluster Explorer (below) allows users to delve into the semantic and topical makeup of their dataset in a way that is significantly less reliant on the chronological ordering of topics. Cluster explorer represents semantic similarity on a 2D map, which displays the semantic similarity of intervals, tweets and terms as their proximity to each other in the cluster map. This provides access to interval-level, tweet-level and term(word)-level visualisations and provides a means for users to explore the different topics prevalent within their dataset and trace relationships between them via ‘topical nodes’ (which may form central ‘hub topics’ from which other sub-topics branch outwards).
[Click on image to enlarge]
Our choice of Twitter as an initial case is based on its status as a ‘simplest case’ of social media data, due to it essentially consisting of short text and links to other media. However, one of the challenges for the future of Chorus will be to conceive of analytically useful ways of visualising data other than short text, including images and sounds, which would allow for an expansion of the software into other social media platforms (such as blogs, Facebook, Tumblr, Instagram, SoundCloud, FourSquare, and so on). More widely, the chief ongoing challenge for social media research as a field will be in the continued development of a research-supporting software infrastructure (and accompanying methodologies that enable social scientists to make sensible use of software such as Chorus) in such a way as to be both intuitive to use and flexible enough to be tailored to a wide range of specific and unspecified research questions. The technical development of software such as Chorus (and the continued feedback we hope to get from social science-trained users) is, we hope, the first step towards formalising a robust social science research programme that can take advantage of the possibilities of social media data in an empirically defensible way. To that end, we welcome any queries about our project and about gaining access to our tools, and are eager to hear the thoughts and comments of interested users via the email address listed above.
The Chorus Team are situated in Brunel University’s Department of Information Systems, Computing and Mathematics, and are:
Dr Tim Cribbin (Timothy.Cribbin@brunel.ac.uk)
Professor Julie Barnett (Julie.Barnett@brunel.ac.uk)
Dr Phillip Brooker (Phillip.Brooker@brunel.ac.uk)
Mr Hiran Basnayake (Hiran.Basnayake@brunel.ac.uk)
Cui, W., S. Liu, L. Tan, C. Shi, Y. Song, Z. J. Gao, X. Tong and H. Qu (2011) ‘TextFlow: Towards Better Understanding of Evolving Topics in Text’, IEEE Transactions on Visualization and Computer Graphics, 17(12), 2412-2421.
Thelwall, M., K. Buckley, G. Paltoglou, D. Cai and A. Kappas (2010) ‘Sentiment Strength Detection in Short Informal Text’, Journal of the American Society for Information Science and Technology, 61(12), pp. 2544-2558.