Determining Twitter audiences: Geolocation and number of followers

altmetrics15
The 2015 Altmetrics Workshop
Amsterdam, 9 October 2015

Stefanie Haustein
Rodrigo Costas

Abstract

A considerable amount of recent scientific journal articles are disseminated on Twitter (e.g., Haustein, Costas, & Larivière, 2015). The analysis of Twitter user information as a way of characterizing and modeling the different types of dissemination and audiences around scientific publications has been pointed out as a very promising development of altmetric indicators (Haustein, Bowman, & Costas, 2015). However, these developments rely quite strongly on the availability and reliability of user-centered data. Twitter offers several types of information that help to identify the type of user, such as their account description, number of followers, geographic location and language setting. Altmetric.com has been capturing tweets to scientific papers since June 2011. Based on Altmetric.com data from November 2014, their database contained 10,710,037 unique tweets (based on the Twitter tweet ID) from 1,489,669 unique users according to the Twitter handle.

In this paper we focus on the geolocation and number of followers on Twitter as captured by Altmetric.com and discuss their reliability and potential use to determine the location of users as well as the size of audiences of tweets. We provide basic information about the number of followers and geolocations of Twitter accounts as recorded by Altmetric.com. Given the skewness of the number of tweets to papers per user (i.e., 59% of all users appear only once in the Altmetric.com database), we also show results for the most active users, that is those with at least 50 tweets recorded by Altmetric.com (n=27,009; ~2% of all users).

Number of followers
The number of followers of a Twitter account mentioning a paper can indicate the level of ‘exposure’ or potential audience size of that tweet. Altmetric.com captures the number of followers of a user. We had assumed that this value reflects the number of followers at the time the tweet was sent, however, for 88% of all users and 17% of the 27,009 most active accounts the number of followers does not change. According to personal communication with Euan Adie of Altmetric.com, for data before 2014 the number of followers is determined the first time a user was recorded. Since 2014 the number of followers for a given user is overwritten each time a new tweet by that user is recorded. Thus the number of followers does not actually reflect the potential audience at the time of the tweet but during the last update. This limits the value of the number of followers as an indicator of the potential audience and prevents longitudinal studies based on the number of followers per account. When using the number of followers as a rough proxy of the overall potential audience of a Twitter user we suggest using the maximum number of followers per account in Altmetric.com. Alternatively, the number of followers can be obtained from the Twitter API to have relatively homogenous values at the time of analysis.

Geolocation
Location information on Twitter is available through geotagged tweets indicating the current location of the user at the time of tweeting. However, geotagging is not a default setting and thus rarely used. Graham, Hale, and Gaffney (2014) reported that as few as 0.7% of 19.6 million tweets contained geo coordinates. For tweets mentioning specific cities, geotagging was between 2% and 5% (Severo, Giraud, & Pecout, 2015). The location of a Twitter user can also be obtained from a designated field in the account description. As these profile locations can be freely assigned by users, they are problematic to use without extensive data cleaning. Of user accounts with tweets containing some location information, 7.5% contained latitude and longitude values, 57% a named location, 20.4% information that helped to identify a country, while 15.1% provided humorous or non-spatial information (Takhteyev, Gruzd, & Wellman, 2012).
Altmetric.com curates profile locations to assign coordinates of cities and countries. Based on all tweets captured by Altmetric.com, 65% of tweets and 60% of unique users had valid (i.e. not ‘null’ or ‘0’) latitude and longitude values in at least one of their tweets. Focusing on the most active users, geolocation was available for 68% of users. Only 1% of all and 4% of the most active users had more than one geolocation. This maybe be partly due to users rarely changing location but is most likely caused by Altmetric.com updating Twitter profile information, as described above. Longitudinal studies of location changes of users can thus not be conducted based on Altmetric.com location information. Top countries included the US (22%), UK (8%), Canada (3%) and Japan (3%). The analysis of number of Twitter accounts per geo coordinates revealed large cities such as London, New York City, Washington DC, Toronto and San Francisco among the most frequent locations but also remote places in the UK and Kansas . These are assumed to be assigned when users do not specify a city. The implied accuracy of the geolocation should thus be handled with care. Location information provided by Altmetric.com thus seems more reliable at the country than the city level.

Supplementary materials

PDF