This is version 0 of an abstract to be presented at altmetrics11.
Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield, UK
In recent decades the focus of public impact of science research has shifted from scientific literacy to public engagement . Social media present an opportunity for analyzing public attitudes towards scientific issues. Recent research has targeted them as a resource for opinion mining and cultural trend spotting, e.g.,  , and their potential use as a resource for altmetrics has been proposed . However it is unclear whether the level of public discussion of science issues on microblogging sites is sufficient to support the assessment of public impact of science for all but a few trends (in Twitter jargon a trend is a term whose frequency spikes for a period), such as the H1N1 spike detected by Cheong & Lee  during a flu pandemic. We argue that to do so requires methods that allow trends in relatively infrequent terms to be spotted. The work presented here makes a first step towards developing such methods by comparing frequency of terms in tweets against a terminology baseline. We refer to a frequency peak above this baseline as a relative trend.
A small-scale, exploratory experiment was conducted to compare the frequency of a selection of scientific terms drawn from the UNESCO Thesaurus (http://www2.ulcc.ac.uk/unesco/thesaurus.htm) in samples collected using the Twitter API in March 2011 (the Twitter4J library v1.6.1 was used (http://twitter4j.org/en/index.html)). The UNESCO Thesaurus was chosen because it is available in English, French and Spanish, giving potential for multilingual follow-on studies. Given the absence of a shared Twitter corpus we used the Google Books NGrams corpus (http://ngrams.googlelabs.com/datasets) as a baseline sample (this is also available in multiple languages).
Table 1. UNESCO Thesaurus 1Gram Terms
|Ionization, Electromagnetism, Crystallography|
|Phosphorus, Alkalinity, Microchemistry|
|Permafrost, Lithosphere, Glaciology|
2.1 Term Selection and Baseline
Sets of three terms each were selected from three UNESCO Thesaurus Hierarchies Physical Sciences, Chemical Sciences and Earth Sciences (see Table 1). Care was taken to select terms, which we hoped would minimize noise caused by factors such as polysemy. The factors considered were as follows. 1) For practical reasons, the terms had to be 1Grams (the Google NGrams datasets grow by a factor of ten between the 1Gram set (ten 1GB files) and the 2Gram set (100 1GB files)). 2) Words with stemming issues (e.g. acid/acids/acidic) were excluded to ensure the same number of filters for each topic. 3) Words with obvious common language usage were avoided as these might bias harvesting to non-science related tweets. 4) Terms were avoided which might have overlap between sets, e.g. Geochemistry overlaps Earth Sciences and Chemical Sciences. The Google 1Grams baseline corpus provides a large sample of usage. The total occurrences of all the sampled sets of terms were 116029 for 2006, 126206 for 2007 and 111417 for 2008. The distribution of the terms in the baseline corpus by topic is shown in the three left hand columns of Figure 1.
It is clear that, although there is some annual variation, Chemical Sciences are the largest group (50-60%) and Earth Sciences the smallest (approx. 10%). Table 1. UNESCO Thesaurus 1Gram Terms Topic Physical Sciences Chemical Sciences Earth Sciences Terms Ionization, Electromagnetism, Crystallography Phosphorus, Alkalinity, Microchemistry Permafrost, Lithosphere, Glaciology
2.2 Twitter Samples
The public Twitter feed was sampled for posts containing any of the nine terms. Each sample contained 300 consecutive posts, Collection periods are summarized in Table 2. Despite the small sample size, each sample took between 2 and 3 days to collect because these terms occur infrequently in tweets (Twitter currently claims 50 million tweets per day giving an estimate of 0.003% of tweets containing these terms in the collection period). This supports our argument that methods for lower frequency terms are needed and implies that the collection of more representative samples of scientific Tweets would require considerable investment of time.
The distribution of terms by topic in the Twitter samples is shown in the three right hand columns of Figure 1. As we would expect for small samples, the distribution is more variable than for the Google NGrams. However, it is broadly in line with the NGrams, in particular Chemical Sciences is always the largest topic. This suggests that Google NGrams could be a “good enough” baseline despite being sampled from a very different source. For two of the samples (T-300-1 24% and T-300-2 19%) the proportion of terms in the Earth Science group is noticeably larger than in the Google NGrams sample. For the third sample (T-300-3) the proportion of terms in the Chemical Sciences group is higher (69%). Could these deviations from the baseline distribution represent relative trends? To examine this in more detail, distributions for individual terms were plotted, see Figure 2 (note that for clarity the three Google samples were merged for this plot as G-2006-2008).
Table 2. Twitter Samples
|Sample ID||Collection Period||Elapsed Time h|
|T-300-1||Tue Mar 01 20:56:43 GMT 2011 –
Thu Mar 03 14:22:18 GMT 2011
|T-300-2||Fri Mar 04 02:35:55 GMT 2011 –
Sun Mar 06 18:38:05 GMT 2011
|T-300-3||Mon Mar 07 20:31:11 GMT 2011 –
Wed Mar 09 16:21:36 GMT 2011
The two Twitter samples with higher coverage of Earth Sciences both show a peak for the term Permafrost (17 and 15% c.f. 5% in G-2006-2008). The third sample, T-300-3, shows a peak for the term Phosphorus (57% c.f. 47%), which contributes to the higher proportion of the Chemical Sciences topic in this sample. We also noted an apparent peak for the term Alkalinity in sample T-300-2 (20% vs 6%). The tweets for these three terms were examined.
Table 3. Number of Phosphorus tweets per category (Total includes uncategorized tweets)
|Sample ID||Total||Legislation||Nutrition||Other Sci.||Industry||White Phos.|
Permafrost: The term Permafrost was used in a variety of ways unrelated to science: only 41 of the total 113 Tweets (36%) containing the term were judged to have scientific content. In addition to metaphorical references to cold, it referred to an online game server and a designer case for the iPhone.
Phosphorus: Phosphorus proved more interesting and was classified into five subcategories (see Table 3). Categories were: Legislation (to limit the use of phosphorus in fertilizers and soap), Nutrition (typically the phosphorus content of specific foods), Other Science topics (including peak phosphorus, phosphorus pollution, a new paper on the Redfield ratio in organisms, and the discovery of arsenic replacing phosphorus in a microbe), Industry (mergers and prices of Phosphorus containing goods) and White Phosphorus (concerning its use in Middle East wars). Trending categories in sample T-300-3 are Industry and White Phosphorus. In Industry, the takeover of a Brazilian company by the Indian firm United Phosphorus is prominent, whereas White PhosphorusÊis boosted by 17 retweets of an emotive message. Interest in the acquisition may be influenced by earlier discussion on peak phosphorus and fertilizer, particularly given IndiaÕs food security relies on the agricultural methods of the Green Revolution. We judge that Phosphorus could be a genuine relative trend in this sample, showing impact of scientific issues on economics.
Alkalinity: None of the tweets we sampled for the term Alkalinity were judged to have scientific content. It occurred mainly in pseudo-scientific health advice. The peak in sample T-300-2 (31 of 60 tweets) was boosted by tweets about measuring pH in swimming pools, fish tanks etc. (T-330-2 was collected over a weekend when more people are engaged in leisure activities).
Resource constraints imposed a number of restrictions on the study, including the use of 1Grams and small Twitter samples. These mean that the results need to be interpreted cautiously. As the experience with Permafrost and Alkalinity shows, the use of scientific terms on Twitter is often colloquial. However, the Phosphorus case shows that impacts of scientific knowledge on public discussion are sometimes observable. Google NGrams may not be an ideal baseline for Twitter studies as it is known that social media differ in coverage from edited media . In general, lack of a common baseline corpus of tweets sheds doubt on the reproducibility of Twitter based studies, which is an important issue for altmetrics . However the ethical and legal issues associated with providing corpora of user created content are considerable. Taking the Google NGrams corpus as a model, we wonder whether an NGram baseline of Twitter is feasible?
Thanks to Elizabeth Cano for discussions about the work.
 Bauer, M., Allum, N. & Miller, S. 2007. ÔWhat can we learn from 25 years of PUS (Public Understanding of Science) survey research? Liberating and expanding the agendaÕ, Public Understanding of Science, 16: 79-95.
 Tumasjan, A et al. 2010, Predicting elections with Twitter: what 140 characters reveal about political sentiment. In 4th International ISWSM 2010.
 Cheong,M. & Lee,V. 2009. Integrating web-based intelligence retrieval and decision-making from the Twitter trends knowledge base. In SWSM 2009.
 Priem J., Hemminger B.M. 2010. Scientometrics 2.0: Toward new metrics of scholarly impact on the social web. First Monday, v15, no.7-5.
 Michel, J.B. et al. 2011, Quantitative analysis of culture using millions of digitized books, Science, v.331, n176, pp. 176-182.
 Halavais, A. & Lackaff, D. 2008 An analysis of topical coverage of Wikipedia, Journal of Computer-Mediated Communication 13 (2008) 429Ð440.
 Lane, J. 2010. LetÕs make science metrics more scientific, Science, v.464, pp.488-489