The search for alternative metrics for taxonomy [v0]

This is version 0 of an abstract to be presented at altmetrics11.

Daphne Duin MSc
Prof. dr  Peter van den Besselaar
VU‐University Amsterdam, Organization Science & Network Institute

Introduction

Some scholars have concerns about the decline of traditional peer review mechanisms and impact measures triggered by the digitization of science. Others, as we argue here, are excited about the possibilities of the Web for the development of alternative metrics to measure scholarly impact and quality. These supporters of change critique the slowness of traditional peer review and the incompleteness of the ISI Web of Knowledge (hereafter WoS) databases as well as their measurements not fitting the needs of certain disciplines. At the same time, peer review and WoS based indicators miss large parts of research impact, especially societal impact (Maassen van de Brink et al., 2010; De Jong et al., 2011). In this paper we will focus on an alternative for post‐ publication impact filtering in one particular field of science.

Our research is motivated by 1) a strong demand in the taxonomic community (a discipline in biology studying living and extinct organisms) to use alternative methods for impact filtering and 2) a general trend in research policy to couple funding to the socio‐economic impact of science (European Commission, 2009).

More generally, research and its results can have various (scientific and societal) audiences, and appropriate impact assessment methods should reflect this. From the position that socio‐ economic impact of science is one of the relevant impact indicators for “good science”, we investigated in this paper if and how we can (i) identify the relevant audiences and (ii) measure the impact it has in terms of use frequency. The hypothesis is that this can be studied from visits to websites that present the research results to the intended (and unintended) audiences. We will study this firstly for one single research group in computer science as a preliminary test, and then apply the approach on the field of taxonomy by using the web server data from Scratchpads. Scratchpads are an online platform for collaborative work, data sharing and analysis (a research infrastructure) and have a global user community. We analyse the web server logs of the platforms to explore if they hold information to identify user audiences and their affiliations. And by comparing the share the various audiences have in the web visits, we learn about the scientific and societal domains in which the infrastructure has impact. By comparing changes over time, we may (in the future) map the increase or decrease of impact in those domains.

Frame of reference

The scientific field of taxonomy is our frame of references. Taxonomy is rooted in natural history museums and herbaria around the world going back to Linnaeus times. The scientific output of the community is estimated at 5.4 million volumes on biodiversity, dating back to 1469 including 800,000 monographs and 40,000 journal titles (Gwinn, 2009).

Over the last ten years or so, taxonomists and sympathizers (Agnarsson, 2007; Godfray, 2002; Krell, 2000, 2002; Valdecasas, 2000; Wheeler, 2005) have waved a red flag to policy makers and founding bodies for a too rigid use of the WoS impact measures which are, according to the scholars, not suitable to measure the quality of their work. One of the arguments against the use of the WoS Impact Factor is the “long‐selve life” of taxonomic publications. Over time the relevance of taxonomic publications remains the same, original descriptions have to be referred to forever, independent of the paper’s quality. Krell estimated that the average age a paper gets cited is 36 years (2002). Another argument is that the community is characterized by a high level for specialization around specific organisms. “Therefore the chance to become cited by colleagues is relatively rare compared with other fields” (Krell, 2002, p. 957). At the same time taxonomic 1 publications seem to fit very well the environment of the Web. For instance publications make use of dense and highly structured data sets (database) that can be turned into machine readable pieces of information. Society at large has a great interest in linking the data of the taxonomic communities in order to build the larger picture of knowledge on global biodiversity. This picture is important for scientific and societal fields such as ecology, environmental studies, public health, biodiversity conservation and urban planning. Moving publications to the Web is seen by most critiques of the current system as the way to come to “fair citation count” (Godfray, 2002, p. 19) and at the same time will serve the interest of other sciences and society at large.

Taxonomists were slow to embrace the Web and using digital data but this is history [Penev et al., 2010]. Today several major initiatives are running to digitize biological collections, library stocks, and are linking collection databases on a global scale (some digitization and Web data initiatives are: Biodiversity Heritage Library [BHL]; Global Biodiversity Information Facility [GBiF]; Encyclopedia of Life [EOL]). But the Web is not only adopted at the institutional level, also individual researchers have started to collaborate online in so called “virtual research communities” using platforms such as Scratchpads. This service allows geographical scattered specialist to collaborate, share and analyze data online. The use of Scratchpads ranges from blog type discussions up‐to analyzing data sets and collaborative writing of scientific papers [Smith et al., 2010]. In April 2011 there were 200 community sites online with more than 2500 registered users.

Data and results

1. The computer science case. We analyzed the logfile for about a week. This showed us the distribution of users over types of audience. Interestingly, as table 1 shows, the research group under study has a very mixed audience, including universities and public research institutes (44%), non‐governmental organizations (5% ) and companies (28%). Apart from that, we found what activities of the group are having impact, and this is clearly the software tools developed. These count for more than 90% of the visits (Van den Besselaar & Heimeriks, 2011).

2. The Scratchpad case. Scratchpads are hosted on the Web server of the Natural History Museum in London. We used the Scratchpads web server log files to analyze the origin of incoming web traffic. We were interested to know if from an analysis of the IP addresses we can retrieve categories of “users “. We were in particular keen to see if we can identify users from academia and government. We looked at 1) the combined lists of incoming traffic (IP addresses) to the total number of Scratchpads and 2) the traffic generated to one specific Scratchpad MicroOrg (fictional name) from which the maintaining researcher told us he noticed a significant interest in the content of his site from outside academia (public services) . Covering the period from July 2010 until April 2011, our data set contains 2245 unique IP addresses that came in to the Scratchpad domain and 561 who visited the MicroOrg website. These data will be analyzed for the final paper.

Discussion

In many fields, among those of taxonomy, there is a clear need to create alternative metrics to evaluate the scholarly and societal impact of scientific output. With the data and the community moving to the Web a web based metric seems the most sensible way forward. In this paper we explored if IP addresses retrieved from web server logs can be used to identify users of scientific information. This seems to work in the first case, and more cases will be analyzed. At this moment, we are interested to learn about the stretch of the audiences, and not so much in the absolute figures, this will be tackled in a later stage of our research. We will also discuss strengths and weaknesses of the approach and the relation with more traditional impact measures.

References

Agnarsson, I., & Kuntner, M. (2007). Taxonomy in a changing world. Seeking solutions for a science in crisis. Systematic Biology, 531‐539. doi: 10.1080/10635150701424546.

De Jong, S. P.,Van Arensbergen, Daemen, F., Van der Meulen, B., & Van den Besselaar, P. (2011). Evaluating of research in context ‐ an approach and two cases. In Research Evaluation 20.

ETAN Expert Working Group (1999). Options and limits for assessing the socio‐economic impact of European RTD programmes. ftp://ftp.cordis.europa.eu/pub/etan/docs/master‐impact.pdf

European Commission (2009). Impact assessment guidelines. http://ec.europa.eu/governance/impact/commission_guidelines/docs/iag_2009_en.pdf

Ferrini, A., & Mohr J. (2009). Uses, limitations, and trends in Web analytics. In B. J. Jansen, A. Spink & I. Taksa (Eds.), Handbook of research on Web log analysis (pp. 124‐142). Hershey, PA: IGI.

Godfray, H. C. J. (2002). Challenges for taxonomy. Nature, 417(6884), 17‐9. doi: 10.1038/417017a.

Gwinn, N. E., & Rinaldo, C. (2009). The Biodiversity Heritage Library : sharing biodiversity literature with the world. IFLA Journal. doi: 10.1177/0340035208102032.

Krell, F.T. (2000). Impact factors aren’t relevant to taxonomy. Nature, 405, 507‐508 Krell, F.T. (2002). Why impact factors donʼt work for taxonomy. Nature, 415 (6875), 957. doi: 10.1038/415957a.

Maassen van der Brink, H., De Haas, M., Van den Heuvel, J., Spaapen, J., Elsen, M., Westenbrink, R., Van den Besselaar, P., Van der Meulen, B., Van Drooge, L. (2010). Evaluating the societal benefits of academic research, a guide. Rathenau Instituu.

Penev, L., Roberts, D., Smith, V.S., Agosti, D., Erwin, T. (2010). Taxonomy shifts up a gear: New publishing tools to accelerate biodiversity research, ZooKeys 50, i‐iv, doi: 10.3897/zookeys.50.543

Smith, V.S., Duin, D., Self, D., Brake, I. & Roberts, D. (2010). Motivating online publication of scholarly research through social networking tools. Conference Proceedings paper delivered at COOP2010, the 9th International Conference on the Design of Cooperative Systems on 18 May, 2010 as part of a workshop titled Incentives and Motivation for Web‐Based Collaboration, p. 329‐340

Valdecasas, A. G., Castroviejo, S. & Marcus, L. F. (2000). Reliance on the citation index undermines the study of biodiversity. Nature, 403: 698

Van den Besselaar, P., & Heimeriks , G. (2011). New media and communication networks in knowledge production. Forthcoming in Cybermetrics, 15.

Wheeler, Q. D., & Valdecasas, A. G. (2005). Ten challenges to transform taxonomy. Graellsia, 61, (2), 151‐160 4

 

One Comment

  1. Victoria Uren
    Posted May 19, 2011 at 4:45 pm | Permalink

    Is the work on Research Objects (next paper in the workshop) relevant to you? There seem to me to be some broad similarities between taxonomy and software as kinds of research output, in that both are human generated artifacts (making them different from “raw” data sets as might be shared in eScience).

    regards Victoria

Post a Comment

Your email is never shared. Required fields are marked *

*
*