Altmetrics for Eurekometrics [v0]

This is version 0 of an abstract to be presented at altmetrics11.

Samuel Arbesman
Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
Institute for Quantitative Social Science, Harvard University, Cambridge, MA, USA

Scientometrics is no longer simply the domain of the analysis at the level of the publication. Due to the advent of vast computational resources and storage capacity available to scientists and automated science [1–4] , there is now the potential for a new type of scientific measurement: quantitatively examining scientific discoveries themselves. This study of discoveries, rather than simply scientific publications, offers the opportunity to understand science at a deeper level. In a previous paper, this discovery-based approach to scientometrics, which we termed eurekometrics, was explored [5]. Here I briefly explore eurekometrics and its potential growth, as well as how eurekometrics and discoveries can be integrated into the altmetrics framework. 

What exactly does eurekometrics mean? One example of a eurekometric approach is the examination of how discoveries become more difficult over time, using discovery data for minor planets, mammalian species, and chemical elements [6]. Another is the prediction of future discovery in the realm of extrasolar planets, using only the properties of previously discoveries [7]. And this sort of analysis will only be growing, as we are on the verge of a massive wave of new discoveries from a variety of sources.

Due to automated discovery, large-scale data collection, as well as citizen science projects, we are set to have vast quantities of discovery data available. Examples of such projects derive from the fields of astronomy [8–10], pharmaceuticals [11], mathematical theorems [12], marine biology [13, 14] , protein folding [15], even ornithology [16]. And the output of these initiatives arent just publications, but findings.

How can eurekometrics be integrated into the field of alternative metrics? Specifically, when it comes to altmetrics, how can ones discoveries be properly accounted for and valued? Equating publication and discovery is a far-from-perfect solution. There are many examples where there are multiple discoveries within a single publication (e.g. the recent Kepler results), and many papers where there isnt even a clear single quantifiable discovery (a methods paper, for example). Similarly, many disoveries are never even published as an article in a peer-reviewed journal, such many discoveries of minor planets or computationally generated theorems.

So how do we resolve this and provide a way of measuring discovery? This question is a subproblem of data citation. Recently, there has been a movement to create mechanisms for citation of datasets using the Digital Object Identifier (DOI) [17]. However, discoveries must additionally be tagged as such, as they are a somewhat distinct type of dataset. But furthermore, in addition to a distinct class of discovery DOI, there must also be a way of identifying all discoveries of a certain type. For example, while all datasets might be citable, how can all species definitions or genomic sequences be easily examined?

Two possible avenues are available for detailed classification: a formal markup methodology for discov- eries, such as a Discovery Markup Language. While this would have the advantage of providing a uniform method of analysis, such a formal language would be unlikely to handle the fast-paced changing nature of scientific discovery. A second path would be a mechanism for a folksonomy-style classification [18], as long as the distinctive nature of a discovery is highlighted in the DOI citation.

These mechanisms, in connection with data citation, will allow eurekometrics to blossom. But fur- thermore, it will allow discoverers to have a more direct means of citing and cataloging their contribution to the scientific process.

References

1. Nature (2008) Community cleverness required. Nature 455:1.

2. Lazer D, Pentland A, Adamic L, Aral S, Barabasi AL, et al. (2009) SOCIAL SCIENCE: Computational Social Science. Science 323: 721–723.

3. Evans J, Rzhetsky A (2010) Machine Science. Science 329: 399–400.

4. Waltz D, Buchanan BG (2009) COMPUTER SCIENCE: Automating Science. Science 324: 43–44.

5. Arbesman S, Christakis NA Eurekometrics: Analyzing the Nature of Discovery. PLoS Computational Biology .

6. Arbesman S (2011) Quantifying the ease of scientific discovery. Scientometrics 86: 245–250.

7. Arbesman S, Laughlin G (2010) A Scientometric Prediction of the Discovery of the First Potentially Habitable Planet with a Mass Similar to Earth. PLoS ONE

8. Kevork A, al E (2003) The First Data Release of the Sloan Digital Sky Survey. The Astronomical Journal 126: 2081.9. Stokes GH, Evans JB, Viggh HEM, Shelly FC, Pearce EC (2000) Lincoln Near-Earth Asteroid Program (LINEAR). Icarus 148: 21–28.

10. Land K, Slosar A, Lintott C, Andreescu D, Bamford S, et al. (2008) Galaxy Zoo: the large-scale spin statistics of spiral galaxies in the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society 388: 1686–1692.

11. Caschera F, Gazzola G, Bedau MA, Bosch Moreno C, Buchanan A, et al. (2010) Automated Discovery of Novel Drug Formulations Using Predictive Iterated High Throughput Experimentation. PLoS ONE 5: e8546.

12. MacKenzie D. (2004) Mechanizing Proof: Computing, Risk, and Trust (Inside Technology). The MIT Press, 439 pp. URL http://www.amazon.com/Mechanizing-Proof-Computing-Inside-Technology/dp/0262632950.

13. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, et al. (2004) Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science 304: 66–74.

14. Ausubel JH, Crist DT, Waggoner PE (2010). First Census of Marine Life 2010: Highlights of a Decade of Discovery.

15. Cooper S, Khatib F, Treuille A, Barbero J, Lee J, et al. (2010) Predicting protein structures with a multiplayer online game. Nature 466: 756–760.

16. Dunn EH, Francis CM, Blancher PJ, Drennan SR, Howe MA, et al. (2009) Enhancing the Scientific Value of the Christmas Bird Count. The Auk 122: 338–346.

17. Brase J (2010) DataCite – A Global Registration Agency for Research Data.

18. Trant J (2009) Studying Social Tagging and Folksonomy: A Review and Framework. Journal of Digital Information; Vol 10, No 1 (2009): Digital Libraries and User-Generated Content .

3 Comments

  1. Posted June 10, 2011 at 3:08 pm | Permalink

    Marking up discoveries is a fascinating problem, especially in the case of automatated discovery. How are discoveries named? Are we doomed to a future of teaching kids that things work because of Theorem 27b/6, or will discoveries get human-readable names once deemed important enough?

    It wasn’t quite clear from my first read that “Discovery Markup Language” does not exist. Maybe you could reword that a little to clear it up? Nevertheless, what do you envisage would be in the DML? What are the pertinent pieces of information a DML would need to record?

  2. parra
    Posted June 14, 2011 at 3:20 pm | Permalink

    A fundamental question made by one of the atendees in altmetrics Workshop is What is a Discovery?. I will rephrase it as “Up to what level of granularity do you consider a discovery to be such?”. Is it the finding of a correlation between being in LinkedIn and having a high hindex a discovery?

  3. Posted June 15, 2011 at 7:31 am | Permalink

    Both of these points are good. “Discovery” is a vague term and its meaning needs to be ironed out. I’m willing to err on the generous side and say that anything we can describe counts, but then we need a way to mark it up so it can be readable, which just pushes the problem along.

Post a Comment

Your email is never shared. Required fields are marked *

*
*