Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
Institute for Quantitative Social Science, Harvard University, Cambridge, MA, USA
What exactly does eurekometrics mean? One example of a eurekometric approach is the examination of how discoveries become more difficult over time, using discovery data for minor planets, mammalian species, and chemical elements . Another is the prediction of future discovery in the realm of extrasolar planets, using only the properties of previously discoveries . And this sort of analysis will only be growing, as we are on the verge of a massive wave of new discoveries from a variety of sources.
Due to automated discovery, large-scale data collection, as well as citizen science projects, we are set to have vast quantities of discovery data available. Examples of such projects derive from the fields of astronomy [8–10], pharmaceuticals , mathematical theorems , marine biology [13, 14] , protein folding , even ornithology . And the output of these initiatives arent just publications, but findings.
How can eurekometrics be integrated into the field of alternative metrics? Specifically, when it comes to altmetrics, how can ones discoveries be properly accounted for and valued? Equating publication and discovery is a far-from-perfect solution. There are many examples where there are multiple discoveries within a single publication (e.g. the recent Kepler results), and many papers where there isnt even a clear single quantifiable discovery (a methods paper, for example). Similarly, many disoveries are never even published as an article in a peer-reviewed journal, such many discoveries of minor planets or computationally generated theorems.
So how do we resolve this and provide a way of measuring discovery? This question is a subproblem of data citation. Recently, there has been a movement to create mechanisms for citation of datasets using the Digital Object Identifier (DOI) . However, discoveries must additionally be tagged as such, as they are a somewhat distinct type of dataset. But furthermore, in addition to a distinct class of discovery DOI, there must also be a way of identifying all discoveries of a certain type. For example, while all datasets might be citable, how can all species definitions or genomic sequences be easily examined?
Two possible avenues are available for detailed classification: a formal markup methodology for discov- eries, such as a Discovery Markup Language. While this would have the advantage of providing a uniform method of analysis, such a formal language would be unlikely to handle the fast-paced changing nature of scientific discovery. A second path would be a mechanism for a folksonomy-style classification , as long as the distinctive nature of a discovery is highlighted in the DOI citation.
These mechanisms, in connection with data citation, will allow eurekometrics to blossom. But fur- thermore, it will allow discoverers to have a more direct means of citing and cataloging their contribution to the scientific process.
1. Nature (2008) Community cleverness required. Nature 455:1.
2. Lazer D, Pentland A, Adamic L, Aral S, Barabasi AL, et al. (2009) SOCIAL SCIENCE: Computational Social Science. Science 323: 721–723.
3. Evans J, Rzhetsky A (2010) Machine Science. Science 329: 399–400.
4. Waltz D, Buchanan BG (2009) COMPUTER SCIENCE: Automating Science. Science 324: 43–44.
5. Arbesman S, Christakis NA Eurekometrics: Analyzing the Nature of Discovery. PLoS Computational Biology .
6. Arbesman S (2011) Quantifying the ease of scientific discovery. Scientometrics 86: 245–250.
7. Arbesman S, Laughlin G (2010) A Scientometric Prediction of the Discovery of the First Potentially Habitable Planet with a Mass Similar to Earth. PLoS ONE
8. Kevork A, al E (2003) The First Data Release of the Sloan Digital Sky Survey. The Astronomical Journal 126: 2081.9. Stokes GH, Evans JB, Viggh HEM, Shelly FC, Pearce EC (2000) Lincoln Near-Earth Asteroid Program (LINEAR). Icarus 148: 21–28.
10. Land K, Slosar A, Lintott C, Andreescu D, Bamford S, et al. (2008) Galaxy Zoo: the large-scale spin statistics of spiral galaxies in the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society 388: 1686–1692.
11. Caschera F, Gazzola G, Bedau MA, Bosch Moreno C, Buchanan A, et al. (2010) Automated Discovery of Novel Drug Formulations Using Predictive Iterated High Throughput Experimentation. PLoS ONE 5: e8546.
12. MacKenzie D. (2004) Mechanizing Proof: Computing, Risk, and Trust (Inside Technology). The MIT Press, 439 pp. URL http://www.amazon.com/Mechanizing-Proof-Computing-Inside-Technology/dp/0262632950.
13. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, et al. (2004) Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science 304: 66–74.
14. Ausubel JH, Crist DT, Waggoner PE (2010). First Census of Marine Life 2010: Highlights of a Decade of Discovery.
15. Cooper S, Khatib F, Treuille A, Barbero J, Lee J, et al. (2010) Predicting protein structures with a multiplayer online game. Nature 466: 756–760.
16. Dunn EH, Francis CM, Blancher PJ, Drennan SR, Howe MA, et al. (2009) Enhancing the Scientific Value of the Christmas Bird Count. The Auk 122: 338–346.
17. Brase J (2010) DataCite – A Global Registration Agency for Research Data.
18. Trant J (2009) Studying Social Tagging and Folksonomy: A Review and Framework. Journal of Digital Information; Vol 10, No 1 (2009): Digital Libraries and User-Generated Content .