ACM Web Science Conference 2012 Workshop
Evanston, IL, 21 June 2012
Center for Complex Networks and Systems Research, School of Informatics & Computing, Indiana University, Bloomington, USA
Maps of science have been created from citation data to visualize the structure of scientific activity. However, most scientific publications are now accessed online. Scholarly web portals provide access to publications in the natural sciences, social sciences and humanities. Scholarly web portals record detailed log data at a scale that exceeds the number of all existing citations combined. Such log data is recorded immediately upon publication and keeps track of the sequences of user requests (clickstreams) that are issued by a variety of users across many different domains. Given these advantages of log datasets over citation data, we investigate whether they can produce reliable more current maps of science similar to the ones generated by Bollen et al. . We validate the previous work by generating new maps of science from clickstream data provided by California Digital Library (CDL) over a period of approximately three years from April 2008 to March 2011.
Log datasets have attractive characteristics when compared to citation datasets: they can be aggregated to cover all scholarly disciplines, and they reflect the activities of a broader scholarly community. But, most importantly, the immediacy of log datasets offers the possibility to study the dynamics of scholarship in real-time, not with a multi-year delay, as is currently the case with citation data. The resulting potential for a wide variety of analysis of the structure and dynamics of scholarship, such as trend analysis and prediction .
Maps constructed from clickstream data can serve numerous functions. Similar to citation maps they provide a means to visually assess the relationships between various domains and journals. However, clickstream maps of science can offer an immediate perspective on what is taking place in science and can thus aid the detection of emerging trends, inform funding agencies, and aid researchers in exploring the interdisciplinary relationships between various scientific disciplines. Clickstream maps can furthermore be used as the basis for exploration and recommendation services that rank journals according to the various parameters of network topology, so that researchers can identify influential journals in any particular domain of interest.
We collected nearly 31 million user interactions recorded by California Digital Library which supports 10 campuses of University of California. For each user interaction the resulting dataset contained the metadata about article identifier, date-time to the second and session identifier. We used the session identifier and date-time to reconstruct temporal sequences  of interactions by the same user. These sequences are mapped to article clickstreams, each of which records the navigation of a user from one article to another. Since each article is published in a journal, these article clickstreams are translated to journal clickstreams. The resulting data set is a collection of journal clickstreams that reflects the navigation of users from one journal to another when interacting with scholarly web portals. The clickstream model was validated by mapping the journals to the Essential Science Indicators from Thomson Reuters . To visualize the clickstream map of science with approximately 16,000 nodes and 1.8 million edges, we only use journal relationships for which we had a minimum number of observations to support the particular connection. We selected the 18,000 journal pairs with the highest values and only top six neighbors were considered in the weighted directed graph. The layout used for the graph for visualization is Fruchterman Reingold , which can converge on different visualizations of the same network data. The resulting journal network that outlines the relationships between various scientific domain as shown in Figure 1.
The visualization shown in Figure 1 displays clustering of related domains based on color coding. The colors chosen for the visualization are kept consistent with maps of science constructed in . Disciplines immunology, pharmacology & toxicology and clinical medicine forms the red cluster. The cluster is connected to yellow cluster consisting of social sciences and psychiatry & psychology through neurosciences & behavior(mustard). Microbiology, molecular biology & genetics forms a green cluster and chemistry(dusty blue) is located along with materials science(blue) material sciences. Geosciences(green) and agricultural sciences(brown) are also connected. The visualization shows meaningful clusters and relations between various domains.
We are able to create the meaningful maps of science by using data from only one provider. This shows that data from a single provider is meaningful and could be trusted for further analysis of the clickstream data. Various network properties can be computed for the maps of science. Properties such as betweenness and page-rank can help in finding the interdisciplinary or important domain specific journals. The temporal betweenness and/or page-rank can be computed for the journals given the data spans for a time period of three years. This can help in analyzing the change in journal’s importance over time and how journals are added and/or removed from the map over a period of time.
 Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, Chute R, et al. (2009) Clickstream data yields high-resolution maps of science. PLoS One 4: e4803.
 Bettencourt L, Kaiser D, Kaur J, Castillo-Chavez C, Wojick D (2008) Population modeling of the emergence and development of scientific fields. Scientometrics 75: 495-518.
 Kleinberg J (2006) Temporal dynamics of on-line information streams. Data Stream Management: Processing High-Speed Data Streams.
 Science Watch: Journal List. URL http://sciencewatch.com/about/met/journallist/. Accessed 2012.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.