Copyright©2009 Antony Williams
There are a lot of conversations going on in the community about Open Data, specifically on the Open Knowledge Foundation email list. A recent blog post announces a working group on Open Data in Science and I’ve sent an email offering to provide input. Hosting ChemSpider has certainly allowed me to get engaged in frontline conversations regarding people’s willingness to share their data and what the perceived differences of Open vs Free are. I don’t have all the answers to all the questions but this area is a growing area of interest and concern for scientists and will likely remain in the spotlight for the foreseeable future. My judgment is that the majority of scientists do not care whether data are free or Open despite the potential repercussions in terms of reuse that this distinction will produce. Scientists do care about whether their own data are free or Open as soon as I discuss with them what the differences are (based on my own understanding of the differences!). See my previous post…
In this regard let’s chat about the Spectral Game for a moment. The Spectral Game is, even now, a resounding success and, in many ways, is surpassing our early expectations in terms of capability and usage. As of last week spectra had been viewed 20,261 times by 1305 unique visitors from47 countries. That’s quite amazing for an online game for chemists that is proliferating through word of mouth (blogs, emails, RSS feeds) only. The spectral game is fed by Open Data and now has over 1000 spectra feeding into the game. These have been supplied by scientists willing to make their data Open and by myself, sourcing data and processing during long evenings in front of a good movie. Open Data has been the criterion we have used to feed the Spectral Game
Recently Jean-Claude Bradley and I were talking about expanding the dataset on the spectral game to include more Mass Spectral, Infrared and UV-Vis data. The NIST Webbook is a rich source of such information and the data CAN be downloaded as JCAMP spectra for local processing. Due to the gracious nature of the people at NIST a request to allow us to download and use some of their data in the spectral game was greeted with full support and we have permission to do so and have already started the process. An example set of spectra can be found for Cholesterol (here) where there is now HNMR, CNMR, EI-MS, UV and IR data. The data were downloaded via this page: http://webbook.nist.gov/cgi/cbook.cgi?Name=cholesterol&Units=SI . The data are NOT Open Data however. If you visit the spectral pages you will see the ownership declared specifically. For the MS page it says :
Owner NIST Mass Spectrometry Data Center
Collection (C) 2007 copyright by the U.S. Secretary of Commerce on behalf of the United States of America. All rights reserved.
Origin T.IIDA NIHON UNIVERSITY, KORIYAMA, FUKUSHIMA-KEN, JAPAN
NIST MS number 67286
and for the IR spectra there are multiple sources:
Data compiled by: Coblentz Society, Inc.
* SOLID (KBr DISC) VS KBr
$$SEE 5095 FOR SOLUTION; PERKIN-ELMER 21 (GRATING); DIGITIZED BY COBLENTZ SOCIETY (BATCH I) FROM HARD COPY; 2 cm-1 resolution
* SOLID (MINERAL OIL MULL); Not specified, most likely a prism, grating, or hybrid spectrometer.; DIGITIZED BY NIST FROM HARD COPY; 4 cm-1 resolution
* SOLUTION 1% (CS2 FOR 2-15 microns, AND C2Cl4 FOR 5.5-7.2 microns)
$$SEE 5106 FOR KBr DISC; PERKIN-ELMER 21 (GRATING); DIGITIZED BY COBLENTZ SOCIETY (BATCH I) FROM HARD COPY; 2 cm-1 resolution
Both NIST and the Coblentz Society generate revenue from some of their data collections despite the fact that these data on Webbook are offered free for viewing. The NIST MS database is the most widely distributed MS database in the world (I believe) and they also offer an IR database for sale . Other data are available (1) The Coblentz society have been building their databases for decades and also offer them for sale. If you look at the prices of the Coblentz collection or the NIST IR collection they are a a hundred to 2 hundred dollars per collection. Maybe some rich uncle could write a check and release them into the world of Open Data for all to use? Otherwise the groups maintaining these collections deserve to have their costs covered at a minimum..which is probably what their revenue streams from these databases allow.
We are thankful to NIST to allow us to upload spectra from the Webbook and we have started to upload data. It will only be a slice of the collection. We will flag the data on our side as NOT Open Data but “Can be accessed by Spectral Game”. In this way the game grows in its types of data but we respect the licenses of the contributors. Open Data vs Free Data vs Pragmatic Usability…maybe the OKF can participate in negotiating the release of such data sources into the public domain and, where appropriate, sourcing some funding to allow them to do it?Stumble it!