Copyright©2012 Aileen Day
In December 2011 we posted about the ChemSpider plugin for IDBS’s Electronic Lab Notebook (ELN) which described a proof of concept plugin which allows chemical structures which are part of an ELN experiment to be published to ChemSpider. The plugin sent a single sdf file per deposition which contains the chemical structures (in mol format) and very basic metadata information about where it comes from (author, principal investigator, ELN experiment ID) in the associated data fields. A mapping file was set up in the ChemSpider deposition system to process associated data field names in the deposited sdf files from the ELN data source and map them onto internal ChemSpider field names. We would like to extend this initial proof of concept to integrate ChemSpider with more ELNs, to store more advanced metadata with each deposition and to be able to publish more types of ELN data e.g. spectra, reactions and properties. A major step towards this goal would be if the metadata were separated from the data file, were defined by a fixed schema and contained more extensive information (e.g. what is in the accompanying ELN data item, what is its source, and what are its access rights). If it were agreed as a standard ELN vendors and developers could build the ability to generate this metadata into their API’s, to be used either when sharing data to a repository e.g. ChemSpider, or also to exchange data from one ELN to another. We at ChemSpider would develop a deposition webservice to process metadata in this format (and accept depositions from any ELN which generated it). This would make the task of publishing spectra, reactions, chemical properties and other file types from a range of ELNs to ChemSpider much more manageable.
A working group met up on 9th December 2011 to work towards the aim of defining a metadata model to answer the question “What comprises an ELN record or an item in it”. The group was headed up by Dr Simon Coles from the University of Southampton, and comprised representatives from universities, ELN vendors, pharmaceutical companies, and RSC ChemSpider and was a smaller subset of the previous EPSRC Dial-a-Molecule “The Smart Laboratory: Towards a national ELN” meeting. We came up with a top level format for the exchange which describes what’s in the record, how do you get it, who contributed to it and access rights in xml format. Since then Simon and Colin Bird have formalised this format into an xml schema, the details of which will be published shortly in a journal article (in preparation).
Before committing to the development effort that would be required by the ELN vendors and ChemSpider to work towards this ultimate aim, it is necessary to finalise the definition of this schema and verify that it works with an example. As a first step towards this, the ‘Publish to ChemSpider’ IDBS plugin has been modified to generate the metadata that would accompany the mol files of structures in a separate file obeying this schema. In a future phase of work the metadata xml and ELN item would be sent to a ChemSpider webservice to be processed for publishing there. The video and screencaptures below show version 2 of the plugin generating this metadata in action:
And the generated result is as below:
While every effort was made to populate fields from generic information stored in the ELN system so that this plugin would work with any IDBS installation (not just that of the Chemistry department of the University of Cambridge who kindly allowed the plugin to be developed against their system), this was not possible for all fields since they are not readily available from extension points of the E-WorkBook software – which will need to be addressed if IDBS do develop an API to generate the elnItemManifest. For example, the names and email addresses of the author and principal investigator of the ELN experiment are defined in a configuration file whose settings can be edited via an interface in the ELN software. The license to release the data under, and an embargo period to wait before the data is released publicly are populated by user inputs which are requested when the user chooses to generate the file. The keywords, description and start date of the experiment are populated by customised ELN experiment fields which have been set up only in Cambridge University’s installation of E-WorkBook.
If you have access to a working version of IDBS’s E-WorkBook and would like to install the plugin to work with it please write to ChemSpiderDev@rsc.org and we will be happy to supply it to you.
Again, thanks to IDBS and the Department of Chemistry, University of Cambridge for allowing us to continue development of the ChemSpider plugin against their software and ELN installation respectively.