Archive for December, 2011

KNIME is an open-source data integration, processing, analysis, and exploration platform which can be used to create workflows to analyse data.

We have experimented with adding a node to a project which would call the ChemSpider webservices to perform a simple search on it and the instructions below outline how to reproduce our experimentation. This was done with KNIME 2.5.0, with the KNIME extension “Generic Webservice Client” installed.

  1. From the Node Repository find the “Generic Webservice Client” under the “Misc” folder and drag it into the Knime project to add a new node
  2. Right-click on this “Generic Webservice Client” and click on the “Configure…” option
  3. The WSDL for each ChemSpider webservice can be found using the link from the page for the appropriate webservice. For example, the WSDL for the Search webservice is at http://www.chemspider.com/Search.asmx. However, if you enter this as the WSDL location you’ll get an error when you click the “Analyze” button (due to a SOAP exception “undefined simple or complext type ‘soapenc:Array’. This is something that we’re looking into addressing in ChemSpider, but for now a workaround is to copy the WSDL, replace the old fashioned soapenc:Array type with tns:ArrayOfString, and save and use this ammended WSDL locally. I have done this with the Search webservice and the resulting WSDL is available for download here. This file should be downloaded, adn extracted somwhere locally. It can then be entered in the “WSDL Location” field of the Generic Webservice client in KNIME (using a location of the form: file:/C:/temp/ChemSpiderSearchWSDL_no_soapencArray.WSDL) which will then be processed correctly on clicking the “Analyze” button
  4. Set the Port, operation, inputs and outputs as required – see screencapture below for settings for my demonstration. Note that you should use your own token as the value for the token input – if you don’t have one already then see the instructions here for instructions.
  5. Add input and output nodes which connect to and from this Generic WebService Client node as required. For example, you could add a FileReader node as the initial node, which reads in the contents of a text file that simply contains a search term as an input (and adapt the Input value accepted as the query input value of the SimpleSearch to map to this column, rather than hardcoding in a value to search for). And the output csid could be written to a csv file using a CSV Writer node.
  6. On executing the workflow, an output csv file is created which contains the ChemSpider ID(s) of any compounds that match the search term. In the case of a search for “benzene” the csid retrieved is 236.

The functionality of electronic lab notebooks (ELNs) and that of ChemSpider overlap to a certain extent – both store chemical information including structures, data, spectra and reactions. However, the focus of most ELNs is to manage, track and audit that data, and that of ChemSpider is to publish and disseminate it to the world. We have been considering how best to use this complementary functionality to integrate an ELN with ChemSpider.

Some ELNs already currently look up information and link to ChemSpider. For example the blog3 Web-logging (“blogging”) engine by Jeremy Frey, Simon Coles and Mark Borkum at Southampton University, which allows links to compounds from the ChemSpider database to be embedded directly into the content of a post. When a link to ChemSpider is detected, blog3 follows the link to retrieve additional information that is relevant to the compound, including: experimental and theoretical data; two- and three- dimensional depictions; and links to papers and journal articles. Another example is the eScience tool that Steven Wan from CSIRO has developed with the University of New South Wales to text mine LabTrove ELN blog posts to identify chemical names and link these to the relevant ChemSpider compounds.

At the meeting “The Smart Laboratory: Towards a national ELN” meeting (organised as part of the Dial-a-Molecule EPSRC Grand Challenge) in August this year, the seeds were sown to take the integration between ELNs and ChemSpider a step further. Cambridge University has the first Chemistry department in the UK to roll out a department-wide Electronic Lab notebook system, and the software that they’re using is IDBS’s E-WorkBook Suite. In collaboration with IDBS and Cambridge’s Chemistry department, we at ChemSpider have made a plug-in which could both dynamically retrieve information from ChemSpider into their ELN, and publish to it the other way. The Chemistry department at Cambridge (Dr Tim Dickens, Dr Brian Brooks, Prof Bobby Glenn and Prof Steven Ley) have been very helpful in granting access to their ELN to write the plug-in, and will be its first users, but the results will be freely available for any existing IDBS E-WorkBook suite user.

About the extension Prof Bobby Glenn has said: “Much of Chemistry is lost, it is simply not published and languishes in forgotten lab notebooks. Capturing novel molecules soon after synthesis on a searchable database like Chemspider is now an effortless process directly from the ELN, which will greatly encourage sharing of compounds, synthetic methods and all the associated data. It’s instant messaging for chemists”. Antony Williams (Vice-President of Strategic Development of ChemSpider) added “The ability to now publish compound data from the IDBS ELN directly to ChemSpider offers a path to direct exposure of novel chemistry as well as the chemist doing the work. This public compound registration capability is the first move towards ultimately exposing synthetic methods and associated experimental data to the community. Our vision is coming to fruition through this collaboration.”

To view the plug-in in action please view the demonstration movie of ChemSpider E-WorkBook Suite Plugin.

Screen capture of launching Publish to ChemSpider plug-in

Compounds can be published to ChemSpider if they have been drawn out in full in an experiment – whether this is as an individual structure or part of a reaction, and whether they are simply uploaded into the experiment as a reaction file, or included in for example a spreadsheet item. Likewise, compound structures can be automatically loaded into a search of ChemSpider if you would like to find out more information about compounds that have been drawn out in full in an experiment, or if you have published a compound to ChemSpider and wish to see the resulting compound pages. The resulting compound pages in ChemSpider will have the data source “IDBS E-WorkBook Suite”. The external ID will show the ID of the experiment from which the structures are from, and the depositor details as defined in the ChemSpider Settings of the ELN.

The ChemSpider IDBS E-WorkBook Suite plug-in is freely available to customers of IDBS E-WorkBook Suite by downloading it from IDBS, and copying it the appropriate place in their IDBS E-WorkBook Suite program files. It is compatible with E-WorkBook Suite versions 9.0 and 9.1.

This plug-in is an initial proof-of-concept to demonstrate that we can pass compound information between ChemSpider and an ELN in both directions. Future versions will allow more of the information within an experiment to be published to ChemSpider – for example to allow reactions along with a description of their methods to be published to ChemSpider SyntheticPages, or to deposit spectra along with compounds to ChemSpider. We would also like to integrate other ELNs with ChemSpider.