Copyright©2011 Aileen Day
KNIME is an open-source data integration, processing, analysis, and exploration platform which can be used to create workflows to analyse data.
We have experimented with adding a node to a project which would call the ChemSpider webservices to perform a simple search on it and the instructions below outline how to reproduce our experimentation. This was done with KNIME 2.5.0, with the KNIME extension “Generic Webservice Client” installed.
- From the Node Repository find the “Generic Webservice Client” under the “Misc” folder and drag it into the Knime project to add a new node
- Right-click on this “Generic Webservice Client” and click on the “Configure…” option
- The WSDL for each ChemSpider webservice can be found using the link from the page for the appropriate webservice. For example, the WSDL for the Search webservice is at http://www.chemspider.com/Search.asmx. However, if you enter this as the WSDL location you’ll get an error when you click the “Analyze” button (due to a SOAP exception “undefined simple or complext type ‘soapenc:Array’. This is something that we’re looking into addressing in ChemSpider, but for now a workaround is to copy the WSDL, replace the old fashioned soapenc:Array type with tns:ArrayOfString, and save and use this ammended WSDL locally. I have done this with the Search webservice and the resulting WSDL is available for download here. This file should be downloaded, adn extracted somwhere locally. It can then be entered in the “WSDL Location” field of the Generic Webservice client in KNIME (using a location of the form:
file:/C:/temp/ChemSpiderSearchWSDL_no_soapencArray.WSDL) which will then be processed correctly on clicking the “Analyze” button
- Set the Port, operation, inputs and outputs as required – see screencapture below for settings for my demonstration. Note that you should use your own token as the value for the token input – if you don’t have one already then see the instructions here for instructions.
- Add input and output nodes which connect to and from this Generic WebService Client node as required. For example, you could add a FileReader node as the initial node, which reads in the contents of a text file that simply contains a search term as an input (and adapt the Input value accepted as the query input value of the SimpleSearch to map to this column, rather than hardcoding in a value to search for). And the output csid could be written to a csv file using a CSV Writer node.
- On executing the workflow, an output csv file is created which contains the ChemSpider ID(s) of any compounds that match the search term. In the case of a search for “benzene” the csid retrieved is 236.