KNIME is an open-source data integration, processing, analysis, and exploration platform which can be used to create workflows to analyse data.

We have experimented with adding a node to a project which would call the ChemSpider webservices to perform a simple search on it and the instructions below outline how to reproduce our experimentation. This was done with KNIME 2.5.0, with the KNIME extension “Generic Webservice Client” installed.

  1. From the Node Repository find the “Generic Webservice Client” under the “Misc” folder and drag it into the Knime project to add a new node
  2. Right-click on this “Generic Webservice Client” and click on the “Configure…” option
  3. The WSDL for each ChemSpider webservice can be found using the link from the page for the appropriate webservice. For example, the WSDL for the Search webservice is at http://www.chemspider.com/Search.asmx. However, if you enter this as the WSDL location you’ll get an error when you click the “Analyze” button (due to a SOAP exception “undefined simple or complext type ‘soapenc:Array’. This is something that we’re looking into addressing in ChemSpider, but for now a workaround is to copy the WSDL, replace the old fashioned soapenc:Array type with tns:ArrayOfString, and save and use this ammended WSDL locally. I have done this with the Search webservice and the resulting WSDL is available for download here. This file should be downloaded, adn extracted somwhere locally. It can then be entered in the “WSDL Location” field of the Generic Webservice client in KNIME (using a location of the form: file:/C:/temp/ChemSpiderSearchWSDL_no_soapencArray.WSDL) which will then be processed correctly on clicking the “Analyze” button
  4. Set the Port, operation, inputs and outputs as required – see screencapture below for settings for my demonstration. Note that you should use your own token as the value for the token input – if you don’t have one already then see the instructions here for instructions.
  5. Add input and output nodes which connect to and from this Generic WebService Client node as required. For example, you could add a FileReader node as the initial node, which reads in the contents of a text file that simply contains a search term as an input (and adapt the Input value accepted as the query input value of the SimpleSearch to map to this column, rather than hardcoding in a value to search for). And the output csid could be written to a csv file using a CSV Writer node.
  6. On executing the workflow, an output csv file is created which contains the ChemSpider ID(s) of any compounds that match the search term. In the case of a search for “benzene” the csid retrieved is 236.

Stumble it!

11 Responses to “How to use ChemSpider webservices from Knime”

  1. Peter Maas says:

    Hi Aileen,

    I got this all up and running and it seems to be working OK.
    However, I was trying to download all possible data sources for a certain entry using CSID2ExtRefs.
    Within Knime this requires a string for .
    When entering a datasource it does return the right information but I want all entries.
    When leaving blank Knime gives me an error ordering me to fill this field.
    Probably this is a limitation of knime.
    Is there a way to get around it?

    Thanks,

    Peter

  2. Aileen Day says:

    Hello,
    Glad you’ve got it further with this.
    The CSID2ExtRefs web service requires the entry of at least one datasource, and this is by design of the webservice itself (so that it can be of use to find links to specific data sources of interest, but to prevent the mass harvesting of links from ChemSpider) rather than a limitation of Knime.
    So I’m afraid I can’t advise you of any way around this…
    Regards,
    Aileen

  3. Peter Maas says:

    Hi Aileen,

    I can see why you want to limit this. What I wanted to do is look for a big list of compounds what suppliers are available within ChemSpider. In that respect it would be nice feature to have type of suppliers here (Chemical Vendor, data aggregators, bioinfo, and so on). In that way you would not do a mass download, just the info one is looking for.
    It’s just a suggestion. I managed to create a flow going through pubchem to do this task. However, some suppliers are included in chemspider but not in pubchem.

    Thanks,

    Peter

  4. Aileen Day says:

    I can log a feature request for the development team to consider adding a new web service which covers that functionality.
    So just to clarify – would you be happy with a webservice into which you enter a CSID and data source type and returns the number of links for each corresponding data source? Or were you thinking of a webservice into which you enter the chemspider id a data source type and it returns the corresponding data sources? Or would either of these options work for you?
    In the meantime (the development of a new web service is unlikely to happen immediately) if you email me at chemspiderdev@rsc.org so I have your email address, I could send you a list of all the data sources in ChemSpider with their corresponding data source type (“Contributor classification” in the corresponding Data Source page in ChemSpider). I think with this information and the existing CSID2ExtRefs web service you should have everything you need, even if it is via a longer route.

  5. Robert Mostyn says:

    Hi. I am at the beginning of a research project that involves referencing specific chemical compounds. The project aims to track the throughput of all chemical compounds through industrial processes.

    I am trying to work out whether ChemSpider will be use to this project. I am trying to test the web service and the only service call I get a result from is GetDatabases. Easy – it doesn’t require any parameters! But other service calls are difficult because I cannot provide valid parameters – probably because the parameters names are not intuitive and I am not providing valid values.

    One success is SearchByFormula2… requires a Formula… I enter C9H8O4 and I get an array of numbers back. But I don’t know what to do with the numbers. Is there some documentation that describes the interface at a higher level than WSDLs?

  6. Robert Mostyn says:

    I have answered my own question… register for a security token. Once I got my token everything started to work!

  7. Aileen Day says:

    Great – glad you worked it out Robert!

  8. Brian Masek says:

    Hi – I’ve configured a KNIME workflow as you suggest in your example, but the Generic Webservice Client node status is red (not executable). I can’t figure out what’s wrong. Any suggestions?

  9. Aileen Day says:

    Perhaps you could email me (chemspiderdec@rsc.org) with more details of what you’re trying to do. For example, if you could send screencaptures of how you’re configured the node, that might help us track down the problem…

  10. Alex M says:

    This is a few years old – is this still valid? I tried with Knime V 2.10 (Win7) and get an error for the WSDL file upon analyze
    (Illegal character in path at index 9: …)

  11. Aileen Day says:

    I had another go at running this with:
    - the latest wsdl from ChemSpider (again amended to replace soapenc:Array with tns:ArrayOfString)
    - the latest Knime 2.10.3 and Generic Web Service Client extension
    But am getting a different error from you – “Execute failed: Webservice invocation failed on all rows, check log for details” so it looks like it’s no longer working. I’ll see if I can work out whether it’s because of changes on Knime’s side or ChemSpider’s.

Leave a Reply