ChemGoggles? What on earth is ChemGoggles? Is this a pair of safety specs for chemists? No…what would be the fun, and the cheminformatics (!!), in that? ChemGoggles will be shown at the ACS meeting in Philadelphia in a couple of weeks and will be a very early display of our venture into the development of an Android app for “photographing” an image of a chemical and searching the ChemSpider database. It will be a matter of finding an image of a chemical (paper, publication etc), taking a photo using an Android device, using structure recognition software to convert the image to a chemical and then searching ChemSpider. It will be imperfect, an early version, but nevertheless a tantalizing display of some of the new directions we are presently taking at the Cheminformatics group here at RSC.

Chemistry is complex. Anybody who has been involved with the creation of electronic datafiles containing thousands of chemical compounds and associated data (chemical names, properties etc) will tell you that errors creep in. ChemSpider has >28 million unique chemical entities and these have been sourced from many different places/groups/individuals. Some of these have been deprecated as we have determined, both manually and algorithmically, that the data are in error. Over the years we have learned a lot about data quality and ways in which algorithms can be applied to data prior to deposition on ChemSpider.

Some obvious structure-based errors that can be checked for would include: hypervalency (e.g. pentavalent carbons), charge imbalance (a compound has no neutralizing counterion for example), absence of stereochemistry (e.g. a compound with 12 possible stereocenters only has one assigned). There are many other such errors that can be detected algorithmically. It’s the old adage of why apply a human to what a computer can fix. With this in mind we have been working on a system called the ChemSpider Validation and Standardization Platform (CVSP for short). This system will serve multiple purposes. It will be one of the foundation blocks for checking structure-based data for our publications (i.e. catch bad chemistry before it is published!), it will be used for validating chemistry for our databases (Natural Product Updates, Methods in Organic Synthesis and Catalysts and Catalyzed Reactions), it will be used to check and validate depositions going into ChemSpider, it will serve data related to the Open PHACTS project  and it will serve the community by providing an online website where you can upload your own SDF files (and other file formats in future) to validate the structures.

I won’t go into detail here about all of the functionality and capability of the system as we will discuss this in further detail on this blog. However, we will be unveiling the system in its present form at the ACS meeting in Philadelphia. Come along and meet some of the team involved in building CVSP and give us your feedback!

In December 2011 we posted about the ChemSpider plugin for IDBS’s Electronic Lab Notebook (ELN) which described a proof of concept plugin which allows chemical structures which are part of an ELN experiment to be published to ChemSpider. The plugin sent a single sdf file per deposition which contains the chemical structures (in mol format) and very basic metadata information about where it comes from (author, principal investigator, ELN experiment ID) in the associated data fields. A mapping file was set up in the ChemSpider deposition system to process associated data field names in the deposited sdf files from the ELN data source and map them onto internal ChemSpider field names. We would like to extend this initial proof of concept to integrate ChemSpider with more ELNs, to store more advanced metadata with each deposition and to be able to publish more types of ELN data e.g. spectra, reactions and properties. A major step towards this goal would be if the metadata were separated from the data file, were defined by a fixed schema and contained more extensive information (e.g. what is in the accompanying ELN data item, what is its source, and what are its access rights). If it were agreed as a standard ELN vendors and developers could build the ability to generate this metadata into their API’s, to be used either when sharing data to a repository e.g. ChemSpider, or also to exchange data from one ELN to another. We at ChemSpider would develop a deposition webservice to process metadata in this format (and accept depositions from any ELN which generated it). This would make the task of publishing spectra, reactions, chemical properties and other file types from a range of ELNs to ChemSpider much more manageable.

A working group met up on 9th December 2011 to work towards the aim of defining a metadata model to answer the question “What comprises an ELN record or an item in it”. The group was headed up by Dr Simon Coles from the University of Southampton, and comprised representatives from universities, ELN vendors, pharmaceutical companies, and RSC ChemSpider and was a smaller subset of the previous EPSRC Dial-a-Molecule “The Smart Laboratory: Towards a national ELN” meeting. We came up with a top level format for the exchange which describes what’s in the record, how do you get it, who contributed to it and access rights in xml format. Since then Simon and Colin Bird have formalised this format into an xml schema, the details of which will be published shortly in a journal article (in preparation).

Before committing to the development effort that would be required by the ELN vendors and ChemSpider to work towards this ultimate aim, it is necessary to finalise the definition of this schema and verify that it works with an example. As a first step towards this, the ‘Publish to ChemSpider’ IDBS plugin has been modified to generate the metadata that would accompany the mol files of structures in a separate file obeying this schema. In a future phase of work the metadata xml and ELN item would be sent to a ChemSpider webservice to be processed for publishing there. The video and screencaptures below show version 2 of the plugin generating this metadata in action:

And the generated result is as below:
Generated example elnItemManifest metadata.

While every effort was made to populate fields from generic information stored in the ELN system so that this plugin would work with any IDBS installation (not just that of the Chemistry department of the University of Cambridge who kindly allowed the plugin to be developed against their system), this was not possible for all fields since they are not readily available from extension points of the E-WorkBook software – which will need to be addressed if IDBS do develop an API to generate the elnItemManifest. For example, the names and email addresses of the author and principal investigator of the ELN experiment are defined in a configuration file whose settings can be edited via an interface in the ELN software. The license to release the data under, and an embargo period to wait before the data is released publicly are populated by user inputs which are requested when the user chooses to generate the file. The keywords, description and start date of the experiment are populated by customised ELN experiment fields which have been set up only in Cambridge University’s installation of E-WorkBook.

If you have access to a working version of IDBS’s E-WorkBook and would like to install the plugin to work with it please write to ChemSpiderDev@rsc.org and we will be happy to supply it to you.

Again, thanks to IDBS and the Department of Chemistry, University of Cambridge for allowing us to continue development of the ChemSpider plugin against their software and ELN installation respectively.

The eagle-eyed amongst you may have noticed that there was an update to ChemSpider just over a week ago. Many of the changes that were performed on the site were aimed at upgrading the underlying architecture of the site and ensuring that the performance of the ChemSpider site is constantly improving as the number of users of our site and services grows.

Here are a few of the changes to the site that are more visible:

  1. Clearer deprecation of records
  2. Citation details
  3. Visibility of average mass
  4. Layout of the structure search page
  5. Improvements to search messaging
  6. Clearer layout of the Experimental Properties section
  7. Support for foreign language help

So to pick out a few of the key items from the above list….

 

Clearer deprecation of records

ChemSpider is designed so that by default, deprecated records are not presented in your search results – this ensures that you don’t have to wade through data for records that are clearly wrong or lack any useful data. But, of course there may be occasions where you happen across a deprecated record. In the past, it wasn’t always easy to immediately see that a record had been deprecated and understand the reason that it had been deprecated. In the new design the notification message is far more prominent and we also make it easy to see the reason why the record was deprecated (this is new requirement in the deprecation process and so for older deprecations this field may be blank).

 

Citation details

We commonly get requests from individuals asking about including data from a ChemSpider record in a presentation or thesis. As outlined in our FAQ page, where individuals reuse data we ask that they cite ChemSpider. And so to make this process simpler we have created an output that contains the basic information that users may need to include in a citation, and we have provided a button that makes it really easy to copy the data to your clipboard in one click.

Looking at the above image you can also see that the Average mass (which was accidentally hidden for a while) has now been made visible the record again.

Layout of the structure search page

One of the most noticeable changes has been the rearrangement of the Structure search interface. While the actual functionality remains the same, the options have been presented in a way that (hopefully) makes it much easier to see all of the options that are available to you when you perform a structure search. This is the 1st phase of our work on this interface, so please let us know what you think about the changes so far.

 

Clearer layout of the Experimental Properties section

Another significant change that we have made is to the presentation data in the Experimental properties infobox. The data is presented in a tidier layout, and while we have always had the ability to provide links to the original datasource, this was not particularly obvious to some users. In this new design we explicitly display the name of the datasource that provided the data, and wherever possible the name will act as a link back to the relevant page/entry in that datasource.

We hope that you find all of these new features useful, and as always we welcome your feedback on these and any other aspects of the site.

For some time now it has been possible to access relevant SureChem patent information from a ChemSpider compound page in the Patents Infobox. ChemSpider compounds are also linked to and from the relevant RSC articles, which has allowed us to form a new partnership between RSC Publishing and SureChem which relies on ChemSpider taking the pivotal role of linking internet chemistry together.

In the RSC article landing pages there is a “Compounds” tab which shows the key compounds that the article is about – as shown in this example. For each compound there is now a link to view the SureChem patent information associated with that compound as below:

The RSC Publishing platform article landing page showing SureChem patent information

The RSC Publishing platform article landing page showing SureChem patent information

SureChem and SureChem’s new free offering, SureChemOpen, offer a suite of patent chemistry data solutions, for example allowing their patents to be found from a structure or substructure search. Now, for each compound returned from such a search it is possible to view any linked ChemSpider compound pages and the number of associated RSC publications (and follow a link to view these articles).

This linking between SureChem and the RSC publication platform relies on ChemSpider (and the standard InChI chemical identifier) providing a bridging link to both, which ensures that the system is accessible, standards-based and scalable, making it easy for future partners to join.

A lot happens in a a few weeks and this past couple of months has been no different. There have been numerous developments for ChemSpider and its related projects including working on the GUI, adding in new data and a lot of infrastructure work on the core of the ChemSPider platform.

We have the ACS meeting in San Diego just around the corner and are presently working hard this week to publish our most recent update to the live servers. For those of you going to San Diego do come and visit us at the RSC booth and we will give you a demo of our most recent project that we have been working on…I’m not going to announce it before the ACS but I encourage any attendees to stop by and hear what we’re up to!

There will be a number of presentations at the meeting and the details are all listed in our online Newsletter.

Alex Tropsha (UNC-Chapel Hill) and I (Antony Williams) will be hosting an InChI Symposium at the meeting so please come along and hear how people are using InChI and some of the directions for the future!

See you in San Diego hopefully!

As the ChemSpider content and data mappings have continued to expand, the demands on our web services have increased dramatically. With the popularity of the site continuing to increase we anticipate even heavier usage of our web services. This is true for our involvement with the Open PHACTS project as well as from a number of software packages served up by analytical instrument vendors, especially in the mass spectrometry domain. Because of the increasing load on our systems, we have taken steps to prevent us from outgrowing our existing infrastructure and have implemented a new scalable, future-proof web services offering that your applications can rely upon.

Continual availability and business continuity for subscribers and academics

We have reinvented our web service infrastructure using Microsoft SQL Server replication technology in order to maintain multiple copies of the ChemSpider database. As a result all system resources are dedicated purely to web services with no background tasks running to affect the performance. Also, the databases are read-only which results in database lock contention being completely eliminated.

A standalone and scalable web service establishment for faster response times

The ChemSpider servers run on the VMWare virtualization platform which allows us to scale out the hardware by assigning more resources as required. In the future we can easily provide a consistently high-performance service even as usage further increases.

Over 1/4 million calls in the first 18 hours

Although ChemSpider web services are fast becoming a priority for us, we are still dedicated to ensuring the website experience is optimal. The changes we have implemented will reduce traffic to the website so you should already have noticed improvements in website performance and reliability.

Some examples of implementations of ChemSpider web service usage can be found here.

Access to the ChemSpider API is free to academic users; for commercial use please contact us at chemspider-at-rsc.org.

James Jack from Accelrys has developed a great example of using ChemSpider web services to add ChemSpider search functionality with the structure drawing tool Accelrys Draw.

It is now possible, with a new add-in to perform advanced searches on ChemSpider with the Accelrys Draw program itself, searching by text, structure searches (exact, similarity and substructure), elements (those present and those absent), intrinsic and predicted properties, and LASSO activities. All of the ChemSpider information about the compounds returned in the search can be viewed and their structure(s) loaded back into the main Accelrys Draw window for further editing.

If you’re interested in finding out more about this add-in or obtaining it then see James’ blog post about the add-in. He has also posted a video demonstrating its use:

Technical details for developers

James has modularised his code so as to separate out a .Net Client API to the ChemSpider Search web service that can be used from *any* .Net application without the need for additional assemblies (other than standard .Net) and requires minimal code. This makes it easy to add the same ChemSpider search functionality to other Accelrys products (e.g. Symyx Notebook).

In addition, he has released this ChemSpiderSearchClient code so that it is available to other ChemSpider users who would like to integrate ChemSpider web services with their code in similar ways.

The “ChemSpiderClient” solution should be opened with Visual Studio. It contains two projects – “ChemSpiderClient” is the main library project (which contains the ChemSpider API code) and “ChemSpider ClientTest(No Draw)” is a simple interface to run the library code (set this as the start up project to debug the project). “ChemSpiderClient.cs” in “ChemSpiderClient” is the main code file that calls the ChemSpider webservices. Best practice for performing ChemSpider searches is observed – first launching a search to retrieve a transaction ID for the search, intermittently searching for the status of the search using the GetAsyncSearchStatus operation of Search.asmx and when the status of the search is “ResultReady” and then retrieving the resulting ChemSpider Ids. If the reference to Symyx.CustomUIControls from the ChemSpider Client is missing then add a reference to Symyx.CustomUIControls.dll in the top-level folder of the zip file.

Please note that a token is needed to access the ChemSpider webservices and by default the code is supplied without one specified, so that you need to input your own token value – the app.config file of “ChemSpider ClientTest(No Draw)” should be edited to enter a valid token that will be used by default. If this isn’t done, the user will need to supply a token when running the search via a pop-up box. To obtain a token, please complete the registration process – when you are registered the Security Token is listed on your Profile page.

We will soon be depositing data from the SORD databases (Selected Organic Reactions Database) onto ChemSpider. This will be done as two separate but related datasets until the SORD data source: Reactants and Products. If you don’t know what SORD is then who better to explain than Dick Wife, the “host” of the SORD database. Dick wrote the overview article below to provide an overview about what SORD is…ENJOY!

The Selected Organic Reactions (SOR) Database: capturing “Lost Chemistry”

Dick Wife, SORD B.V. The Netherlands (www.sord.nl; dick.wife@sord.nl)

A new database is capturing the 80% of Lost Chemistry from theses and dissertations which doesn’t make it into publications and chemists who contribute their data get access to the entire database for free.

SORD, an independent Dutch company, is carefully selecting the synthetic chemistry focused on Life Science research and making this chemistry available in their Selected Organic Reactions (SOR) Database. For the theses/dissertations which they select, SORD excerpts all of the reactions in the Experimental section are excerpted. This means there will still be a small overlap of data with full publications. There will also be a larger overlap with publications such as Notes, Letters or Communications but these do not contain the experimental details. The SOR Database brings all this chemistry to the desktop, every last detail written by the author.

Some time back, SORD looked at around 300k interesting drug-like compounds in the literature and which countries they had come from, and the native language. The English-speaking countries accounted for only 37% of the total. German/Swiss dissertations are often written in English but this is new. The theses and dissertations in the other languages represent more than half of the total. SORD routinely translates German and French experimental texts into English. They are about to start on Chinese and Japanese translations and, if anyone can give them access to Russian theses, they will translate these as well!

A thesis or dissertation is the result of several years of hard work by a research student under the constant supervision of the research leader whose reputation is at stake if the work described is wrong or inaccurate. It is also examined by a committee who decide on awarding the degree, or not. They scrutinize closely the Results & Discussion as well as the Experimental sections. The chemistry is reliable.

Advanced Chemistry Development, Inc (ACD/Labs) is partnering SORD in developing this Database. The SOR Database is available for in-house use with ChemFolder Enterprise or on the Internet with ACD/Web Librarian™. This is a screen-shot of a typical SOR Database record in Web Librarian.

 

 

 

 

 

 

 

 

 

 

 

 

 

The Reaction Scheme shows every atom (there are no abbreviations). The Experimental  text is edited to ASCII format and the key parameters (Reagent(s), Solvent(s), yield(s), MP(s) and Optical Rotation(s) are displayed in separate Fields, as are the full bibliographic data, making data-mining possible. There is also a link which enables the user to bring up the PDF of each reaction containing all of the spectral and other physical data which SORD does not excerpt. The PDF-EX link is a powerful and unique feature of the SOR Database.

Now some explanation about SORD’s excerption rules. What they call the Reaction Scheme (A + B à C, etc.) contains only the reacting and product compound structures. A Reagent is an essential reaction component of which no part ends up in the product – if it does, it becomes a Reactant! When several reactions are performed before the product is isolated (and characterized) the Reagents and Solvents are listed in Steps. Failed reactions are not excerpted but reactions with poor yields are.

The SOR Database currently contains 170k reactions; the target is one million at the end of 2013. Even this number is a lot smaller than what you find today in the major commercial reaction databases. Back in the nineties, SORD researchers looked at one such large commercial database which then contained 9 million compounds. Sifting through the content for drug-like compounds resulted in just 450k or 5% of the records[1]. Size is one database metric; quality is much more important! In the SOR Database, you will only find characterized products – and no polymers, or compounds with no molecular structure.

Users of the SOR Database also have access to the separate databases which contain the Reagents (ca. 3,000) and Solvents (ca. 450) which have been encountered so far. Often a Reagent is a catalyst (organic/organometallic) but they can also be simple entities like bases, acids, ammonium salts, etc. or complex chiral ligands. Authors give Reagents many different names and so each Reagent (and Solvent) in the SOR Database has been assigned a unique name. This enables rapid searches using the assigned names, again a novel feature of the database. Such searches can bring you to really nice chemistry.

As an Example, the second generation Grubbs olefin metathesis catalyst has been given the name Grubbs 2 catalyst. In the current SOR Database, there are more than 500 reactions where it has been used. Some of these are straightforward; some are not and generate novel ring systems like this one from the Martin group at North Carolina at Chapel Hill:

Searches in the Reactions Scheme, or using Reagent/Solvent names and hit refinement brings you to new chemistry which until now was only found on a dusty shelf in a library. The “Lost Chemistry” is now getting smaller as SORD carefully selects and excerpts the reactions which deserve a new life. The SOR Database is essential for novelty searches and it is a powerful supplement for the other commercial reaction databases.

Finally some more good news for academic research chemists; your data will be readily accessible to the whole chemical world who will cite your work in their publications. The chemistry which you never published may be just what others are looking for. Routinely SORD excerpts the complete collection of theses and dissertations from research supervisors; they will be more than happy to see your work appear in the next SOR Database!


[1] de Laet, A.; Hehenkamp, J. J.; Wife, R. L. Finding Drug Candidates in Lost/Emerging Chemistry. J. Heterocycl. Chem. 2000, 37, 669–674.

The RSC’s objective is to advance the chemical sciences, not only at a research level but also to provide tools to train the next generation of chemists. ChemSpider contains a lot of useful information for students learning Chemistry but there is also a lot of information which is not relevant to their studies which might be confusing and distracting. For some time we have been considering the concept of an educational version of ChemSpider, aimed at students (and their teachers or lecturers) in their last years of school, and first years of university (ages 16-19), which restricts the compounds and the properties, spectra and links displayed for each, to those relevant to their studies. As a result, we are pleased to announce the launch of the Learn Chemistry Wiki which not only fulfils this aim, but also takes it further. This project was developed in a collaboration between Dr Martin Walker at the State University of New York at Potsdam, ChemSpider and the Royal Society of Chemistry’s Education team.
The Learn Chemistry Wiki contains over 2000 “substance” pages which correspond to simple compounds that would commonly be encountered during the last years of school and first years of University. Each of these pages corresponds to a ChemSpider compound, from which it dynamically retrieves compound images, a summary of its properties(molecular formula, mass, IUPAC name, appearance, melting and boiling points, solubility, etc.) and links to view safety sheets and spectra. It also contains text from Wikipedia to display in the substance page based on the Wikipedia links in ChemSpider.

The Learn Chemistry Wiki also goes a step further and not only contains compound information in isolation but also contains laboratory experiments (with parallel sections which contain an overview, teachers’ notes and students’ handouts) for each, quizzes, and tutorials which are linked to the compound information to put them into context. The wiki is based on the MediaWiki platform (which allows multiple users to contribute collaboratively since the website is intended to be a community website), but extends it to incorporate functionality similar to that of ChemSpider, invoked via custom-made extensions. For example, it is possible to draw structures using GGA’s Ketcher in order to find structures, or to draw answers to quiz questions (for example to specify the product of a particular reaction). It is also possible to include an interactive spectrum retrieved from ChemSpider in any wiki page, using the ChemDoodle spectrum viewing widget in browsers which support canvases or JSpecView applet in those that don’t.

For an overview and demonstration of the Learn Chemistry Wiki site see the Learn Chemistry Wiki site tour webppage or the Learn Chemistry Wiki overview demo video:

The Learn Chemistry Wiki is part of the new RSC’s new Learn Chemistry platform which provides a central access point and search facility to make it easier to access the various different RSC teaching resources that it provides.

KNIME is an open-source data integration, processing, analysis, and exploration platform which can be used to create workflows to analyse data.

We have experimented with adding a node to a project which would call the ChemSpider webservices to perform a simple search on it and the instructions below outline how to reproduce our experimentation. This was done with KNIME 2.5.0, with the KNIME extension “Generic Webservice Client” installed.

  1. From the Node Repository find the “Generic Webservice Client” under the “Misc” folder and drag it into the Knime project to add a new node
  2. Right-click on this “Generic Webservice Client” and click on the “Configure…” option
  3. The WSDL for each ChemSpider webservice can be found using the link from the page for the appropriate webservice. For example, the WSDL for the Search webservice is at http://www.chemspider.com/Search.asmx. However, if you enter this as the WSDL location you’ll get an error when you click the “Analyze” button (due to a SOAP exception “undefined simple or complext type ‘soapenc:Array’. This is something that we’re looking into addressing in ChemSpider, but for now a workaround is to copy the WSDL, replace the old fashioned soapenc:Array type with tns:ArrayOfString, and save and use this ammended WSDL locally. I have done this with the Search webservice and the resulting WSDL is available for download here. This file should be downloaded, adn extracted somwhere locally. It can then be entered in the “WSDL Location” field of the Generic Webservice client in KNIME (using a location of the form: file:/C:/temp/ChemSpiderSearchWSDL_no_soapencArray.WSDL) which will then be processed correctly on clicking the “Analyze” button
  4. Set the Port, operation, inputs and outputs as required – see screencapture below for settings for my demonstration. Note that you should use your own token as the value for the token input – if you don’t have one already then see the instructions here for instructions.
  5. Add input and output nodes which connect to and from this Generic WebService Client node as required. For example, you could add a FileReader node as the initial node, which reads in the contents of a text file that simply contains a search term as an input (and adapt the Input value accepted as the query input value of the SimpleSearch to map to this column, rather than hardcoding in a value to search for). And the output csid could be written to a csv file using a CSV Writer node.
  6. On executing the workflow, an output csv file is created which contains the ChemSpider ID(s) of any compounds that match the search term. In the case of a search for “benzene” the csid retrieved is 236.

The functionality of electronic lab notebooks (ELNs) and that of ChemSpider overlap to a certain extent – both store chemical information including structures, data, spectra and reactions. However, the focus of most ELNs is to manage, track and audit that data, and that of ChemSpider is to publish and disseminate it to the world. We have been considering how best to use this complementary functionality to integrate an ELN with ChemSpider.

Some ELNs already currently look up information and link to ChemSpider. For example the blog3 Web-logging (“blogging”) engine by Jeremy Frey, Simon Coles and Mark Borkum at Southampton University, which allows links to compounds from the ChemSpider database to be embedded directly into the content of a post. When a link to ChemSpider is detected, blog3 follows the link to retrieve additional information that is relevant to the compound, including: experimental and theoretical data; two- and three- dimensional depictions; and links to papers and journal articles. Another example is the eScience tool that Stephen Wan from CSIRO has developed with the University of New South Wales to text mine LabTrove ELN blog posts to identify chemical names and link these to the relevant ChemSpider compounds.

At the meeting “The Smart Laboratory: Towards a national ELN” meeting (organised as part of the Dial-a-Molecule EPSRC Grand Challenge) in August this year, the seeds were sown to take the integration between ELNs and ChemSpider a step further. Cambridge University has the first Chemistry department in the UK to roll out a department-wide Electronic Lab notebook system, and the software that they’re using is IDBS’s E-WorkBook Suite. In collaboration with IDBS and Cambridge’s Chemistry department, we at ChemSpider have made a plug-in which could both dynamically retrieve information from ChemSpider into their ELN, and publish to it the other way. The Chemistry department at Cambridge (Dr Tim Dickens, Dr Brian Brooks, Prof Bobby Glenn and Prof Steven Ley) have been very helpful in granting access to their ELN to write the plug-in, and will be its first users, but the results will be freely available for any existing IDBS E-WorkBook suite user.

About the extension Prof Bobby Glenn has said: “Much of Chemistry is lost, it is simply not published and languishes in forgotten lab notebooks. Capturing novel molecules soon after synthesis on a searchable database like Chemspider is now an effortless process directly from the ELN, which will greatly encourage sharing of compounds, synthetic methods and all the associated data. It’s instant messaging for chemists”. Antony Williams (Vice-President of Strategic Development of ChemSpider) added “The ability to now publish compound data from the IDBS ELN directly to ChemSpider offers a path to direct exposure of novel chemistry as well as the chemist doing the work. This public compound registration capability is the first move towards ultimately exposing synthetic methods and associated experimental data to the community. Our vision is coming to fruition through this collaboration.”

To view the plug-in in action please view the demonstration movie of ChemSpider E-WorkBook Suite Plugin.

Screen capture of launching Publish to ChemSpider plug-in

Compounds can be published to ChemSpider if they have been drawn out in full in an experiment – whether this is as an individual structure or part of a reaction, and whether they are simply uploaded into the experiment as a reaction file, or included in for example a spreadsheet item. Likewise, compound structures can be automatically loaded into a search of ChemSpider if you would like to find out more information about compounds that have been drawn out in full in an experiment, or if you have published a compound to ChemSpider and wish to see the resulting compound pages. The resulting compound pages in ChemSpider will have the data source “IDBS E-WorkBook Suite”. The external ID will show the ID of the experiment from which the structures are from, and the depositor details as defined in the ChemSpider Settings of the ELN.

The ChemSpider IDBS E-WorkBook Suite plug-in is freely available to customers of IDBS E-WorkBook Suite by downloading it from IDBS, and copying it the appropriate place in their IDBS E-WorkBook Suite program files. It is compatible with E-WorkBook Suite versions 9.0 and 9.1.

This plug-in is an initial proof-of-concept to demonstrate that we can pass compound information between ChemSpider and an ELN in both directions. Future versions will allow more of the information within an experiment to be published to ChemSpider – for example to allow reactions along with a description of their methods to be published to ChemSpider SyntheticPages, or to deposit spectra along with compounds to ChemSpider. We would also like to integrate other ELNs with ChemSpider.

Recently I have been programming a java plug-in from which I needed to call the ChemSpider webservices, and I found that this wasn’t as straightforward as I was expecting, so I thought I would post how to do it in case it’s useful for anyone else who wants to do likewise.
The basic method I used was to use Apache Axis2 to generate java code for the WSDL’s of the main ChemSpider webservices. This java code is available here: chemspider_webservices_javasourcecode.zip and I have also made the compiled jar file available here: chemspider_webservices.jar. The ChemSpider webservices can be called from other java code by referencing this jar file (and the other axis library files).
This blog post describes how I generated and used this jar file. I was using the Eclipse IDE, so some of what I describe will be specific to that.
There is a similar jar file of some ChemSpider webservices which is available by downloading MZMine (the file chemspider-api.jar in the lib directory) and an example of its use can be seen by downloading the source code and looking at the file src\net\sf\mzmine\modules\peaklistmethods\identification\dbsearch\databases\ChemSpiderGateway.java). That jar file was generated using the previous version of Axis (just plain Axis, rather than Axis2) compared to this one. The example here may be easier to use as a start point since the full range of ChemSpider webservices are included in the jar file, there is a full description of how it was generated, the code used to generate the jar file is available and there are more examples of its use.

Generating the chemspider_webservices.jar file

To generate the java code from the WSDL of the ChemSpider webservices I used the WSDL2Java functionality of Apache Axis2. This is available in different forms, including an Eclipse plug-in which will directly import the java code generated into a project, but I found various bugs when trying to use the latest version of that, so just used the command line version.
I started off with generating the java code from the WSDL of the ChemSpider MassSpecAPI webservice:

  • I downloaded and unzipped the latest version of the Apache Axis2 binary distribution from their download page. I used version 1.6.1 of Axis2.
  • In the “bin” directory of this download there should be a file called java2wsdl.bat. Running this batch file from a command line saves a lot of time trying to set up the class paths correctly to run Java2WSDL. Before using it you should set up the following two environment variables:
    • AXIS2_HOME: Must point to the top level of the AXIS2 files which you just downloaded
    • JAVA_HOME: Must point at your Java Development Kit installation direcotry (e.g. C:\Program Files\Java\jre6)
  • To see a full list of the options available when running WSDL2Java simply open a command prompt and run the batch file with no options to obtain the Usage options – more information about these can be found in the Apache Axis2 user guide:
    • > axis2-1.6.1\bin\wsdl2java.bat
  • I ran it with options to specify to use the SOAP 1.2 port of the ChemSpider MassSpecAPI webservice (most ChemSpider webservices have the option of SOAP 1.1, SOAP 1.2, HTTP GET or HTTP POST), to generate synchronous code only (not asynchronous), and to use adb databinding (this is the default anyway):
    • > axis2-1.6.1\bin\wsdl2java.bat -uri http://www.chemspider.com/MassSpecAPI.asmx?WSDL -pn MassSpecAPISoap12 -s -d adb
  • This then generated the file MassSpecAPIStub.java which it automatically put in the package com.chemspider.www (so was the appropriate folder structure was created above it accordingly)
  • I repeated this processes with the other 4 main ChemSpider webservices:
    • > axis2-1.6.1\bin\wsdl2java.bat -uri http://www.chemspider.com/Search.asmx?WSDL -pn SearchSoap12 -s -d adb
    • > axis2-1.6.1\bin\wsdl2java.bat -uri http://www.chemspider.com/InChI.asmx?WSDL -pn InChISoap12 -s -d adb
    • > axis2-1.6.1\bin\wsdl2java.bat -uri http://www.chemspider.com/Spectra.asmx?WSDL -pn SpectraSoap12 -s -d adb
    • > axis2-1.6.1\bin\wsdl2java.bat -uri http://www.chemspider.com/OpenBabel.asmx?WSDL -pn OpenBabelWebServiceSoap12 -s -d adb
  • The folders and java class files generated by Java2WSDL (MassSpecAPIStub.java, SearchStub.java, InChIStub.java, SpectraStub.java and OpenBabelWebServiceStub.java) that were generated are available in the zip file chemspider_webservices_javasourcecode.zip for further reference
  • I then started a new Eclipse project, imported this generated File system into it
  • The generated classes rely on the Axis2 library files so these need to be added to the build path – in Eclipse this is done by right-clicking on the project in the Package Explorer, choosing Properties > Java Build Path > Libraries > Add External Jars and selecting all of the lib files in the lib folder of the Axis2 folder.
  • This project was exported as the jar file chemspider_webservices.jar

Using the chemspider_webservices.jar file as an external library jar file

The chemspider_webservices.jar file and all of the Apache Axis2 library jar files need adding to a java project as referenced libraries before it can be called. To do this in Eclipse right-click on the project in the Package Explorer, choose Properties > Java Build Path > Libraries > Add External Jars and select:

  • the chemspider_webservices.jar file (download it from chemspider_webservices.jar and save it locally)
  • all of the lib files in the lib folder of the Axis2 folder.

Once this has been done then the ChemSpider webservices can be called from the project. An example is shown below, and is also downloadable in text format from here. This has been structured into (pretty well self-contained) functions which can be easily called to retrieve the results of a particular operation of a webservice. In the main function these functions are called and the output written out.

Please note that you should put your obtains your own ChemSpider token from ChemSpider to set as the ChemSpiderToken value – to obtain this, register for a ChemSpider account, and look up your token from your user Profile page after logging in. Some tokens require your user account to be associated with the “Service Subscriber” role, which you can request from your user profile page.

package com.chemspider.www.examples;

import java.util.HashMap;
import java.util.Map;

import javax.swing.JOptionPane;

import org.apache.log4j.BasicConfigurator;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;

import com.chemspider.www.*;
import com.chemspider.www.InChIStub.InChIToCSIDResponse;
import com.chemspider.www.SearchStub.GetAsyncSearchResultResponse;
import com.chemspider.www.SearchStub.GetAsyncSearchStatusResponse;
import com.chemspider.www.SearchStub.SimpleSearchResponse;
import com.chemspider.www.MassSpecAPIStub.ArrayOfInt;
import com.chemspider.www.MassSpecAPIStub.ArrayOfString;
import com.chemspider.www.MassSpecAPIStub.ExtendedCompoundInfo;
import com.chemspider.www.MassSpecAPIStub.GetDatabasesResponse;
import com.chemspider.www.MassSpecAPIStub.GetExtendedCompoundInfoArrayResponse;
import com.chemspider.www.MassSpecAPIStub.SearchByMassAsyncResponse;

public class WebServiceExamples {

/**
* @param args
*/

private static final Logger LOG = Logger.getLogger(WebServiceExamples.class.getName());

private static String ChemSpiderToken = "YOU NEED TO INSERT YOUR OWN TOKEN IN HERE";

public static void main(String[] args) {
BasicConfigurator.configure();

JOptionPane.showMessageDialog(null, "The compound with InChI InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H has CSID:"+get_InChI_InChIToCSID_Results("InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H"));

int[] SimpleSearchResults = get_Search_SimpleSearch_Results("taxol", ChemSpiderToken);
JOptionPane.showMessageDialog(null, "The first of "+SimpleSearchResults.length+" ChemSpider compound(s) returned by a search for Taxol has CSID:"+SimpleSearchResults[0]);

int[] inputCSIDs = new int[2];
inputCSIDs[0] = 236;
inputCSIDs[1] = 238;
Map> GetExtendedCompoundInfoArrayResults = get_MassSpecAPI_GetExtendedCompoundInfoArray_Results(inputCSIDs, ChemSpiderToken);
Map thisCompoundInfo = GetExtendedCompoundInfoArrayResults.get(238);
JOptionPane.showMessageDialog(null, "The Average Mass of the compound with CSID 238 is: "+thisCompoundInfo.get("AverageMass"));

String[] GetDatabaseResults = get_MassSpecAPI_GetDatabases_Results();
JOptionPane.showMessageDialog(null, "The first of "+GetDatabaseResults.length+" datasources in ChemSpider is:"+GetDatabaseResults[0]);

String SearchByMassAsyncResults = get_MassSpecAPI_SearchByMassAsync_Results(1100.0, 0.1,GetDatabaseResults, ChemSpiderToken);
JOptionPane.showMessageDialog(null, "Transaction ID for search on compounds with mass = 1100+/- 0.1 from any data source is" + SearchByMassAsyncResults);
JOptionPane.showMessageDialog(null, "The operation status of the search with this transaction ID is" + get_Search_GetAsyncSearchStatus_Results(SearchByMassAsyncResults, ChemSpiderToken));
int[] GetAsyncSearchResultResults = get_Search_GetAsyncSearchResult_Results(SearchByMassAsyncResults, ChemSpiderToken);
JOptionPane.showMessageDialog(null, "And the first of "+GetAsyncSearchResultResults.length+" ChemSpider compound(s) returned by the search has CSID:"+GetAsyncSearchResultResults[0]);
}

/**
* Function to call the InChIToCSID operation of ChemSpider's InChI SOAP 1.2 webservice (http://www.chemspider.com/InChI.asmx?op=InChIToCSID)
* Convert InChI to ChemSpider ID.
*
* @param inchi: string representing inchi to search ChemSpider for
* @return: string representing CSID returned
*/
public static String get_InChI_InChIToCSID_Results(String inchi) {
String Output = null;
try {

final InChIStub thisInChIstub = new InChIStub();
com.chemspider.www.InChIStub.InChIToCSID InChIToCSIDInput = new com.chemspider.www.InChIStub.InChIToCSID();
InChIToCSIDInput.setInchi(inchi);
final InChIToCSIDResponse thisInChIToCSIDResponse = thisInChIstub.inChIToCSID(InChIToCSIDInput);
Output = thisInChIToCSIDResponse.getInChIToCSIDResult();
} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

/**
* Function to call the SimpleSearch operation of ChemSpider's Search SOAP 1.2 webservice (http://www.chemspider.com/search.asmx?op=SimpleSearch)
* Search by Name, SMILES, InChI, InChIKey, etc. Returns a list of found CSIDs (first 100 - please use AsyncSimpleSearch instead if you like to get the full list). Security token is required.
*
* @param query: String representing search term (can be Name, SMILES, InChI, InChIKey)
* @param token: string containing your user token (listed at your http://www.chemspider.com/UserProfile.aspx page)
* @return: int[] array containing the ChemSpider IDs. If more than 100 are found then only the first 100 are returned.
*/
public static int[] get_Search_SimpleSearch_Results(String query, String token) {
int[] Output = null;
try {
final SearchStub thisSearchStub = new SearchStub();
com.chemspider.www.SearchStub.SimpleSearch SimpleSearchInput = new com.chemspider.www.SearchStub.SimpleSearch();
SimpleSearchInput.setQuery(query);
SimpleSearchInput.setToken(token);
final SimpleSearchResponse thisSimpleSearchResponse = thisSearchStub.simpleSearch(SimpleSearchInput);
Output = thisSimpleSearchResponse.getSimpleSearchResult().get_int();
} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

/**
* Function to call the GetDatabases operation of ChemSpider's MassSpecAPI SOAP 1.2 webservice (http://www.chemspider.com/massspecapi.asmx?op=GetDatabases)
* Get the list of datasources in ChemSpider.
*
* @return: the list of datasources in ChemSpider as a String Array
*/
public static String[] get_MassSpecAPI_GetDatabases_Results() {
String[] Output = null;
try {

final MassSpecAPIStub thisMassSpecAPIStub = new MassSpecAPIStub();
com.chemspider.www.MassSpecAPIStub.GetDatabases getDatabaseInput = new com.chemspider.www.MassSpecAPIStub.GetDatabases();
final GetDatabasesResponse thisGetDatabasesResponse = thisMassSpecAPIStub.getDatabases(getDatabaseInput);
Output = thisGetDatabasesResponse.getGetDatabasesResult().getString();
} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

/**
* Function to call the GetExtendedCompoundInfoArray operation of ChemSpider's MassSpecAPI SOAP 1.2 webservice (http://www.chemspider.com/massspecapi.asmx?op=GetExtendedCompoundInfoArray)
* Get array of extended record details by an array of CSIDs. Security token is required.
*
* @param CSIDs: integer array containing the CSIDs of compounds for which information will be returned
* @param token: string containing your user token (listed at your http://www.chemspider.com/UserProfile.aspx page)
* @return: a Map> containing the results array for each CSID (with Properties CSID, MF, SMILES, InChIKey, AverageMass, MolecularWeight, MonoisotopicMass, NominalMass, ALogP, XLogP, CommonName)
*/
public static Map> get_MassSpecAPI_GetExtendedCompoundInfoArray_Results(int[] CSIDs, String token) {
Map> Output = new HashMap>();
try {
final MassSpecAPIStub thisMassSpecAPIStub = new MassSpecAPIStub();
ArrayOfInt inputCSIDsArrayofInt = new ArrayOfInt();
inputCSIDsArrayofInt.set_int(CSIDs);
com.chemspider.www.MassSpecAPIStub.GetExtendedCompoundInfoArray getGetExtendedCompoundInfoArrayInput = new com.chemspider.www.MassSpecAPIStub.GetExtendedCompoundInfoArray();
getGetExtendedCompoundInfoArrayInput.setCSIDs(inputCSIDsArrayofInt);
getGetExtendedCompoundInfoArrayInput.setToken(token);
final GetExtendedCompoundInfoArrayResponse thisGetExtendedCompoundInfoArrayResponse = thisMassSpecAPIStub.getExtendedCompoundInfoArray(getGetExtendedCompoundInfoArrayInput);
ExtendedCompoundInfo[] thisExtendedCompoundInfo = thisGetExtendedCompoundInfoArrayResponse.getGetExtendedCompoundInfoArrayResult().getExtendedCompoundInfo();
for (int i=0; i Map thisCompoundExtendedCompoundInfoArrayOutput = new HashMap();
thisCompoundExtendedCompoundInfoArrayOutput.put("CSID", Integer.toString(thisExtendedCompoundInfo[i].getCSID()));
thisCompoundExtendedCompoundInfoArrayOutput.put("MF", thisExtendedCompoundInfo[i].getMF());
thisCompoundExtendedCompoundInfoArrayOutput.put("SMILES", thisExtendedCompoundInfo[i].getSMILES());
thisCompoundExtendedCompoundInfoArrayOutput.put("InChI", thisExtendedCompoundInfo[i].getInChI());
thisCompoundExtendedCompoundInfoArrayOutput.put("InChIKey", thisExtendedCompoundInfo[i].getInChIKey());
thisCompoundExtendedCompoundInfoArrayOutput.put("AverageMass", Double.toString(thisExtendedCompoundInfo[i].getAverageMass()));
thisCompoundExtendedCompoundInfoArrayOutput.put("MolecularWeight", Double.toString(thisExtendedCompoundInfo[i].getMolecularWeight()));
thisCompoundExtendedCompoundInfoArrayOutput.put("MonoisotopicMass", Double.toString(thisExtendedCompoundInfo[i].getMonoisotopicMass()));
thisCompoundExtendedCompoundInfoArrayOutput.put("NominalMass", Double.toString(thisExtendedCompoundInfo[i].getNominalMass()));
thisCompoundExtendedCompoundInfoArrayOutput.put("ALogP", Double.toString(thisExtendedCompoundInfo[i].getALogP()));
thisCompoundExtendedCompoundInfoArrayOutput.put("XLogP", Double.toString(thisExtendedCompoundInfo[i].getXLogP()));
thisCompoundExtendedCompoundInfoArrayOutput.put("CommonName", thisExtendedCompoundInfo[i].getCommonName());
Output.put(thisExtendedCompoundInfo[i].getCSID(), thisCompoundExtendedCompoundInfoArrayOutput);
}

} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

/**
* Function to call the SearchByMass2 operation of ChemSpider's MassSpecAPI SOAP 1.2 webservice (http://www.chemspider.com/massspecapi.asmx?op=SearchByMass2)
* Search ChemSpider by mass +/- range.
*
* @param Mass: The compounds returned have a mass (Double) within the range Mass +/- Range
* @param Range: The compounds returned have a mass (Double) within the range Mass +/- Range
* @return: the ChemSpider IDs of compounds returned (as a String Array)
*/
public static String get_MassSpecAPI_SearchByMassAsync_Results(Double mass, Double range, String[] dbs, String token) {
String Output = null;
try {
final MassSpecAPIStub thisMassSpecAPIStub = new MassSpecAPIStub();
com.chemspider.www.MassSpecAPIStub.SearchByMassAsync getSearchByMassAsyncInput = new com.chemspider.www.MassSpecAPIStub.SearchByMassAsync();
getSearchByMassAsyncInput.setMass(mass);
getSearchByMassAsyncInput.setRange(range);
ArrayOfString inputDBsArrayofString = new ArrayOfString();
inputDBsArrayofString.setString(dbs);
getSearchByMassAsyncInput.setDbs(inputDBsArrayofString);
getSearchByMassAsyncInput.setToken(token);
final SearchByMassAsyncResponse thisSearchByMassAsyncResponse = thisMassSpecAPIStub.searchByMassAsync(getSearchByMassAsyncInput);
Output = thisSearchByMassAsyncResponse.getSearchByMassAsyncResult();
} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

/**
* Function to call the GetAsyncSearchStatus operation of ChemSpider's Search SOAP 1.2 webservice (http://www.chemspider.com/search.asmx?op=GetAsyncSearchStatus)
* Query asynchronous operation status. Requires transaction ID returned by AsynchSearch operation. Security token is required.
*
* @param rid: String representing transaction ID returned from a previous search
* @param token: string containing your user token (listed at your http://www.chemspider.com/UserProfile.aspx page)
* @return: String describing status of this search - can have values Unknown or Created or Scheduled or Processing or Suspended or PartialResultReady or ResultReady or Failed or TooManyRecords
*/
public static String get_Search_GetAsyncSearchStatus_Results(String rid, String token) {
String Output = null;
try {
final SearchStub thisSearchStub = new SearchStub();
com.chemspider.www.SearchStub.GetAsyncSearchStatus GetAsyncSearchStatusInput = new com.chemspider.www.SearchStub.GetAsyncSearchStatus();
GetAsyncSearchStatusInput.setRid(rid);
GetAsyncSearchStatusInput.setToken(token);
final GetAsyncSearchStatusResponse thisGetAsyncSearchStatusResponse = thisSearchStub.getAsyncSearchStatus(GetAsyncSearchStatusInput);
Output = thisGetAsyncSearchStatusResponse.getGetAsyncSearchStatusResult().toString();
} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

/**
* Function to call the GetAsyncSearchResult operation of ChemSpider's Search SOAP 1.2 webservice (http://www.chemspider.com/search.asmx?op=GetAsyncSearchResult)
* Returns the list of CSIDs found by AsynchSearch operation. Security token is required.
*
* @param rid: String representing transaction ID returned from a previous search
* @param token: string containing your user token (listed at your http://www.chemspider.com/UserProfile.aspx page)
* @return: int[] array containing the ChemSpider IDs.
*/
public static int[] get_Search_GetAsyncSearchResult_Results(String rid, String token) {
int[] Output = null;
try {
final SearchStub thisSearchStub = new SearchStub();
com.chemspider.www.SearchStub.GetAsyncSearchResult GetAsyncSearchResultInput = new com.chemspider.www.SearchStub.GetAsyncSearchResult();
GetAsyncSearchResultInput.setRid(rid);
GetAsyncSearchResultInput.setToken(token);
final GetAsyncSearchResultResponse thisGetAsyncSearchResultResponse = thisSearchStub.getAsyncSearchResult(GetAsyncSearchResultInput);
Output = thisGetAsyncSearchResultResponse.getGetAsyncSearchResultResult().get_int();
} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

}

Disclaimer: I’m new to Java programming, so please excuse me if you are a java expert and I’ve said something obvious, offended you with my code or used the wrong terminology anywhere.

In a way this is a taster, as we’re looking at our Search as part of the refresh of ChemSpider, and more detail will follow. Another motivation for posting was a couple of recent requests for ChemSpider functionality which is already available – a great pointer to how we think about offering massive functionality in a clear interface. The two requests? One was that it would be great if a user could do a search from an input image (so to load an image, convert to structure and launch a search). The other wanted a way to just look for compounds with a specific element included. Both of these can be done on ChemSpider – and tragically both came in as amonymous feature requests. So, because I don’t think they’ve even been fully itemised before, let me count the ways by which you can search ChemSpider.

Simple Search on ChemSpider

 

 

 

 

 

 

 

1. Search by name – systematic name, synonym, trade name
2. Search by chemical identifier – InChI, InChIKey, SMILES
3. Search by database identifier – registry number

Structure Search on ChemSpider

 

 

 

 

 

 

 

 

 

 

 

 

4. Search by exact structure drawn, substructure, similarity – exact match, all tautomers, same skeleton (including/excluding H), all isomers
5. Draw an exact structure – in one of several structure drawers

Load a structure file from an image

 

 

 

 

6. Load from mol, sdf, skc, cdx files
7. Load from an image of the structure – gif, png, jpg, tiff – to get an editable/correctable structure for search

Convert an identifier to search ChemSpider

 

 

 

 

8. Convert an identifier or name to a structure, to use or amend in the structure search

Advanced search options in ChemSpider

 

 

 

 

 

 

 

9. Search for compound with/without a particular element or elements
10. Search by properties – molecular formula, mol wt, nominal mass, average mass, monoisotopic mass. Exact match or within a range
11. Search by calculated properties range – ACD/LogP, ACD/LogD (pH 5.5), ACD/LogD (pH 7.4), Rule Of 5, Number of Hydrogen Bond Acceptors, Number of Hydrogen Bond Donors, Number of Freely Rotatable Bonds, Polar Surface Area, Polar Surface Area, Molar Volume, Refractive Index, Boiling Point, Flash Point, Density, Surface Tension
12. Search by data sources – select one or many individual data sources (from the 400 we hold), one or many data source types from Available Chemicals Databases, Biological Properties , Chemical Reactions, Chemical Safety Data , Drugs or Compounds in Development, Imaging Agents , Information Aggregators, Journal Publishers via MeSH , Ligand/binding/crystal Structure Databases, Metabolic Pathways , Molecular Libraries Screening Center Network, Natural Products, NIH Substance Repository, Patents, Personal Collections, Physical Properties (including SAR/QSAR databases), Protein 3D Structures, Publication or Magazine Article, Spectroscopy Databases , Substance Vendors, Theoretical Properties, Toxicology/Environmental Databases, Virtual Library, Web-based Article (blog or commentary)
13. Search by focussed library – Building Blocks, Screening Compounds, Building Stock, D-EXP014, Acetylcholinesterase (AChE), cAMP dependent protein kinase (PKA), Estrogen Receptor (Alpha), Phospholipase A2 (PLA(2)), Test Set for DILI modelling, Test Set for DILI modelling, Training Set for DILI modelling
14. Search by ligand screening – LASSO (Ligand Activity in Surface Similarity Order) similarity
15. Combine search to look for Single- or multi-component structures
16. Combine search to look for, or disregard, isotopically labelled structures
17. Filter results with analytical data

ChemSpider APIs

 

 

18. Use our web services for mass spectrometry to search by molecular mass or elemental composition within ChemSpider or within particular data sources,
19. Use our web services to search by chemical identifier, retrieve information about ChemSpider record, retrieve the chemical structure thumbnail
20. Use our web services for spectra to return all Open Data spectral information from ChemSpider, return spectral information on a compound, return identified spectra
21. You can show all spectra of a particular type on the spectra page

The free ChemSpider mobile app developed in collaboration with Alex Clark (innovator of the Mobile Molecular DataSheet, Reaction101 and Yield101) is now available for download from the iTunes store! The full details of the app, and some associated screenshots, are outlined on the SciMobileApps wiki here. A brief overview is given below…

“ChemSpider Mobile is a free iOS app (iPhone, iPod, iPad) for searching the ChemSpider online chemical database. It provides the ability to search by drawing a chemical structure, or entering a compound name. The app is very straightforward and easy to learn. Search results are shown in a list showing structure and names. Any search result can be examined in more detail by launching the mobile browser and viewing the structure on the ChemSpider web page. Although the ChemSpider web page is designed to work well on mobile browsers, the mobile app is more convenient to use, and is currently the best way to search by structure from a mobile device. The structure drawing capabilities are provided by the embedded version of the Mobile Molecular DataSheet. The app was built by Molecular Materials Informatics, on behalf of the Royal Society of Chemistry.”

We will look at developing an Android app for ChemSpider, taking into account what we learn from the early use of the iOS Mobile app.

A screencast of the functionality of ChemSpider Mobile is available below.

Only two days until the start of this year’s Fall ACS meeting in Denver. The ChemSpider team is busy preparing for the meeting, packing bags, polishing talks and honing workshop skills.

Please drop by and say “Hi!”

We’d like to repeat our invitation to everyone at the conference to drop by the RSC booth (Booth 1100). Where, of course you can chat with the ChemSpider team, get a quick demo (and find out more about our latest features), pick up our hot-off-the-press User Guide or scoop some exclusive ChemSpider goodies!

To celebrate the release of the new iPhone/iPad app* we have a limited number of covers for 3G and 4G iPhones as well as iPads

*The app itself is free to download from the AppStore.

You can also find out about lots of other things that the RSC does: from publishing books and journals to the promotion of chemistry worldwide. We’ll also have lots of information on our new e-membership option, which is making its’ debut at this meeting. Also keep an eye out for members of our Editorial staff from journals including: OBC, MedChemComm, PCCP, Soft Matter and RSC Advances, who will be scouring the conference in search of lots of new and exciting research.

Natural Product & Synthetic Chemists

I’d like to make an extra special invitation to any Synthetic chemists and Natural products chemists – from PhD students to Professors (please pass this on to all your friends and colleagues who will be at the meeting). The ChemSpider team really wants to hear about your research. Tell us about your latest publication or the work that you are most proud of, and we can make sure that your key compounds from these publications are in ChemSpider, on a platform freely accessible to chemists everywhere. If you are more interested in methodology you shouldn’t feel left out – ask us about ChemSpider Synthetic Pages.

 

ChemSpider related talks and workshops

Antony Williams (most-definitely the hardest working man I know) is giving a number of talks and workshops (details below) which are sure to be entertaining as well as thought-provoking and will be well-worth squeezing into your schedule.

We look forward to meeting you.

 

“Aligning scientific expertise and passion through a career path in the chemical sciences”

Colorado Convention Center, Room: 110, Sunday 28th August 2011, 1.40PM – 2PM

 

“Chemistry in the hand: The delivery of structure databases and spectroscopy gaming on mobile devices

Colorado Convention Center, Room: 110, Monday 29th August 2011, 9.05AM – 9.35AM

 

“ChemSpider: Does community engagement work to build a quality online resource for chemists?”

Colorado Convention Center, Room: 110, Tuesday 30th August, 10.10AM – 10.50AM

 

“An Introduction to ChemSpider – A Combination Platform of Free Chemistry Database, Free Prediction Engines and Wiki Environment”

Colorado Convention Center, Room 503, Wednesday 31th August 2011, 08.30AM – 11AM

 

“Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs”

Colorado Convention Center, Room: 110, Wednesday 31st August 2011, 10.45AM – 11.05AM

As I mentioned in my blog post a few weeks ago, over the last few months we have been hard at work trying to improve how we organise all of the information and features that can be found when you view a ChemSpider record. And now you can see the fruits of our labour.

We hope that you find the changes we’ve made give you a better and easier user experience. While we think that the changes will be clear and intuitive, I’d like to highlight a few key features in my next few posts.

Inline help

When you look at compound pages and other useful pages, you should now see a lot more Question mark symbols dottedInline Help question symbol throughout the pages. We’ve called this approach inline help: rather than giving you an in-depth help resource on a separate page or as a PDF, it is much more useful to have a little snippet of help right at the point in the page where you need it. Clicking on the question mark symbol should bring up a yellow text box with short guidance (where there is a need to provide more complete help, we’ll provide a link to a page which contains much more detailed information). Of course, do let us know if you have any suggestions for improvements to the help text.

Inline hep text

 

Default infobox ordering*

Many users indicate they most often look for names (or name-structure associations), physical properties and spectral data, so we have put this information at the beginning of the record. Now when you come to a record, by default the Names infobox is the first box listed followed by the Properties, Spectra and the Articles infoboxes.

None of your favourite infoboxes have been removed (in fact we’ve created some new ones – see later). If you don’t like the default order, it is easy to change the ordering of the infoboxes by clicking on the titlebar and dragging them up or down the record. ChemSpider will remember your order and will use this for all future visits to the site from that PC (in the same browser/profile).

*If you have visited the site before ChemSpider will remember your previous settings. If you want to see the new default order you will need to clear your browser history or delete the ChemSpider cookies that are saved in your profile.

 

New infoboxes: Searches and Chemical Vendors

ChemSpider has always had great features, for instance:

The Similar Search – that allows you to find records for compounds that have the same skeleton, but have different stereochemistry or isotopic labels

The ability to load the structure from the current record into a structure search, so that you are able to modify it and construct a new search.

However, this hasn’t always been made very clear, in our redesigned compound page we have aimed to make these powerful search tools easier to discover and understand.

The Searches Infobox

Now you can find these all together in the Searches infobox – along with our Google Scholar custom queries which allow you to perform one search across publications using all of the validated synonyms (saving you from having to perform many separate searches for individual synonyms). We also help you to perform ‘structure searches’ of Google (in the form of an InChIkey search).

The Search infobox

The Chemical Vendors Infobox

We’ve also created an infobox  just to display Chemical vendor information, so that it is much quicker to find if the compound in the record is commercially available.

The record for Sparteine with it's Chemical vendors infobox

 

In my next post I’ll finish off discussing the improvements that we’ve made to the site. But of course, if you have any comments or questions about the features I’ve discussed here, please leave a comment below, or send an email to the ChemSpider inbox.

 

 

There are multiple structure drawing editors on ChemSpider. And we could add more! For example, one we don’t have is JSDraw and we also don’t have the ChemDoodle components in place, yet, though I am VERY impressed with the spectral display components that are integrated into the SpectralGame that ChemSpider supports. Compared to just a few years ago there is now an abundance of structure drawing editors in the form of Applets and JavaScript Editors. So many in fact that it can be confusing to the user. The user in reality should not worry about the technology behind the editor. It should be quite simple, especially when it comes to something as simple as the editor being the interface to querying ChemSpider. It should display perfectly on the browser(s) and platform(s) used by the user, it should be intuitive and easy to use (preferably without having to resort to reading help files), and essentially, it should “do what I want it to do”. Not at all an unreasonable list of demands right? Not so easy to deliver on mind you!

On ChemSpider we have multiple structure drawing editors. If you visit this page and open up the selection window by using “Click to Edit” you will see the editor below and, underneath the editor shown, a series of editors that you can choose from.

Structure Editors on ChemSpider

There has to be an order of listing the editors…the listed order is NOT a preferred order from our point of view. Just a list. We have heard feedback from numerous people about their preferred editor. Some live and breath the Java Molecular Editor (JME). Some prefer Accelrys JDraw because they already use Accelrys Draw. Many think that Elemental is a great Javascript Editor.

We are left with a choice….leave all editors (which has a cost in time to support them, keep them updated, tested etc) or reduce the number of editors to just a couple (or three). So, we welcome your input, on this blog post as a comment, or via the survey on SurveyMonkey here. We’d like your input to help steer our decision. Thanks

Previously there was ChemMobi, then there was our implementation of ChemSpider for a mobile browser and then ChemSpider SyntheticPages for a mobile browser. At next weeks’ ACS meeting in Denver we hope that the ChemSpider mobile app developed in collaboration with Alex Clark (innovator of the Mobile Molecular DataSheet, Reaction101 and Yield101) will be available for download from the iTunes store! The full details of the app, and some associated screenshots, are outlined on the SciMobileApps wiki here. A brief overview is given below…

“ChemSpider Mobile is a free iOS app (iPhone, iPod, iPad) for searching the ChemSpider online chemical database. It provides the ability to search by drawing a chemical structure, or entering a compound name. The app is very straightforward and easy to learn. Search results are shown in a list showing structure and names. Any search result can be examined in more detail by launching the mobile browser and viewing the structure on the ChemSpider web page.

Although the ChemSpider web page is designed to work well on mobile browsers, the mobile app is more convenient to use, and is currently the best way to search by structure from a mobile device. The structure drawing capabilities are provided by the embedded version of the Mobile Molecular DataSheet. The app was built by Molecular Materials Informatics, on behalf of the Royal Society of Chemistry.”

An early view screencast of the functionality of ChemSpider Mobile is now available.  New movies showing the details of the app will follow in the near future but this is an early view for interested parties.

We’ve rejigged our data to make searching more reliable.

What have we done?

We’ve regenerated all of the InChIs in the database with version 1.03 of the InChI code.

What does that mean?

The InChI (international chemical identifier) is a short piece of text that describes the structure of a molecule. Each one is generated by a free and open-source computer program, which guarantees that it should be the same and there shouldn’t be conflicting InChIs for the same molecule. You can’t really write them by hand, because they look like this:

InChI=1S/C10H22ClN2O5PS/c1-3-10(9-18-20(2,15)16)12-19(14)13(7-5-11)6-4-8-17-19/h10,12H,3-9H2,1-2H3

ChemSpider is built on InChIs. If two molecules have the same InChI, then they’re the same record in ChemSpider, and if you can’t InChIfy it, you can’t put it in ChemSpider. That’s why we can’t do, for example, polymers yet.

We’re proud to be founder members of the InChI Trust, which supports this critical element in the sharing of chemical compound information.

InChI Trust Member 2011

What does all this mean for ChemSpider?

Because there is an active community supporting InChI who look out for these things, version 1.03 contained some bug fixes which mean that a very small number of the InChIs themselves, only a few dozen out of the whole database, have changed.

  • P+–O bonds and P+–S are now treated slightly differently. This means that it will be easier to find the exact molecule you’re looking for, regardless of how it’s been drawn. (In principle this will also apply to analogous bonds containing arsenic, selenium, tellurium and antimony, but I can’t see any examples of this in the database.)
  • There was a small bug where the InChI generated for a molecule with an azide group in it sometimes varied according to the input drawing. But that doesn’t happen now.

This regeneration has also allowed us to catch and clean up some errors in the data.

What happens next?

Version 1.04 of the InChI code will be released soon. With our new framework for processing large amounts of data we’ll be able to update our InChIs much quicker. The main changes in 1.04 that affect the InChI are to how it handles radical atoms in aromatic rings, nobelium, lawrencium and rutherfordium, so we anticipate that there shouldn’t be very many changed InChIs!

COPIED FROM THE CHEMCONNECTOR BLOG

Unless you have no interest in sports, or have your head under a stone, you will be aware of the fact that the next Olympics will be held in London in 2012. Peter Scott (one of the editors of ChemSpider SyntheticPages) and I were recently discussing how much of a role chemistry plays now in modern sports. I’m a runner, cyclist, swimmer and overall sporting type of guy and depend on wicking materials to keep me cool, nutritional support to get me through my 100-150 mile bike rides in a day, glide stick to “stop me chafing” (ow!) and graphite grease to silence the rattling chain on my bike. In fact it doesn’t matter what sport I am doing it is easy to notice the influence that chemistry has on my improved performance at my tender age of, ahem, just over 40 (and holding, for a while now).

I was reminiscing with Peter that Sir Graham Richards and I were chatting about pyrenes about a year ago and we lamented on how Benzo[CD]pyrene, shown here, looks just like the Olympic rings. There is another rather well known “Olympic molecule” of course, already captured on Wikipedia and named Olympiadane. It looks rather complex to synthesize and personally I think the benzopyrene looks a lot more like the Olympic rings so I attached the synonym Olympicene to it! In fact, if you search ChemSpider using the name Olympicene you will find it.

In a recent discussion about our online crowdsourced database of syntheses, ChemSpider SyntheticPages,(and not distracted at all by the conversation about the Olympics going to the UK next year!!!)  I mentioned again to Peter the molecule Olympicene and he searched ChemSpider to find it. We agreed that it would be fun to know how easy it would be too synthesize it and if it was done it would be a good synthesis to add to ChemSpider SyntheticPages. That was enough to trigger Peter into action and chat with one of his colleagues to see if he can make it.

And so it starts…the trials and tribulations of how to synthesize the chemical Olympicene will be captured on ChemSpider SyntheticPages step by step. We’re not sure how complex a synthesis it will be..time will tell. It will be great to add the analytical data to ChemSpider too as it gets generated..including all the intermediate reaction steps and associated data. ChemSpider and CSSP were designed to support projects like this so it will be a fun story to watch it work through.

If YOU have any thoughts about good synthetic approaches for what seems like a simple molecule post them on this blog. Actually, why not try synthesizing yourself and add your syntheses to SyntheticPages!? Every contribution is issued a DOI for your publication list!

It might be ideal to get a  number of synthetic approaches posted on ChemSpider SyntheticPages and see which one is the best! Watch this space. Also, I’ve set up a Twitter account to capture the progress at @Olympicene. Enjoy!

We will be hosting a training session for ChemSpider at the ACS meeting in Denver. Please register early.

An Introduction to ChemSpider – A Combination Platform of Free Chemistry Database, Free Prediction Engines and Wiki Environment

Where: Colorado Convention Center
Room: 503
When: Wednesday, August 31, 8:30 AM – 11:00 AM

>> Click here to register for this workshop
ChemSpider has become one of the premier free online chemistry resources used by many thousands of chemists around the world every day. Hosting over 26 million unique chemical entities, sourced from over 400 separate data sources, ChemSpider provides access to experimental and predicted data, links to patents and publications and uniquely offers the ability to deposit and share their own data online. With the intention of integrating and curating public chemistry resources for the community ChemSpider encourages participation from chemists around the world. Integrated to Wikipedia, Google Patents, Google Books, Google Scholar and PubMed, as well as to the RSC Publishing platform, ChemSpider provides access to chemistry contained in millions of articles. This training session will provide an overview of searching ChemSpider and will discuss how to deposit data and participate in curating the existing information. We will also provide an overview of ChemSpider SyntheticPages, our venture into providing a community-based resource of semantically enriched synthetic procedures and allowing community peer review. This will be an interactive session and you are encouraged to bring your laptops to work along and ask questions regarding present and future capabilities. ChemSpider is built for the community and we welcome your comments about how to make it better for your needs.

I’m sure that by now everyone has noticed that the ChemSpider homepage design changed just over a month ago. A few features moved around, the Molecules of Interest section was retired and perhaps most significantly the Search box was given a dose of CSID: 5791, becoming bigger and more prominent.

The reason for this wasn’t just to make the site more attractive (though I think it does look ‘prettier’). Our motivation for the change is to deliver a site that makes it easier for users to interact with and understand. And by doing so, hopefully make it quicker and simpler for you to get your tasks done using ChemSpider. The refresh of the homepage is hopefully illustrative of this: We think that as most users come to ChemSpider to search for information – it should be easy to get straight into a search, hence the greater emphasis on this feature.

In the next few days we will release another upgrade to the interface which is centered on making it easier to understand the data presented in the compound Record View pages. I’ll post a blog entry dealing with some of the key features in the next few days.

The development of ChemSpider is an ongoing process, and we are aware that even after this upgrade there will be aspects of the compound Record View pages that will need more work (and also other parts of the site that still need development). It’s not going to be easy: ChemSpider brings together a rich and varied set of data from a large number of sources – this poses many challenges. We also realise that there are many different tasks that each of you – as users – want to perform, and it is always going to be difficult to reconcile all of the different opinions/needs.

However, we are trying to make the site better for you. And therefore, we’d really like to know your opinions on the changes (please test new features for a few days first). We welcome your feedback on the redesign either in the form of blog comments or email feedback (chemspider-at-rsc.org).

Over the next week – keep your eyes peeled for the upgrade and my accompanying blog post which will endeavor to give you a good introduction to the new features.

We have text mined compound names from all RSC 2008-2010 journal articles and loaded these into ChemSpider – adding about 26,000 new-to-ChemSpider compounds with links back to the published articles. We’ve also simplified the view of compound name and chemical/biochemical term highlighting within the Publishing Platform HTML view, so readers can link out from compound names (direct to ChemSpider for related compound information) and from chemical and biochemical terms (to other linked articles). We’ll be extending this to cover our 2011-and-then-ongoing publications, then looking to go further back into our journal archive. Later this week we should also have the compounds visible from the article home page, also linking through to ChemSpider

We have also worked with the Utopia Documents team (getutopia.com) to apply these enhancements to our PDF – so with the free Utopia Documents PDF viewer (originally developed in conjunction with Portland Press for the Biochemical Journal), readers get any enhancements overlaid on top of the PDF as they’re reading and can link out just as they can from the HTML. As this is powered from an API from our Publishing Platform, any additional links we make in future will be reflected in real time without having to update the PDF. Anyone who’s seen Steve Pettifer’s Utopia demonstrations tends to say “wow” at the potential, so many thanks to the Utopia team in Manchester for adding support for RSC articles. As above, this will work on 2008-2010 articles just being loaded, and as we extend the coverage Utopia will pick up and display the additional links for these papers

The RSC’s free chemical database ChemSpider has added RDF functionality to its interface, in collaboration with the University of Southampton’s School of Chemistry. The availability of RDF allows the database records to be found and understood by semantic web tools, another step in ChemSpider’s mission to create a public chemical information infrastructure.

Richard Kidd, Informatics Manager at the RSC says “we are delighted to work with top academic teams pushing forward what’s possible with semantic chemistry, and we hope others will use the RDF representation of ChemSpider to support their own developments”

ChemSpider as a Linked Data source for oreChem

The machine-processable representation was specifically developed in order to leverage the core competencies of the ChemSpider database: resolvable identifiers; high-quality, curated metadata; and rich linking to the extensive RSC corpus. Furthermore, as part of the Microsoft Research-funded oreChem project, OAI-ORE technology is being used to facilitate the discovery and re-use of the chemical information in the correct context.

Prof Jeremy Frey and Dr Simon Coles commented “it is a pleasure for Southampton to work with the RSC’s ChemSpider as a culmination of our contribution to the Microsoft-funded oreChem project. As a member of the Southampton Chemistry eResearch team, this work forms the core of graduate student Mark Borkum’s PhD thesis. ”

“Enabling open, semantic chemistry in this way is a monumental step forward for the domain,” notes Lee Dirks, director of Education & Scholarly Communication for Microsoft Research, “We’re thrilled to have played a role in facilitating the creation of this resource and extremely pleased to see Southampton and the RSC innovating and leading the field.”

Another oreChem participant, Carl Lagoze, the Associate Professor, Cornell University Information Science, Co-Director Open Archives Initiative added “it’s wonderful to see the results of our work on OAI-ORE in this exciting application. It fulfils our goal of making the results of research easier to disseminate and reuse”

Read the rest of this entry »