We recently published an update to the ChemSpider website which, in addition to fixing a number of bugs, has added some useful new features. Three of these features are highlighted in this post – one which you might have noticed already, and two which you may not have discovered yet.

Auto-Complete

We have reinstated the auto-complete feature on the ChemSpider homepage. Now, when you begin typing in the search box, ChemSpider makes suggestions based on what you have typed. This makes it easier than ever to find what you are looking for – even if you aren’t quite sure how to spell it.

Autocomplete on the ChemSpider homepage

 

Combined Structure/Property Searches

People frequently ask if there is a way to search substructure and other properties like molecular weight or molecular formula at the same time. This update now makes it possible to perform this kind of combined search from our improved Advanced Search page.

E.g. If you are interested in finding compounds which are structurally similar to Valium, you can enter a benzodiazepinone substructure and restrict it to compounds with a molecular weight of 275-325.

Substructure and Molecular Weight search
Substructure and Molecular Weight search


This search then returns Valium along with other similar drugs like clonazepam, nitrazepam and lorazepam.

There are many other search options that can be combined with a substructure/similarity search so look at the Advanced Search page and have a play.

Molecular Formula Range Searching

You can also search a range of molecular formulae at once. To specify the range for a given element, put the range in parentheses after the element. E.g. C7H(10-12)O(0-1) would return all compounds containing exactly 7 carbons and between 10 to 12 hydrogens and which may or may not contain an oxygen. This type of search can be performed from the Simple Search page, as part of an Advanced Search or from the ChemSpider homepage.

Best of all, this can be combined with any of the other search parameters on the Advanced Search page including the substructure search. For example, if you wanted to find polychlorinated biphenyls containing at least three Chlorines you could perform a substructure search for a biphenyl with a molecular formula of C12H(0-7)Cl(3-10).

Substructure and Molecular Weight search
Substructure and Molecular Weight search


In our next post, we will cover some new ways you can search by properties that are stored in our records such as melting point, density, etc.

I’ll be talking at the 6th Joint Sheffield Conference on Cheminformatics in July on Validation and Standardization of Molecular Structures in General and Sugars in Particular. This is a taster.

Sugars in Particular

One of the big problems with chemical structure algorithms is that they can’t, in general, cope with the ways that chemists are accustomed to drawing sugar molecules. They will lose the stereochemistry around the sugar ring, collapsing D-glucose, say, on to L-glucose, not to mention allose, altrose, gulose and all the others.

(ChemDraw, I should note, can interpret chair stereo properly, but it is very much an exception.)

The first step in determining correct stereochemistry for a chair atom is recognizing a chair hexagon. That is the subject of this post.

Read the rest of this entry »

We have previously described initial steps to integrate ChemSpider with ELNs with IDBS, and to define the elnItemManifest metadata model.

We have now also made further steps to integrate ChemSpider with Southampton University’s ELN, LabTrove, following on from an eScience tool that Stephen Wan from CSIRO had developed with the University of New South Wales to text mine LabTrove ELN blog posts to identify chemical names and link these to the relevant ChemSpider compounds. LabTrove is an open source blog-based system which can be used for recording and sharing experimental findings. Previously, if an image of the compound was to be added to an experiment blog post, it would be necessary either to upload it as an image (following drawing it in a separate drawing package) or to paste in a link to the image in another website (following a separate internet search in another browser window). We have now added the ability to click a button directly when adding or editing an experiment to launch a search of ChemSpider and when the required compound is found, an image of it can be added to chemspider simply by clicking on it, as can be seen in this demonstration video:

The editing controls in LabTrove are based on TinyMCE, a WYSIWYG editor which is used in a range of blogs, including WordPress. This means that this same ChemSpider plugin can also be used to insert compound images from ChemSpider from any other blog or website that uses a TinyMCE editor too.

If you have a LabTrove installation which you would like to add the ChemSpider plugin to then simply update your installation with the latest source code from LabTrove’s SourceForge website.

If you have a website or blog which uses a TinyMCE editor which you would like to add the ChemSpider plugin to then simply download this zip file, extract the folder in it and move the “chemspider” directory created to your tinymce plugins folder. Then, in your tinymce initialization process, add the plugin “chemspider” and the button “chemspider”.

ChemSpider SyntheticPages is one of those projects we support for which I have particular affection. For those who haven’t yet taken a look at it – please do so, it is a community resource made by chemists for chemists and is free to access – you don’t even need to register to look at the articles.

The original concept of SyntheticPages was brought to life by a group of academics who developed the original platform and format (and of course the members of the research community who embraced it and submitted articles). When ChemSpider became part of the RSC the concept of a community resource for reactions seemed like a complementary partner to the database of chemical compounds that we had established. With this in mind we were fortunate to collaborate with the hosts of the original SyntheticPages platform and, combining our resources and visions, we provided a new platform for submission. A short presentation about CSSP is online here.

CSSP today is quite well known within a small community of chemists but comments from the audiences that we expose the work to are very positive on the value of the platform and the way that we have developed it to date. Certainly the authors can get 10s of thousands of hits on their articles based on the published statistics! The “Leaderboards” are all available online for anyone to review.

We believe that everyone can see the value of building a directory of reliable, robust reactions that can continue to evolve through feedback and questions. But more that that, we see the potential benefits for:

  • Young scientists as a portfolio of their work that can enhance a resumé
  • Building systems that can contribute to Alternative Metrics  – Already people are developing platforms, such as Impact Story. CSSP presents the perfect opportunity to build such online contributions will become increasingly visible and important for a scientist in parallel, of course, with the present metrics for contribution and reputation.

We are presently working on a new system for “rewards and recognition” for contributors to our online databases and we will be rolling this out in more detail in the near future. It will be our way of recognizing the contributions of our users for their commitment to communicating science to the community using our platform as one of their vehicles to do so. As part of this activity we are also choosing to recognize present and future authors for their contribution of 5 or more SyntheticPages to CSSP. We will be contacting previous authors to ensure that they receive a brand spanking new, off the press, CSSP Lab coat to thank them for making their syntheses available!

Discussing the project to recognise and celebrate the top contributors to CSSP, Dr James Milne Managing Director RSC Publishing said the following:

“The ChemSpider SyntheticPages lab coats are a great idea, as they highlight a number of fantastic contributors, and also the role of CSSP within the broader publishing context. RSC Publishing strives to serve the needs of researchers worldwide, through publishing and disseminating high quality content, and this database of practical synthetic procedures certainly adds to this knowledge base.  I’d personally like to thank these contributors for supporting CSSP through their publications.”

If you haven’t already qualified for a CSSP lab coat by submitting 5 or more procedures; What’s stopping you? We look forwards to reading your submissions…….

 

Okay, I’ll admit it, that the title of this entry is not quite what Samuel Taylor Coleridge wrote in The Rime of the Ancient Mariner – but it does sum up this post pretty well.

Image taken from Wikipedia (http://en.wikipedia.org/wiki/File:Plughole.JPG#file)

Water is one of those chemicals that we tend to take for granted until it reminds us; usually because we have too much or too little of it. In one way or another, water seems to have insistently nagging me this year. In the Spring in the UK there were talks of water restrictions and droughts, while now the many places are flooded, and only a few weeks ago in the US, Hurricane Sandy proved that water could be as formidable a force as the winds.

Don’t forget water is a chemical!

Water has a huge impact on the chemical sciences – after all it is one of the most common chemicals in the world. And as such, Water features in many of the activities of the RSC, to list just a few recent examples….

Well, what about this webinar?

When I was still a bench chemist I have to admit that I only thought of water as something used in extractions, or to be excluded from reactions (and occasionally in tackling the mountain of dirty glassware that I’d accumulated). But looking at the title of the latest Chemistry World Webinar – it looks like there are still many aspects of water that I have to learn about. The webinar is free, if the details below pique your interest; you only need to follow the link and sign up to watch the live Webinar. If you can’t watch at that time or are reading this post after the Webinar has taken place – don’t worry you can access the archive of all of the Chemistry World Webinars at: http://chemistryworld.gav.co.uk/webcasts/past-events.php.

The importance of water quality in the laboratory

4 December 2012, 13:00 – 14:00 (GMT)
Free webinar

Speaker: Dr Estelle Riché – Senior Scientist, Merck Millipore

How are water contaminants affecting your lab results?

Join us for our next live and interactive Chemistry World webinar to learn why and how water is purified to yield the various water qualities used in the laboratory.

By the end of this free one-hour knowledge-share, you will be able to:
• identify the different contaminants potentially present in laboratory water
• understand the potential impact of these contaminants on laboratory applications such as HPLC, LC-MS, etc.
• understand how various water purification technologies remove these contaminants from laboratory water
• make better choices for the water you use in your laboratory work
Click here to find out more and register for free
This webinar is brought to you by Chemistry World in partnership with Merck Millipore.

 

(This is not a post about carbohydrates, despite the title!)

Dodgy stereochemistry is a persistent problem.  Even if someone knows all of the stereocentres in a particular molecule, they might not necessarily draw them in a way that a machine, or even a person, can interpret.  There are rules about whether the pointy end or the blunt end of a bond indicates the stereocentre, and it’s surprising how often you see them done wrongly.

Today I’m going to talk about a particular IUPAC recommendation for drawing stereocentres that might at first glance seem surprising, the rule that you may only have one stereobond at a given stereocentre. If you have a wedged bond attached to an atom, you can’t have a hashed bond attached to the same atom. And vice versa.

Why is this?

You might think that as you’re supplying more information, you’re making the diagram easier to interpret. However, you’re running directly counter to the normal principles of communication.  You’re being more informative than required, and this sets off alarm bells in the reader.  What are you trying to say?  If you ask a passerby the time and they say “Well, it’s half past six Greenwich Mean Time” you’re entitled to wonder why they’re quoting the timezone. Maybe they’re trying to be funny.

Paul Grice thought about this whole problem in the 1970s and came up with a set of four principles, summarized in maxims, that listeners (or readers) assume that speakers are following.  These are they:

  • Be Truthful. Do not say what you believe to be false. Do not say that for which you lack adequate evidence.

Let us hope that this one is implicit in any chemical drawing!

  • Make your contribution as informative as is required.  Do not make your contribution more informative than required.

If you have two methyl groups coming off an atom, do not make one wedgy and one hashy. You are adding no new information!

Do not mark carbons with the letter C unless your target audience is schoolchildren.

  • Be relevant:

On the grand scale: do not illustrate an article with any old molecule—make sure the molecule mentioned is actually relevant.

On the scale of the drawing itself, however: If you have three bonds about an ordinary p-block atom, for example, make sure they’re at 120 degrees to each other.  If they aren’t, for example if two of them are at right angles, the reader will infer that something odd is going on.

  • Be clear:

Make sure all your double bonds actually look like double bonds rather than a single bond parallel to another single bond.  I suspect a lot of the success of ChemDraw is down to the fact that it produces attractive, clear chemical drawings.

Do people ever flout the maxims on purpose?

Oh yes.  People often flout the maxims when trying to be funny, or in a political interview.  Similarly there are all kinds of Gricean violations in the chemical drawings you see in patents: bonds which do not quite extend all the way to atoms, R groups labelled as Y (particularly dangerous as Y is yttrium!) or Q or W (also tungsten) or some other unusual letter and so forth.  Exactly why this happens so much more often in patents than in journal articles is left as an exercise for the reader.

Do you know about Natural Product Updates?

Natural Product Updates (NPU) gives you the molecules involved in key developments in natural product chemistry. Thanks to our work in interpreting what chemists mean, not just what chemists draw, ChemSpider now links to NPU’s data for 13800 natural products since 2005, of which 7800 are brand new to ChemSpider.

Where ChemSpider has information on a compound in NPU you will see the image above, as in, for example, calothrixin B. This is a link to NPU page on the RSC Publishing Platform.

Soon we’ll be integrating more of our graphical databases into ChemSpider. Watch this space!

For those of you who were interested by our previous blog post ‘Publish to ChemSpider’ ELN plugin generates elnItemManifest, and are at ACS Fall 2012 in Philadelphia, more details about this project will be described by Dr Simon Coles (Southampton University) in his oral presentation “Towards publishing semantic descriptions of Electronic Laboratory Notebook records” (paper ID: 17061 and final paper number: 90) in the “CINF: Division of Chemical Information division”, and “Herman Skolnik Award Symposium” session, on August 21, 2012 from 10:50 am to 11:05 am at Philadelphia Marriott Downtown, Room: 302/303.

If you are at ACS Fall 2012 don’t forget the other ChemSpider at ACS Fall 2012 in Philadelphia events.

You might not think so, but you’re very good at taking a two-dimensional drawing and converting it into a three-dimensional shape in your head. No, really, you are.

Fig. 1. Galactose in perspective.

Take the drawing of galatose in Fig. 1. Even if you’re not a chemist, you can tell which bits of the ring are at the front and at the back, which bonds point up and which bonds point down. If you actually are a chemist, you’ve been trained to apply this geometrical intuition to work out what’s going on at each of the five stereocentres.

However, if you ask the InChI algorithm about the stereochemistry of this molecule, it’ll say that there is no stereochemistry in there and you’re looking at a stereoless description of which atom is attached to which. Since we use the InChI algorithm to say whether two records describe the same molecule, this puts us in a quandary, and there are thousands of entries in ChemSpider that come from just such a drawing and hence lack stereochemistry.

Read the rest of this entry »

As part of the Royal Society of Chemistry, the ChemSpider team likes to get involved with all of the other projects that are going on within the RSC, and we were really excited to be asked to provide our expertise to the SpectraSchool resource. This HE STEM funded program provides a range of resources to help in the understanding of the principles and practice of spectroscopy and spectroscopic methods.
SpectraSchool brings together Spectroscopy resources, an Introduction to Spectroscopy*, Interactive Spectra and the Spectroscopy in a Suitcase scheme which affords school children the chance to use modern spectroscopic equipment in their classroom.

The SpectraSchool resource was originally developed with the University of Leicester who collected and assigned many of the spectra that displayed within the site. Now that SpectraSchool is part of Learn Chemistry we have helped to integrate new features, including a new HTML 5 based spectrum viewer that provides interactive display of spectra. The fact that this is based on HTML 5 means that the spectrum can be viewed on just about any device that has a modern browser (eg, computers, tablets, phones or even touch screen tvs).

A student visiting the site has the ability to zoom in on peaks and to see which features of a chemical structure give rise to a particular peak in a spectrum; by selecting either a peak or a particular part of the structure (see the highlighting of the methyl group in the structure of caffeine below and the corresponding peak in the adjacent 1H NMR spectrum).

SpectraSchool and Chemistry in the Olympics are great examples of the RSC’s new microsites which bring together lots of great resources and tools in a fresh and exciting interface.

Take a look at SpectraSchool and LearnChemistry today we welome feedback through the in page feedback links or connect with us and other chemistry educators in the Talk Chemistry forums. Why not start exploring this great (free) educational resource today?

 

 

* The Introduction to Spectroscopy was developed in collaboration with the University of Cardiff

Once again the RSC will be attending the American Chemical Society’s Fall meeting which will be held in Philadelphia, Pennsylvania, August 19-23, 2012, where the RSC stand will be located at booth 701.

Several members of the ChemSpider team will be attending the conference; both to give presentations and also to chat/answer questions on the booth. If you are attending the conference please drop by and say Hello and ask any questions that you have (you might even be able to get a free coffee – available on a first-come first served basis from 11 am on both Monday and Tuesday). We will also be running an exciting ChemSpider competition to coincide with the conference. You can get details from our Booth #701, or by checking out the ChemSpider blog.

There will be two key ChemSpider events in Philadelphia:
A special On-Stand demo – Monday 20 August, 11 am, Booth 701
“ChemSpider and You: A workshop exploring how ChemSpider can help you find chemical information” – A 2 h workshop for both newcomers to ChemSpider and experienced searchers alike. 10am-12pm Tuesday 21 August, Exhibit Halls A-B, Workshop Room 2 (You can register for the workshop via the conference website – we will try and accommodate anyone who just turns up on the day.)

In addition members of the ChemSpider team are giving a number of talks, including some early glimpses of exciting new tools that we are working on. The presentations are listed below – for more details including the abstracts for each of the talks see the Technical program.

‘Mining public domain data as a basis for drug repurposing’, Philadelphia Marriott Downtown, Room 302/303, Sunday 19th August, 4.15PM – 4.40PM

‘Putting chemistry into the hands of students – chemistry made mobile using resources from the Royal Society of Chemistry’, Pennsylvania Convention Center, Room 109B, Sunday 19th August, 10.50AM – 11.10AM

‘Feeding and consuming data to support Open Notebook Science via the ChemSpider platform’, Philadelphia Marriott Downtown, Conference Room 307, Monday 20th August, 2.05PM – 2.30PM

‘Approaches for extraction and “digital chromatography” of chemical data – a perspective from the RSC’, Hilton Garden Inn Philadelphia, Salon D, Monday 20th August, 2.30PM – 2.55PM

‘Delivering an online service for validating and standardizing chemical structure files using the ChemSpider platform’, Philadelphia Marriott Downtown, Franklin Hall 6, Tuesday 21st August, 9.15AM – 9.35AM

‘ChemSpider compound database as one of the pillars of a semantic web for chemistry’, Philadelphia Marriott Downtown, Grand Ballroom Salon H, Tuesday 21st August, 4.55PM – 5.10PM

‘How can the International Chemical Identifier (InChI) be extended to non-trivial chemicals?’, Philadelphia Marriott Downtown, Franklin Hall 6, Thursday 23rd August, 9:35AM – 9.55AM

‘Serving up and consuming community content for chemists using wikis’, Philadelphia Marriott Downtown, Franklin Hall 6, Thursday 23rd August, 9.55AM – 10.15AM

 

We look forwards to seeing you at the conference!

ChemSpider has become one of the worlds primary online resources for finding data, information, links, images, spectra..and on and on…about “chemicals”. Building a database of over 28 million chemicals that grows in some way in content, functionality and richness on a daily basis is, to say the least, a lot of work. But our cheminformatics team here at the RSC is not scared of work. We like it! So when we decided that it was time to enhance our efforts around the management of chemical reactions to move from ChemSpider SyntheticPages to a database of chemical reactions containing 10s if not 100s of thousands of reactions the question was how. What software platform would we use? Where would we source reactions? What functionality would we need to roll out as an early display of capability to entice users to test it out, give feedback and, ultimately, get involved. We made those decisions and we will be showing off the results of our project “ChemSpider Reactions Database” (yes, we’re very creative with our project titles aren’t we!!!) at the Fall ACS in Philadelphia.

If you want to learn what we are up to in regards to chemical reactions come and visit with us at the ACS booth…we’ll show an early view of over a quarter of a million reactions in an online, free to access database. We’ll chat about some of our future plans and hopefully engage you in a discussion about whether or not you would be willing to contribute reactions to the database. Wouldn’t it be good if we can provide to synthetic chemists a platform for accessing and managing reactions as we have done for chemicals. Of course, seamlessly integrated and platform independent…served up by the latest web technologies and mobile-enabled. What the future could look like… exciting times!

ChemGoggles? What on earth is ChemGoggles? Is this a pair of safety specs for chemists? No…what would be the fun, and the cheminformatics (!!), in that? ChemGoggles will be shown at the ACS meeting in Philadelphia in a couple of weeks and will be a very early display of our venture into the development of an Android app for “photographing” an image of a chemical and searching the ChemSpider database. It will be a matter of finding an image of a chemical (paper, publication etc), taking a photo using an Android device, using structure recognition software to convert the image to a chemical and then searching ChemSpider. It will be imperfect, an early version, but nevertheless a tantalizing display of some of the new directions we are presently taking at the Cheminformatics group here at RSC.

Chemistry is complex. Anybody who has been involved with the creation of electronic datafiles containing thousands of chemical compounds and associated data (chemical names, properties etc) will tell you that errors creep in. ChemSpider has >28 million unique chemical entities and these have been sourced from many different places/groups/individuals. Some of these have been deprecated as we have determined, both manually and algorithmically, that the data are in error. Over the years we have learned a lot about data quality and ways in which algorithms can be applied to data prior to deposition on ChemSpider.

Some obvious structure-based errors that can be checked for would include: hypervalency (e.g. pentavalent carbons), charge imbalance (a compound has no neutralizing counterion for example), absence of stereochemistry (e.g. a compound with 12 possible stereocenters only has one assigned). There are many other such errors that can be detected algorithmically. It’s the old adage of why apply a human to what a computer can fix. With this in mind we have been working on a system called the ChemSpider Validation and Standardization Platform (CVSP for short). This system will serve multiple purposes. It will be one of the foundation blocks for checking structure-based data for our publications (i.e. catch bad chemistry before it is published!), it will be used for validating chemistry for our databases (Natural Product Updates, Methods in Organic Synthesis and Catalysts and Catalyzed Reactions), it will be used to check and validate depositions going into ChemSpider, it will serve data related to the Open PHACTS project  and it will serve the community by providing an online website where you can upload your own SDF files (and other file formats in future) to validate the structures.

I won’t go into detail here about all of the functionality and capability of the system as we will discuss this in further detail on this blog. However, we will be unveiling the system in its present form at the ACS meeting in Philadelphia. Come along and meet some of the team involved in building CVSP and give us your feedback!

In December 2011 we posted about the ChemSpider plugin for IDBS’s Electronic Lab Notebook (ELN) which described a proof of concept plugin which allows chemical structures which are part of an ELN experiment to be published to ChemSpider. The plugin sent a single sdf file per deposition which contains the chemical structures (in mol format) and very basic metadata information about where it comes from (author, principal investigator, ELN experiment ID) in the associated data fields. A mapping file was set up in the ChemSpider deposition system to process associated data field names in the deposited sdf files from the ELN data source and map them onto internal ChemSpider field names. We would like to extend this initial proof of concept to integrate ChemSpider with more ELNs, to store more advanced metadata with each deposition and to be able to publish more types of ELN data e.g. spectra, reactions and properties. A major step towards this goal would be if the metadata were separated from the data file, were defined by a fixed schema and contained more extensive information (e.g. what is in the accompanying ELN data item, what is its source, and what are its access rights). If it were agreed as a standard ELN vendors and developers could build the ability to generate this metadata into their API’s, to be used either when sharing data to a repository e.g. ChemSpider, or also to exchange data from one ELN to another. We at ChemSpider would develop a deposition webservice to process metadata in this format (and accept depositions from any ELN which generated it). This would make the task of publishing spectra, reactions, chemical properties and other file types from a range of ELNs to ChemSpider much more manageable.

A working group met up on 9th December 2011 to work towards the aim of defining a metadata model to answer the question “What comprises an ELN record or an item in it”. The group was headed up by Dr Simon Coles from the University of Southampton, and comprised representatives from universities, ELN vendors, pharmaceutical companies, and RSC ChemSpider and was a smaller subset of the previous EPSRC Dial-a-Molecule “The Smart Laboratory: Towards a national ELN” meeting. We came up with a top level format for the exchange which describes what’s in the record, how do you get it, who contributed to it and access rights in xml format. Since then Simon and Colin Bird have formalised this format into an xml schema, the details of which will be published shortly in a journal article (in preparation).

Before committing to the development effort that would be required by the ELN vendors and ChemSpider to work towards this ultimate aim, it is necessary to finalise the definition of this schema and verify that it works with an example. As a first step towards this, the ‘Publish to ChemSpider’ IDBS plugin has been modified to generate the metadata that would accompany the mol files of structures in a separate file obeying this schema. In a future phase of work the metadata xml and ELN item would be sent to a ChemSpider webservice to be processed for publishing there. The video and screencaptures below show version 2 of the plugin generating this metadata in action:

And the generated result is as below:
Generated example elnItemManifest metadata.

While every effort was made to populate fields from generic information stored in the ELN system so that this plugin would work with any IDBS installation (not just that of the Chemistry department of the University of Cambridge who kindly allowed the plugin to be developed against their system), this was not possible for all fields since they are not readily available from extension points of the E-WorkBook software – which will need to be addressed if IDBS do develop an API to generate the elnItemManifest. For example, the names and email addresses of the author and principal investigator of the ELN experiment are defined in a configuration file whose settings can be edited via an interface in the ELN software. The license to release the data under, and an embargo period to wait before the data is released publicly are populated by user inputs which are requested when the user chooses to generate the file. The keywords, description and start date of the experiment are populated by customised ELN experiment fields which have been set up only in Cambridge University’s installation of E-WorkBook.

If you have access to a working version of IDBS’s E-WorkBook and would like to install the plugin to work with it please write to ChemSpiderDev@rsc.org and we will be happy to supply it to you.

Again, thanks to IDBS and the Department of Chemistry, University of Cambridge for allowing us to continue development of the ChemSpider plugin against their software and ELN installation respectively.

The eagle-eyed amongst you may have noticed that there was an update to ChemSpider just over a week ago. Many of the changes that were performed on the site were aimed at upgrading the underlying architecture of the site and ensuring that the performance of the ChemSpider site is constantly improving as the number of users of our site and services grows.

Here are a few of the changes to the site that are more visible:

  1. Clearer deprecation of records
  2. Citation details
  3. Visibility of average mass
  4. Layout of the structure search page
  5. Improvements to search messaging
  6. Clearer layout of the Experimental Properties section
  7. Support for foreign language help

So to pick out a few of the key items from the above list….

 

Clearer deprecation of records

ChemSpider is designed so that by default, deprecated records are not presented in your search results – this ensures that you don’t have to wade through data for records that are clearly wrong or lack any useful data. But, of course there may be occasions where you happen across a deprecated record. In the past, it wasn’t always easy to immediately see that a record had been deprecated and understand the reason that it had been deprecated. In the new design the notification message is far more prominent and we also make it easy to see the reason why the record was deprecated (this is new requirement in the deprecation process and so for older deprecations this field may be blank).

 

Citation details

We commonly get requests from individuals asking about including data from a ChemSpider record in a presentation or thesis. As outlined in our FAQ page, where individuals reuse data we ask that they cite ChemSpider. And so to make this process simpler we have created an output that contains the basic information that users may need to include in a citation, and we have provided a button that makes it really easy to copy the data to your clipboard in one click.

Looking at the above image you can also see that the Average mass (which was accidentally hidden for a while) has now been made visible the record again.

Layout of the structure search page

One of the most noticeable changes has been the rearrangement of the Structure search interface. While the actual functionality remains the same, the options have been presented in a way that (hopefully) makes it much easier to see all of the options that are available to you when you perform a structure search. This is the 1st phase of our work on this interface, so please let us know what you think about the changes so far.

 

Clearer layout of the Experimental Properties section

Another significant change that we have made is to the presentation data in the Experimental properties infobox. The data is presented in a tidier layout, and while we have always had the ability to provide links to the original datasource, this was not particularly obvious to some users. In this new design we explicitly display the name of the datasource that provided the data, and wherever possible the name will act as a link back to the relevant page/entry in that datasource.

We hope that you find all of these new features useful, and as always we welcome your feedback on these and any other aspects of the site.

For some time now it has been possible to access relevant SureChem patent information from a ChemSpider compound page in the Patents Infobox. ChemSpider compounds are also linked to and from the relevant RSC articles, which has allowed us to form a new partnership between RSC Publishing and SureChem which relies on ChemSpider taking the pivotal role of linking internet chemistry together.

In the RSC article landing pages there is a “Compounds” tab which shows the key compounds that the article is about – as shown in this example. For each compound there is now a link to view the SureChem patent information associated with that compound as below:

The RSC Publishing platform article landing page showing SureChem patent information

The RSC Publishing platform article landing page showing SureChem patent information

SureChem and SureChem’s new free offering, SureChemOpen, offer a suite of patent chemistry data solutions, for example allowing their patents to be found from a structure or substructure search. Now, for each compound returned from such a search it is possible to view any linked ChemSpider compound pages and the number of associated RSC publications (and follow a link to view these articles).

This linking between SureChem and the RSC publication platform relies on ChemSpider (and the standard InChI chemical identifier) providing a bridging link to both, which ensures that the system is accessible, standards-based and scalable, making it easy for future partners to join.

A lot happens in a a few weeks and this past couple of months has been no different. There have been numerous developments for ChemSpider and its related projects including working on the GUI, adding in new data and a lot of infrastructure work on the core of the ChemSPider platform.

We have the ACS meeting in San Diego just around the corner and are presently working hard this week to publish our most recent update to the live servers. For those of you going to San Diego do come and visit us at the RSC booth and we will give you a demo of our most recent project that we have been working on…I’m not going to announce it before the ACS but I encourage any attendees to stop by and hear what we’re up to!

There will be a number of presentations at the meeting and the details are all listed in our online Newsletter.

Alex Tropsha (UNC-Chapel Hill) and I (Antony Williams) will be hosting an InChI Symposium at the meeting so please come along and hear how people are using InChI and some of the directions for the future!

See you in San Diego hopefully!

As the ChemSpider content and data mappings have continued to expand, the demands on our web services have increased dramatically. With the popularity of the site continuing to increase we anticipate even heavier usage of our web services. This is true for our involvement with the Open PHACTS project as well as from a number of software packages served up by analytical instrument vendors, especially in the mass spectrometry domain. Because of the increasing load on our systems, we have taken steps to prevent us from outgrowing our existing infrastructure and have implemented a new scalable, future-proof web services offering that your applications can rely upon.

Continual availability and business continuity for subscribers and academics

We have reinvented our web service infrastructure using Microsoft SQL Server replication technology in order to maintain multiple copies of the ChemSpider database. As a result all system resources are dedicated purely to web services with no background tasks running to affect the performance. Also, the databases are read-only which results in database lock contention being completely eliminated.

A standalone and scalable web service establishment for faster response times

The ChemSpider servers run on the VMWare virtualization platform which allows us to scale out the hardware by assigning more resources as required. In the future we can easily provide a consistently high-performance service even as usage further increases.

Over 1/4 million calls in the first 18 hours

Although ChemSpider web services are fast becoming a priority for us, we are still dedicated to ensuring the website experience is optimal. The changes we have implemented will reduce traffic to the website so you should already have noticed improvements in website performance and reliability.

Some examples of implementations of ChemSpider web service usage can be found here.

Access to the ChemSpider API is free to academic users; for commercial use please contact us at chemspider-at-rsc.org.

James Jack from Accelrys has developed a great example of using ChemSpider web services to add ChemSpider search functionality with the structure drawing tool Accelrys Draw.

It is now possible, with a new add-in to perform advanced searches on ChemSpider with the Accelrys Draw program itself, searching by text, structure searches (exact, similarity and substructure), elements (those present and those absent), intrinsic and predicted properties, and LASSO activities. All of the ChemSpider information about the compounds returned in the search can be viewed and their structure(s) loaded back into the main Accelrys Draw window for further editing.

If you’re interested in finding out more about this add-in or obtaining it then see James’ blog post about the add-in. He has also posted a video demonstrating its use:

Technical details for developers

James has modularised his code so as to separate out a .Net Client API to the ChemSpider Search web service that can be used from *any* .Net application without the need for additional assemblies (other than standard .Net) and requires minimal code. This makes it easy to add the same ChemSpider search functionality to other Accelrys products (e.g. Symyx Notebook).

In addition, he has released this ChemSpiderSearchClient code so that it is available to other ChemSpider users who would like to integrate ChemSpider web services with their code in similar ways.

The “ChemSpiderClient” solution should be opened with Visual Studio. It contains two projects – “ChemSpiderClient” is the main library project (which contains the ChemSpider API code) and “ChemSpider ClientTest(No Draw)” is a simple interface to run the library code (set this as the start up project to debug the project). “ChemSpiderClient.cs” in “ChemSpiderClient” is the main code file that calls the ChemSpider webservices. Best practice for performing ChemSpider searches is observed – first launching a search to retrieve a transaction ID for the search, intermittently searching for the status of the search using the GetAsyncSearchStatus operation of Search.asmx and when the status of the search is “ResultReady” and then retrieving the resulting ChemSpider Ids. If the reference to Symyx.CustomUIControls from the ChemSpider Client is missing then add a reference to Symyx.CustomUIControls.dll in the top-level folder of the zip file.

Please note that a token is needed to access the ChemSpider webservices and by default the code is supplied without one specified, so that you need to input your own token value – the app.config file of “ChemSpider ClientTest(No Draw)” should be edited to enter a valid token that will be used by default. If this isn’t done, the user will need to supply a token when running the search via a pop-up box. To obtain a token, please complete the registration process – when you are registered the Security Token is listed on your Profile page.

We will soon be depositing data from the SORD databases (Selected Organic Reactions Database) onto ChemSpider. This will be done as two separate but related datasets until the SORD data source: Reactants and Products. If you don’t know what SORD is then who better to explain than Dick Wife, the “host” of the SORD database. Dick wrote the overview article below to provide an overview about what SORD is…ENJOY!

The Selected Organic Reactions (SOR) Database: capturing “Lost Chemistry”

Dick Wife, SORD B.V. The Netherlands (www.sord.nl; dick.wife@sord.nl)

A new database is capturing the 80% of Lost Chemistry from theses and dissertations which doesn’t make it into publications and chemists who contribute their data get access to the entire database for free.

SORD, an independent Dutch company, is carefully selecting the synthetic chemistry focused on Life Science research and making this chemistry available in their Selected Organic Reactions (SOR) Database. For the theses/dissertations which they select, SORD excerpts all of the reactions in the Experimental section are excerpted. This means there will still be a small overlap of data with full publications. There will also be a larger overlap with publications such as Notes, Letters or Communications but these do not contain the experimental details. The SOR Database brings all this chemistry to the desktop, every last detail written by the author.

Some time back, SORD looked at around 300k interesting drug-like compounds in the literature and which countries they had come from, and the native language. The English-speaking countries accounted for only 37% of the total. German/Swiss dissertations are often written in English but this is new. The theses and dissertations in the other languages represent more than half of the total. SORD routinely translates German and French experimental texts into English. They are about to start on Chinese and Japanese translations and, if anyone can give them access to Russian theses, they will translate these as well!

A thesis or dissertation is the result of several years of hard work by a research student under the constant supervision of the research leader whose reputation is at stake if the work described is wrong or inaccurate. It is also examined by a committee who decide on awarding the degree, or not. They scrutinize closely the Results & Discussion as well as the Experimental sections. The chemistry is reliable.

Advanced Chemistry Development, Inc (ACD/Labs) is partnering SORD in developing this Database. The SOR Database is available for in-house use with ChemFolder Enterprise or on the Internet with ACD/Web Librarian™. This is a screen-shot of a typical SOR Database record in Web Librarian.

 

 

 

 

 

 

 

 

 

 

 

 

 

The Reaction Scheme shows every atom (there are no abbreviations). The Experimental  text is edited to ASCII format and the key parameters (Reagent(s), Solvent(s), yield(s), MP(s) and Optical Rotation(s) are displayed in separate Fields, as are the full bibliographic data, making data-mining possible. There is also a link which enables the user to bring up the PDF of each reaction containing all of the spectral and other physical data which SORD does not excerpt. The PDF-EX link is a powerful and unique feature of the SOR Database.

Now some explanation about SORD’s excerption rules. What they call the Reaction Scheme (A + B à C, etc.) contains only the reacting and product compound structures. A Reagent is an essential reaction component of which no part ends up in the product – if it does, it becomes a Reactant! When several reactions are performed before the product is isolated (and characterized) the Reagents and Solvents are listed in Steps. Failed reactions are not excerpted but reactions with poor yields are.

The SOR Database currently contains 170k reactions; the target is one million at the end of 2013. Even this number is a lot smaller than what you find today in the major commercial reaction databases. Back in the nineties, SORD researchers looked at one such large commercial database which then contained 9 million compounds. Sifting through the content for drug-like compounds resulted in just 450k or 5% of the records[1]. Size is one database metric; quality is much more important! In the SOR Database, you will only find characterized products – and no polymers, or compounds with no molecular structure.

Users of the SOR Database also have access to the separate databases which contain the Reagents (ca. 3,000) and Solvents (ca. 450) which have been encountered so far. Often a Reagent is a catalyst (organic/organometallic) but they can also be simple entities like bases, acids, ammonium salts, etc. or complex chiral ligands. Authors give Reagents many different names and so each Reagent (and Solvent) in the SOR Database has been assigned a unique name. This enables rapid searches using the assigned names, again a novel feature of the database. Such searches can bring you to really nice chemistry.

As an Example, the second generation Grubbs olefin metathesis catalyst has been given the name Grubbs 2 catalyst. In the current SOR Database, there are more than 500 reactions where it has been used. Some of these are straightforward; some are not and generate novel ring systems like this one from the Martin group at North Carolina at Chapel Hill:

Searches in the Reactions Scheme, or using Reagent/Solvent names and hit refinement brings you to new chemistry which until now was only found on a dusty shelf in a library. The “Lost Chemistry” is now getting smaller as SORD carefully selects and excerpts the reactions which deserve a new life. The SOR Database is essential for novelty searches and it is a powerful supplement for the other commercial reaction databases.

Finally some more good news for academic research chemists; your data will be readily accessible to the whole chemical world who will cite your work in their publications. The chemistry which you never published may be just what others are looking for. Routinely SORD excerpts the complete collection of theses and dissertations from research supervisors; they will be more than happy to see your work appear in the next SOR Database!


[1] de Laet, A.; Hehenkamp, J. J.; Wife, R. L. Finding Drug Candidates in Lost/Emerging Chemistry. J. Heterocycl. Chem. 2000, 37, 669–674.

The RSC’s objective is to advance the chemical sciences, not only at a research level but also to provide tools to train the next generation of chemists. ChemSpider contains a lot of useful information for students learning Chemistry but there is also a lot of information which is not relevant to their studies which might be confusing and distracting. For some time we have been considering the concept of an educational version of ChemSpider, aimed at students (and their teachers or lecturers) in their last years of school, and first years of university (ages 16-19), which restricts the compounds and the properties, spectra and links displayed for each, to those relevant to their studies. As a result, we are pleased to announce the launch of the Learn Chemistry Wiki which not only fulfils this aim, but also takes it further. This project was developed in a collaboration between Dr Martin Walker at the State University of New York at Potsdam, ChemSpider and the Royal Society of Chemistry’s Education team.
The Learn Chemistry Wiki contains over 2000 “substance” pages which correspond to simple compounds that would commonly be encountered during the last years of school and first years of University. Each of these pages corresponds to a ChemSpider compound, from which it dynamically retrieves compound images, a summary of its properties(molecular formula, mass, IUPAC name, appearance, melting and boiling points, solubility, etc.) and links to view safety sheets and spectra. It also contains text from Wikipedia to display in the substance page based on the Wikipedia links in ChemSpider.

The Learn Chemistry Wiki also goes a step further and not only contains compound information in isolation but also contains laboratory experiments (with parallel sections which contain an overview, teachers’ notes and students’ handouts) for each, quizzes, and tutorials which are linked to the compound information to put them into context. The wiki is based on the MediaWiki platform (which allows multiple users to contribute collaboratively since the website is intended to be a community website), but extends it to incorporate functionality similar to that of ChemSpider, invoked via custom-made extensions. For example, it is possible to draw structures using GGA’s Ketcher in order to find structures, or to draw answers to quiz questions (for example to specify the product of a particular reaction). It is also possible to include an interactive spectrum retrieved from ChemSpider in any wiki page, using the ChemDoodle spectrum viewing widget in browsers which support canvases or JSpecView applet in those that don’t.

For an overview and demonstration of the Learn Chemistry Wiki site see the Learn Chemistry Wiki site tour webppage or the Learn Chemistry Wiki overview demo video:

The Learn Chemistry Wiki is part of the new RSC’s new Learn Chemistry platform which provides a central access point and search facility to make it easier to access the various different RSC teaching resources that it provides.

KNIME is an open-source data integration, processing, analysis, and exploration platform which can be used to create workflows to analyse data.

We have experimented with adding a node to a project which would call the ChemSpider webservices to perform a simple search on it and the instructions below outline how to reproduce our experimentation. This was done with KNIME 2.5.0, with the KNIME extension “Generic Webservice Client” installed.

  1. From the Node Repository find the “Generic Webservice Client” under the “Misc” folder and drag it into the Knime project to add a new node
  2. Right-click on this “Generic Webservice Client” and click on the “Configure…” option
  3. The WSDL for each ChemSpider webservice can be found using the link from the page for the appropriate webservice. For example, the WSDL for the Search webservice is at http://www.chemspider.com/Search.asmx. However, if you enter this as the WSDL location you’ll get an error when you click the “Analyze” button (due to a SOAP exception “undefined simple or complext type ‘soapenc:Array’. This is something that we’re looking into addressing in ChemSpider, but for now a workaround is to copy the WSDL, replace the old fashioned soapenc:Array type with tns:ArrayOfString, and save and use this ammended WSDL locally. I have done this with the Search webservice and the resulting WSDL is available for download here. This file should be downloaded, adn extracted somwhere locally. It can then be entered in the “WSDL Location” field of the Generic Webservice client in KNIME (using a location of the form: file:/C:/temp/ChemSpiderSearchWSDL_no_soapencArray.WSDL) which will then be processed correctly on clicking the “Analyze” button
  4. Set the Port, operation, inputs and outputs as required – see screencapture below for settings for my demonstration. Note that you should use your own token as the value for the token input – if you don’t have one already then see the instructions here for instructions.
  5. Add input and output nodes which connect to and from this Generic WebService Client node as required. For example, you could add a FileReader node as the initial node, which reads in the contents of a text file that simply contains a search term as an input (and adapt the Input value accepted as the query input value of the SimpleSearch to map to this column, rather than hardcoding in a value to search for). And the output csid could be written to a csv file using a CSV Writer node.
  6. On executing the workflow, an output csv file is created which contains the ChemSpider ID(s) of any compounds that match the search term. In the case of a search for “benzene” the csid retrieved is 236.

The functionality of electronic lab notebooks (ELNs) and that of ChemSpider overlap to a certain extent – both store chemical information including structures, data, spectra and reactions. However, the focus of most ELNs is to manage, track and audit that data, and that of ChemSpider is to publish and disseminate it to the world. We have been considering how best to use this complementary functionality to integrate an ELN with ChemSpider.

Some ELNs already currently look up information and link to ChemSpider. For example the blog3 Web-logging (“blogging”) engine by Jeremy Frey, Simon Coles and Mark Borkum at Southampton University, which allows links to compounds from the ChemSpider database to be embedded directly into the content of a post. When a link to ChemSpider is detected, blog3 follows the link to retrieve additional information that is relevant to the compound, including: experimental and theoretical data; two- and three- dimensional depictions; and links to papers and journal articles. Another example is the eScience tool that Stephen Wan from CSIRO has developed with the University of New South Wales to text mine LabTrove ELN blog posts to identify chemical names and link these to the relevant ChemSpider compounds.

At the meeting “The Smart Laboratory: Towards a national ELN” meeting (organised as part of the Dial-a-Molecule EPSRC Grand Challenge) in August this year, the seeds were sown to take the integration between ELNs and ChemSpider a step further. Cambridge University has the first Chemistry department in the UK to roll out a department-wide Electronic Lab notebook system, and the software that they’re using is IDBS’s E-WorkBook Suite. In collaboration with IDBS and Cambridge’s Chemistry department, we at ChemSpider have made a plug-in which could both dynamically retrieve information from ChemSpider into their ELN, and publish to it the other way. The Chemistry department at Cambridge (Dr Tim Dickens, Dr Brian Brooks, Prof Bobby Glenn and Prof Steven Ley) have been very helpful in granting access to their ELN to write the plug-in, and will be its first users, but the results will be freely available for any existing IDBS E-WorkBook suite user.

About the extension Prof Bobby Glenn has said: “Much of Chemistry is lost, it is simply not published and languishes in forgotten lab notebooks. Capturing novel molecules soon after synthesis on a searchable database like Chemspider is now an effortless process directly from the ELN, which will greatly encourage sharing of compounds, synthetic methods and all the associated data. It’s instant messaging for chemists”. Antony Williams (Vice-President of Strategic Development of ChemSpider) added “The ability to now publish compound data from the IDBS ELN directly to ChemSpider offers a path to direct exposure of novel chemistry as well as the chemist doing the work. This public compound registration capability is the first move towards ultimately exposing synthetic methods and associated experimental data to the community. Our vision is coming to fruition through this collaboration.”

To view the plug-in in action please view the demonstration movie of ChemSpider E-WorkBook Suite Plugin.

Screen capture of launching Publish to ChemSpider plug-in

Compounds can be published to ChemSpider if they have been drawn out in full in an experiment – whether this is as an individual structure or part of a reaction, and whether they are simply uploaded into the experiment as a reaction file, or included in for example a spreadsheet item. Likewise, compound structures can be automatically loaded into a search of ChemSpider if you would like to find out more information about compounds that have been drawn out in full in an experiment, or if you have published a compound to ChemSpider and wish to see the resulting compound pages. The resulting compound pages in ChemSpider will have the data source “IDBS E-WorkBook Suite”. The external ID will show the ID of the experiment from which the structures are from, and the depositor details as defined in the ChemSpider Settings of the ELN.

The ChemSpider IDBS E-WorkBook Suite plug-in is freely available to customers of IDBS E-WorkBook Suite by downloading it from IDBS, and copying it the appropriate place in their IDBS E-WorkBook Suite program files. It is compatible with E-WorkBook Suite versions 9.0 and 9.1.

This plug-in is an initial proof-of-concept to demonstrate that we can pass compound information between ChemSpider and an ELN in both directions. Future versions will allow more of the information within an experiment to be published to ChemSpider – for example to allow reactions along with a description of their methods to be published to ChemSpider SyntheticPages, or to deposit spectra along with compounds to ChemSpider. We would also like to integrate other ELNs with ChemSpider.

Recently I have been programming a java plug-in from which I needed to call the ChemSpider webservices, and I found that this wasn’t as straightforward as I was expecting, so I thought I would post how to do it in case it’s useful for anyone else who wants to do likewise.
The basic method I used was to use Apache Axis2 to generate java code for the WSDL’s of the main ChemSpider webservices. This java code is available here: chemspider_webservices_javasourcecode.zip and I have also made the compiled jar file available here: chemspider_webservices.jar. The ChemSpider webservices can be called from other java code by referencing this jar file (and the other axis library files).
This blog post describes how I generated and used this jar file. I was using the Eclipse IDE, so some of what I describe will be specific to that.
There is a similar jar file of some ChemSpider webservices which is available by downloading MZMine (the file chemspider-api.jar in the lib directory) and an example of its use can be seen by downloading the source code and looking at the file src\net\sf\mzmine\modules\peaklistmethods\identification\dbsearch\databases\ChemSpiderGateway.java). That jar file was generated using the previous version of Axis (just plain Axis, rather than Axis2) compared to this one. The example here may be easier to use as a start point since the full range of ChemSpider webservices are included in the jar file, there is a full description of how it was generated, the code used to generate the jar file is available and there are more examples of its use.

Generating the chemspider_webservices.jar file

To generate the java code from the WSDL of the ChemSpider webservices I used the WSDL2Java functionality of Apache Axis2. This is available in different forms, including an Eclipse plug-in which will directly import the java code generated into a project, but I found various bugs when trying to use the latest version of that, so just used the command line version.
I started off with generating the java code from the WSDL of the ChemSpider MassSpecAPI webservice:

  • I downloaded and unzipped the latest version of the Apache Axis2 binary distribution from their download page. I used version 1.6.1 of Axis2.
  • In the “bin” directory of this download there should be a file called java2wsdl.bat. Running this batch file from a command line saves a lot of time trying to set up the class paths correctly to run Java2WSDL. Before using it you should set up the following two environment variables:
    • AXIS2_HOME: Must point to the top level of the AXIS2 files which you just downloaded
    • JAVA_HOME: Must point at your Java Development Kit installation direcotry (e.g. C:\Program Files\Java\jre6)
  • To see a full list of the options available when running WSDL2Java simply open a command prompt and run the batch file with no options to obtain the Usage options – more information about these can be found in the Apache Axis2 user guide:
    • > axis2-1.6.1\bin\wsdl2java.bat
  • I ran it with options to specify to use the SOAP 1.2 port of the ChemSpider MassSpecAPI webservice (most ChemSpider webservices have the option of SOAP 1.1, SOAP 1.2, HTTP GET or HTTP POST), to generate synchronous code only (not asynchronous), and to use adb databinding (this is the default anyway):
    • > axis2-1.6.1\bin\wsdl2java.bat -uri http://www.chemspider.com/MassSpecAPI.asmx?WSDL -pn MassSpecAPISoap12 -s -d adb
  • This then generated the file MassSpecAPIStub.java which it automatically put in the package com.chemspider.www (so was the appropriate folder structure was created above it accordingly)
  • I repeated this processes with the other 4 main ChemSpider webservices:
    • > axis2-1.6.1\bin\wsdl2java.bat -uri http://www.chemspider.com/Search.asmx?WSDL -pn SearchSoap12 -s -d adb
    • > axis2-1.6.1\bin\wsdl2java.bat -uri http://www.chemspider.com/InChI.asmx?WSDL -pn InChISoap12 -s -d adb
    • > axis2-1.6.1\bin\wsdl2java.bat -uri http://www.chemspider.com/Spectra.asmx?WSDL -pn SpectraSoap12 -s -d adb
    • > axis2-1.6.1\bin\wsdl2java.bat -uri http://www.chemspider.com/OpenBabel.asmx?WSDL -pn OpenBabelWebServiceSoap12 -s -d adb
  • The folders and java class files generated by Java2WSDL (MassSpecAPIStub.java, SearchStub.java, InChIStub.java, SpectraStub.java and OpenBabelWebServiceStub.java) that were generated are available in the zip file chemspider_webservices_javasourcecode.zip for further reference
  • I then started a new Eclipse project, imported this generated File system into it
  • The generated classes rely on the Axis2 library files so these need to be added to the build path – in Eclipse this is done by right-clicking on the project in the Package Explorer, choosing Properties > Java Build Path > Libraries > Add External Jars and selecting all of the lib files in the lib folder of the Axis2 folder.
  • This project was exported as the jar file chemspider_webservices.jar

Using the chemspider_webservices.jar file as an external library jar file

The chemspider_webservices.jar file and all of the Apache Axis2 library jar files need adding to a java project as referenced libraries before it can be called. To do this in Eclipse right-click on the project in the Package Explorer, choose Properties > Java Build Path > Libraries > Add External Jars and select:

  • the chemspider_webservices.jar file (download it from chemspider_webservices.jar and save it locally)
  • all of the lib files in the lib folder of the Axis2 folder.

Once this has been done then the ChemSpider webservices can be called from the project. An example is shown below, and is also downloadable in text format from here. This has been structured into (pretty well self-contained) functions which can be easily called to retrieve the results of a particular operation of a webservice. In the main function these functions are called and the output written out.

Please note that you should put your obtains your own ChemSpider token from ChemSpider to set as the ChemSpiderToken value – to obtain this, register for a ChemSpider account, and look up your token from your user Profile page after logging in. Some tokens require your user account to be associated with the “Service Subscriber” role, which you can request from your user profile page.

package com.chemspider.www.examples;

import java.util.HashMap;
import java.util.Map;

import javax.swing.JOptionPane;

import org.apache.log4j.BasicConfigurator;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;

import com.chemspider.www.*;
import com.chemspider.www.InChIStub.InChIToCSIDResponse;
import com.chemspider.www.SearchStub.GetAsyncSearchResultResponse;
import com.chemspider.www.SearchStub.GetAsyncSearchStatusResponse;
import com.chemspider.www.SearchStub.SimpleSearchResponse;
import com.chemspider.www.MassSpecAPIStub.ArrayOfInt;
import com.chemspider.www.MassSpecAPIStub.ArrayOfString;
import com.chemspider.www.MassSpecAPIStub.ExtendedCompoundInfo;
import com.chemspider.www.MassSpecAPIStub.GetDatabasesResponse;
import com.chemspider.www.MassSpecAPIStub.GetExtendedCompoundInfoArrayResponse;
import com.chemspider.www.MassSpecAPIStub.SearchByMassAsyncResponse;

public class WebServiceExamples {

/**
* @param args
*/

private static final Logger LOG = Logger.getLogger(WebServiceExamples.class.getName());

private static String ChemSpiderToken = "YOU NEED TO INSERT YOUR OWN TOKEN IN HERE";

public static void main(String[] args) {
BasicConfigurator.configure();

JOptionPane.showMessageDialog(null, "The compound with InChI InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H has CSID:"+get_InChI_InChIToCSID_Results("InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H"));

int[] SimpleSearchResults = get_Search_SimpleSearch_Results("taxol", ChemSpiderToken);
JOptionPane.showMessageDialog(null, "The first of "+SimpleSearchResults.length+" ChemSpider compound(s) returned by a search for Taxol has CSID:"+SimpleSearchResults[0]);

int[] inputCSIDs = new int[2];
inputCSIDs[0] = 236;
inputCSIDs[1] = 238;
Map> GetExtendedCompoundInfoArrayResults = get_MassSpecAPI_GetExtendedCompoundInfoArray_Results(inputCSIDs, ChemSpiderToken);
Map thisCompoundInfo = GetExtendedCompoundInfoArrayResults.get(238);
JOptionPane.showMessageDialog(null, "The Average Mass of the compound with CSID 238 is: "+thisCompoundInfo.get("AverageMass"));

String[] GetDatabaseResults = get_MassSpecAPI_GetDatabases_Results();
JOptionPane.showMessageDialog(null, "The first of "+GetDatabaseResults.length+" datasources in ChemSpider is:"+GetDatabaseResults[0]);

String SearchByMassAsyncResults = get_MassSpecAPI_SearchByMassAsync_Results(1100.0, 0.1,GetDatabaseResults, ChemSpiderToken);
JOptionPane.showMessageDialog(null, "Transaction ID for search on compounds with mass = 1100+/- 0.1 from any data source is" + SearchByMassAsyncResults);
JOptionPane.showMessageDialog(null, "The operation status of the search with this transaction ID is" + get_Search_GetAsyncSearchStatus_Results(SearchByMassAsyncResults, ChemSpiderToken));
int[] GetAsyncSearchResultResults = get_Search_GetAsyncSearchResult_Results(SearchByMassAsyncResults, ChemSpiderToken);
JOptionPane.showMessageDialog(null, "And the first of "+GetAsyncSearchResultResults.length+" ChemSpider compound(s) returned by the search has CSID:"+GetAsyncSearchResultResults[0]);
}

/**
* Function to call the InChIToCSID operation of ChemSpider's InChI SOAP 1.2 webservice (http://www.chemspider.com/InChI.asmx?op=InChIToCSID)
* Convert InChI to ChemSpider ID.
*
* @param inchi: string representing inchi to search ChemSpider for
* @return: string representing CSID returned
*/
public static String get_InChI_InChIToCSID_Results(String inchi) {
String Output = null;
try {

final InChIStub thisInChIstub = new InChIStub();
com.chemspider.www.InChIStub.InChIToCSID InChIToCSIDInput = new com.chemspider.www.InChIStub.InChIToCSID();
InChIToCSIDInput.setInchi(inchi);
final InChIToCSIDResponse thisInChIToCSIDResponse = thisInChIstub.inChIToCSID(InChIToCSIDInput);
Output = thisInChIToCSIDResponse.getInChIToCSIDResult();
} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

/**
* Function to call the SimpleSearch operation of ChemSpider's Search SOAP 1.2 webservice (http://www.chemspider.com/search.asmx?op=SimpleSearch)
* Search by Name, SMILES, InChI, InChIKey, etc. Returns a list of found CSIDs (first 100 - please use AsyncSimpleSearch instead if you like to get the full list). Security token is required.
*
* @param query: String representing search term (can be Name, SMILES, InChI, InChIKey)
* @param token: string containing your user token (listed at your http://www.chemspider.com/UserProfile.aspx page)
* @return: int[] array containing the ChemSpider IDs. If more than 100 are found then only the first 100 are returned.
*/
public static int[] get_Search_SimpleSearch_Results(String query, String token) {
int[] Output = null;
try {
final SearchStub thisSearchStub = new SearchStub();
com.chemspider.www.SearchStub.SimpleSearch SimpleSearchInput = new com.chemspider.www.SearchStub.SimpleSearch();
SimpleSearchInput.setQuery(query);
SimpleSearchInput.setToken(token);
final SimpleSearchResponse thisSimpleSearchResponse = thisSearchStub.simpleSearch(SimpleSearchInput);
Output = thisSimpleSearchResponse.getSimpleSearchResult().get_int();
} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

/**
* Function to call the GetDatabases operation of ChemSpider's MassSpecAPI SOAP 1.2 webservice (http://www.chemspider.com/massspecapi.asmx?op=GetDatabases)
* Get the list of datasources in ChemSpider.
*
* @return: the list of datasources in ChemSpider as a String Array
*/
public static String[] get_MassSpecAPI_GetDatabases_Results() {
String[] Output = null;
try {

final MassSpecAPIStub thisMassSpecAPIStub = new MassSpecAPIStub();
com.chemspider.www.MassSpecAPIStub.GetDatabases getDatabaseInput = new com.chemspider.www.MassSpecAPIStub.GetDatabases();
final GetDatabasesResponse thisGetDatabasesResponse = thisMassSpecAPIStub.getDatabases(getDatabaseInput);
Output = thisGetDatabasesResponse.getGetDatabasesResult().getString();
} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

/**
* Function to call the GetExtendedCompoundInfoArray operation of ChemSpider's MassSpecAPI SOAP 1.2 webservice (http://www.chemspider.com/massspecapi.asmx?op=GetExtendedCompoundInfoArray)
* Get array of extended record details by an array of CSIDs. Security token is required.
*
* @param CSIDs: integer array containing the CSIDs of compounds for which information will be returned
* @param token: string containing your user token (listed at your http://www.chemspider.com/UserProfile.aspx page)
* @return: a Map> containing the results array for each CSID (with Properties CSID, MF, SMILES, InChIKey, AverageMass, MolecularWeight, MonoisotopicMass, NominalMass, ALogP, XLogP, CommonName)
*/
public static Map> get_MassSpecAPI_GetExtendedCompoundInfoArray_Results(int[] CSIDs, String token) {
Map> Output = new HashMap>();
try {
final MassSpecAPIStub thisMassSpecAPIStub = new MassSpecAPIStub();
ArrayOfInt inputCSIDsArrayofInt = new ArrayOfInt();
inputCSIDsArrayofInt.set_int(CSIDs);
com.chemspider.www.MassSpecAPIStub.GetExtendedCompoundInfoArray getGetExtendedCompoundInfoArrayInput = new com.chemspider.www.MassSpecAPIStub.GetExtendedCompoundInfoArray();
getGetExtendedCompoundInfoArrayInput.setCSIDs(inputCSIDsArrayofInt);
getGetExtendedCompoundInfoArrayInput.setToken(token);
final GetExtendedCompoundInfoArrayResponse thisGetExtendedCompoundInfoArrayResponse = thisMassSpecAPIStub.getExtendedCompoundInfoArray(getGetExtendedCompoundInfoArrayInput);
ExtendedCompoundInfo[] thisExtendedCompoundInfo = thisGetExtendedCompoundInfoArrayResponse.getGetExtendedCompoundInfoArrayResult().getExtendedCompoundInfo();
for (int i=0; i Map thisCompoundExtendedCompoundInfoArrayOutput = new HashMap();
thisCompoundExtendedCompoundInfoArrayOutput.put("CSID", Integer.toString(thisExtendedCompoundInfo[i].getCSID()));
thisCompoundExtendedCompoundInfoArrayOutput.put("MF", thisExtendedCompoundInfo[i].getMF());
thisCompoundExtendedCompoundInfoArrayOutput.put("SMILES", thisExtendedCompoundInfo[i].getSMILES());
thisCompoundExtendedCompoundInfoArrayOutput.put("InChI", thisExtendedCompoundInfo[i].getInChI());
thisCompoundExtendedCompoundInfoArrayOutput.put("InChIKey", thisExtendedCompoundInfo[i].getInChIKey());
thisCompoundExtendedCompoundInfoArrayOutput.put("AverageMass", Double.toString(thisExtendedCompoundInfo[i].getAverageMass()));
thisCompoundExtendedCompoundInfoArrayOutput.put("MolecularWeight", Double.toString(thisExtendedCompoundInfo[i].getMolecularWeight()));
thisCompoundExtendedCompoundInfoArrayOutput.put("MonoisotopicMass", Double.toString(thisExtendedCompoundInfo[i].getMonoisotopicMass()));
thisCompoundExtendedCompoundInfoArrayOutput.put("NominalMass", Double.toString(thisExtendedCompoundInfo[i].getNominalMass()));
thisCompoundExtendedCompoundInfoArrayOutput.put("ALogP", Double.toString(thisExtendedCompoundInfo[i].getALogP()));
thisCompoundExtendedCompoundInfoArrayOutput.put("XLogP", Double.toString(thisExtendedCompoundInfo[i].getXLogP()));
thisCompoundExtendedCompoundInfoArrayOutput.put("CommonName", thisExtendedCompoundInfo[i].getCommonName());
Output.put(thisExtendedCompoundInfo[i].getCSID(), thisCompoundExtendedCompoundInfoArrayOutput);
}

} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

/**
* Function to call the SearchByMass2 operation of ChemSpider's MassSpecAPI SOAP 1.2 webservice (http://www.chemspider.com/massspecapi.asmx?op=SearchByMass2)
* Search ChemSpider by mass +/- range.
*
* @param Mass: The compounds returned have a mass (Double) within the range Mass +/- Range
* @param Range: The compounds returned have a mass (Double) within the range Mass +/- Range
* @return: the ChemSpider IDs of compounds returned (as a String Array)
*/
public static String get_MassSpecAPI_SearchByMassAsync_Results(Double mass, Double range, String[] dbs, String token) {
String Output = null;
try {
final MassSpecAPIStub thisMassSpecAPIStub = new MassSpecAPIStub();
com.chemspider.www.MassSpecAPIStub.SearchByMassAsync getSearchByMassAsyncInput = new com.chemspider.www.MassSpecAPIStub.SearchByMassAsync();
getSearchByMassAsyncInput.setMass(mass);
getSearchByMassAsyncInput.setRange(range);
ArrayOfString inputDBsArrayofString = new ArrayOfString();
inputDBsArrayofString.setString(dbs);
getSearchByMassAsyncInput.setDbs(inputDBsArrayofString);
getSearchByMassAsyncInput.setToken(token);
final SearchByMassAsyncResponse thisSearchByMassAsyncResponse = thisMassSpecAPIStub.searchByMassAsync(getSearchByMassAsyncInput);
Output = thisSearchByMassAsyncResponse.getSearchByMassAsyncResult();
} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

/**
* Function to call the GetAsyncSearchStatus operation of ChemSpider's Search SOAP 1.2 webservice (http://www.chemspider.com/search.asmx?op=GetAsyncSearchStatus)
* Query asynchronous operation status. Requires transaction ID returned by AsynchSearch operation. Security token is required.
*
* @param rid: String representing transaction ID returned from a previous search
* @param token: string containing your user token (listed at your http://www.chemspider.com/UserProfile.aspx page)
* @return: String describing status of this search - can have values Unknown or Created or Scheduled or Processing or Suspended or PartialResultReady or ResultReady or Failed or TooManyRecords
*/
public static String get_Search_GetAsyncSearchStatus_Results(String rid, String token) {
String Output = null;
try {
final SearchStub thisSearchStub = new SearchStub();
com.chemspider.www.SearchStub.GetAsyncSearchStatus GetAsyncSearchStatusInput = new com.chemspider.www.SearchStub.GetAsyncSearchStatus();
GetAsyncSearchStatusInput.setRid(rid);
GetAsyncSearchStatusInput.setToken(token);
final GetAsyncSearchStatusResponse thisGetAsyncSearchStatusResponse = thisSearchStub.getAsyncSearchStatus(GetAsyncSearchStatusInput);
Output = thisGetAsyncSearchStatusResponse.getGetAsyncSearchStatusResult().toString();
} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

/**
* Function to call the GetAsyncSearchResult operation of ChemSpider's Search SOAP 1.2 webservice (http://www.chemspider.com/search.asmx?op=GetAsyncSearchResult)
* Returns the list of CSIDs found by AsynchSearch operation. Security token is required.
*
* @param rid: String representing transaction ID returned from a previous search
* @param token: string containing your user token (listed at your http://www.chemspider.com/UserProfile.aspx page)
* @return: int[] array containing the ChemSpider IDs.
*/
public static int[] get_Search_GetAsyncSearchResult_Results(String rid, String token) {
int[] Output = null;
try {
final SearchStub thisSearchStub = new SearchStub();
com.chemspider.www.SearchStub.GetAsyncSearchResult GetAsyncSearchResultInput = new com.chemspider.www.SearchStub.GetAsyncSearchResult();
GetAsyncSearchResultInput.setRid(rid);
GetAsyncSearchResultInput.setToken(token);
final GetAsyncSearchResultResponse thisGetAsyncSearchResultResponse = thisSearchStub.getAsyncSearchResult(GetAsyncSearchResultInput);
Output = thisGetAsyncSearchResultResponse.getGetAsyncSearchResultResult().get_int();
} catch (Exception e) {
LOG.log(Level.ERROR, "Problem retrieving ChemSpider webservices", e);
}
return Output;
}

}

Disclaimer: I’m new to Java programming, so please excuse me if you are a java expert and I’ve said something obvious, offended you with my code or used the wrong terminology anywhere.