Archive for the Community Building Category

We are pleased to announce that we have just imported 1047 CIFs to ChemSpider of crystal structures that were previously reported in RSC papers (and are available as ESI for those) to ChemSpider for the relevant compounds, and linked those back to the original articles and to the CCDC’s webCSD, e.g. example compound with RSC article CIF (see the CIF infobox). Since each CIF that is uploaded into ChemSpider must be associated with a ChemSpider compound, the difficult part of this task was working out a 2D molecular structure (in .mol file format) for each 3D crystal structure (in .cif file format) – which is particularly difficult because CIFs only contain information about each atomic position and not how the atoms are bonded to each other in the crystal or whether they are charged or not.
Ultimately we would like this CIF to mol conversion (and the whole upload) to be performed programmatically without human intervention. However, there is no reliable way to do that currently – although programs such as OpenBabel can be used to extract mols from each CIF, the reliability of this conversion isn’t 100%.
So as one of our student intern projects at the University of Southampton this summer (in parallel with another student intern project at Southampton University to share thesis data in ChemSpider) we used OpenBabel (version 2.3.2, run from the command line with the options -i cif inputfilename.txt -o mol -m –unique -d –AddPolarH) to extract mols for all the CIFs in the RSC archive (over 43,000 files as of June 2013) and enlisted Julija Kezina (shown below) to review the results of these conversions to ensure that only good structure and CIF pairs would be deposited to ChemSpider, and to better understand the problems in the conversion process with a view to fixing them. One problem that became immediately apparent was that because the 2D structure obtained was just a projection of the 3D structure along the a cell axis, which is not always the orientation which shows the molecule most clearly, even if they did have the write chemical connections between the atoms, so all mol structures were run through OpenEye’s cleaning algorithm before being reviewed.
Julija Kezina - Southampton University intern who examined CIF to Mol conversion
Julija compared each structure in the output mol files with those in the original CIF files to judge whether the conversion was accurate or not. In addition, as an extra check, all of the output mol structures were submitted to ChemSpider validation and standardisation platform to filter out molecules with structural problems (e.g. stereochemistry, valence or congestion issues).
Overall, approximately 30% of the CIF to mol conversions that Julija checked were good, with the right connectivity of atoms and ions (although approximately 30% of these needed the atomic positions to be repositioned to clean or tidy up the structure, either manually or using ChemDraw’s cleaning functionality). The 1047 of these mols which contain only a single molecule (without solvent molecules or cocrystals etc.) are those which have been deposited into ChemSpider with their corresponding CIFs.
The journals which had the highest successful conversion percentage were Molecular BioSystems (57%), MedChemComm (51%), Organic and Biomolecular Chemistry (44%) and Green Chemistry (44%) – the journals which in general are about small organic molecules.
Julija was working in the National Crystallography Service’s office at the University of Southampton, under the co-supervision of Professor Simon Coles, and we are grateful to them for their help and advice about the finer points of the CIF file format.

Unsuccessful CIF to mol conversions

Running and evaluating OpenBabel on such a large and varied set of structures has given us a useful opportunity to identify and categorise the most common problems encountered. Here we share these and give examples that would enable the identification of some easy fixes in the pipeline that might benefit the whole community and be used as test cases when doing so. We will report these bugs to the OpenBabel forum and because OpenBabel is open source, hope to resolve at least some of these issues in the future through collaboration with its other developers.

The following OpenBabel bugs look like they might be most straightforward to fix:

Details Example
  • Category: BAD_NITRO
  • Frequency: 233
  • Description: there are different ways of representing nitro groups in structure drawers – OpenBabel currently does so by producing a mol with a pentavalent nitrogen. In ChemSpider we we choose to avoid this in favour of a format with a charge-separated nitro.
  • Solution: Allow OpenBabel to have a different output option for nitro groups to output them as shown in corrected mol file.
BAD_NITRO example: ob_b209378b_1.jpg

Files: ob_b209378b_1.zip

  • Category: BAD_MULT
  • Frequency: 434
  • Description: Duplicate (exactly identical, including stereochemistry) molecules are present in the resulting mol file despite running OpenBabel with the –unique option (which should filter out duplicate molecules based on their inchis)
  • Solution: Fix OpenBabel when run with the –unique option so that it works.
BAD_MULT example: nj_b306072a_1.jpg

Files: nj_b306072a_1.zip

  • Category: BAD_MISSINGPARTOFMOLECULE
  • Frequency: 724
  • Description: Part of the molecule is missing
  • Cause: OpenBabel doesn’t understand crystal symmetry – only the atoms in the CIF that are explicitly listed with positions are included in the resulting mol file, and those that are inferred by symmetry are not.
  • Solution: Make OpenBabel generate the full molecule from the symmetry in the CIF file, or recommend that a script/program that can process a CIF to generate another CIF with all atoms is run before OpenBabel.
BAD_MISSINGPARTOFMOLECULE example: ce_b202304k_5

Files: ce_b202304k_5.zip

  • Category: BAD_PARTIALOCCUPANCY
  • Frequency: 432
  • Description: partial occupancy of multiple sites for a particular atom in the CIF file
  • Cause: In CIF files sometimes positions of multiple sites are specified with occupancy less than one – OpenBabel doesn’t recognise this and assumes that the occupancy of all sites is one effectively, so that there are duplicates of some atoms or fragments in the mol file.
  • Solution: Where the _atom_site_occupancy is less than one, group together atoms into those which are alternatives of each other (by type, proximity, and those which add up to a total occupancy of 1) and choose only one of them to include in the final mol file (that with the highest site occupancy, or if two have equal occupancies of e.g. 0.5 then pick one at random). Note that there needs to be consistency, so that if for example a C is discarded, then all of the adjoining H’s with partial occupancy are also discarded but those bonded to the C that is included are included (as in the attached example).
BAD_PARTIALOCCUPANCY example: md_c2md20054f_1.jpg

Files: md_c2md20054f_1.zip

Many of the problems were caused by idiosynchronies or errors in the input CIFs, but these on the whole weren’t handled well by OpenBabel (e.g. by writing an error message and terminating the program) but rather, in the majority of cases went into an infinite loop and the program hung. Because of this, and because the OpenBabel conversions were part of a longer script, all OpenBabel jobs had to be run with an arbitary timeout so that if still running after this timeout they were killed, which may have discarded some valid but long-running OpenBabel jobs. We will investigate whether there is a validation program that can be automatically performed on CIFs to filter out ones with these problems (similar to the CCDC’s EnCIFer but which can be run programmatically), but it would be relatively straightforward to make OpenBabel more reliable by being able to exit nicely when it encounters these problems so that pre-validation wasn’t necessary. These problems are listed in the table below:

Details Example
  • Category: CIF_NOCOORDINATES
  • Frequency: 378
  • Description: cif doesn’t contain any coordinates
  • Cause: Some CIFs contain e.g. powder diffraction refinement data and don’t contain coordinates.
  • Solution: OpenBabel already issues an error: “CIF Error: no atom found ! (in data block:XXX)” – simply abort the program if this is found (rather than trying to continue).

Files: CC_B502254A_3.txt

  • Category: CIF_MISSINGLOOP
  • Frequency: 85
  • Description: cif misses a “loop_” line
  • Solution: Do an initial check that there is at least one loop_ line in the expected place before attempting to do the conversion.
CIF_MISSINGLOOP example: ob_c2ob25400j_2.jpg

Files: ob_c2ob25400j_2.zip

  • Category: CIF_COMMENTEDFIELD
  • Frequency: 36
  • Description: if there is a CIF field name in a commented section of the CIF, OpenBabel doesn’t ignore it and goes into an infinte loop
  • Solution: It would be trivial to make sure that OpenBabel ignores CIF field names which are commented out (between a pair of semicolons).
CIF_COMMENTEDFIELD example: dt_c3dt33040k_1.jpg

Files: dt_c3dt33040k_1.zip

The following OpenBabel bugs were the most frequent in occurence, but will be difficult to fix. They arise from the problem that the CIF format does not record charges on atoms/ions or the types of bong between them so OpenBabel needs to work them out which is hard to do correctly.

Details Example
  • Category: BAD_CHARGEMISSING
  • Frequency: 830
  • Description: One or more ions in the molecule have the wrong charge on them in the resulting mol file
BAD_CHARGEMISSING example: md_c2md20105d_1.jpg

Files: md_c2md20105d_1.zip

  • Category: BAD_WRONGCOORDINATION
  • Frequency: 747
  • Description: One or more atoms or ions in the molecule have the wrong coordination – problem observed in metal ions, S, P, Se and B
BAD_CHARGEMISSING example: ob_b314176d_1.jpg

Files: ob_b314176d_1.zip

  • Category: BAD_BONDMISSING
  • Frequency: 587
  • Description: One or more of the bonds in the molecule are of the wrong order e.g. a single bond instead of a double bond.
BAD_BONDMISSING example: MD_c3md00077j_1.jpg

Files: c3md00077j_1.zip

  • Category: BAD_WRONGBOND
  • Frequency: 452
  • Description: Wrong sequence of single/double bonds.
BAD_WRONGBOND example: nj_b301045g_3.jpg

Files: nj_b301045g_3.zip

  • Category: BAD_NOCOORDL
  • Frequency: 52
  • Description: no coordination to a ligand.
BAD_NOCOORDL example: ob_b307014j_1.jpg

Files: ob_b307014j_1.zip

  • Category: BAD_MISSINGH
  • Frequency: 18
  • Description: missing hydrogen.
BAD_MISSINGH example: ob_b311669g_3.jpg

Files: ob_b311669g_3.zip

There were also some problem mol files produced which either won’t be able to be fixed by OpenBabel (since they resulted from either errors or limitations of the input CIF files which cannot be fixed retrospectively) or are too difficult to fix and/or too infrequently occuring to be worth the effort:

  • There were 237 cases where there were solvent molecules in the CIF (many of which have missing hydrogens, partial occupancy of the molecule or part of the molecule etc.) which give rise to spurious oxygens, fragments of molecules and radicals in the resulting mol file (see example files for nj_b306778e_1.zip). 148 of these cases are just water solvent molecules either with missing or detached hydrogen atoms. The poor definition of the solvent molecules is a limitation of CIF files from diffraction so it is not possible for OpenBabel to better define them in the output mol that is derived from them. However, running OpenBabel with the -r option to remove all but the largest contiguous fragment was quite successful to remove these problem solvent molecules so no further action is required to deal with this problem and this option will be used by us in the future.
  • There were 81 cases where there was at least one missing hydrogen in the original CIF (or in 3 cases, all hydrogens missing) – see example files for ob_B500173K_3.zip.
  • Some CIFs contain crystal structures which correspond to continuous networks rather than small molecules (e.g. polymers, MOFs, zeolites, POMs) which cannot meaningfully be captured in mol format – see example files for ce_b309410c_3.zip.
  • There were a few (24) cases where the stereochemistry in the mol file obtained is incorrectly defined. However, because on the stereochemistry was well interpreted by OpenBabel and these cases were relatively few, it probably isn’t worth disturbing the apple cart to investigate these further – see example files for ob_b407215b_4.zip
  • .

This summer there have been a number of students from the University of Southampton doing internships on joint projects between the university and the Royal Society of Chemistry and ChemSpider. Three of these students have been sifting through theses from past members of Richard Whitby’s research group in order to extract the compound, spectra and reaction data in it (and linked lab note books, and archive spectra files) and share these in LabTrove, ChemSpider, and CSSP. The students – Alex Hartke, Yet Wai Lee and Josh Whittam (all 2nd year undergraduates) – are shown below together with the boxes of thesis data, lab notebooks and spectra print outs that they digitised.
Southampton University Interns
Between them they digitised 7 theses, by A.Henderson, L. Sayer, D. Owen, D.Macfarlane, F. Giustiniano, G. Saluste, J. Stec, which resulted in 1035 LabTrove pages being published to the Whitby Group’s LabTrove blog.

The theses were a rich source of compound information – including compound structures, names, properties and spectra, all of which were also deposited into ChemSpider resulting in 208 new compound pages, and about 600 spectra.

For this project the students manually deposited the compound information into LabTrove and then deposited the compounds and spectra to ChemSpider. However, we are currently developing a range of ChemSpider jquery widgets which can be integrated into web-based ELNs such as LabTrove which will make it easier to enter compound information from ChemSpider into experiments, and also to publish compound and reaction data from the ELNs to ChemSpider, CSSP and ChemSpider Reactions. This will follow on from the initial proof of concept to retreive ChemSpider information and enter it into LabTrove pages.

With this long-term aim in view, the LabTrove pages that the interns stored the compound and reaction data were structured using LabTrove templates, and this structuring will make it easier for publishing widgets to understand the data and process it the correct way. In this way, the project was partly a test to ensure that the templates were suitable for storing compound data in LabTrove. As well as the ChemSpider compound and associated data template (with corresponding help page, templates were also written to store reaction data in a formatted way, since the theses were primarily focused on the synthesis of compounds. At their simplest, basic reaction data can be stored in LabTrove using the ChemSpider Reactions template (and corresponding help page, and eventually posts written in this format will be easily publishable to ChemSpider Reactions. More detailed reaction data can be stored using the ChemSpider SyntheticPages style reaction template (and corresponding help page. The initial aim was to deposit all of this reaction data into ChemSpider SyntheticPages but it became clear that it was difficult for anyone other than the researcher who conducted the reaction, or their superviser to supply the necessary level of detail for CSSP submissions, and in particular couldn’t easily be reached by retrospectively abstracting theses. As a result, only a handful of reactions were submitted to CSSP, and the majority (over 500) were stored in LabTrove for future submission to ChemSpider Reactions.

If reactions can be published easily from ELNs to ChemSpider Reactions and that is easily queryable by other researchers and their applications when performing new reactions this will be a major step towards the aims of the Dial-a-molecule (an EPSRC Grand Challenge network). An important part of the reaction data which needs to be captured is the stoichiometry table of substances used and produced in a reaction. However, these stoichiometry tables are too complicated to incorporate into a LabTrove template, so the LabTrove reaction templates will be used in conjunction with a new ChemSpider jquery widget which is currently in the process of being integrated with LabTrove (more details to follow on this blog shortly!) which will construct them. The widget performs ChemSpider lookups to retrieve compound information, and will calculate equivalents, thereby saving the researcher time when working out the amounts of reactants needed or yields of products obtained. An example of a reaction post which was initially created using the ChemSpider Reactions template and then supplemented by adding a stoichiometry table to it using the ChemSpider Edit Stoichiometry Table widget is shown here.

If you are a LabTrove user and wish to use the ChemSpider templates, their source is available via their links above, and instructions for using templates in Labtrove are documented here.

For some time now it has been possible to access relevant SureChem patent information from a ChemSpider compound page in the Patents Infobox. ChemSpider compounds are also linked to and from the relevant RSC articles, which has allowed us to form a new partnership between RSC Publishing and SureChem which relies on ChemSpider taking the pivotal role of linking internet chemistry together.

In the RSC article landing pages there is a “Compounds” tab which shows the key compounds that the article is about – as shown in this example. For each compound there is now a link to view the SureChem patent information associated with that compound as below:

The RSC Publishing platform article landing page showing SureChem patent information

The RSC Publishing platform article landing page showing SureChem patent information

SureChem and SureChem’s new free offering, SureChemOpen, offer a suite of patent chemistry data solutions, for example allowing their patents to be found from a structure or substructure search. Now, for each compound returned from such a search it is possible to view any linked ChemSpider compound pages and the number of associated RSC publications (and follow a link to view these articles).

This linking between SureChem and the RSC publication platform relies on ChemSpider (and the standard InChI chemical identifier) providing a bridging link to both, which ensures that the system is accessible, standards-based and scalable, making it easy for future partners to join.

James Jack from Accelrys has developed a great example of using ChemSpider web services to add ChemSpider search functionality with the structure drawing tool Accelrys Draw.

It is now possible, with a new add-in to perform advanced searches on ChemSpider with the Accelrys Draw program itself, searching by text, structure searches (exact, similarity and substructure), elements (those present and those absent), intrinsic and predicted properties, and LASSO activities. All of the ChemSpider information about the compounds returned in the search can be viewed and their structure(s) loaded back into the main Accelrys Draw window for further editing.

If you’re interested in finding out more about this add-in or obtaining it then see James’ blog post about the add-in. He has also posted a video demonstrating its use:

Technical details for developers

James has modularised his code so as to separate out a .Net Client API to the ChemSpider Search web service that can be used from *any* .Net application without the need for additional assemblies (other than standard .Net) and requires minimal code. This makes it easy to add the same ChemSpider search functionality to other Accelrys products (e.g. Symyx Notebook).

In addition, he has released this ChemSpiderSearchClient code so that it is available to other ChemSpider users who would like to integrate ChemSpider web services with their code in similar ways.

The “ChemSpiderClient” solution should be opened with Visual Studio. It contains two projects – “ChemSpiderClient” is the main library project (which contains the ChemSpider API code) and “ChemSpider ClientTest(No Draw)” is a simple interface to run the library code (set this as the start up project to debug the project). “ChemSpiderClient.cs” in “ChemSpiderClient” is the main code file that calls the ChemSpider webservices. Best practice for performing ChemSpider searches is observed – first launching a search to retrieve a transaction ID for the search, intermittently searching for the status of the search using the GetAsyncSearchStatus operation of Search.asmx and when the status of the search is “ResultReady” and then retrieving the resulting ChemSpider Ids. If the reference to Symyx.CustomUIControls from the ChemSpider Client is missing then add a reference to Symyx.CustomUIControls.dll in the top-level folder of the zip file.

Please note that a token is needed to access the ChemSpider webservices and by default the code is supplied without one specified, so that you need to input your own token value – the app.config file of “ChemSpider ClientTest(No Draw)” should be edited to enter a valid token that will be used by default. If this isn’t done, the user will need to supply a token when running the search via a pop-up box. To obtain a token, please complete the registration process – when you are registered the Security Token is listed on your Profile page.

The RSC’s objective is to advance the chemical sciences, not only at a research level but also to provide tools to train the next generation of chemists. ChemSpider contains a lot of useful information for students learning Chemistry but there is also a lot of information which is not relevant to their studies which might be confusing and distracting. For some time we have been considering the concept of an educational version of ChemSpider, aimed at students (and their teachers or lecturers) in their last years of school, and first years of university (ages 16-19), which restricts the compounds and the properties, spectra and links displayed for each, to those relevant to their studies. As a result, we are pleased to announce the launch of the Learn Chemistry Wiki which not only fulfils this aim, but also takes it further. This project was developed in a collaboration between Dr Martin Walker at the State University of New York at Potsdam, ChemSpider and the Royal Society of Chemistry’s Education team.
The Learn Chemistry Wiki contains over 2000 “substance” pages which correspond to simple compounds that would commonly be encountered during the last years of school and first years of University. Each of these pages corresponds to a ChemSpider compound, from which it dynamically retrieves compound images, a summary of its properties(molecular formula, mass, IUPAC name, appearance, melting and boiling points, solubility, etc.) and links to view safety sheets and spectra. It also contains text from Wikipedia to display in the substance page based on the Wikipedia links in ChemSpider.

The Learn Chemistry Wiki also goes a step further and not only contains compound information in isolation but also contains laboratory experiments (with parallel sections which contain an overview, teachers’ notes and students’ handouts) for each, quizzes, and tutorials which are linked to the compound information to put them into context. The wiki is based on the MediaWiki platform (which allows multiple users to contribute collaboratively since the website is intended to be a community website), but extends it to incorporate functionality similar to that of ChemSpider, invoked via custom-made extensions. For example, it is possible to draw structures using GGA’s Ketcher in order to find structures, or to draw answers to quiz questions (for example to specify the product of a particular reaction). It is also possible to include an interactive spectrum retrieved from ChemSpider in any wiki page, using the ChemDoodle spectrum viewing widget in browsers which support canvases or JSpecView applet in those that don’t.

For an overview and demonstration of the Learn Chemistry Wiki site see the Learn Chemistry Wiki site tour webppage or the Learn Chemistry Wiki overview demo video:

The Learn Chemistry Wiki is part of the new RSC’s new Learn Chemistry platform which provides a central access point and search facility to make it easier to access the various different RSC teaching resources that it provides.

The functionality of electronic lab notebooks (ELNs) and that of ChemSpider overlap to a certain extent – both store chemical information including structures, data, spectra and reactions. However, the focus of most ELNs is to manage, track and audit that data, and that of ChemSpider is to publish and disseminate it to the world. We have been considering how best to use this complementary functionality to integrate an ELN with ChemSpider.

Some ELNs already currently look up information and link to ChemSpider. For example the blog3 Web-logging (“blogging”) engine by Jeremy Frey, Simon Coles and Mark Borkum at Southampton University, which allows links to compounds from the ChemSpider database to be embedded directly into the content of a post. When a link to ChemSpider is detected, blog3 follows the link to retrieve additional information that is relevant to the compound, including: experimental and theoretical data; two- and three- dimensional depictions; and links to papers and journal articles. Another example is the eScience tool that Stephen Wan from CSIRO has developed with the University of New South Wales to text mine LabTrove ELN blog posts to identify chemical names and link these to the relevant ChemSpider compounds.

At the meeting “The Smart Laboratory: Towards a national ELN” meeting (organised as part of the Dial-a-Molecule EPSRC Grand Challenge) in August this year, the seeds were sown to take the integration between ELNs and ChemSpider a step further. Cambridge University has the first Chemistry department in the UK to roll out a department-wide Electronic Lab notebook system, and the software that they’re using is IDBS’s E-WorkBook Suite. In collaboration with IDBS and Cambridge’s Chemistry department, we at ChemSpider have made a plug-in which could both dynamically retrieve information from ChemSpider into their ELN, and publish to it the other way. The Chemistry department at Cambridge (Dr Tim Dickens, Dr Brian Brooks, Prof Bobby Glenn and Prof Steven Ley) have been very helpful in granting access to their ELN to write the plug-in, and will be its first users, but the results will be freely available for any existing IDBS E-WorkBook suite user.

About the extension Prof Bobby Glenn has said: “Much of Chemistry is lost, it is simply not published and languishes in forgotten lab notebooks. Capturing novel molecules soon after synthesis on a searchable database like Chemspider is now an effortless process directly from the ELN, which will greatly encourage sharing of compounds, synthetic methods and all the associated data. It’s instant messaging for chemists”. Antony Williams (Vice-President of Strategic Development of ChemSpider) added “The ability to now publish compound data from the IDBS ELN directly to ChemSpider offers a path to direct exposure of novel chemistry as well as the chemist doing the work. This public compound registration capability is the first move towards ultimately exposing synthetic methods and associated experimental data to the community. Our vision is coming to fruition through this collaboration.”

To view the plug-in in action please view the demonstration movie of ChemSpider E-WorkBook Suite Plugin.

Screen capture of launching Publish to ChemSpider plug-in

Compounds can be published to ChemSpider if they have been drawn out in full in an experiment – whether this is as an individual structure or part of a reaction, and whether they are simply uploaded into the experiment as a reaction file, or included in for example a spreadsheet item. Likewise, compound structures can be automatically loaded into a search of ChemSpider if you would like to find out more information about compounds that have been drawn out in full in an experiment, or if you have published a compound to ChemSpider and wish to see the resulting compound pages. The resulting compound pages in ChemSpider will have the data source “IDBS E-WorkBook Suite”. The external ID will show the ID of the experiment from which the structures are from, and the depositor details as defined in the ChemSpider Settings of the ELN.

The ChemSpider IDBS E-WorkBook Suite plug-in is freely available to customers of IDBS E-WorkBook Suite by downloading it from IDBS, and copying it the appropriate place in their IDBS E-WorkBook Suite program files. It is compatible with E-WorkBook Suite versions 9.0 and 9.1.

This plug-in is an initial proof-of-concept to demonstrate that we can pass compound information between ChemSpider and an ELN in both directions. Future versions will allow more of the information within an experiment to be published to ChemSpider – for example to allow reactions along with a description of their methods to be published to ChemSpider SyntheticPages, or to deposit spectra along with compounds to ChemSpider. We would also like to integrate other ELNs with ChemSpider.

Only two days until the start of this year’s Fall ACS meeting in Denver. The ChemSpider team is busy preparing for the meeting, packing bags, polishing talks and honing workshop skills.

Please drop by and say “Hi!”

We’d like to repeat our invitation to everyone at the conference to drop by the RSC booth (Booth 1100). Where, of course you can chat with the ChemSpider team, get a quick demo (and find out more about our latest features), pick up our hot-off-the-press User Guide or scoop some exclusive ChemSpider goodies!

To celebrate the release of the new iPhone/iPad app* we have a limited number of covers for 3G and 4G iPhones as well as iPads

*The app itself is free to download from the AppStore.

You can also find out about lots of other things that the RSC does: from publishing books and journals to the promotion of chemistry worldwide. We’ll also have lots of information on our new e-membership option, which is making its’ debut at this meeting. Also keep an eye out for members of our Editorial staff from journals including: OBC, MedChemComm, PCCP, Soft Matter and RSC Advances, who will be scouring the conference in search of lots of new and exciting research.

Natural Product & Synthetic Chemists

I’d like to make an extra special invitation to any Synthetic chemists and Natural products chemists – from PhD students to Professors (please pass this on to all your friends and colleagues who will be at the meeting). The ChemSpider team really wants to hear about your research. Tell us about your latest publication or the work that you are most proud of, and we can make sure that your key compounds from these publications are in ChemSpider, on a platform freely accessible to chemists everywhere. If you are more interested in methodology you shouldn’t feel left out – ask us about ChemSpider Synthetic Pages.

 

ChemSpider related talks and workshops

Antony Williams (most-definitely the hardest working man I know) is giving a number of talks and workshops (details below) which are sure to be entertaining as well as thought-provoking and will be well-worth squeezing into your schedule.

We look forward to meeting you.

 

“Aligning scientific expertise and passion through a career path in the chemical sciences”

Colorado Convention Center, Room: 110, Sunday 28th August 2011, 1.40PM – 2PM

 

“Chemistry in the hand: The delivery of structure databases and spectroscopy gaming on mobile devices

Colorado Convention Center, Room: 110, Monday 29th August 2011, 9.05AM – 9.35AM

 

“ChemSpider: Does community engagement work to build a quality online resource for chemists?”

Colorado Convention Center, Room: 110, Tuesday 30th August, 10.10AM – 10.50AM

 

“An Introduction to ChemSpider – A Combination Platform of Free Chemistry Database, Free Prediction Engines and Wiki Environment”

Colorado Convention Center, Room 503, Wednesday 31th August 2011, 08.30AM – 11AM

 

“Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs”

Colorado Convention Center, Room: 110, Wednesday 31st August 2011, 10.45AM – 11.05AM

 

Last week Antony Williams gave three presentations about ChemSpider as a chemistry resource at the 241st ACS National Meeting  & Exposition in Anaheim.

For those of you who were not able to attend here are the presentations:

RSC ChemSpider as an environment for teaching and sharing chemistry 

Hosting a compound centric community resource for chemistry data 

How the web has weaved a web of interlinked chemistry data 

In January of this year we held a meeting in London with a group of interested parties who wanted to discuss with us how ChemSpider can be used to support metabolomics. From my point of view that was a very successful meeting in terms of providing an overview of what ChemSpider is capable of today as well as garnering feedback and input from a community of users applying mass spectrometry to perform metabolomics studies.

As a result we will be holding a round-table discussion here in the United States in Research Triangle Park, North Carolina, in April 2011, again for scientists interested in further refining how ChemSpider can be extended to serve the metabolomics community. An outline of the meeting is provided below. If you are interested in participating please respond to me directly at williams”AT”rsc”DOT”org by the deadline listed below. We will cap the attendance fairly quickly and are specifically looking for people who can be vocal about their needs and how we may be able to help with ChemSpider as a platform.

Metabolomics Round Table – Delivering Value to the Metabolomics Community via ChemSpider, a Public Domain Database

Hosts: John Shockcor, Waters and Antony Williams, Royal Society of Chemistry

When: April 21st, 2011

Venue: To Be Determined, Research Triangle Park, North Carolina

The metabolomics community presently utilizes public domain databases such as KEGG, LipidMap, DrugBank and a myriad of other online resources to assist in the analysis of data. However, rich as these resources are, they are limited in scope, are challenged by known data quality issues, and are not directly focused on serving the needs of the metabolomics community. ChemSpider is an online resource for the chemistry community hosted by the Royal Society of Chemistry with the intention of linking together online chemistry resources, cleaning and curating chemistry related data and collectively serving a number of communities. ChemSpider has been used by members of the mass spectrometry community, including instrument vendors, for the past 3 years. This roundtable meeting is to provide an overview of how ChemSpider is presently used by scientists working in the domain of metabolomics and garner feedback from the existing user base as well as new potential users to help define how ChemSpider can be enhanced to further support the needs of this community.

Antony Williams, VP of Strategic Development and host of ChemSpider at RSC, and John Shockcor, Director of Life Sciences Business Development at Waters Corp, invite you to attend thismeeting to provide input to steer development of ChemSpider to address the needs of the metabolomics community. An agenda will be defined in the near future based on interest.

If you are interested in attending please express your interest by sending an email to williamsa@rsc.org

 

The Royal Society of Chemistry will be heading to California for the Spring ACS Meeting where Antony Williams, the VP of Strategic Development for ChemSpider will be presenting several papers and hosting a ChemSpider Training Session.

The Training Session – “ChemSpider: A Community Resource for Chemical Data”  will be held on  Wednesday, March 30th from  8:30-11:00 AM in the Anaheim Convention Center,  Room 211 A.

This should be a lively and interactive session and is your opportunity to give feedback regarding present functionality and how you would like to see ChemSpider develop in the future. These sessions have proved popular in the past, so make sure to register early for your place here.

The titles and locations of the talks are:

RSC ChemSpider as an environment for teaching and sharing chemistry – Division of Chemical Education. March 28, 2011 from 9:45 am to 10:05 am. Disney’s Grand Californian Hotel , Room: Trillium B

Hosting a compound centric community resource for chemistry data – CINF: Division of Chemical Information. March 28, 2011 from 3:05 pm to 3:30 pm. Anaheim Convention Center , Room: 204 A

How the web has weaved a web of interlinked chemistry data – CINF: Division of Chemical Information. March 29, 2011 from 3:00 pm to 3:40 pm. Anaheim Convention Center , Room: 204 A

For a more pesonalized demonstration of ChemSpider you can also visit the RSC at Booth 903.

We look forward to meeting you in Anaheim.

On Monday 31 January, ChemSpider and Waters partnered with Chemistry World to deliver its first international live webinar and active audience event at Burlington House, London - Connecting Chemistry & Mass Spectrometry on the Internet.

Dr Antony Williams (RSC, ChemSpider) and Dr John Shockcor (Waters)  presented a top class and engaging event.

You can view the event by registering here:

http://chemistryworld.gav.co.uk/webcasts/event-detail/5/identification-of-metabolite-structures-using-mass-spectrometry.html

Earlier in the day we also hosted a round-table discussion for scientists interested in further refining how ChemSpider can be extended to serve the metabolomics community. Our thanks go out to all those who attended for a lively discussion.

We would be interested to hear your views -  for example, are there any additional features that you would like to have available or any other data sources that we should  link to?

In response to your recent feedback we have now made it easier to see at a glance the Systematic name or PhysChem Properties for a compound.

Here is an example of the new record layout for glucose:

  glucose1

 

Clicking on any of the hyperlinks in the central column will expand the information available. The Search Google Scholar link will enable you to expand a search into the scientific literature based on the approved names and synonyms in ChemSpider.

Searching Similar will bring a table of compounds which share the same skeleton, but may have variations in the stereochemistry. The results are displayed in a grid format. New visual icons are now available to help you select the relevant record. These icons will tell you if there is information from Wikipedia, or if spectra are available for that compound.

 glucose2

Clicking on the structure image or the ChemSpider ID will take you back to the record view for each individual compound.

Other icons will be indicative of specific stereochemistry or double bond geometry or if the compound is a charged species or if it an isotope.

 glucose5

   = no, of defined stereocenters

  glucose6

                                = double-bond geometry

    glucose7

                                  = charged species

  glucose9

                              = non-regular isotope

glucose8                                   = spectra

    glucose4

                                           = Wikipedia

Scrolling down the record view will still give lots more information about the compound such as commercial vendors, links to biological, toxicity and safety data, as well as links to RSC journals, books and databases.

We have also made it easier for you to keep the information from different info boxes by adding a print button. For example you can now print a spectrum of interest.

 glucose3

 

Don’t forget that you can also Add your comments, compounds, literature references and spectra to ChemSpider by clicking on the right hand hyperlinks at the top of the record view.

For adding anything but comments you need to be a registered and logged on user.

Please do continue to let us know what you think about these enhancements and if there is anything else that you would like to see on the website.

The new content delivery platform from RSC Publishing provides powerful, fast access to journals, books and databases. You can search across nearly one million articles using one simple interface and refine your results through intuitive filters.

With the latest release a  new Compounds tab now displays the key chemical compounds from a journal article when it has been semantically enriched via RSC’s Project Prospect. Each compound links back to ChemSpider to access its 400 chemical data sources for compounds and users can also find related RSC journal articles containing the same compound.

 Phenylglyinol

 

Try it now by clicking on the ‘Compounds’ tab in the article - Total synthesis of (±)-Vertine with Z-selective RCM as a key step, Laetitia Chausset-Boissarie, Roman Àrvai, Graham R. Cumming, Céline Besnard and E. Peter Kündig, Chem. Commun., 2010, 46, 6264.

Effectively, you can run a text search within the Publishing Platform, perhaps by searching for your research topic or favourite author, to identify new papers and view the properties for any compounds in the article within ChemSpider.

A recurring question which has come though our customer usability survey is “Can you copy and paste structures drawn in ChemDraw into ChemSpider?” The answer is yes, you can.  Simply draw the structure as normal and from the Edit menu choose Select All and  Copy. In ChemSpider choose Structure Search from the search menu and click on the structure image to activate one of the Java-based structure drawing applets.

  

 Structure image (3)

 

 

From the options given choose Draw/Edit and paste your structure into the drawing window, followed by Accept. You are now in a position to search your structure.

 

 Structure image (2)

 

 

Alternatively, rather than using this route you can save your structure drawing in ChemDraw as a .mol file and in the structure drawing applet of ChemSpider, select the option to Load, then navigate to the location of your saved .mol file, open and load.

 

 Structure image (1)

 


Are you looking for bioactivity information for small molecules? ChemSpider now provides a direct link to the ChEMBL database from the European Bioinformatics Institute (EBI).

For example, take a look at the record for Fluconazole, an anti-fungal drug, in ChemSpider. If you go to the Associated Data Sources box and select Biological Data you will find the following links:

ChEMBL1

 

 

Clicking on the External ID link associated with ChEMBL will take you to the ChEMBL record.

ChEMBL2

 

 

 

 

The EBI produce both ChEBI (Chemical Entities of Biological Interest) and ChEMBL, a database of approximately 500,000 bioactive compounds. The bioactivities listed are abstracted from the scientific literature and are linked directly to the article.

You could also start your search in ChEMBL and then link back to ChemSpider to find additional information using the Std. InChIKey displayed in the ChEMBL record.

We like to make things easy for our users.

We hope you’ve had an opportunity to take a look at the revamped website. If you would like to share your thoughts on usability and design or site performance please take a moment to click on the “Give Feedback” button on the website. This will really help us to make the ChemSpider user experience even better.

 Kampyle feedback

For all you Tweeters out there following Science Online the Twitter account for Aileen and Dave at the RSC  is  ChemSpider.

Not to be confused with that of Antony Williams who is still vey much ChemSpiderman.

Nature, Mendeley, and the British Library are excited to present Science Online London 2010. How is the web changing the way we conduct, communicate, share, and evaluate research? How can we employ these trends for the greater good? This September, a brilliant group of scientists, bloggers, web entrepreneurs, and publishers will be meeting for two days to address these very questions.

ChemSpider will be there to hear and record what is being said. If you are going to be there look out for David Sharpe and Aileen Day.

We will of course report back on topics that pertain to ChemSpider and the greater world of chemistry publishing.

Earlier this month I reported on the integration of Infotherm to ChemSpider but at that time it would have been necessary for non-RSC members to pay for the data on Infotherm despite the fact that a search would have provided the links and you could have clicked through to the Infotherm data pages. Some good news from Fiz-Chemie though…they are waiving the fee for data on pure compounds accessed from ChemSpider and as a result giving access to over 200,000 tables of data. This is a great contribution to the community of ChemSpider users. Thanks Fiz-Chemie!

 

infotherm

We deposit a lot of data onto ChemSpider in a  month and the database is growing daily. As an example of the ongoing depositions take a look at what has been deposited in a one month timrframe from July-August. This is simply what has been published by me…not all depositions. It’s a pretty good indicator of ongoing efforts to enhance the quantity of content on the site.

published_in_a_month

The Chemicalize website from ChemAxon is gaining interest (1,2) and, likely, LOTS of users! Chemicalize is both a website for recognizing chemical names and converting to chemical structures as well as an integration path to their property prediction algorithms. Some basic testing of chemicalize shows that their chemical name detection and conversion to structures using either name to structure conversion (algorithmically) or name lookup (via dictionaries) is very good. Not perfect, but very good. Perfect chemical name lookups are impossible as the associated dictionaries grow every time a new natural product is found for example, or a new drug is released.

With ChemSpider we are more interested in the linking to the predicted property pages. For example, if you want to see the predicted properties for Penicillin visit here.

penicillin

Now, with ChemSpider ChemAxon were kind enough (and I mean applaud them, acknowledge them and send flowers!!) to give us a way to pass through a structure and initiate the predictions on the Chemicalize site. This is tremendous news for you all! Under the properties Infobox we provide a list of properties from ACD/Labs, a list of properties from EPISuite, a list of experimental properties, sourced from various places and now, the link to Predict Properties using Chemicalize.

properties

Clicking the tab for Predict Properties from ChemAxon display the link through to Chemicalize as shown below.

chemicalize So, now we have sets of prediction capabilities linked up to ChemSpider. The ACD/Labs predictions are pre-calculated and every time there is an update to the algorithms in theory we would have to recalculate across the database and publish. This would take weeks of time across the almost 25 million structures so it is not a frequent task. It is the same issue with EPISuite. With the Chemicalize integration however the predictions are live, on the structure at the time it is passed to the algorithms. This has the advantage that the prediction algorithms can be incrementally improved and you will always get the latest and greatest results. However, having the predicted values from ACD/Labs available allows flexible searching as shown below. We are grateful to ChemAxon to allowing us to integrate Chemicalize. It gives LIVE access to the latest and greatest predictions as well as access to a whole series of new predictions for which we don’t have data on the database…especially pKa values, topology analysis, geometry and others. Thanks ChemAxon!

acdlabs

It’s a while since we first started the ChemSpider Forum and things have been a little quiet there recently as we made the transition to RSC ChemSpider, so we would like to invite you to re-visit the Forum and share your comments, suggestions and ideas with us.

On the newly revamped ChemSpider website you will find a link under the Help menu that will take you directly to the Forum.

Share some of your user experiences: How do you use ChemSpider? What problems did it solve for you? Did you look for something and didn’t find it?

The Forum will also be a place to find documentation such as Quick Guides on How to do Searching on ChemSpider and How to Add structures, spectra or reactions to ChemSpider so you can help to grow the community for chemistry.

We look forward to hearing from you.

iChemLabs, the developer of the popular ChemDoodle chemical drawing program, and Royal Society of Chemistry’s ChemSpider, a leading provider of chemical services and data on the internet, are excited to announce a software agreement that will provide significant benefits to customers.

iChemLabs, a developer of chemical software for students and professionals, announces a strategic partnership with RSC ChemSpider. iChemLabs has integrated ChemDoodle with the ChemSpider database containing almost 25 million compounds to search for pre-drawn chemical structures through the innovative MolGrabber widget. ChemSpider is integrating ChemDoodle Web Components into their service to provide users with a next generation HTML5 experience.

Search ChemSpider with ChemDoodle

Kevin Theisen, President of iChemLabs, states “ChemSpider is an excellent example of the creation of a popular and useful service by utilizing cutting-edge technology. By incorporating the HTML5 ChemDoodle Web Components, ChemSpider will take a further step towards creating the most advanced and futuristic chemical database on the web. Now that ChemDoodle Web Components are fully supported on iPhone OS and Android, our partners will be able to push rich media services to their customers across all browsers and mobile devices.”

Antony Williams, VP of Strategic Development for ChemSpider, adds “For many chemists ChemSpider has become their primary website to search for chemicals and related information, whether it be through a standard browser or via a mobile device. Our intention is to offer the best user experience possible and integrating to the HTML5-compliant ChemDoodle Web Components will afford users enhanced capabilities.”

ChemDoodle is available for download immediately. New users can request a free 30 day trial at http://www.chemdoodle.com. The free and open source ChemDoodle Web Components can be accessed at http://web.chemdoodle.com. ChemSpider is hosted at http://www.chemspider.com.

About iChemLabs, LLC.:
iChemLabs, LLC. is a scientific software company specializing in all forms of computational chemistry including NMR simulation, chemical visualization, and chemical informatics. iChemLabs provides expertise in desktop, mobile and web based technologies for both consulting and custom development. www.ichemlabs.com

About the Royal Society of Chemistry:
The RSC is the largest organisation in Europe for advancing the chemical sciences. Supported by a worldwide network of members and an international publishing business, our activities span education, conferences, science policy and the promotion of chemistry to the public. www.rsc.org

About ChemSpider:
ChemSpider offers a structure centric community for chemists to resource data. Offering access to almost 25 million unique chemical entities from over 400 data sources and by providing a platform for crowd sourced deposition, annotation and curation, it is the richest source of free integrated chemistry information available online. ChemSpider delivers data and services to enable the semantic web for chemistry. www.chemspider.com

Name Kevin Theisen
Phone 888-505-2436
Email sales@ichemlabs.com
Url http://www.ichemlabs.com
Address 200 Centennnial Ave., Suite 200
City/Town Piscataway
State/Province NJ
Zip Code 08854
Country USA
   
Name Antony Williams
Phone 919-201-1516
Email info@chemspider.com
Url http://www.chemspider.com
Address 904 Tamaras Circle
City/Town Wake Forest
State/Province NC
Zip Code 27587

The INFOTHERM® database, produced by FIZ CHEMIE  Berlin, has now been linked into records in ChemSpider.

The database provides experimental thermodynamic and physical properties of 33,000 mixtures and 9,000 pure substances from a total of 12,000 compounds. In ChemSpider if you search for a compound and look under the Phys. Properties tab in the Associated Data Source Info box, you will find a link to a record in INFOTHERM.

ChemSpider record for Artemisinin showing INFOTHERM link

ChemSpider record for Artemisinin showing INFOTHERM link

Clicking on the ID link will take you to records in INFOTHERM for that compound where you can look at the various thermodynamic properties available and make a selection of which property is of interest. By clicking on the button for Full access you can view the experimental data and find a link to the bibliographic reference.

 

Artemisinin's solubility in supercritical carbon dioxide

Artemisinin's solubility in supercritical carbon dioxide

It is now a year since the Royal Society of Chemistry (RSC) acquired ChemSpider and much has happened in what is really  quite a short space of time.

ChemSpider has added more compounds and data sources, significant compound collections, there are now links to RSC Publishing and PubMed which are effectively making these resources structure-searchable, we have set-up a micro publishing platform with the development of ChemSpider SyntheticPages and we have facilitated mobile chemistry with the launch of  ChemSpider Mobile.

We are impressed with these developments and we hope you are too. We are always interested to hear your views.

Look out for even more new features which will be released shortly.

If you are attending the Fall ACS Meeting in Boston, August 22-26 you can learn more about all of these developments and more by attending an interactive workshop which is being held on Tuesday, August 24, 3:30 PM – 6:00 PM in the Boston Convention & Exhibition Center, Room 102B. Register now.

You could also attend one of the lectures being given by Antony Williams:

How community crowdsourcing and social networking is helping to build a quality online resource for chemists – August 22, 2010 10:25 am
LOCATION: Seaport Hotel, Room: Seaport Ballroom A

Chemistry in your hand. Using mobile devices to access public chemistry compound data – August 26, 2010 1:30 pm
LOCATION: Boston Convention & Exhibition Center, Room: Room 156A

 We look forward to seeing you there.