Archive for the RSC Publishing Category

We are pleased to announce that we have just imported 1047 CIFs to ChemSpider of crystal structures that were previously reported in RSC papers (and are available as ESI for those) to ChemSpider for the relevant compounds, and linked those back to the original articles and to the CCDC’s webCSD, e.g. example compound with RSC article CIF (see the CIF infobox). Since each CIF that is uploaded into ChemSpider must be associated with a ChemSpider compound, the difficult part of this task was working out a 2D molecular structure (in .mol file format) for each 3D crystal structure (in .cif file format) – which is particularly difficult because CIFs only contain information about each atomic position and not how the atoms are bonded to each other in the crystal or whether they are charged or not.
Ultimately we would like this CIF to mol conversion (and the whole upload) to be performed programmatically without human intervention. However, there is no reliable way to do that currently – although programs such as OpenBabel can be used to extract mols from each CIF, the reliability of this conversion isn’t 100%.
So as one of our student intern projects at the University of Southampton this summer (in parallel with another student intern project at Southampton University to share thesis data in ChemSpider) we used OpenBabel (version 2.3.2, run from the command line with the options -i cif inputfilename.txt -o mol -m –unique -d –AddPolarH) to extract mols for all the CIFs in the RSC archive (over 43,000 files as of June 2013) and enlisted Julija Kezina (shown below) to review the results of these conversions to ensure that only good structure and CIF pairs would be deposited to ChemSpider, and to better understand the problems in the conversion process with a view to fixing them. One problem that became immediately apparent was that because the 2D structure obtained was just a projection of the 3D structure along the a cell axis, which is not always the orientation which shows the molecule most clearly, even if they did have the write chemical connections between the atoms, so all mol structures were run through OpenEye’s cleaning algorithm before being reviewed.
Julija Kezina - Southampton University intern who examined CIF to Mol conversion
Julija compared each structure in the output mol files with those in the original CIF files to judge whether the conversion was accurate or not. In addition, as an extra check, all of the output mol structures were submitted to ChemSpider validation and standardisation platform to filter out molecules with structural problems (e.g. stereochemistry, valence or congestion issues).
Overall, approximately 30% of the CIF to mol conversions that Julija checked were good, with the right connectivity of atoms and ions (although approximately 30% of these needed the atomic positions to be repositioned to clean or tidy up the structure, either manually or using ChemDraw’s cleaning functionality). The 1047 of these mols which contain only a single molecule (without solvent molecules or cocrystals etc.) are those which have been deposited into ChemSpider with their corresponding CIFs.
The journals which had the highest successful conversion percentage were Molecular BioSystems (57%), MedChemComm (51%), Organic and Biomolecular Chemistry (44%) and Green Chemistry (44%) – the journals which in general are about small organic molecules.
Julija was working in the National Crystallography Service’s office at the University of Southampton, under the co-supervision of Professor Simon Coles, and we are grateful to them for their help and advice about the finer points of the CIF file format.

Unsuccessful CIF to mol conversions

Running and evaluating OpenBabel on such a large and varied set of structures has given us a useful opportunity to identify and categorise the most common problems encountered. Here we share these and give examples that would enable the identification of some easy fixes in the pipeline that might benefit the whole community and be used as test cases when doing so. We will report these bugs to the OpenBabel forum and because OpenBabel is open source, hope to resolve at least some of these issues in the future through collaboration with its other developers.

The following OpenBabel bugs look like they might be most straightforward to fix:

Details Example
  • Category: BAD_NITRO
  • Frequency: 233
  • Description: there are different ways of representing nitro groups in structure drawers – OpenBabel currently does so by producing a mol with a pentavalent nitrogen. In ChemSpider we we choose to avoid this in favour of a format with a charge-separated nitro.
  • Solution: Allow OpenBabel to have a different output option for nitro groups to output them as shown in corrected mol file.
BAD_NITRO example: ob_b209378b_1.jpg


  • Category: BAD_MULT
  • Frequency: 434
  • Description: Duplicate (exactly identical, including stereochemistry) molecules are present in the resulting mol file despite running OpenBabel with the –unique option (which should filter out duplicate molecules based on their inchis)
  • Solution: Fix OpenBabel when run with the –unique option so that it works.
BAD_MULT example: nj_b306072a_1.jpg


  • Frequency: 724
  • Description: Part of the molecule is missing
  • Cause: OpenBabel doesn’t understand crystal symmetry – only the atoms in the CIF that are explicitly listed with positions are included in the resulting mol file, and those that are inferred by symmetry are not.
  • Solution: Make OpenBabel generate the full molecule from the symmetry in the CIF file, or recommend that a script/program that can process a CIF to generate another CIF with all atoms is run before OpenBabel.
BAD_MISSINGPARTOFMOLECULE example: ce_b202304k_5


  • Frequency: 432
  • Description: partial occupancy of multiple sites for a particular atom in the CIF file
  • Cause: In CIF files sometimes positions of multiple sites are specified with occupancy less than one – OpenBabel doesn’t recognise this and assumes that the occupancy of all sites is one effectively, so that there are duplicates of some atoms or fragments in the mol file.
  • Solution: Where the _atom_site_occupancy is less than one, group together atoms into those which are alternatives of each other (by type, proximity, and those which add up to a total occupancy of 1) and choose only one of them to include in the final mol file (that with the highest site occupancy, or if two have equal occupancies of e.g. 0.5 then pick one at random). Note that there needs to be consistency, so that if for example a C is discarded, then all of the adjoining H’s with partial occupancy are also discarded but those bonded to the C that is included are included (as in the attached example).
BAD_PARTIALOCCUPANCY example: md_c2md20054f_1.jpg


Many of the problems were caused by idiosynchronies or errors in the input CIFs, but these on the whole weren’t handled well by OpenBabel (e.g. by writing an error message and terminating the program) but rather, in the majority of cases went into an infinite loop and the program hung. Because of this, and because the OpenBabel conversions were part of a longer script, all OpenBabel jobs had to be run with an arbitary timeout so that if still running after this timeout they were killed, which may have discarded some valid but long-running OpenBabel jobs. We will investigate whether there is a validation program that can be automatically performed on CIFs to filter out ones with these problems (similar to the CCDC’s EnCIFer but which can be run programmatically), but it would be relatively straightforward to make OpenBabel more reliable by being able to exit nicely when it encounters these problems so that pre-validation wasn’t necessary. These problems are listed in the table below:

Details Example
  • Frequency: 378
  • Description: cif doesn’t contain any coordinates
  • Cause: Some CIFs contain e.g. powder diffraction refinement data and don’t contain coordinates.
  • Solution: OpenBabel already issues an error: “CIF Error: no atom found ! (in data block:XXX)” – simply abort the program if this is found (rather than trying to continue).

Files: CC_B502254A_3.txt

  • Frequency: 85
  • Description: cif misses a “loop_” line
  • Solution: Do an initial check that there is at least one loop_ line in the expected place before attempting to do the conversion.
CIF_MISSINGLOOP example: ob_c2ob25400j_2.jpg


  • Frequency: 36
  • Description: if there is a CIF field name in a commented section of the CIF, OpenBabel doesn’t ignore it and goes into an infinte loop
  • Solution: It would be trivial to make sure that OpenBabel ignores CIF field names which are commented out (between a pair of semicolons).
CIF_COMMENTEDFIELD example: dt_c3dt33040k_1.jpg


The following OpenBabel bugs were the most frequent in occurence, but will be difficult to fix. They arise from the problem that the CIF format does not record charges on atoms/ions or the types of bong between them so OpenBabel needs to work them out which is hard to do correctly.

Details Example
  • Frequency: 830
  • Description: One or more ions in the molecule have the wrong charge on them in the resulting mol file
BAD_CHARGEMISSING example: md_c2md20105d_1.jpg


  • Frequency: 747
  • Description: One or more atoms or ions in the molecule have the wrong coordination – problem observed in metal ions, S, P, Se and B
BAD_CHARGEMISSING example: ob_b314176d_1.jpg


  • Frequency: 587
  • Description: One or more of the bonds in the molecule are of the wrong order e.g. a single bond instead of a double bond.
BAD_BONDMISSING example: MD_c3md00077j_1.jpg


  • Category: BAD_WRONGBOND
  • Frequency: 452
  • Description: Wrong sequence of single/double bonds.
BAD_WRONGBOND example: nj_b301045g_3.jpg


  • Category: BAD_NOCOORDL
  • Frequency: 52
  • Description: no coordination to a ligand.
BAD_NOCOORDL example: ob_b307014j_1.jpg


  • Category: BAD_MISSINGH
  • Frequency: 18
  • Description: missing hydrogen.
BAD_MISSINGH example: ob_b311669g_3.jpg


There were also some problem mol files produced which either won’t be able to be fixed by OpenBabel (since they resulted from either errors or limitations of the input CIF files which cannot be fixed retrospectively) or are too difficult to fix and/or too infrequently occuring to be worth the effort:

  • There were 237 cases where there were solvent molecules in the CIF (many of which have missing hydrogens, partial occupancy of the molecule or part of the molecule etc.) which give rise to spurious oxygens, fragments of molecules and radicals in the resulting mol file (see example files for 148 of these cases are just water solvent molecules either with missing or detached hydrogen atoms. The poor definition of the solvent molecules is a limitation of CIF files from diffraction so it is not possible for OpenBabel to better define them in the output mol that is derived from them. However, running OpenBabel with the -r option to remove all but the largest contiguous fragment was quite successful to remove these problem solvent molecules so no further action is required to deal with this problem and this option will be used by us in the future.
  • There were 81 cases where there was at least one missing hydrogen in the original CIF (or in 3 cases, all hydrogens missing) – see example files for
  • Some CIFs contain crystal structures which correspond to continuous networks rather than small molecules (e.g. polymers, MOFs, zeolites, POMs) which cannot meaningfully be captured in mol format – see example files for
  • There were a few (24) cases where the stereochemistry in the mol file obtained is incorrectly defined. However, because on the stereochemistry was well interpreted by OpenBabel and these cases were relatively few, it probably isn’t worth disturbing the apple cart to investigate these further – see example files for
  • .

For some time now it has been possible to access relevant SureChem patent information from a ChemSpider compound page in the Patents Infobox. ChemSpider compounds are also linked to and from the relevant RSC articles, which has allowed us to form a new partnership between RSC Publishing and SureChem which relies on ChemSpider taking the pivotal role of linking internet chemistry together.

In the RSC article landing pages there is a “Compounds” tab which shows the key compounds that the article is about – as shown in this example. For each compound there is now a link to view the SureChem patent information associated with that compound as below:

The RSC Publishing platform article landing page showing SureChem patent information

The RSC Publishing platform article landing page showing SureChem patent information

SureChem and SureChem’s new free offering, SureChemOpen, offer a suite of patent chemistry data solutions, for example allowing their patents to be found from a structure or substructure search. Now, for each compound returned from such a search it is possible to view any linked ChemSpider compound pages and the number of associated RSC publications (and follow a link to view these articles).

This linking between SureChem and the RSC publication platform relies on ChemSpider (and the standard InChI chemical identifier) providing a bridging link to both, which ensures that the system is accessible, standards-based and scalable, making it easy for future partners to join.

The RSC’s objective is to advance the chemical sciences, not only at a research level but also to provide tools to train the next generation of chemists. ChemSpider contains a lot of useful information for students learning Chemistry but there is also a lot of information which is not relevant to their studies which might be confusing and distracting. For some time we have been considering the concept of an educational version of ChemSpider, aimed at students (and their teachers or lecturers) in their last years of school, and first years of university (ages 16-19), which restricts the compounds and the properties, spectra and links displayed for each, to those relevant to their studies. As a result, we are pleased to announce the launch of the Learn Chemistry Wiki which not only fulfils this aim, but also takes it further. This project was developed in a collaboration between Dr Martin Walker at the State University of New York at Potsdam, ChemSpider and the Royal Society of Chemistry’s Education team.
The Learn Chemistry Wiki contains over 2000 “substance” pages which correspond to simple compounds that would commonly be encountered during the last years of school and first years of University. Each of these pages corresponds to a ChemSpider compound, from which it dynamically retrieves compound images, a summary of its properties(molecular formula, mass, IUPAC name, appearance, melting and boiling points, solubility, etc.) and links to view safety sheets and spectra. It also contains text from Wikipedia to display in the substance page based on the Wikipedia links in ChemSpider.

The Learn Chemistry Wiki also goes a step further and not only contains compound information in isolation but also contains laboratory experiments (with parallel sections which contain an overview, teachers’ notes and students’ handouts) for each, quizzes, and tutorials which are linked to the compound information to put them into context. The wiki is based on the MediaWiki platform (which allows multiple users to contribute collaboratively since the website is intended to be a community website), but extends it to incorporate functionality similar to that of ChemSpider, invoked via custom-made extensions. For example, it is possible to draw structures using GGA’s Ketcher in order to find structures, or to draw answers to quiz questions (for example to specify the product of a particular reaction). It is also possible to include an interactive spectrum retrieved from ChemSpider in any wiki page, using the ChemDoodle spectrum viewing widget in browsers which support canvases or JSpecView applet in those that don’t.

For an overview and demonstration of the Learn Chemistry Wiki site see the Learn Chemistry Wiki site tour webppage or the Learn Chemistry Wiki overview demo video:

The Learn Chemistry Wiki is part of the new RSC’s new Learn Chemistry platform which provides a central access point and search facility to make it easier to access the various different RSC teaching resources that it provides.

Only two days until the start of this year’s Fall ACS meeting in Denver. The ChemSpider team is busy preparing for the meeting, packing bags, polishing talks and honing workshop skills.

Please drop by and say “Hi!”

We’d like to repeat our invitation to everyone at the conference to drop by the RSC booth (Booth 1100). Where, of course you can chat with the ChemSpider team, get a quick demo (and find out more about our latest features), pick up our hot-off-the-press User Guide or scoop some exclusive ChemSpider goodies!

To celebrate the release of the new iPhone/iPad app* we have a limited number of covers for 3G and 4G iPhones as well as iPads

*The app itself is free to download from the AppStore.

You can also find out about lots of other things that the RSC does: from publishing books and journals to the promotion of chemistry worldwide. We’ll also have lots of information on our new e-membership option, which is making its’ debut at this meeting. Also keep an eye out for members of our Editorial staff from journals including: OBC, MedChemComm, PCCP, Soft Matter and RSC Advances, who will be scouring the conference in search of lots of new and exciting research.

Natural Product & Synthetic Chemists

I’d like to make an extra special invitation to any Synthetic chemists and Natural products chemists – from PhD students to Professors (please pass this on to all your friends and colleagues who will be at the meeting). The ChemSpider team really wants to hear about your research. Tell us about your latest publication or the work that you are most proud of, and we can make sure that your key compounds from these publications are in ChemSpider, on a platform freely accessible to chemists everywhere. If you are more interested in methodology you shouldn’t feel left out – ask us about ChemSpider Synthetic Pages.


ChemSpider related talks and workshops

Antony Williams (most-definitely the hardest working man I know) is giving a number of talks and workshops (details below) which are sure to be entertaining as well as thought-provoking and will be well-worth squeezing into your schedule.

We look forward to meeting you.


“Aligning scientific expertise and passion through a career path in the chemical sciences”

Colorado Convention Center, Room: 110, Sunday 28th August 2011, 1.40PM – 2PM


“Chemistry in the hand: The delivery of structure databases and spectroscopy gaming on mobile devices

Colorado Convention Center, Room: 110, Monday 29th August 2011, 9.05AM – 9.35AM


“ChemSpider: Does community engagement work to build a quality online resource for chemists?”

Colorado Convention Center, Room: 110, Tuesday 30th August, 10.10AM – 10.50AM


“An Introduction to ChemSpider – A Combination Platform of Free Chemistry Database, Free Prediction Engines and Wiki Environment”

Colorado Convention Center, Room 503, Wednesday 31th August 2011, 08.30AM – 11AM


“Structure representations in public chemistry databases: The challenges of validating the chemical structures for 200 top-selling drugs”

Colorado Convention Center, Room: 110, Wednesday 31st August 2011, 10.45AM – 11.05AM

We have text mined compound names from all RSC 2008-2010 journal articles and loaded these into ChemSpider – adding about 26,000 new-to-ChemSpider compounds with links back to the published articles. We’ve also simplified the view of compound name and chemical/biochemical term highlighting within the Publishing Platform HTML view, so readers can link out from compound names (direct to ChemSpider for related compound information) and from chemical and biochemical terms (to other linked articles). We’ll be extending this to cover our 2011-and-then-ongoing publications, then looking to go further back into our journal archive. Later this week we should also have the compounds visible from the article home page, also linking through to ChemSpider

We have also worked with the Utopia Documents team ( to apply these enhancements to our PDF – so with the free Utopia Documents PDF viewer (originally developed in conjunction with Portland Press for the Biochemical Journal), readers get any enhancements overlaid on top of the PDF as they’re reading and can link out just as they can from the HTML. As this is powered from an API from our Publishing Platform, any additional links we make in future will be reflected in real time without having to update the PDF. Anyone who’s seen Steve Pettifer’s Utopia demonstrations tends to say “wow” at the potential, so many thanks to the Utopia team in Manchester for adding support for RSC articles. As above, this will work on 2008-2010 articles just being loaded, and as we extend the coverage Utopia will pick up and display the additional links for these papers

I recently talked about ChemSpider and some of our recent and future developments at the STM Innovations Seminar 2010, in a flash session of 20-slide talks timed @ 15 secs a slide. Great fun to do, and a format which ensures much less suffering for the audience – or at least shorter but more concentrated suffering. Anyway, here’s ChemSpider in 5 minutes for an audience of non-chemists. The other slides and videos from the Seminar are also available. Thanks to STM for hosting the event and River Valley for filming it.

The new content delivery platform from RSC Publishing provides powerful, fast access to journals, books and databases. You can search across nearly one million articles using one simple interface and refine your results through intuitive filters.

With the latest release a  new Compounds tab now displays the key chemical compounds from a journal article when it has been semantically enriched via RSC’s Project Prospect. Each compound links back to ChemSpider to access its 400 chemical data sources for compounds and users can also find related RSC journal articles containing the same compound.



Try it now by clicking on the ‘Compounds’ tab in the article - Total synthesis of (±)-Vertine with Z-selective RCM as a key step, Laetitia Chausset-Boissarie, Roman Àrvai, Graham R. Cumming, Céline Besnard and E. Peter Kündig, Chem. Commun., 2010, 46, 6264.

Effectively, you can run a text search within the Publishing Platform, perhaps by searching for your research topic or favourite author, to identify new papers and view the properties for any compounds in the article within ChemSpider.

ALPSP Publishing Innovation award 2010

Some of the team were present at the ALPSP Conference last Thursday – as the envelope was opened to announce ChemSpider as the winner of the ALPSP Publishing Innovation award for 2010! The judging panel commented that “[ChemSpider] has quickly become a highly valued and comprehensive community resource and has immense potential for future development”.

We’re especially proud as we were up against the other excellent shortlisted finalists of DataSalon’s Mastervision (which was highly commended, and we use it ourselves), the Semantic Biochemical Journal from Portland Press and the University of Manchester, and the AIP’s UniPHY social networking site.

We also managed to recreate the prize giving with Antony & Valery this morning – difficult to recreate the atmosphere of a conference dinner at 9am on an autumn Monday morning though…

Pics after the jump


Last night I gave a presentation at the BAGIM meeting in Boston. The abstract is below together with the embedded presentation from Slideshare

ChemSpider – Is This The Future of Linked Chemistry on the Internet?
ChemSpider was developed with the intention of aggregating and indexing available sources of chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge. There are now hundreds of chemical structure databases such as literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data etc. and no single way to search across them. Despite the diversity of databases available online their inherent quality, accuracy and completeness is lacking in many regards. ChemSpider was established to provide a platform whereby the chemistry community could contribute to cleaning up the data, improving the quality of data online and expanding the information available to include data such as reaction syntheses, analytical data and experimental properties. ChemSpider has now grown into a database of almost 25 million chemical substances, grows daily, and is integrated with over 400 sources, many of these directly supporting the Life Sciences. This presentation will provide an overview of our efforts to improve the quality of data online, to provide a foundation for a linked web for chemistry and to provide access to a set online tools and services to support access to these data.

When you’re viewing a compound page in ChemSpider e.g. hydrogen peroxide there are several ways to find more detailed RSC information (articles and books) about that compound:

  1. In the “Articles” infobox, the results under the “Links & References” tab are links to various journals that have either been deposited or added by ChemSpider users. When the RSC enhances one of its articles, the most important compounds in the article are submitted to ChemSpider for deposition (see here for more details), and when this happens, the article details are also deposited so that a link will appear in this box. As such, there usually won’t be thousands of links in this box, but those that are there will for example pick up references to compounds which maybe aren’t named explicitly in the article (but for example are drawn out in a figure) and as such couldn’t be found by a simple text serach.
  2. The results under the “RSC Journals” tab in the “Articles” infobox are the results of passing a search into the RSC publishing platform to retrieve all of the journal articles that contain any approved synonym for the compound (in the “Identifiers” infobox under the “Names and Synonyms” tab). Since these lookups first appeared in ChemSpider 6 months ago (see here for more details) this platform has progressed from bring a beta version, to now being the fully-fledged publishing platform for searching on and delivering RSC journals, books and databases. To investigate the functionality of this platform more and refine your search results list, click on the link above the list of results to “Click here to explore results” and this will allow you to sort the results by date, or apply filters e.g. to restrict the results set by author, date range,  journal etc.). This is useful since for common chemicals, the list of results returned can be long.
  3.  The “RSC Books” tab under “Articles” performs a similar search on occurences of the approved synonyms but on RSC books rather than RSC journal articles.
  4. Likewise, the “RSC Databases” infobox shows search results of the same approved synonyms but this time the results are from the various RSC abstract databases named in its various tabs. This means that they contain references to the compounds in non-RSC articles.

Links to RSC Articles

Laying out intrinsically three-dimensional molecular structures in a readable way on a two-dimensional page is a hard problem for human beings, let alone for algorithms, which is why ChemSpider stores a 2D layout alongside the InChI, which only describes which atom is connected to which other atom.

This is a really valuable resource for enhancing our RSC journal articles, so we’ve been experimenting with adding galleries to compounds, with examples here and here. Is this what PDFs you download from the website should look like? Would a digest gallery of the latest articles published be more useful? Do let us know.

As this is my first posting on the ChemSpider blog I should introduce myself. I’m Colin Batchelor and I’m in the Informatics team at the RSC. Some of my work is on ChemSpider, but I also work on informatics for RSC Publishing, and I’m a member of the InChI subcommittee.

Richard Kidd

We’re very proud that ChemSpider is one of the shortlisted finalists for the ALPSP 2010 Publishing Innovation award – the other finalists are Mastervision from DataSalon, Semantic Biochemical Journal from Portland Press, and UniPHY from the American Institute of Physics. The winners will be announced on 9 September at the ALPSP International Conference, so good luck to all, fingers crossed here and we’ll let you know the result.

First post for me on the ChemSpider blog, so a quick introduction. I’m Richard Kidd, and I manage the Informatics team at the RSC in Cambridge. We work on the technical enhancements of our publications (such as 2007 ALPSP Innovation award winner RSC Prospect) and we’re supporting the current and future ChemSpider developments alongside the original team – and this also includes support for ChemSpider users and depositors. We’re all here.

18 June 2010

ChemSpider, the Royal Society of Chemistry’s online chemistry community and database, scooped the Innovative Software Award at the iExpo/KM Forum 2010.

The prize is organised by GFII (the Association for Professionals of the Information Industry) and recognises leading software providers in the information industry for their innovative capabilities and user interfaces.

The organisers, with the intention of promoting the industry of professional information and knowledge management, dedicated the prize for the 16th consecutive year to the most innovative organisations.

Presented by Didier Benard, from Sanofi Aventis R & D, the award recognises a non-commercial initiative in enhancing information online whether for the professional community or for the general public. The jury selected ChemSpider as an award winner for providing free access to data on chemical information (both text and structure-based), which is reliable and controlled by an international expert community.

ChemSpider links together compound information across the web, providing free text and structure search access of millions of chemical structures. With an abundance of additional property information, tools to curate and use the data, and integration to a multitude of other online services, ChemSpider is the richest single source of structure-based chemistry information available online.

Antony Williams, VP of Strategic Development, ChemSpider, said: “The recognition of our efforts to provide the internet’s premier online search engine for chemistry by such a distinguished panel of judges is very flattering. This award further encourages us to remain focused on the delivery of the world’s primary free chemistry portal.”

I gave a talk at the ACS Meeting in the Future of Scholarly Communication Meeting yesterday. The abstract is below…and I DID talk about how Viagra keeps flower stems stiff. It was recorded and should go online soon and I will point to it again. For now I have linked the Slideshare presentation

Enhancing discoverability across Royal Society of Chemistry content by integrating to ChemSpider, an online database of chemical structures

The ability to query across a chemistry publishers content using chemical structure searching can dramatically enhance discoverability. RSC has been applying a number of procedures to integrate RSC’s ChemSpider community resource with our published content and databases. These include: 1) entity extraction procedures 2) chemical name conversion procedures using software algorithms and curated dictionaries 3) semantic markup and 4) a crowdsourced curation processes. This presentation will provide an overview of the processes we have utilized in order to provide structure-based integration to RSC content. We will discuss our ongoing efforts to extend the approaches to the mining of data from the rich supplementary information sections of many RSC publications. Our intention is to provide access to synthesis procedures and analytical data and further enrich the ChemSpider database for the benefit of the chemistry community.

The functionality discussed below will be released at the ACS Spring Meeting during the week of March 21st 2010

Following on from the last post regarding integrating to RSC Databases via the RSC Publishing Beta web services layer this post expands on the nature of the integration that we have been able to introduce. The RSC publishing beta gives us access to over 500,000 journal articles, book chapters and database records through one simple search interface. Using a similar approach to that outlined for the RSC database searches, that of using validated synonyms as the basis of the search for chemicals, we are able to search across the entire ePlatform of articles and retrieve hits as shown below. The hits are under the RSC journals tab.

Since the RSC publishing platform segregates the journals from the books the same search will return results from RSC books also. Our tests show that this is incredibly fast and highly accurate. This is our first venture into tapping into the chemical compounds sitting inside the RSC archive. More work is coming…

If you look at the tabs below you will also see that we have integrated to Google Books, Google Scholar and the Microsoft Academic Search. We are truly integrating to available internet resources to bring together the benefits of all of the primary search engines available.


The functionality discussed below will be released at the ACS Spring Meeting during the week of March 21st 2010

The Royal Society of Chemistry has a whole series of databases. None of them have been structure searchable…until now. As with our PubMed integration and our Google Patents integration rolling out shortly, just because a database hasn’t had the chemical structures extracted and indexed doesn’t mean that those resources cannot be made “structure searchable”. It’s not a subtle distinction however, as discussed in the Google Patents blog post. These types of integrations depend on the correct association between chemical names and structures, access to an API allowing facile and flexible searching and, something that is purely serendipitous in nature, the absence of overlaps between chemical names and common language.

We have used the recently announced RSC Publishing beta platform and the API made available to us to enable the searching. As my colleague Graham McCann announced recently “(the) platform gives access to over 500,000 journal articles, book chapters and database records through one simple search interface. The new platform delivers faster browsing, intelligent searching and more intuitive navigation and is open for beta testing now.”

Our approach has been to search the title and the abstract for each of the databases for all of the validated identifiers. It works. It is FAST and it provides “structure-related” access to all six RSC databases. An example screen shot is below where a search on chlorobenzene retrieves data on each of the following databases: Mass Spectrometry Bulletin, Laboratory Hazards Bulletin, Methods in Organic Synthesis, Catalysts and Catalysed Reactions, Natural Product Updates and Analytical Abstracts. The screen shot below shows the analytical abstracts linked by the term chlorobenzene in the title or abstract itself. 284 a fraction of a second. The abstract is linked out to the original article via DOI, where possible.


My personal favorites in the set of databases are the Natural Product Updates (NPU) and the Methods in Organic Synthesis (MOS) databases. The NPU database contains tens of thousands of natural product chemical structures, together with chemical names, references and some physical properties. Rich resources for ChemSpider. MOS includes includes reaction schemes, title and bibliographic details. Rich resources to connect to ChemSpider SyntheticPages in the future.

We have only just started to tap into the riches contained within the RSC archive. It’s like stumbling across a roomful of rubies to pick up diamonds. There is content all around us waiting for us to connect. We will connect this up to ChemSpider and make it available. Access to the databases will be shown at the ACS Meeting in San Francisco.

From the early days of the acquisition of ChemSpider by the RSC we have been focused on accessing the rich content that the RSC has contained in its databases and in its rich archive. We have been working hard for a number of months now to integrate systems, projects and processes into ChemSpider so that RSC chemistry is more discoverable. What we will be unveiling in the next few days we believe is big. We’ll roll it out one piece at a time. The last blog post discussed the deposition of new compounds from RSC prospected articles into ChemSpider. The email below results from the deposition of compounds from one article. One set of 10 structures from one article that are directly deposited into ChemSpider when the article goes live. These are compounds that are deposited and live immediately, not abstracted later. Imagine when we are doing this for all RSC articles, database and books….

ALL of the compounds below are NEW to the ChemSpider database…everyone of them. While not all RSC articles are only about novel compounds clearly there are new compounds moving into the database from the RSC publications.

Dear RSC Prospect,

This email is to notify that your deposition (#3427) has been published. Below please find a list of links to the structures that belong to your deposition:



The structures link back directly to the RSC article via DOI as shown below.


We’ve taken the first step towards user being able to seamlessly bounce back and forth between finding compounds of interest using the ChemSpider search and selection tools and finding more information about them in RSC journals…

I’m pleased to announce that we’ve just switched on a deposition system which will take compounds from the prospected version of RSC articles as they are published and automatically deposit them into ChemSpider, making a link back to the original article from the new compound page. An example of a new compound is here which was generated when this article was prospected. The same deposition process is used to make links from existing ChemSpider compounds to new RSC articles, for example here was generated when this article was published.

This is basically a way to stick our toe in the water to investigate how much intervention and cleaning is necessary to deposit compounds when all the information that we have been storing for them is the InChI without any 2D layout information (which is an issue that other potential data sources may also face too).  To do this we’ve been making use of the ChemSpider webservices to download the mol files of InChIs already in ChemSpider, or using the InChItoMol webservice to generate new mol files where they don’t exist already.  Tracking and fixing problems as they crop up at this manageable rate will help us when we face the larger task of importing all of the compounds that have been prospected in the past into ChemSpider.

I’d never published in an RSC journal until recently when Sean Ekins and I published in Lab on a Chip. The process was good…fast and efficient. Now, I am an employee of the RSC now but these are objective comments and I’m looking forward to publishing more with the RSC. I do still publish with my co-authors with other publishers because of the nature of the work we do – cheminformatics fits well into ACS’ JCIM and the Journal of Cheminformatics and NMR papers still fit well into Wiley’s Magnetic Resonance in Chemistry. Our J Cheminf paper is still the most accessed article in that journal.

I was impressed to get a letter from the Editorial Director of the RSC in my inbox last week. It was directed to me as an author in an RSC journal. What was impressive was the fact that he took the time to issue this to all authors as well as the fact that the letter discussed the impact factors for RSC now matching those of the ACS. Very impressive and a nice touch from Jim Milne!

jim milne

Over the past few months there have been a number of discussions in the blogosphere about the future importance and value of classical Impact factors but for now they remain the industries primary measure of, well Impact. With that in mind it was good to see this announcement from the RSC regarding recently published impact factors for Chemistry related journals. Nice…it’s good to know that we joined an organization that continues to focus on its core mission and strengths and be acknowledged by the readers and the industry for its efforts.

25 June 2009
Publication of the 2008 impact factors, calculated by ISI, once again brought good news for authors and readers of RSC journals. 

Nearly all the RSC journals increased in impact factor, immediacy index and article influence, with an impressive average impact factor increase of 8.2%. Overall, the average impact factor for the RSC portfolio now stands at 4.7, equal to that of the ACS collection. 

RSC journals feature in the top 10 rankings (by impact factor and immediacy index) in 6 of the 7 core chemistry categories* as listed on ISI, and of the top 100 chemistry journals, ranked by impact factor, 15 are from RSC Publishing. 

Three individual journals to highlight include: 

ChemSocRev – with a 33% increase, its impact factor now stands at 17.419, confirming its position as a leading international chemistry journal. This flagship journal now contains the greatest number of chemical reviews published in 2008 of any chemistry review journal – making it truly, first in its class. 

Lab on a Chip – celebrates a 28% rise taking its impact factor to 6.48, placing it within the top ten journals in the multidisciplinary chemistry category. 

PCCP – rises over 20% to its highest ever value of 4.06. Additionally, its new immediacy index (0.81) remains the highest value for any journal publishing general primary research in the fields of physical chemistry and chemical physics. 

Editorial Director, Dr James Milne, reflected on the outstanding performance of the RSC journals, ‘the impressive increases in impact factor for the RSC portfolio of journals are a direct reflection on the world class authors who regularly publish in these prestigious international titles. Over the last five years, RSC journals have attracted a significant increase in submissions, with nearly 60% more material published during this same short period.’ He continues, ‘to provide more articles and also higher quality articles, is a clear reflection of the dedicated support the journals receive from authors, editors and referees throughout the world; for this contribution, I would like to sincerely thank all the scientists involved.’ 

“a clear reflection of the dedicated support the journals receive from authors, editors and referees throughout the world”

Other RSC highlights include: 

Analyst- impact factor now 3.761 (32% rise over 3 years) 

ChemComm- impact factor now 5.340 (21% rise over 3 years)

CrystEngComm- impact factor now 3.535, and leading journal for immediacy (0.684) in the crystallography category

DaltonTransactions- highest ever impact factor of 3.580 (an 11.5% increase)

Faraday Discussions- impressive impact factor of 4.604

Green Chemistry- impact factor of 4.542, and the leading journal in its field

JAAS- a 20% rise in impact factor to 4.028 (confirming its position as the leading journal in atomic spectrometry)

Journal of Environmental Monitoring- impact factor now 1.989 (26% rise over 3 years)

Journal of Materials Chemistry- highest ever impact factor of 4.646 (5th consecutive increase)

Molecular BioSystems- records an impressive second impact factor of 4.236

Natural Product Reports- at 7.450, the second highest impact journal in the medicinal and organic chemistry categories

New Journal of Chemistry- impact factor now 2.942, an 11% increase

Organic & Biomolecular Chemistry- enjoys its highest ever IF at 3.550 (a 12% rise)

Photochemical & Photobiological Sciences- the society-owned journal has an impact factor of 2.144

Soft Matter- impact factor now 4.586, and still the # 1 journal for impact (and immediacy) in the field.

RSC is committed to providing a world-class publishing service to its authors, and to deliver cutting-edge chemical science to researchers throughout the world. The rise in citations, impact factors and immediacy indices provide a clear indication that more researchers than ever before are recognising journals from the RSC as a key resource to access the very best research.

Journals from RSC Publishing provide exceptional value for money, with high impact science, investment in award-winning technologies and flexible pricing models. To add RSC Journals to your collection, please contact our Sales team via the form below.

* The 7 Chemistry journal subject-categories as listed by ISI: Chemistry, Analytical; Chemistry; Applied; Chemistry, Inorganic & Nuclear; Chemistry, Medicinal; Chemistry, Multidisciplinary; Chemistry, Organic; Chemistry, Physical.


The Impact Factor provides an indication of the average number of citations per paper. Produced annually by ISI®, Impact Factors are calculated by dividing the number of citations in a year, by the number of citeable articles published in the preceding two years.

The immediacy index is a measure of how topical and urgent the papers published by a journal are. It is calculated by dividing the number of citations to articles published in a given year by the number of articles published in that year.

Data based on 2008 Impact Factors, calculated by ISI®, released June 2009.