Archive for the ChemSpider Services Category

Likely the majority of you in the Open Source and cheminformatics world will know of Open Babel. In the development teams’ own words “Open Babel is a chemical toolbox designed to speak the many languages of chemical data. It’s an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.” We have now implemented OpenBabel on the services page for the conversion of SMILES strings and InChI strings to chemical structures. Previously the conversion of chemical names, trade names, SMILES and InChI was supported by utilizing the ACD/Labs Name to Structure batch software in a service transaction mode. As previously discussed we are no longer using this application and have replaced the SMILES and InChI conversion with OpenBabel. We were very impressed by the ease of implementation and certainly the community dialog and support around this Open Source component. Our thanks to the community supporting Open Babel. We are committed to giving back any developments we add.

In terms of Name to Structure conversion originally in place with ACD/Labs Name to Structure this is now rather name “look-up” through the millions of names, synonyms and registry numbers we have on the index.

Open Babel Structure Generation

Please note that it is now possible to become a registered user of ChemSpider. The process is described here. This affords certain capabilities including the ability to actively curate structure synonyms, IDs and associated numbers as well as upload spectra to the ChemSpider database. Additional capabilities will be made available to registered users in the near future so make sure to become a registered users.

Also, while you are on this blog, if you are not already subscribed to this blog and would like to receive the postings directly to your email rather than visit this webpage sign up via FeedBurner by typing in your email address in the box shown below.

Feedburner

The ChemSpider database is populated with millions of structures and associated identifiers in the form of systematic names, trade names and synonyms and associated structure numbers. Unfortunately, many of these identifiers are incorrect. We have performed some robotic cleansing of the database and continue to improve the algorithms to cleanse the data. An update of these cleansed data will be made shortly.

We are conscious of the fact that despite our best efforts a level of manual curation is necessary. With this in mind we have allowed manual curation to registered users. Once logged in every record with one or more names or identifiers will be shown with a red cross, a green tick or a “revert” sign as shown below.

Curating the synonym data

For any name or identifier that is deemed to be suspicious or incorrect can be removed from the list by clicking on the red cross. This puts a strike out through the name but does NOT remove it from the database. Only a master curator, a role granted to a very small subset of ChemSpider users can approve the final removal of the data from the database. If a synonym is “approved” then it is underlined and moved to the top of the list. The list is as shown below after some level of curation.

Curation level 1


The green double headed arrow allows the curation to be reversed. Every curation activity is logged and with time we will be able to identify new master curators. In the near future we will be locking in a list of names, synonyms and identifiers based on the input of our master curators and our own processes for identifying the best identifiers.




We encourage everyone to help us clean up the identifiers. If you believe there are other issues with a structural record then use the other mode of curating by selecting Help Curate Data at the top right hand side of each record. Details are given elsewhere about general curation.

There are two common text string formats for chemical structures available today: SMILES, which has been around a long time and is well established, and InChI, the IUPAC Identifier, a fairly new player but quickly gaining ground.
There are a number of ways to generate these strings from a chemical structure representation and one I recommend at the desktop (I’m biased since it’s one of the products I manage in my day job) is ACD/ChemSketch. There is a commercial ware version and a freeware version, downloaded over 780,000 times

As an online service however you can use ChemSpider for the purpose of converting SMILES strings or InChI strings. Simply visit the Services page and paste in your SMILES string or InChI string and Convert to Structure. The services provided here utilize the ACD/Name to Structure capabilities but can also be performed using the OpenBabel libraries. Following conversion of the identifier the structure can then be downloaded to the desktop for further manipulation.

For example, the SMILES string shown here can be converted as shown below: COc4cc1c(CCN3CC\C2=C\C[C@@H](C[C@@]123)OC)cc4OC
A similar operation can be conducted for InChI string conversion. In a later “Did You Know” posting we will show you how to generate SMILES and InChI’s using ChemSpider services.

SMILES to structure