Copyright©2014 Aileen Day
Following on from our previous blog post about extracting chemical structures (as mol files) from their crystal structures (CIF files) in the RSC archive using OpenBabel, it transpired that the Crystallography Open Database (COD), were conducting a similar project to extract the chemical connectivity (in SMILES format) from their large collection of openly accessible CIF files using OpenBabel. This opened the possibility of linking ChemSpider to COD (and vica-versa) by comparing these SMILES with ChemSpider structures and has resulted in 34,768 new links being made, each with a corresponding CIF in ChemSpider.
At the beginning of February there were 262,817 CIFs in COD, of which 78,473 had been converted into SMILES (numbers which have been increasing daily since then). We downloaded these SMILES and performed webservice structure searches of ChemSpider on them all using the StructureSearch operation of the ChemSpider Search webservice. Those SMILES which were not currently in ChemSpider were converted into mol files using OpenEye and reviewed by a ChemSpider curator with a view to depositing the suitable structures into ChemSpider as new compounds. The curation meant that we have been able to provide feedback to COD about SMILES that look suspicious and as if there may have been a problem with the conversion process – for example charge and radical issues, undefined stereochemistry for sugars, missing stereochemistry and the duplication of molecules or fragments within the same CIF. Since ChemSpider is primarily a collection of small organic molecules, many of the large number of metallorganic complexes were omitted simply because they weren’t within our scope.
After the deposition of the suitable new compounds, we identified 34,768 ChemSpider compounds which corresponded to COD crystal structures. These links have been added in the “Datasources” infobox under the “Spectral Data” tab, and the corresponding CIF added to ChemSpider so that it will show in the “CIFs” infobox with a link to the relevant COD webpage.
An example compound that has been linked to COD is Ibuprofen (ChemSpider ID 3544) which has been linked to http://www.crystallography.net/2006278.html. The reciprocal links are due to be added to COD shortly.
We would like to thank Miguel Quirós Olozábal (COD) for his help and cooperation with this project.