Following on from our previous blog post about extracting chemical structures (as mol files) from their crystal structures (CIF files) in the RSC archive using OpenBabel, it transpired that the Crystallography Open Database (COD), were conducting a similar project to extract the chemical connectivity (in SMILES format) from their large collection of openly accessible CIF files using OpenBabel. This opened the possibility of linking ChemSpider to COD (and vica-versa) by comparing these SMILES with ChemSpider structures and has resulted in 34,768 new links being made, each with a corresponding CIF in ChemSpider.
ChemSpider-COD linking example
At the beginning of February there were 262,817 CIFs in COD, of which 78,473 had been converted into SMILES (numbers which have been increasing daily since then). We downloaded these SMILES and performed webservice structure searches of ChemSpider on them all using the StructureSearch operation of the ChemSpider Search webservice. Those SMILES which were not currently in ChemSpider were converted into mol files using OpenEye and reviewed by a ChemSpider curator with a view to depositing the suitable structures into ChemSpider as new compounds. The curation meant that we have been able to provide feedback to COD about SMILES that look suspicious and as if there may have been a problem with the conversion process – for example charge and radical issues, undefined stereochemistry for sugars, missing stereochemistry and the duplication of molecules or fragments within the same CIF. Since ChemSpider is primarily a collection of small organic molecules, many of the large number of metallorganic complexes were omitted simply because they weren’t within our scope.
After the deposition of the suitable new compounds, we identified 34,768 ChemSpider compounds which corresponded to COD crystal structures. These links have been added in the “Datasources” infobox under the “Spectral Data” tab, and the corresponding CIF added to ChemSpider so that it will show in the “CIFs” infobox with a link to the relevant COD webpage.
An example compound that has been linked to COD is Ibuprofen (ChemSpider ID 3544) which has been linked to The reciprocal links are due to be added to COD shortly.
We would like to thank Miguel Quirós Olozábal (COD) for his help and cooperation with this project.

Stumble it!

5 Responses to “Linking from ChemSpider to the Crystallography Open Database”

  1. Egon Willighagen says:

    Well done on the OpenData icon, Jmol integration, and linking to COD! You mentiond that the COD list of structures brought compounds to ChemSpider that you did not have before; how many was that, and, can you list those?

  2. Aileen Day says:

    Thank you! There were about 10.049 new compounds from COD that correspond to ChemSpider IDs 30651485 to 30661534.

  3. Ethyl 4-aminobenzoate says:

    please chart crystallography for Ethyl 4-aminobenzoate

  4. Aileen Day says:

    If you browse to and click on the “More” button under the “More details” section, you will be able to select “Crystal CIFs” and see a number of CIFs that contain this molecule in them.

  5. Sofi says:

    It’s very impressive to have this facility of open access crystalography☺
    Can I get crstalography for fibrinogen

Leave a Reply