Copyright©2009 Antony Williams
InChIs are a powerful way to communicate chemical structures. They are going to enable internet chemistry and when we roll out the InChI Resolver shortly then the community will have access to a resource to resolve InChIKeys and ultimately navigate chemistry on the web. We commonly receive chemical structures in the form of InChIs and in order to deposit the structures we have to convert the InChIs back to chemical structures, commonly into SDF format for batch deposition. For simple organics this is not a difficult process…the tools we have at our disposal can deal with the layout of simple organics. However, for some of the chemical structures we receive optimizing 2D layout is very challenging. Many of the issues come with fullerenes (See examples below) but not only. Carbohydrates, complex cycles etc are big challenges.
In building the InChI resolver we hope to provide attractive visual depictions of the associated structures. Without AuxInfo data carrying the coordinates, or without the deposition of SDF files containing the layout coordinates we have a major challenge ahead of us. Auxinfo data are shown below for erythromycin. These data are rarely generated when people generate InChIKeys and the issue of structure layout will dominate the interpretation of complex structures.
Since beauty is in the eye of the beholder my judgement is that automatc layour algorithms should only assist in the appropriate layout and eyeballs will need to make the final decision. That is why it is better to deposit SDF files of InChIs with Auxinfo carrying the coordinates than it is to deposit InChIs only and leave the structure layout to an algorithm. It will fail.
I am interested in seeing what people can do with their structure cleaning algorithms on InChIs like this:
The images below show the iterative application of DIFFERENT structure layout algorithms. One caution…your layout algorithm should produce the SAME InChI at the end and NOT flip stereocenters. Interesting challenge. Who says cheminformatics isn’t challenging? And who thought building an InChI Resolver would be easy?