Copyright©2008 Antony Williams
The recent post regarding the InChIKey resolver has catalyzed a number of conversations. There have been just as many going on off the blog as well as comments on the original blog posting. One thing that came up a number of times was about how there is no such thing as a unique InChIKey.
One specific question asked whether or not the InChI was sensitive to tautomers? This is all down to option settings. There are a number of layers in the InChIString from which a key is derived. The InChIString (and therefore InChIKey) generated for a particular structure is dependent on the settings for the layers. I won’t review the layers again as it has been done many times elsewhere especially at the unofficial InChI FAQ page.
Suffice it to say the mobile proton perception layer DOES allow individual InChIKeys to be generated for different tautomers. See below the 4 tautomers for guanine and the different InChIKeys.
Note that the first set of characters in front of the dash carry the “connectivity” information between atoms while the second set of characters carries the content of the layers – stereochemistry, mobile protons, charge and isotopes. In the four guanine structures the connectivities are identical.
When the mobile proton perception is switched on then all tautomers give the SAME InChIKey, UYTPUPDQBNUYGX-UHFFFAOYAE. This type of capability can be very valuable when creating a database for the purpose of searching a database. For example, every structure could be populated into the database both with and without mobile proton perception. This would allow for searching of not only the individual tautomers but also all members of the same tautomer family.
What this means is that a whole series of InChIStrings and InChIKeys can be generated for a molecule dependent on settings. There are moves afoot to define a set of standard settings for the generation of InChIs. Until then variability is possible. This is compounded by the input of the correct structures prior to generating InChIs. Perform a search for Taxol on ChemSpider and you will get three structures, same mass, same connectivity (check the keys in Table View)
Check the InChIKeys below and you will see the different layers. Check CAREFULLY for differences in stereochemistry and you will see question marks for undefined stereochemistry. The FULL stereochemistry is in the bottom InChI only.
The following depositors do not have full stereochemistry for Taxol in their databases it appears. Maybe this is because the structure was drawn before full characterization?
ChemBank, ChemExper Chemical Directory, DiscoveryGate, Emory University Molecular Libraries Screening Center, KEGG, NINDS Approved Drug Screening Program, PubChem, San Diego Center for Chemical Genomics, Thomson Pharma, CambridgeSoft Corporation, PubChem
I believe the issue with appropriate InChi generation is not down to settings as they could be set as defaults within the majority of InChI generators, especially using a centralized InChI resolver where structures would be submitted and strings and keys could be generated on the fly. I believe the issue is with the accuracy of structure drawings primarily. A central service could produce tools to check for undefined stereochemistry, highlight it and ask for resolution or submission as is. There are many more questions…Stumble it!