The recent post regarding the InChIKey resolver has catalyzed a number of conversations. There have been just as many going on off the blog as well as comments on the original blog posting. One thing that came up a number of times was about how there is no such thing as a unique InChIKey.

One specific question asked whether or not the InChI was sensitive to tautomers? This is all down to option settings. There are a number of layers in the InChIString from which a key is derived. The InChIString (and therefore InChIKey) generated for a particular structure is dependent on the settings for the layers. I won’t review the layers again as it has been done many times elsewhere especially at the unofficial InChI FAQ page.

Suffice it to say the mobile proton perception layer DOES allow individual InChIKeys to be generated for different tautomers. See below the 4 tautomers for guanine and the different InChIKeys.

guanine-inchikeys.png

Note that the first set of characters in front of the dash carry the “connectivity” information between atoms while the second set of characters carries the content of the layers – stereochemistry, mobile protons, charge and isotopes. In the four guanine structures the connectivities are identical.

When the mobile proton perception is switched on then all tautomers give the SAME InChIKey, UYTPUPDQBNUYGX-UHFFFAOYAE. This type of capability can be very valuable when creating a database for the purpose of searching a database. For example, every structure could be populated into the database both with and without mobile proton perception. This would allow for searching of not only the individual tautomers but also all members of the same tautomer family.

What this means is that a whole series of InChIStrings and InChIKeys can be generated for a molecule dependent on settings. There are moves afoot to define a set of standard settings for the generation of InChIs. Until then variability is possible. This is compounded by the input of the correct structures prior to generating InChIs. Perform a search for Taxol on ChemSpider and you will get three structures, same mass, same connectivity (check the keys in Table View)

taxol.png

Check the InChIKeys below and you will see the different layers. Check CAREFULLY for differences in stereochemistry and you will see question marks for undefined stereochemistry. The FULL stereochemistry is in the bottom InChI only.

InChI: InChI=1/C4 ​7H51NO14/c​1-25-31(60​-43(56)36(​52)35(28-1​6-10-7-11-​17-28)48-4​ 1(54)29-18​-12-8-13-1​9-29)23-47​(57)40(61-​42(55)30-2​0-14-9-15-​ 21-30)38-4​5(6,32(51)​22-33-46(3​8,24-58-33​)62-27(3)5​0)39(53)37​ (59-26(2)4​9)34(25)44​(47,4)5/h7​-21,31-33,​35-38,40,5​1-52,57H,2​ 2-24H2,1-6​H3,(H,48,5​4)/t31-,32​-,33+,35-,​36+,37-,38​?,40-,45+,​ 46-,47+/m0​/s1
InChI: InChI=1/C4 ​7H51NO14/c​1-25-31(60​-43(56)36(​52)35(28-1​6-10-7-11-​17-28)48-4​ 1(54)29-18​-12-8-13-1​9-29)23-47​(57)40(61-​42(55)30-2​0-14-9-15-​ 21-30)38-4​5(6,32(51)​22-33-46(3​8,24-58-33​)62-27(3)5​0)39(53)37​ (59-26(2)4​9)34(25)44​(47,4)5/h7​-21,31-33,​35-38,40,5​1-52,57H,2​ 2-24H2,1-6​H3,(H,48,5​4)/t31-,32​-,33?,35?,​36?,37+,38​?,40?,45+,​ 46?,47+/m0​/s1
InChI: InChI=1/C4 ​7H51NO14/c​1-25-31(60​-43(56)36(​52)35(28-1​6-10-7-11-​17-28)48-4​ 1(54)29-18​-12-8-13-1​9-29)23-47​(57)40(61-​42(55)30-2​0-14-9-15-​ 21-30)38-4​5(6,32(51)​22-33-46(3​8,24-58-33​)62-27(3)5​0)39(53)37​ (59-26(2)4​9)34(25)44​(47,4)5/h7​-21,31-33,​35-38,40,5​1-52,57H,2​ 2-24H2,1-6​H3,(H,48,5​4)/t31-,32​-,33+,35-,​36+,37+,38​-,40-,45+,​ 46-,47+/m0​/s1

The following depositors do not have full stereochemistry for Taxol in their databases it appears. Maybe this is because the structure was drawn before full characterization?

ChemBank, ChemExper Chemical Directory, DiscoveryGate, Emory University Molecular Libraries Screening Center, KEGG, NINDS Approved Drug Screening Program, PubChem, San Diego Center for Chemical Genomics, Thomson Pharma, CambridgeSoft Corporation, PubChem

I do not have access to all databases to confirm this but a search of the Pubchem record to check for sources suggests this observation is true.

I believe the issue with appropriate InChi generation is not down to settings as they could be set as defaults within the majority of InChI generators, especially using a centralized InChI resolver where structures would be submitted and strings and keys could be generated on the fly. I believe the issue is with the accuracy of structure drawings primarily. A central service could produce tools to check for undefined stereochemistry, highlight it and ask for resolution or submission as is. There are many more questions…

Stumble it!

2 Responses to “Does InChI Account for Tautomers?”

  1. hko says:

    Thanks for recalling some sophisticated hints concerning
    tautomers and related inchistrings and inchikeys.

  2. Eric Milgram says:

    The concepts of tautomers, mobile protons, resonance structures, and the like, raises some interesting questions for designers of chemical structure management systems. For more than 10 years now, every organization where I have worked has struggled to reconcile these chemical abstractions with their corresponding informatics implementations.

    In the pharmaceutical industry, there is tremendous awareness of the effect that pH can have on the biological activity of a compound. Obviously, the pH determines the chemical “state” in which the compound will exist, which in turn, affects its properties. For example, the solubility of a compound, as well as its permeability, just to name a few parameters, can vary greatly as a function of pH. However, when I search a chemical database for a given structure, how should pH be taken into account? For example, if I structure search for acetic acid, should “acetate” be included in my search results? Do any of the major chemical databases today allow one to specify that results be restricted by pH?

    I remember one early chemical database that didn’t support the Kekulé delocalized depiction of aromatic rings. The database required that individual double bonds be drawn. When performing a search, the database wasn’t sophisticated enough to give you all “chemically” equivalent results, so that if you didn’t search for both alternate forms, you might miss a result. Based on my very limited understanding of InChi, it has the potential to resolve these kinds of issues in rather facile way.

Leave a Reply