Copyright©2009 Antony Williams
In January of this year IUPAC announced the release of version 1.02 of the InChI algorithms and software with the Standard InChI and InChIKey. The definition is given below.
In response to user requests, a Standard InChI (i.e. without options for properties such as tautomerism and stereoconfiguration) has been defined as follows:
- Standard InChI is for the purposes of interoperability/compatibility between large databases/web searching and information exchange.
- Standard InChI and non-standard InChI are always distinguishable.
- Standard InChI is a stable identifier; however, periodic updates may be necessary; they are reflected in the identifier version designation, which is included in the InChI string.
- Any shortcomings in standard InChI may be addressed using non-standard InChI (currently obtainable using InChI version 1.02beta).
In response to user feedback the format of InChIKey has been changed; it is different from that in InChI software v. 1.02-beta, having 27 characters rather than 25.
Standard InChIKey has five distinct components.
- 14-character hash of the basic (Mobile-H) InChI layer;
- 8-character hash of the remaining layers (except for the “/p” segment, which accounts for added or removed protons: it is not hashed at all; the number of protons is encoded at the end of the standard InChIKey.)
- 1 flag character,
- 1 version character
- the last character is a [de]protonation indicator.
The overall length of InChIKey is fixed at 27 characters, including separators (dashes):
This is significantly shorter than a typical InChI string.
(1) AAAAAAAAAAAAAA is a 14-character hash.
(2) BBBBBBBB is an 8-character hash
(3) F is a flag indicating standard InChIKey (produced out of standard InChI): it always has the value ‘S’.
(4) V is a flag for InChI version character: ‘A’ for version 1, ‘B’ for version 2, etc.
P is an indicator for the number of protons; this number is not encoded in the hash but is indicated as a separate 2-character block at the end, where one character is a hyphen, as –N for neutral, -M for -1 hydrogen, -O for +1 hydrogen, etc.
We have generated Standard InChIStrings and InChIKeys across the entire database now and for each record now you will see four variations of the InChI.
It is now possible to search across the entire database by Standard InChIs also.Stumble it!