The recent blog posting on the InChIKey Resolver has sparked quite a lot of interest. I’ve been talking with Alex Tropsha, one of our Advisory group regarding hosting the InChIKey Resolver project at UNC-Chapel Hill and the decision is that we will move ahead with setting up a system under their control. They are presently looking for  a developer interested in relocating to Chapel Hill to support both this project as well as some other exciting projects they have running in their laboratories. If anyone is interested in a Cheminformatics role please contact me at the usual email address and I will connect you to the appropriate person. I’m excited since we’d get to work together on the project. Don’t be shy…UNC Chapel Hill is a superb school, Alex and his team are doing excellent science and the environment is simultaneously one of fun, creativity and hard work.

Stumble it!

5 Responses to “Hunting for a Cheminformatics Person to Support InChIKey Resolver”

  1. Joe Krahn says:

    There is no reason to “develop” an InChiKey Resolver. The ONLY way to go from a hash key back to the string that generated it is a database lookup. For every “known” InChiKey, the InChi string that generated it must be saved to a database. There is NO other way to do it. It is the nature of hashes. Trying to decode the hash is like trying to break an SSL encryption key. If it were possible, then secure HTTPS internet wouldn’t work.

    So, the solution is trivial. Just run a database, like MySQL, and start loading known InChiKeys. There is no software to develop. You don’t need a software developer, just a computer database administrator.

  2. Antony Williams says:

    Joe, the majority of this has all been covered in a previous blog post: http://www.chemspider.com/blog/we-need-an-inchikey-resolver-and-we-need-it-now.html

    The basic capabilities of looking up a key to give an InChIString to convert to a structure are lookup-based. I agree.

    However, the ability to deposit structures online, generate appropriate depictions of the structures, build APIs to poke into the data as well as deposit data in a highly automated fashion etc take mork work than throwing a few tens of millions of strings onto MySQL. As commented above, the role is partly to support this effort but there are many other exciting projects going on needing a developer to support them.

  3. Joe Krahn says:

    Another thing to consider is the design of InChiKeys, if it is not too late to change. It would be good to generate the hash in parts, instead of a hash for the whole InChiString. InChi is annotated in parts divided by “/”. The first part of the key could be a hash of the first part of the InChi. So, you could easily look up just the heavy-atom part. It would be much easier to have a rather complete database of this first part.

    In fact, if the main part of the InChi is known, it becomes feasible to do a combinatorial search to decipher the remaining part(s) of the hash. It makes the hash design much more useful. I think that the added value should be sufficient to go through the trouble of a re-design. Also, I would add a version tag so that this and any future revisions can be handled effectively (i.e. prefix with “1:”).

    Also, one of the main problems with any unique ID is that it pre-determines which features of the structure make it unique. It will always be too precise for some people and not precise enough for others. My preference would be to have a variable-length key, so that varying levels of detail can be defined, just as with InChi strings.

  4. Tom Transue says:

    Antony and Joe,

    I guess there are two issues: 1) a DB to link keys to InChIs (which we seem to agree is easy), and 2) a set of tools for comparing structures so that we could, for example establish sets of structures which differ by only stereochemistry, tautomerizaion, isomerization, isotopic make-up, etc. I guess that I don’t (yet) see InChI as having a failure in its ability to specify structure so much as its inability (or at least shortcomings) for generalizing structure (i.e. maybe it is too specific?!?). I *think* that developing a DB of relationships between chemicals (or maybe just the ability to calculate them on the fly) is desired so that a person could submit an InChI-key and ask for all the matches which are equivalent (according to selected criteria of equivalence).

    I don’t see any reason that the establishment of the “resolver” should wait for part 2 above, though. Therefore, why not throw together the actual InChI-key lookup DB and work on the other part when you find the people/money? The second part would involve downloading all InChI strings from the resolver and running some clever comparison tools to start lining up related structures. As a course grouping, you could group chemicals with the same formula (the first part of the InChI string itself), although you might have to be smart with hydrogen, counter-ions, and charge state. Then each of those groups could be split into more specific groups.

    Tom

  5. Annabel says:

    Hi Tom,

    What ever hppened to your idea of an InChI comparison tool?
    Did you end up working on that?
    Do you know of any tool / software available for comparing two InChI string and figuring out their relationship ?

    Thanks,
    Annabel

Leave a Reply