We’ve rejigged our data to make searching more reliable.
What have we done?
We’ve regenerated all of the InChIs in the database with version 1.03 of the InChI code.
What does that mean?
The InChI (international chemical identifier) is a short piece of text that describes the structure of a molecule. Each one is generated by a free and open-source computer program, which guarantees that it should be the same and there shouldn’t be conflicting InChIs for the same molecule. You can’t really write them by hand, because they look like this:
InChI=1S/C10H22ClN2O5PS/c1-3-10(9-18-20(2,15)16)12-19(14)13(7-5-11)6-4-8-17-19/h10,12H,3-9H2,1-2H3
ChemSpider is built on InChIs. If two molecules have the same InChI, then they’re the same record in ChemSpider, and if you can’t InChIfy it, you can’t put it in ChemSpider. That’s why we can’t do, for example, polymers yet.
We’re proud to be founder members of the InChI Trust, which supports this critical element in the sharing of chemical compound information.
What does all this mean for ChemSpider?
Because there is an active community supporting InChI who look out for these things, version 1.03 contained some bug fixes which mean that a very small number of the InChIs themselves, only a few dozen out of the whole database, have changed.
- P+–O– bonds and P+–S– are now treated slightly differently. This means that it will be easier to find the exact molecule you’re looking for, regardless of how it’s been drawn. (In principle this will also apply to analogous bonds containing arsenic, selenium, tellurium and antimony, but I can’t see any examples of this in the database.)
- There was a small bug where the InChI generated for a molecule with an azide group in it sometimes varied according to the input drawing. But that doesn’t happen now.
This regeneration has also allowed us to catch and clean up some errors in the data.
What happens next?
Version 1.04 of the InChI code will be released soon. With our new framework for processing large amounts of data we’ll be able to update our InChIs much quicker. The main changes in 1.04 that affect the InChI are to how it handles radical atoms in aromatic rings, nobelium, lawrencium and rutherfordium, so we anticipate that there shouldn’t be very many changed InChIs!

![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=8e0a4fbf-15c0-47d7-9cd4-d60a88f414b5)

![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=da7da160-182a-4a67-b807-5b97fa188ef4)
In what seems like an eon since I first blogged about the need for an
Entries (RSS)