For those of you performing curation activities on ChemSpider you will likely have noticed the ability to mark a new type of identifier, a shorthand formula. We have enabled this because it has become clear that this could be a useful part of document markup as part of our ChemMantis system. For example, looking at an article let’s consider the excerpt shown below.

Regarding the excerpt you can see a number of highlighted terms, all being shorthand formulae and not depending on name to structure conversion algorithms but rather depending on a lookup dictionary. Each of these names are linked to ChemSpider for direct look up of information associated with the chemicals. The list of shorthand formulae extracted from a couple of hundred articles is actually only a couple of hundred formulae at present. It includes the most obvious compounds that we can all interpret: CH3OH, MeOH, CH3CN, MeCN, CH3COOH, NaCl, NaF, NaCN, KBr, KCl and so on. All of these are immediately interpretable by chemists. There are likely a few more to be found over the coming months but in the past week of reviewing articles from various sources we have actually only added a couple of new formulae. We have also seen value in linking up ions and elements as appropriate. We are likely to add filters for display/not display of elements and ions since we’re of the opinion that displaying every incidence of an element in an article is of luttle value…just imagine how many times you might see the word carbon or hydrogen in an article… carbon-carbon bonds, hydrogen bonding etc. So, we’re switching them off by default. We’ll keep reporting on how we are improving ChemMantis…based on the review of a stack of articles the system has improved dramatically. We are asking for your articles now…combining shorthand formulae and chemical name markup will highlight a document as shown below.

3 Responses to “Supporting Shorthand Formulae to Support ChemMantis”

  1. Rich Apodaca says:

    Tony, very nice. It seems likely that a dictionary of abbreviations/connection tables built and improved by testing against real documents might be useful outside the scope of ChemSpider/Mantis.

    Any plans to offer this dictionary separately?

  2. Antony Williams says:

    Rich..we are presently working with a number of closed access articles and not seeing much difference in those articles either in terms of shorthand formulae so I think our list is actually fairly good at present. The dictionaries we are building would certainly be of significant benefit to groups working in text mining and chemistry document markup. We hope to recoup our costs through the licensing of these dictionaries to interested parties.

  3. Paul Schulwitz says:

    You know something like this would be very nice for patent searchers. I’m constantly having to refer to the specification of patents to find the chemical structure of shorthand terms listed in the claims. Even then I often need to go to another resource to identify a structure.
    Just having this ability to quickly identify structures for common protecting groups (BOC, Cbz, etc.) and amino acids, esp. uncommon amino acids would be a great time saver.

