A Chemical Dictionary from Adam Azman with Help from ChemSpider
Posted by: Antony Williams in Community BuildingCopyright©2008 Antony Williams
A few months ago I met with Adam Azman in Chapel Hill to discuss how the names in our ChemSpider database could be used to expand his Chemical Dictionary. It seemed that we would be sitting on a treasure trove of name fragments that could help him in his efforts. So, we supplied Adam with 1.3 million identifiers and Adam has worked for the last few months to generate his Chemical Dictionary. He extracted over 100,000 name fragments from our collection as he has described in his blogpost here.
Extracted from Adam’s blog are his so-called Administrivia “The dictionary is licensed under the Creative Commons Attribution 3.0 License. … The dictionary is compatible for Microsoft Office (Windows or Mac), and Open Office (Windows or Linux). The install file includes instructions for upgrading old versions and installing it for the first time. The dictionary should be useful for all chemists. However, I am an organic chemist. Thus, the dictionary was created from an organic chemist’s mindset. It will probably be most useful for organic chemists.”
Adam has explained in detail how he did the work. I encourage you to read his post to fully understand the nature of the work and how much heavy-lifting he actually did.
It’s been a pleasure to help Adam and the community by supplying our own form of a “dictionary” to him for his particular treatment. It took a few hours of work from our side and months of hard work from him. I encourage you to take advantage of his efforts…if you are a chemist this is a real gift for the season. The dictionary can be downloaded from our site here.
Now I want you to consider timing. We are working hard on our ChemMantis project, a system for entity extraction and document markup. Part of this includes the generation of dictionaries for finding chemical names. We’ve already expanded our chemical dictionary using the database of identifiers from ChemSpider but for those of you working with other systems such as OSCAR3 or the other commercial markup systems dependent on chemical dictionaries you will likely find Adam’s contribution significant. Enjoy.
Entries (RSS)
December 20th, 2008 at 10:44 am
[...] Through David, I was introduced to Antony Williams from chemspider.com. I met with him one afternoon in February, and he agreed to release his database of 1.3 million identifiers for me to integrate into the next upgrade. (Update: read Tony’s writeup here) [...]
December 20th, 2008 at 10:45 am
Thanks for the link and the kind words. And for all your help.
December 20th, 2008 at 7:35 pm
Is it possible to provide for the very same chemical name list also a chemspider identifier list? This could be a fantastic starting point for word, openoffice, google docs, or any other kind of mash-up with chemical content.
December 21st, 2008 at 9:30 pm
The majority of the dictionary is made up of chemical name tokens, not full names but some WOULD have associated CSIDs. I’ll discuss with Adam..
December 22nd, 2008 at 1:36 pm
I guess in this case a substructure search identifier would be better