Archive for February 13th, 2008

ChemSpider IS polluted with interesting identifiers associated with chemical structures and I have blogged many times about our efforts to clean it up. I’ve also suggested that systems such as ChemSpider, and their are many, needs an easy way to provide feedback and we have done this as discussed here. All of us hosting such large data collections deal with these issues. Today I found a classic though. A search on a CAS Number brought me to this page:

estrone1.png

The information seems fair enough but the list of names is quite amusing:

estrone2.png

These  might be a new form of “International Name”. We have had disasters just like this on our own site. At the weekend I was informed by a user of one of our structures having over 70,000 identifiers! We looked at it. It was the ONLY structure on the database with more than 300 identifiers and this one user found it. We’ve cleaned it out now. Hosting services like this is a lot of fun :-)

Buy me a Coffee

An article entitled The Search for Unusual Suspects discusses the fact that scaffold hopping expands the range of core molecular shapes for lead generation. This is of particular interest to us here at ChemSpider because it discusses LASSO. For those of you watching the blog you will know that we are in the process of implementing LASSO here on ChemSpider. One of the specific advantages of the LASSO descriptor is the ability to scaffold-hop. This is defined in the article and quoted here:

The term “scaffold hopping” was coined by former Hoffmann-La Roche researcher Gisbert Schneider. “It defines the techniques used to identify isofunctional molecules — molecules that have the same bioactivity but different architecture — in other words, different chemotypes,”

Rather than try to do justice to the article I recommend reading the article here.

Also, expect to see LASSO rollout in the the very near future here on ChemSpider.

Buy me a Coffee