common-chemistry

The Chemical Abstracts Service have announced their first foray into providing Public Domain data. CommonChemistry.org was announced at the ACS meeting and is now online for all to visit. From the “About Common Chemistry” webpage the site is defined as:

“This database contains the CAS Registry Number®, chemical names (both formal and common), molecular formulas, and structures or sequences for ~7800 chemicals of widespread general public interest. These substances are of global commercial use or importance and have been cited 1,000 or more times in the CAS databases. Examples of substances included are aspirin, biotin, benzoyl peroxide, and boric acid. The Common Chemistry database also includes all 118 elements of the Periodic Table, although not all of the elements may meet the 1,000 references threshold.

Links to Wikipedia records (when available) have been provided by the Wikipedia Chemicals WikiProject in collaboration with Chemical Abstracts Service.

You can quickly and easily confirm a chemical name, CAS Registry Number, or structure from this database of common, everyday chemicals.

You can search for substances in Common Chemistry by either their CAS Registry Number or by their chemical name. Chemical name searches can be by exact name if you have one or by name fragment. CAS Registry Number searches are exact search only. Consult the Help page for additional search tips and details.

This database will be updated periodically. Information such as Wikipedia links may be added on a more frequent basis as it becomes available.”

A search on Xanax or Aspirin produces a hit very quickly and the record example for Xanax is given here. The result is a validated CAS Number for Xanax, a list of chemical names and the chemical structure. You can compare that to the ChemSpider record for Xanax here. I personally prefer our structure images on ChemSpider. The comparison is below…ChemSpider is on the right. We have a lot more info on the ChemSpider website and a lot of it is validated y the community.

xanax

Of note is the fact that the CAS number provided with the CAS image is not separated by dashes. I had never seen that before.

We have already created the CommonChemistry.org Data Source on ChemSpider in case anyone wants to connect up records from ChemSpider with CommonChemistry as they are curating our dataset. I’ve already linked a few records to CommonChemistry.org and maybe that will happen at Wikipedia too. Some basic checking on a few records shows that we have good validation on the registry numbers on ChemSpider already. I checked 5 records and we were correct in all cases. This is unlikely to bear true across the entire database but is a good sign.

It is unclear what licensing is on the data. I doubt it’s Open but that won’t matter to the majority of users…they are looking for a piece of information or to confirm something and are unlikely to be distracted by whether the data are Open or not…free access will suffice.

I haven’t tested the search capabilities too much and will do so in the next few days. I think that CAS should consider showing the leed of the Wikipedia article as well as linking out to other information. ChemSpider is a good one since we list articles, properties, analytical data etc for a much enhanced record …see Cholesterol as an example. When the site is out of beta we’ll offer to produce ChemSpider IDs for the entire CommonChemistry database in case they want to link.

This website is an interesting shift for CAS and demonstrates a willingness to provide access to Public Domain data. It is a good start to open up the first 7800 structures with more than 1000 citations and there is much more that they can do in a smilar vein, theoretically without threatening their business model. It’s going to be interesting to watch. Certainly CAS have helped in the validation of the CAS Numbers on Wikipedia and that has been an interesting project for all with validated CAS numbers resulting. It has been a long and exacting project with many eyes poring over the data…all for the good of the community.

Reblog this post [with Zemanta]
Stumble it!

6 Responses to “CAS Announce CommonChemistry.org”

  1. Steven Bachrach says:

    Interesting development by CAS – moving into the “free information world”. I find it noteworthy that CAS does not provide the InChI as a synonym! Apparently they don’t believe in what you say Tony in your subsequent post “An article about the influence and proliferation of InChIs“.

  2. will says:

    It is a very clean, quick interface and is a good (initial) response to the free chemical structure searches available on the web which could in time steal market share from CAS.

    For real usefulness they will need to provide more associated links/sources of the data.

    This presents a bit of a paradox for them as to do so would undermine SciFinder (though not to do so would be a helping hand to free access services which link out to data sources – such as ChemSpider and Google Scholar in the long run)

  3. Joerg Kurt Wegner says:

    How to turn a good inspiration into something useless? Now we got yet-another-data-silo and web-service, which is only linking to Wikipedia.

    Honestly, I see no added value at all, beside getting a CAS number, I do not care about, because it can not be used for linking to other data silos. Not only are they not mentioning InChI, but also that ChemSpider was a leading member of the Wikipedia initiative, so, kudos for the chemistry 2.0 community, surely not for CAS, now following what we already created a long-time ago (in web standard time).

    I keep searching Google or ChemSpider directly, which gives me more information, also for the compounds with less than 1000 publications ! Anyway, I appreciate the effort and wish CAS a dramatic Alexa traffic increase from others, I will not come back to that site, if it stays like that.

  4. Rich Apodaca says:

    Tony, great find. Here’s another perspective:

    http://zusammen.metamolecular.com/2009/03/31/sixty-four-free-chemistry-databases-part-6-common-chemistry-from-chemical-abstracts-service

  5. Physchim62 says:

    It’s an interesting site, and has more information than I feared. CAS has not only opened up (very slightly) its file of official CAS Registry Numbers®, but also its file of synonyms. Of course, they have kept control over which records they release, and this is how they must be planning to protect their business model. It is not really huge news to anyone that the CAS Registry Number® of formaldehyde is [50-00-0], after all!

    As for the copyright (“licensing”) on the data, CAS’s position does not seem to have changed. CAS claims copyright on CAS Registry Numbers®, but will let you use, “without a license and without paying a fee”, up to 10,000 of them “in a catalog, website, or other product for which there is no charge”. Their Information Use Policies were updated on December 12, 2008, but I find no major changes to the points of contention from this time last year.

    Let’s be very clear about this: there is no copyright on information per se. Nobody can stop me telling you that the boiling point of water is 100 ºC, or 212 ºF, or that the CAS Registry Number® is [7732-18-5] (as can now be confirmed from CAS’s own site). On the other hand, there can be a copyright on collections of information. So you are free to reproduce an entry or two from the commonchemistry.org site, but you would need to be very careful before reproducing the entire database.

    The major change, as Antony says, is the change in attitude. This time last year, CAS was issues thinly veiled threats to withdraw access to CAS databases from Wikipedia editors who used them to verify WP’s information; the response was a quite open threat of legal action against CAS under European competition law. At least now, both sides are co-operating for the benefit of the general public, which is of course Wikipedia’s main userbase.

  6. Joerg Kurt Wegner says:

    @Physchim62 – Thanks, very informative. Do you also have a reference for the statement of the copyright on ‘collections’?

    I am still wondering if ‘a publisher collection’ of chemistry articles (or PubChem) are not violating the CAS policy, already? The future and text mining will show … if this is indeed the case.

    Again, I also appreciate the efforts of CAS, but please link to other sources. Anyway, on the long-term I am still in favor of InChIs, they are not perfect, but they are the best we have right now, for breaking down chemistry data silo barriers.

Leave a Reply