I have been invited to write an article regarding Open Access Chemistry Databases and am in the process of gathering information. During one of my google searches I happened across a statement I was aware of but had forgotten until recently. It relates to the ability to use CAS numbers on a website. Specifically, from the CAS Information Use Policies of 2005 it says, quote:

“A User or Organization may include, without a license and without paying a fee, up to 10,000 CAS Registry Numbers or CASRNs in a catalog, website, or other product for which there is no charge. The following attribution should be referenced or appear with the use of each CASRN: CAS Registry Number® is a Registered Trademark of the American Chemical Society. CAS recommends the verification of the CASRNs through CAS Client ServicesSM.”

I interpret this as meaning that above 10,000 CAS numbers permission must be granted to the organization gathering togethering a data collection. Based on my experience there are a LOT of situations where collections of more than 10,000 CAS numbers exist. We are presently deduplicating and indexing another million structures on the ChemSpider index. We regularly receive SDF files (are these electronic “catalogs”?) containing structures and CAS numbers…and when these contain over 10,000 CAS numbers are they inadvertently going against CAS policy? Are all of those online databases with a large number of structures doing so with permission (for example ChemIDPlus, ZINC DB, eMolecules and, of course, PubChem.

I can only imagine if these large collections/websites/databases do not have permission to expose over 10,000 CAS numbers. What a public relations nightmare that could open up! Since we deposited the PubChem dataset to ChemSpider that naturally includes any associated registry numbers. Since eMolecules has deposited portions (not all) of the PubChem dataset they also have deposited the registry numbers.

I may be lighting a fire here, and might get some interesting calls as a result, but I am publicly asking the question…if you are managing a website or public data collection of over 10,000 CAS numbers (read that as any site exposing PubChem data) have you asked permission to expose the data? And … did you get permission? CAS numbers are everywhere…they are “phone numbers” for chemistry. On cans and boxes in our kitchen and garage. On webpages all over the place. This is a very interesting situation for “large chemistry databases”…

4 Responses to “How Many Electronic Databases Have More Than 10000 CAS Numbers?”

  1. Rich Apodaca says:

    On the other hand, this situation has been out there for a long time. This is a very messy legal area and it could be argued that CAS’ claim to copyright on CAS numbers may not hold H2O in court(disclaimer: IANAL) At least to my knowledge, their claim has never been tested. And if others have been violating the 10,000 CAS number limit for some time, the odds look even bleaker for CAS prevailing.

    I think CAS’ argument rests on something like the cases argued by Major League Baseball about their ownership of statistics. Check out this article:


    Replacing “baseball” with “chemistry”, and “Major League” with “CAS” provides some food for thought. The problem is that some act of creativity is involved in generating baseball statistics. You need leagues, rules, games, etc.

    CAS numbers are more like telephone numbers – a shorthand string of digits with much less creativity involved than in generating baseball statistics. There’s a well-known case in which a company using another company’s telephone numbers and contact info won the right to keep using them, royalty-free:


  2. Antony Williams says:

    In a discussion about this issue today with Wikipedia Chemistry I was referred to this reference case. it’s a great reference: http://en.wikipedia.org/wiki/Feist_Publications_v._Rural_Telephone_Service

