Copyright©2008 Antony Williams
I admit to not being fully knowledgeable in the details of CAS Numbers. If anyone has a short treatise regarding their history and breadth relative to generic/specific structures and “materials” I’d welcome getting pointed to it. That said in the community in which I participate CAS Registry numbers appear to be very confusing. One thing is for sure…the authority IS the Chemical Abstracts Service. They have the reference data collection of course.
In the public domain there is a “mess of data” and various parties attempting to use them for full effect. It’s a problem. In a recent letter to C&E News (May 5, 2008,Volume 86, Number 18,pp. 4-7,) a Ms Deanna Morrow Hall, from Stone Mountain, Georgia commented on this confusion. I can’t paste the entire letter here because of Copyright issues of course but will abstract.
“The most common problem is the confusion between the number for the generic formula of a compound (intended to be used for a chemical entity when its exact composition is unknown or variable) versus the number for a compound of specific known formula.”
She gave as an example, Propanol:
Propanol (generic formula) : 62309-51-7
“First, a vendor (either in a product specification or in a material safety data sheet) uses 62309–51–7 as the registry number for one of the specific configurations. If a buyer uses the correct specific registry number to search for suppliers of one of the specific configurations, then he will not find that vendor.
Second, a vendor uses the correct specific registry number for one of the specific configurations. If a buyer uses 62309–51–7 to search for suppliers of one of the specific configurations, then he will not find that vendor.
Third, a vendor correctly uses 62309–51–7 to describe a mixture of the two specific configurations, but the buyer thinks he’s ordering one of the pure compositions.”
Her letter concludes with
“Given that these errors occur with greater frequency than one might anticipate and are not trivial in their consequences, it seems appropriate that ACS should initiate a study to quantify the extent of the problem and to identify solutions to it.”
Access to Registry Numbers and just the related structure/material would be a great service to chemists. It would likely have an enormous impact on the ACS/CAS bottom line though. This is understandable. But what about the bottom line of communication between chemists? Ms. Hall’s examples are definitely real.
In the Wikipedia curation project outlined on this blog we have run unto issues with validating CAS Numbers. Fortunately CAS have offered to help. The project is now rolling again after a hiatus and we are presently preparing 500 structures to upload…hopefully more. We definitely found errors and the validation process will be possible only with their help. What do we do moving forward though?
“A good example is Wikipedia. (…..) relies on the “wisdom of crowds”, but I think it works well in chemistry. Chemspider has harnessed the wisdom of crowds but I suspect that only a very small fraction of their entries have been human-curated and I give an example below which seems to need attention.”
The reality is that about 10X the number of chemicals on Wikipedia have been human-curated..I estimate about 50,000. Curation means what in this case? It makes validation of the consistency between the structure displayed and the numerous identifiers allocated to that structure. We cannot validate predicted values of course. 50,000 human curated records is significant.
Peter went on to discuss identifiers “Identifiers. Potentially identifiers are the easiest and most powerful tool. An identifier is a unique string associated by an authority with a substance (not necessarily pure). If an authority(X) asserts that substance A(X) and substance B(X) have the same identifier then they can be said to be equivalent. There are many authorities making such assertions. Ultimately it is only the authority(X) who can make assertions about its identifiers. To be widely useful the authority should provide a lookup (resolution) service which is both human- and machine-accessible. In practice many authorities don’t do this or provide only a toll-access service. The identifiers are also often copyright and may or may not be copied. This often leads to other authorities(Y) who copy identifiers without permission and make their own assertions which may or may not be compatible with the authority(X). Frequently also the source of the identifier is not given. Thus many people who submit information to Pubchem give identifiers and these are listed as “[RN]” = registry number. For aspirin for example, there seem to be many identifiers – in the Chemspider entry all the following link through to Pubchem, e.g. 2349-94-2[RN], 26914-13-6[RN], 98201-60-6[RN]”
When Peter commented “I give an example below which seems to need attention” I think he was pointing to the fact that aspirin has many Registry Numbers “there seem to be many identifiers – in the Chemspider entry all the following link through to Pubchem, e.g. 2349-94-2[RN], 26914-13-6[RN], 98201-60-6[RN]“. Maybe it wasn’t the issue. Either way it’s a great foundation to examine CAS Numbers.
Is three RNs on ChemSpider appropriate? Well, we know that MULTIPLE RNs is okay already based on Ms Morrow Halls comments. Is ChemSpider on target with these three?
Landolt-Bornstein’s Poperty Index is very well known. They have aspirin here. They list the following CAS Numbers: 50-78-2, 2349-94-2, 11126-35-5, 11126-37-7, 26914-13-6, 98201-60-6
An online MSDS sheet for Aspirin is here and lists the registry numbers: 50-78-2, 98201-60-6, 26914-13-6, 2349-94-2, 11126-35-5, 11126-37-7
The German Institute of Medical Documentation and Information lists Aspirin here and lists the following CAS Numbers: 50-78-2, 2349-94-2; 11126-35-5; 11126-37-7; 26914-13-6; 98201-60-6.
The RTECS database lists for Aspirin:
The Registry of Toxic Effects of Chemical Substances
Salicylic acid, acetate
CAS #: 50-78-2
ALT CAS #: 2349-94-2
ALT CAS #: 11126-35-5
ALT CAS #: 11126-37-7
ALT CAS #: 26914-13-6
ALT CAS #: 98201-60-6
For the MSDS Sheet and the German Institute the CAS Numbers are the same as Landolt-Bornstein…maybe they were sourced there?
Peter had listed only three RNs on ChemSpider “2349-94-2[RN], 26914-13-6[RN], 98201-60-6[RN]“” Checking ChemSpider showed we actually had the following list there: One Validated RN: 50-78-2 (the one declared as the Primary Number on the other sites) and the following list (NONE of them validated):
ALL of these are valid based on the other data sources EXCEPT for 337376-15-5, a totally unrelated compound detailed here. This one has been deleted using the usual synonyms curation process and the others approved.
PubChem also lists ALL six Registry Numbers as shown below. There are those who believe registry numbers are not on PubChem. Not true.
So, ChemSpider, PubChem, MSDS sheets and many others have a consistent set of 6 registry numbers for aspirin. Are they correct…only CAS could confirm. I believe this shows that multiple CAS Numbers are appropriate. What I cannot comment on is what each one stands for. This reverts back to Ms Morrow-Hall’s comments.
Moving forward how will we stop the proliferation of errors? How can we reduce the potential cost of mistakes made as a result of CAS Number miscommunications?Stumble it!