I’ve blogged previously about the confusion that appears to persist around CAS Numbers. Most of you are probably aware of the Wikipedia Chemistry project ongoing at present to validate the set of structures on Wikipedia. The project is well underway now and i can comment that there were definitely incorrect CAS Numbers on Wikipedia associated with the chemical structures but in reality the quality was very good. At present my estimate is about 20:1 in terms of 20 times more CAS Numbers on Wikipedia were CORRECT than incorrect. Very impressive I say considering how haphazardly they are used by people. More below..

Now, we DO have registry numbers on ChemSpider. Probably more than 10000 of them too with an acknowledgment given to the statement on large electronic collections of structures and registry numbers. The majority of these are on PubChem too and proliferate across hundreds of chemical companies. Go and search for chemicals in China and see how they are listed! We have an increasing number of companies from offshore depositing their data into ChemSpider and see issues showing up.

As a perfect example of the confusion of registry numbers that are showing up on ChemSpider check out this query: search for the number 1429-50-1 on ChemSpider and you will get this list of hits.

Confused?  Well, it’s simply people misusing registry numbers when making their association. I doubt they are not getting the numbers issued by CAS and therefore label “a component” of their material with the CAS Number and throw in a few waters of hydration here and there, a counterion or two and whoops…proliferation of CAS numbers with the wrong association. For the example above the “primary component” should be clear and consistent between the six hits.

ChemSpider, just like PubChem, cannot be responsible for the quality of what’s deposited with us. What we can do is use processes, robots and manual curation efforts to help clean it up.

So, what IS the correct chemical associated with 1429-50-1? No idea!  Anybody else know?

Stumble it!

3 Responses to “The Confusion of Registry Numbers on ChemSpider”

  1. Deb Banville says:

    CAS RN’s are assigned using strict business rules, however, few of us know what these rules are and how to apply them correctly. The phosphonic acid structures you drew appear to differ in their salts in some cases. The first structure without the salts is the correct structure for the CAS RN you gave (MF C6H20N2O12P4). It would have been easier to spot salt/parent compounds if the Molecular Formulas have the salts separate from their parent compounds, i.e. C6 H20 N2 O12 P4 . Na6.

    Better yet, avoid the RN’s (outside of CAS databases) and rely on the structures themselves or a variety of representions like INCHI’s or SMILEs that directly describe the molecule of interest. The INCHI type of representations can be searched using WIKI text searching capabilitites and will be as correct as the structure it was generated from!

  2. Getting a CAS Number from a PubChem CID « So much to do, so little time says:

    [...] to have multiple CAS numbers in PubChem and this problem has been discussed by Antony a number of times. Finally, given the fact that it might be possible that some arbitrary string matches the CAS [...]

  3. Getting a CAS Number from a PubChem CID at So much to do, so little time says:

    [...] to have multiple CAS numbers in PubChem and this problem has been discussed by Antony a number of times. Finally, given the fact that it might be possible that some arbitrary string matches the CAS [...]

Leave a Reply