The Wikipedia curation project I have been working on has been on hold from my side for the past few weeks as I focused on some other important tasks including some presentations, 5 peer-reviewed papers and articles and a whole series of technical advances on ChemSpider. It’s been good to get a break from eyeball curation but I am ready to start again in mid-March if no new night time distractions show up.

Tonight I was catching up with my Watchlist on Wikipedia for the first time in a long time and noted that a comment had been added to the Wikipedia Project: CAS Validation page. This discussion page was started to have a place to discuss a second validation of my work by other membes of the WP:Chem team and especially to deal with my concerns about CAS numbers not matching the structure drawn in the Chemical Box or Drug Box. Sometimes the CAS number might be for the chloride salt but the structure would be the neutral form for example. So, this was our discussion place. I believe there is general agreement by all participants at WP:Chem that CAS Numbers have value for the users of Wikipedia and chemists is general so the presence of a CAS number in the boxes makes absolute sense and, of course, the correct CAS number for the structure makes sense in an encyclopedia. Therefore, validation and sourcing of CAS numbers has been pursued.

A comment from Eric Shively at CAS can be found here online at Wikipedia. He comments:

Chemical Abstracts Service (CAS) objects to anyone encouraging the use of SciFinder® and STN® to curate third-party databases or chemical substance collections, including the one found in Wikipedia. SciFinder and STN are provided to researchers under formal license agreements, under which the researchers agree to refrain from using these tools to build databases. We urge and expect those researchers to respect the explicit terms of the agreements they have entered into. CAS is a division of the American Chemical Society. Please contact CAS if you have questions. Eric Shively, CAS, eshively@cas.org Eshively (talk) 20:56, 5 March 2008 (UTC)

It’s an interesting stance. This at a time when there is more focus on facilitating information exchange. In an environment where people are using resources such as Wikipedia to source information one would assume that the availability of CAS numbers would actually be encouraged rather than so blatantly discouraged. It’s been said before that CAS numbers are like the phone numbers of the chemistry world so if they were to be sourced from a vendors catalog would that be acceptable? And how would anybody know where they are sourced anyway? If they were sourced from a bottle of chemicals on the shelf and added to Wikipedia is that acceptable?

Nevertheless,  as Mr Shively comments there are legal agreements in place and they are expected to be respected. Question: does every user of Scifinder read the agreement? When a large Pharma company licenses access to Scifinder for their users do they expect people to know the legalities of usage and train their users in such detail? Maybe…

As it is I am not a user of SciFinder…though I’d like to be. I think it’s an incredible resource. So, I don’t have to worry about the legal repercussions of using the system (yet). As it is I will continue my work of curating and I guess there will be a discussion now with the WP:Chem team about what to do about CAS Numbers.

Stumble it!

17 Responses to “CAS Discourages Using SciFinder to Help Curate Wikipedia Structures and CAS Numbers”

  1. Jean-Claude Bradley says:

    The fact that large collections of CAS numbers are tolerated on public company catalogs on the internet has always puzzled me, given the knee-jerk responses such as this they usually provide. If you get some clarification about this at some point from CAS please post it.
    Of course, how can you prove that a melting point or CAS number was looked up on one of their databases? This is an opportunity for CAS to collaborate with the community and try to recapture a bit of goodwill, without hurting sales of their products because they offer so much more value. But now people are going to devise ways of avoiding the use of CAS numbers and actually reduce the value of their system.
    Publishers of textbooks are making the same mistakes with respect to disallowing public use of small sections of their books for open courseware applications.

  2. steve heller says:

    To paraphrase Captain Renault (Casablanca) “I’m shocked, shocked to find that people all over the world are not abiding by the CAS contract agreements and the copyright of the CAS Registry Numbers.” As a long time (almost 50 years) ACS member I feel strongly that the CAS should enforce it legal rights and either have people remove the CAS numbers from their printed and electronic publications or initiate legal action (such as they have done with Google on a different matter). Besides the legal issue there is the quality control issue, which many people have pointed out. It is not easy to be you have the correct CAS number due to a variety of reasons – e.g., different salts and stereochemistry. Polluting printed and electronic with incorrect CAS numbers is not a good thing. The issue of quality control has been a concern for many years, but it seems people just don’t want to pay for it. I was involved with project that had a large contract with CAS which allowed us to obtain the correct CAS numbers for thousands of chemical structures. S. R. Heller, G. W. A. Milne, and R. J. Feldmann, Quality Control of Chemical Data Bases, J. Chem. Inf. Comput. Sci., 16, 232-233(1976) – http://pubs.acs.org/cgi-bin/archive.cgi/jcisd8/1976/16/i04/pdf/ci60008a010.pdf
    Regrettably, after a few years, the ACS would not renew the contract under the same terms and conditions, so we stop adding CAS numbers to the databases.

  3. Egon Willighagen says:

    In 1995 I started a Dutch website on organic chemistry [1] and the CAS number was as useful as it is now, and already then we knew we were not allowed to compose a database of CAS numbers. Not sure about the legal state of that, but our university had a license; not sure if students had access, but do not believe so. Anyway, building a substantial list of CAS number was not allowed. So, we looked for other means of identifying molecular structures, which led us to CML… this was around ’96-’97 or so, at least before XML was released, and we started using CML actually when it was still in a more obscure SGML format :) Yeah, the XML recommendation was much appreciated!

    OK, so back to your blog item. You can imagine that the comment in WP by CAS does not surprise me at all; nothing really new. If they would allow this, it would set a precedence…

    The solution is, however, fairly easy. Use InChI(Key), PubChem CID, or ChemSpider CID; the latter two are on the same level as CAS numbers. CAS registry numbers are overrated. Not sure if they still hand out CAS numbers to mixture too… (I guess not).

    Oh, and I agree with Cpt. Renault… people should really abide to legal requirements. Period. If you don’t like them, quit the legal agreement. As simple as that.

    1.http://www.woc.science.ru.nl/

  4. Joerg Kurt Wegner says:

    I do not get this? It is clear that contractual obligations of STN(R) and SciFinder(R) do not allow people publishing CAS numbers, right! But if this is the only source accessing CAS numbers, why is it then possible that CAS numbers are available at all in publications and some databases? And what is if people are using those sources for CAS numbers, but not STN(R) and SciFinder(R) directly? Is this not a widespread misuse?
    “The public domain can also be defined in contrast to trademarks. Names, logos, and other identifying marks used in commerce can be restricted as proprietary trademarks for a single business to use. Trademarks can be maintained indefinitely, but they can also lapse through disuse, negligence, or widespread misuse, and enter the public domain. It is possible, however, for a lapsed trademark to become proprietary again, leaving the public domain.” [http://en.wikipedia.org/wiki/Public_domain]

    I left a message on the CAS discussion and Eshively’s talk page and hope that he could comment on this issue
    http://en.wikipedia.org/wiki/User_talk:Eshively#Curating_CAS_numbers_.28feedback_request.29

    Beside, I blogged about it too and finished with the sentence … I would say (again, this time in nicer words): ‘get the hell organized scientists worldwide!’ If CAS can do it, we can do it? It may take longer to get the party started, but if we do not start it will never happen.

  5. will says:

    There is an issue around whether the terms and conditions wrt the usage of CAS numbers are legally enforceable.

    Remember:
    They are just numbers. i.e. descriptors.

    Their usage in itself does not represent a violation of CAS trademark unless implications are made by the user that they own or are associated with the CAS trademark. CAS does not enforce its so-called ‘right’ to limit DBs to 10,000 because it has no such right (unless CAS DBs are used to obtain them in the 1st place).

  6. Egon Willighagen says:

    Will, regarding your statement “they are just numbers i.e. descriptors”…

    I’m not sure about that argument… they are certainly not descriptors in the QSAR kind of sense. Instead, there is (some, not much) creativity in those numbers (fortunately, they did not patent the concept :) … it’s much more like a phone book, indeed. I’m no legal expert, not trained in law at all actually, but I do not see CAS number as natural facts…

    The fact that they have not enforced the 10k limitation, e.g. with chemical vendors, makes it the more important for them to make clear what they mean with the restriction, e.g. via the note in Wikipedia…

    I understand that what they like you (if you do not have copy of their database) to do and allow you to do (via the license for a copy of the database) is different. But curation can only happen against the database itself; which means that you licensed it; if they say in the license you cannot even develop an alternative numbering scheme, than you agreed to that license when you bought the license. I do not see why that would not be a legal agreement, and why that could not be upheld in court.

  7. Joerg Kurt Wegner says:

    @Will: Can you please elaborate on this or give some references?
    I used in my blog post a quote from the PublicDomain@Wikipedia article
    http://miningdrugs.blogspot.com/2008/03/cas-numbers-are-not-public-domain-are.html

    “Work created before the existence of copyright and patent laws also form part of the public domain. The Bible and the inventions of Archimedes are in the public domain. However, copyright may exist in translations or new formulations of this work.” [Wikipedia]

    This means in my opinion that a CAS identifier is a translation of a structure, right? As far as I understand the above statement can can they have copyright, no?

  8. Science in the open » What to use as a the primary key for chemicals? says:

    [...] be found on most commercially supplied substances. Yet, as described by Peter Murray-Rust and Antony Williams recently you can’t look these up without paying for them. And indeed by recording them for your own [...]

  9. will says:

    First of all, I have no legal training either so this is certainly not legal advice.

    What I do (think) I know:

    1) When using a CAS database, their terms and conditions rule (unless they contradict).

    2) CAS no.s are identifiers. Without the substance details, they are useless, and that is what makes an identifier an identifier.

    3) Identifiers (titles, citations, and … CAS no.s) can be freely redistributed as long as trademark is respected (i.e. CAS are credited or not discredited as the trademark holder)

    4) Joerg: “translation of a structure” >>
    And a structure (or substance) is a fact. The translation methodology is the copyrightable stuff not its output.

    5) Egon: “curation can only happen against the database itself” >>
    Ideally yes. But I think chemical companies are still good sources of info. Dont know what you think but the Sigma-Aldrich catalogue is where most go for chemical data as far as I see. CAS have to have their numbers at the chemical companies (usably accurately) or they have no value in industry.

  10. Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » What to use as a the primary key for chemicals? says:

    [...] be found on most commercially supplied substances. Yet, as described by Peter Murray-Rust and Antony Williams recently you can’t look these up without paying for them. And indeed by recording them for your own [...]

  11. The Curation of Almost 5000 Structures on Wikipedia at The ChemConnector Blog - Observations and Musings for the Chemistry Community says:

    [...] recently commented on the statement made by Eric Shively of CAS about the CAS Validation Project going on at Wikipedia. The basic premise of the work is the need [...]

  12. Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » Compounds, substances and identifiers says:

    [...] has been discussion recently (e.g. CAS Discourages Using SciFinder to Help Curate Wikipedia Structures and CAS Numbers and the Wikipedia Project: CAS Validation page) about the use of CAS identifiers and possible [...]

  13. Physchim62 says:

    I trust that the chemical community will treat this missive from CAS with the contempt it deserves.
    *It is clear that CAS places the maximization of its revenue above the provision of chemical information. What does CAS object to here? That researchers use its products to find chemical information, or that this information is published? In either case, its stance is both ludicrous and profoundly anti-scientific.
    *In a discussion about CAS registry numbers, it should be pointed out that these are used by many governments and international organizations (see, e.g., 29 CFR 1910.1000) and innumerable commercial firms (e.g. chemical suppliers). Indeed, they would not be interesting for WP if they were not so widely used! CAS tacitly admits that it cannot control this use through copyright law, as has been discussed at length both on WP, which is why it has to resort to contract law in the form of the draconian license terms it imposes for access to its databases.
    *However CAS is effectively a monopoly supplier of much chemical information, as can be seen from the prices it manages to charge for access to its databases. The restrictions it purports to impose of the reuse of its “product” would appear to breach anti-trust legislation on both sides of the Atlantic. Users of CAS databases in the European Union can take heart from Art. 8.1 of the Database Directive (96/6/EC):
    **“The maker of a database which is made available to the public in whatever manner may not prevent a lawful user of the database from extracting and/or re-utilising insubstantial parts of its contents, evaluated qualitatively and/or quantitatively, for any purposes whatsoever.”
    I call on CAS to make it clear that the information contained in its databases may be freely reused in accordance with the principles of chemical science and the laws of the jurisdictions in which it operates.

  14. Peter W says:

    I must of missed something. CAS Reg Numbers have value? I have done a lot of purchasing, searching, and using chemicals. I never cared what the CAS Reg Number was. (Well, not until someone told the NJ DEP chemicals needed them.)

    I should presume that no one would care what a sample’s notebook number or company ID number was as long as they knew its structure. Would it matter to anyone what the system was that any company used to give an ID to a sample? (Sure, I understand they would, because companies use the ID without the structure to evaluate compounds outside of the company. What is the compound?)

    CAS Reg Numbers were useful for CAS to identify compounds as unique entities. When you searched in Beilstein, in German, or Berichte, or Bull. Soc. chim. France and deconstructed the chemical name into a structure, there was always some uncertainty as to whether the compound you drew was correct. CAS did provide a second opinion and by linking it with a CAS Reg Number, at least it helped to correlate with another name. However, practically speaking, it didn’t confirm the structure nor the chemistry.

    Now, it is so easy to do structure and substructure searches. Do people really search Aldrich by CAS Reg Numbers? (This is a chemistry thread, right?) If I can find the structure, do I still need the ID? I thought the structure was the universal concordance, not the database ID given to it. Wikipedia doesn’t enter compounds under their CAS Registry numbers, do they? So, if the preference for finding a compound in Wikipedia is a name and it gives a verifiable structure, of what value is an abstract ID number?

    Now, if this argument was about Smiles, that would be a whole different kettle of fish.

    Indeed, it is paradoxical, if you don’t like CAS and their policies, don’t use their Registry Numbers. The more they might be used to substitute for real names and structures, the more valued they become. The less they are used and correlated, the less value they become. I thought it was going to be very ironic if everyone used aliases in a discussion of a universal identity of a compound, but not so.

  15. John Shockcor says:

    This is just another example of how the American Comical Society is losing its credibility. I have for years seen this trend toward bureaucracy and commercial gain in all aspects of their activities. They are supposed to be enabling the science of chemistry, not hindering it.

  16. Freie Katalogdaten und Erschließungsmittel « Jakoblog — Das Weblog von Jakob Voß says:

    [...] eingeschränkt nutzbar sind. Ein Beispiel für ein nicht nutzbares Erschließungssystem nennt Anthony Williams der von Peter kommentiert wird: Die American Chemical Society (ACS) verbietet es, die CAS-Nummern [...]

  17. Peter Delashmit says:

    I have been following this debate over open source in general on the web & in C & Eng news weekly.

    I am a product development chemist. I have been a member of the ACS for 8 years. I pay my own membership fees because the company management doesn’t particular see this as important. They put the money into Sales & marketing, the lab limps by, so to speak.

    Still, most of us are in science for the love it it ( I hope) and so we make do.
    For all the years I have been in the ACS, truthfully, it has been of little or no benefit to me except the local section get togethers peroidically.

    I have to assume there are others like myself, that just don’t feel like writing
    and so hopefully, some of those people will read this post and voice their opinion.

    The prices that the ACS wants to access literature (after paying membership fees) , is oriented toward instituions and organizations that are way above what most companies in industry are going to pay. So, let’s say the individual is willing to pay out of pocket. No way can they afford the ACS prices. Think of us like teachers that use their own money to get the kids school supplies.

    I love the open source stuff & I understand the ACS needs to pay it’s bills, but I think it is alienating alot of members & non-members in the science community
    by taking a hard line approach to sharing scientific information. It makes them appear to be more intersted in money than science.

Leave a Reply