Previously I blogged about our intention to scrape CrystalEye data and publish onto ChemSpider. The original comments regarding the data on CrystalEye were as follows:

  1. pm286 Says:
    (1) All data come from Free sources – i.e. visible without a subscription. Some journals (Acta Crystallographica and RSC for example) do not copyright the data. Others like ACS add copyright notices. It is our contention, and Elsevier has agreed for its own material, that facts are not copyrightable. We have therefore extracted and transformed facts and mounted these. Where the original material (CIF) does not carry copyright we mount it on our pages – where it does we do not, but we have the transformed data. In those cases it would be possible to recreate the original CIF data in semantic form ,but not the exact typographical layout which contains meaningless whitespace.I am not aware that ACS or Elsevier have ever made statements of any kind about our Open Data efforts.

    You may scrape anything, must you must honour the source and the metadata and you should add the Open Data sticker. If you scrape the link (simplest) you may simpy point to our site. If you scrape more data you should ensure that the integrity of the data is maintined and that if it is re-used the re-used data should still clearly show our metadata.

We have already done the work to scrape certain data from the site but have chosen to be extra careful with taking the declaration of Open Data made to all data sources. My primary worry was with the data scraped from the ACS journals. With this caution in mind I sent a letter to the copyright department at ACS as outlined here. In fact I made a couple of phone calls, sent the email about 2 more times and finally managed to talk to a nice gentleman from the ACS copyright department and brought my concerns to light. Since then we have exchanged multiple emails, spoken again on the phone and I have been told that a meeting of minds from both Washington and Ohio was being scheduled to discuss the situation. That’s 2 months after my original email.

Today I received the following email and I am excerpting from it..

“Thank you for your inquiry about the proposed use by ChemSpider of information in the CrystalEye database that has been published within certain ACS journal publications. In light of your query, we are examining the manner in which ACS published material is represented within that database as well as the nature of your proposed use, so that we can respond in an informed manner to your request.


If you will be attending the ACS National Meeting in New Orleans, perhaps we could confer with you at that time to discuss our findings and advise you appropriately?

Communicators Name withheld ”

What I thought was a simple question and done with the intention that ChemSpider was safe turns out not to be so simple. It could take until March 2008 to get an answer! At this stage we will not be publishing any of the CrystalEye data without confirmation from each of the publishers that this is allowed. I asked the question previously “Who gets to declare data open or not?“  and even received the question “Why even offer the option of closed?” The primary reason is that we have turbulent times ahead of us around such issues of “openness” and until these are navigated I am working to keep  ChemSpider  “safe “. I am willing to participate, support and contribute to the evangelism of openness but am equally concerned with keeping ChemSpider alive for the close to 3000 users per day now accessing the service.

It was an interesting day to receive this email about a potential FIVE MONTH delay to a decision about Open Data especially now that Science Commons have released a Protocol for Implementing Open Access Data just yesterday. Read the entire post for details but the intent of the memo is as follows: “This memo does not specify an Internet standard of any kind, but does specify the requirements for gaining and using the Science Commons Open Access Data Mark and metadata, by using legal tools and norms that conform to the protocol specified. This memo is available under the Creative Commons Attribution 3.0 (unported jurisdiction) license and will be submitted to the World Wide Web Consortium for consideration.”

So, while protocols are exposed to the community by Science Commons the challenge of utilizing them now begins…I will be in communication with members of the Science Commons soon to determine how ChemSpider can it into the model…

Stumble it!

2 Responses to “Why We Can’t Publish Scraped CrystalEye Data Yet….And Science Commons Declare a Protocol for Implementing Open Access Data”

  1. Jeffrey Halbstein-Harris says:

    I find this topic quite interesting and wonder about the intention of “data owners”. In my world of HealthCare Quality Improvement data are at times not published for the benefit of society, for example: protected individual health information. Other cases include the issues of safety e.g. not releasing information that could be misused and cause harm to a patient.

    Unfortunatly, many data sources do not share their information as it is core to their business model and intellectual property. This case happens so often that critical variables necessary for the evaluation of health-care outcomes are not available. So…for the protection of individual rights to ownership or coproarte profit; the creative persons in the private sector are not able to create and test new products for the evaluation of cause and effect.

    What a mess our egos create!

  2. Gary Martin says:

    It would appear that Jeffrey has hit the proverbial nail on the head with his comment,

    “…many data sources do not share their information as it is core to their business model and intellectual property.”

    Would the ACS rather you were subscribing to some service that they provide to access information provided to them by authors that they have copyrighted?

Leave a Reply