The past couple of days has seen an interesting exchange going on over on the SimBioSys blog.

Zsolt Zsoldos is someone I respect, not only for his passion for his science but also for his want to educate others in the challenges of what he does in developing software. I believe his blog post entitled “Crystal Structure Errors in CSD too” was an honest attempt to tell people to be “careful” when using data from databases. I don’t care whether the database is ChemSpider, PubChem, the CAS Registry or any of the other databases available via free access of commercial transaction, they ALL have errors. It is inevitable. Zsolt’s attempt to highlight that such errors exist was done, I believe, with pedagogical intent.

“J” then came back and gave some appropriate comments in response to Zsolt’s post and they should be consumed in series. It appears there was some type of backroom conversation, likely with the CCDC,  about how these comments were not prominent enough. Zsolt then posted this:

Update: Since the posting of this blog entry, we have received 2 public comments — displayed in a standard way as all comments by the WordPress blog software, and some private emails originating from CCDC. One of the complaints from CCDC was that the second comment — which explains the problems and directs the blame on my naivity for my wrong expectations about the data — was not displayed as prominently as the original article.”

He then posted the comment into the original article. Huh? Not sure why Zsolt should have felt obliged to do this for anyone. It’s a WordPress issue re how comments are displayed. He should not have felt obliged to insert the text into the article. Zsolt then went on to comment about the licence agreement and permission to use the CSD. What is more interesting to me is his view here:

“On a personal opinion: such restrictions on the use of scientific facts do not seem to make much sense to me. As the IUCr position paper explains: There is a long-standing acceptance within crystallography of the principle that such primary data sets should be freely available for sharing and re-use (with appropriate credit) within the structural science community. Also the FAQ on the CystalEye site explains: “As this supplementary data is a set of facts and is not part of the article full-text it does not fall under the copyright, and it should therefore be free to both view and download“. Nevertheless, CCDC has the legal right to stop us from using the data, since we signed a licensing agreement containing such conditions. That was a mistake on our part, one that we have to live with now. Let this case be a warning for others who have not yet made such mistake to sign the draconian agreement. ”

Those of you who have been watching the discussion between myself and ACS over the past few months will know I have been trying to get confirmation that “supplementary data” are Open Data and that we could scrape the CIFs if we chose to…it’s a MANY month conversation at this point. The Unilever School at Cambridge, via Nick Day’s work, has generated CrystalEye and, after many conversations, we were provided the data source and have it on ChemSpider now. We are awaiting constructive feedback from Nick and Peter Murray-Rust regarding our implementation of their data on our site. THis is especially important when there are licensing issues as appear to have been enforced on SimBioSys, evidenced by this Public Apology to CCDC. Read the post for details. It is Zsolt’s concluding statement that feeds directly into the value of Open Data in science and the value of CrystalEye to the community.

He comments: “One lesson I learned from this exchange is the importance of Open Data for scientific advancement (some scientists believe that research data must be free), e.g. such that is available from CrystalEye. When even non-profit organizations (registered as a charity) use draconian license agreements protecting data created and published by others, then fully commercial entities (like pharmaceutical companies) must be guarding their own data even stronger. It makes it difficult to make scientific progress if a single blog mention of an error in a data entry invites the wrath of the company who sells services on the data.”

As efforts like CrystalEye prevail, as the copyrightability and position of publishers regarding supplementary data is resolved, and the efforts of groups such as ChemSpider are applied to gathering Open Data and developing algorithms from these data, there is likely to be increasing tension showing up such as we see here.

Stumble it!

One Response to “When a Scientific Blog Posting, Data Licensing and Open Data Access Come Together”

  1. Zsolt Zsoldos says:

    Thank you for the supporting comments. It is refreshing to see that ChemSpider also includes freely available crystallographic data. With open competition like CrystalEye and ChemSpider, CSD no longer has their former monopoly on the scientific data, which is a very good development on its own — competition is always better than monopoly. The community based curation process of ChemSpider will improve the quality of the data available here, so CCDC would be better off focusing on their value added services to keep and extend their user base instead of acting like Big Brother to enforce their Orwellian licensing policy.


Leave a Reply