Archive for the Open Access Publishing Category

Last a week I had a pleasant chat with a reporter from Nature magazine, a Mr Geoff Brumfiel. Geoff was interested in ChemSpider…what it was, how it ran, who used it, who supported it, who liked it, who curated it, who didn’t like it and so on.

The results of that discussion, and others he spoke to about ChemSpider, are here in his article.

Chemists spin a web of data p139
Chemspider website provides free information on millions of molecules.
Geoff Brumfiel
doi:10.1038/453139a
Full Text | PDF

It is a rule at Nature, at least for this type of article, that I could not see the article before it went to press and therefore I didn’t get the chance to proofread and comment. Geoff has accurately captured the spirit of our discussions but a few detailed clarifications are needed too. I have pasted in black the article content and in italics the clarification.

providing the community with an open-access source of chemical information

I giggled and commented please don’t say it’s Open Access. Say it’s Free Access. Say there are Open Data. And now we have Creative Commons licenses. But don’t say it’s Open Access, not Strong, not weak, not gold, not green. Just Free Access. No price barriers to usage.

Chemist Antony Williams is hoping to change this in a move likely to ruffle the feathers of the American Chemical Society.

I commented that we are not purposely in competition with anyone. It’s not what drives us to do this. Whether others see us to be competitive is for them not us. We don’t intentionally try to ruffle feathers. It doesn’t mean that what we are doing won’t ruffle feathers of course. Whether it’s ACS or others. It’s not the goal..it might be an outcome.

The modest project has made chemists interested in open access take notice — last week, the number of daily users of the site surpassed 5,000.

We have crossed 5500 users for the past two nights. The trend is positive.

“Other potential sources of information, such as Wikipedia, lack the algorithms needed to search chemicals according to their structure. “

Structure searching is “feasible” of course with InChI Strings. But substructure isn’t and Wikipedia is treated as a text-based search by almost all of its users

“The site is maintained with modest profits from advertising and the work of about 30 active volunteers who double-check the data pulled in from outside.

The original investment in hardware and software costs has finally been recouped. Modest profits? No one gets paid for the work we do. There is a phenomenal sweat equity investment in the platform numbering many thousands of hours to get here. We are indebted to the many software collaborators, providers of tools and the people curating and depositing to the system. There have BEEN about 30 active volunteers. RIght now I would say the number of active depositors and curators is around 10. But it is growing. I hadn’t checked the number of REGISTERED users for a long time. We have over 1150 registered users…those who CAN login and curate data, deposit data, see new features etc. People do NOT have to register to use the site…but >1150 did. Wow. I didn’t know it was that many until i just checked (BIG SMILE)

““There’s an awful lot of chemical information, but there’s an awful lot of rubbish as well,” says Barrie Walker, a retired industrial chemist in Yorkshire, UK, who helps maintain the site.”

Don’t know whether Barrie said this or not. He IS an honest guy and he is our QUALITY GURU and we are proud that he is willing to give us his fine eyes. There IS garbage on the site still. But, after a year online and active curating it has been much reduced. About 200 edits a day are made to the site: names changed/deleted/added, spectra/structures/URLs/Publications added etc. It’s quite the pace. We have cleaned up 100s of thousands of incorrect associations from the external data sources. It’s been and will remain an enormous task with an enormous payback for the community

Williams adds that the site still has problems with certain searches. For example, it struggles to distinguish between isomers: molecules with the same chemical formula arranged in different structures.

We can distinguish isomers no problem. The PROBLEM is that there is a mixture of isomeric species submitted from multiple data sources and data are mixed and intermingled in way that the user cannot get to the correct structure. Search taxol or Ginkgolide on the ChemSpider blog and read the mutliple blog posts about this. We can of course search all isomers for a particular chemical formula…

“But Williams nevertheless believes that the service may be able to compete with for-profit services. “What I’m doing is highly disruptive,” he says. “I think it can be done and it needs to be done.”

I think what WE are doing…its not me..it’s we…is disruptive. In a good way. Many chemists will benefit. Will it have an impact on for-profit services? Yes, maybe. As an outcome but not as the target. Our team of people, both internal to ChemSpider’s development and Advisory Group, and the people we don’t even know who are cleaning and depositing into the system for their colleagues in the community, are creating a powerful resource for Chemists. The FOCUS of this effort is to Build a Structure Centric Community for Chemists. We will change that soon…the focus on Structure-Centric will be to cover Chemistry in general and to Build a Community for Chemists.

We are well on our way and thanks to Nature, and Geoff in particular for exposing it. My comments above are not meant to detract from Geoff’s reporting abilities but it was a long discussion and some clarification statements are of value i believe.

Over the past year ChemSpider has been challenged over the nature of our offering in terms of Open Data etc. A small number of people focused a lot of time talking about this while we remained focused on improving the website and having it available for people to use as a Free Access website. I spoke to Peter Suber about Open Access and then John Willbanks about Creative Commons.

Since ChemSpider is the aggregate of a number of people’s work (including provision of software by collaborators) I had to get into conversation to see what licenses would be acceptable to those groups.

With the redesign of the website we have structured ourselves in a way to add licenses as we see appropriate now. So, as of today we have added the Creative Commons Attribution Share Alike 3.0 United States License and the appropriate logo is on all sections of a Record View except for the predicted properties. Once we get approval from our collaborators for this same license (and discussions are underway) then the whole record view will be Licensed.

At that point, you are free :

  • to Remix — to make derivative works

Under the following conditions:

  • Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
  • Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

Those of you frequenting the blog will know that we have a dedicated subset on ChemSpider for Molbank and that I have found the MDPI management and editorial team a pleasure to work with. I discussed my want to stay in relationship with them in a recent blogpost and, as stated in that posting, followed up with them to make them aware of an error in their article and the ongoing discussions in the blogosphere about their “openness”. In case the readers of the blog aren’t set up to catch the comments on the blogposts I am pointing to a comment made today by a member of MDPI.

“We are aware that our current MDPI copyright statement is not in line with the BBB definitions on open access. We are currently smoothly moving to a CC By Attribution License v3.0. Marine Drugs (http://www.mdpi.org/marinedrugs/) has already been published under that license since January 2008. IJMS (http://www.mdpi.org/ijms/) and other MDPI journals will start publishing under this license in the May respectively June 2008 issues. All previous content published by MDPI will be released under the CC By license within a couple of months on our new publication platform (now under testing). So this discussion about MDPI and open access will soon be part of history.”

My experience of working in the domain of creating a community for chemists is quite a simple one. If you want to know what a group is up to just ask them. Seems that MDPI has a clear path forward.

Recently I posted about our intention to post the full Molbank articles on ChemSpider. PMR commented on my potential over-extension of their Open Access nature:

“PMR: I also support publishers who make their material available. I don’t want to appear churlish, but Molbank use what is effectively a NC (non-commercial) license and this is what concerned me (and others) when I posted about 1 year ago. I don’t think it has changed. So sorry, Antony, it’s not “as Open Access as they can be” especially if one has to ask permission to mount the material.”

He may be right. What I do know is that I prefer to get into relationship with the groups/people I work with in the community. Simply grabbing their content/data without some connection doesn’t feel comfortable. AND, I realize in these days of search engines and scraping that’s quite acceptable.

When I approached MDPI, the publishers of Molbank, they were gracious in their willingness to have ChemSpider support, integrate and utilize their content. This is contrary to some of my experiences with some other advocates of Open Data and Open Access where trying to get their “Open Data” is like pulling teeth. MDPI appear to be the opposite, in my experience.

I commented  on Peter’s blog tonight:

“Regarding your comment “especially if one has to ask permission to mount the material.” I think that’s a comment on the fact that I asked permission? I asked permission for the reason that I am focused on building a community for chemists and this includes me staying in relationship with publishers. I think you know this about me from my previous comments about CrystalEye

“http://www.chemspider.com/blog/intention-to-scrape-crystaleye-content-and-staying-in-relationship-with-publishers.html”

I judge its a better way to Build the Structure Centric Community for Chemists on ChemSpider. So, while I didn’t have to ask for permission, I did. the result was an excellent exchange, newfound relationships and an opportunity to build an enhanced relationship WITH support and permission.

Many bloggers it appears assume that “concerned parties” read their blogs. For example, when you posted this: http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1048 did you make the editors at Molbank aware of the error or did you just scrape their content and blog? I have adopted a new approach of late – when I see issues with peoples data, websites etc I inform them directly to help them clean up errors. I’ve done this for Drugbank, PubChem, a number of blogsites, and so on.

In case you didn’t inform them I will send them your blog link tonight…also to the original author since I’m sure they will appreciate it too. This, I believe, is being a member of the community and   since the authors and the publishers are taking actions to contribute to the Open Access community it’s part of my personal charge to help.”

I have sent an email to the original author and to the MDPI editors with the hope they might clean the article or post an Erratum. This is what I feel is appropriate as an active member of the community. If you see errors on ChemSpider please do let us know directly. We have a “Add: Feedback” on every record page and do pay attention to your input.

Some of you may be aware of the Molbank Open Access Journal. I recently blogged about our dedicated website for this Open Access Journal described here. Murray-Rust has discussed MDPI journals previously and their nature of Open Access. I am happy to validate that they are as Open Access as they can be. They have given us the right to mirror their articles on our site and in the next few weeks we will do exactly that, host Molbank articles connected directly to the chemical structures. Watch this space for our exapanding integrations with Open Access publishers.